ZADA
ZAYAZ Docs Architecture
1. Purpose
ZAYAZ Docs Architecture defines how documentation, supporting artifacts, search, and future registry-scale source files are organized across the platform.
The objective is to:
- preserve the strengths of the current system
- prevent documentation infrastructure from becoming a storage bottleneck
- enable future scalability for ESG data, registries, and computation layers
This architecture is normative for all documentation and documentation-adjacent systems.
2. Core Principles
2.1. Docusaurus is the Documentation Frontend
Docusaurus is responsible for:
- rendering MDX documents
- composing documentation pages
- rendering custom components
- integrating snippets and metadata
- providing navigation and documentation UX
Docusaurus MUST NOT be used as a storage layer for:
- large Excel libraries
- large structured registries
- operational datasets
- backend policy state
2.2. GitHub is the Authoring Layer
GitHub is the authoritative source for:
- MDX documentation
- documentation code
- custom Docusaurus components
- snippets and associated files
- schemas and lightweight examples
- documentation scripts
- Jira integration
All documentation MUST be version-controlled and reviewed via GitHub.
2.3. AWS is the Heavy Artifact Layer
AWS is used for:
- large Excel libraries
- bulk source datasets
- future structured registries
- resolver-backed datasets
- large binary artifacts
AWS is NOT a replacement for documentation authoring.
It is the correct storage layer for artifacts that are:
- large
- numerous
- operational
- not human-authored documentation
2.4. Cloudflare is the Delivery Layer
Cloudflare is responsible for:
- hosting the built documentation site
- edge delivery
- caching and performance
2.5. Fly.io is the Search Runtime Layer
Fly.io hosts:
- search indexing runtime
- search APIs
- ingestion outputs
This may evolve later, but remains valid for v1.
3. Architectural Layers
3.1. Authoring Layer (GitHub)
Contains:
- MDX files
- snippets
- associated files
- schemas
- documentation scripts
- Jira integration
3.2. Presentation Layer (Docusaurus)
Contains:
- rendered documentation UI
- navigation
- custom components
- page composition
3.3. Delivery Layer (Cloudflare)
Contains:
- deployed documentation
- static assets
- edge distribution
3.4. Search Layer (Fly.io)
Contains:
- search index
- search API
- ingestion pipelines
3.5. Heavy Artifact Layer (AWS)
Contains:
- Excel source libraries
- large datasets
- future registries
- future resolver-backed services
4. Storage Decision Rules
4.1. Store in GitHub if:
- the asset is human-authored
- it is part of documentation
- it is reviewed like code
- it is tightly coupled to a page
- it is relatively small
Examples:
.mdx- snippets
- schemas
- code examples
- associated files
4.2. Store in AWS if:
- the asset is large
- it is a source library
- it is not documentation text
- it is not required at build/runtime for docs
- it will feed future services
Examples:
- Excel workbooks
- bulk datasets
- large registries
- generated exports
5. Excel File Policy
5.1. Current State
Excel files are currently:
- stored in
docusaurus/static/excel - referenced via URL in documentation
- not actively processed at runtime
Example:
{
"id": "compute_method_registry",
"description": "ZAYAZ compute methods registry (schemas, implementations, dependencies).",
"url": "/excel/compute_method_registry.xlsx"
}
5.2. Future State
Excel files SHALL be treated as external artifacts.
They SHOULD:
- be stored in AWS
- be accessed via stable URLs
- not reside inside the docs repository
5.3. Recommended URL Model
{
"id": "compute_method_registry",
"description": "ZAYAZ compute methods registry",
"url": "https://assets.zayaz.io/excel/compute_method_registry.xlsx"
}
5.4. Documentation Behavior
Docusaurus MUST:
- display metadata about Excel tables
- link to the source files
- NOT load or process Excel files directly
6. Excel Ingestion Architecture
The system contains an Excel ingestion pipeline:
ingestExcel()
This pipeline:
- scans
.xlsxfiles - extracts:
- workbook name
- sheet name
- column headers
- sample rows
- generates searchable documents
6.1. Architectural Interpretation
Excel ingestion belongs to:
Search / indexing layer
NOT to:
Docusaurus runtime
6.2. Future Direction
Excel ingestion SHOULD evolve to:
- read from AWS storage
- optionally use local cache
- feed search/index systems
Docusaurus remains unaffected by this change.
7. Associated Files System
The Associated Files system remains a core capability.
It SHOULD contain:
- snippets
- examples
- small schemas
- page-specific supporting files
It MUST NOT be used for:
- bulk Excel libraries
- large datasets
- archive-scale artifacts
8. Custom Docusaurus Capabilities
The following systems remain valid and SHOULD be preserved:
- Jira integration
- auto-generated developer instructions
- snippet engine
- custom visualization components
- doc tooling
Docs Architecture extends these capabilities, it does not replace them.
9. Non-Negotiable Invariants
- MDX documentation remains in GitHub
- Docusaurus remains the documentation frontend
- GitHub remains the authoring system
- AWS stores heavy artifacts
- Excel files are external artifacts
- Search ingestion is decoupled from docs rendering
- Documentation MUST remain lightweight and fast
10. Phase Plan
Phase 1 — Stabilization
- keep current structure
- remove unnecessary large artifacts from repo
- clarify storage boundaries
Phase 2 — Excel Migration
- move Excel libraries to AWS
- update URLs in docs
- keep metadata in GitHub
Phase 3 — Registry Externalization
- move large registries to AWS-backed systems
- introduce resolver APIs
- keep documentation summaries in MDX
11. Final Principle
GitHub is for authored documentation Docusaurus is for presenting documentation AWS is for heavy artifacts and future registries
This separation ensures:
- scalability
- maintainability
- performance
- long-term architectural clarity
APPENDIX A - File Placement Decision Tree
A.1. Purpose
This document defines the operational decision logic for determining where files MUST be stored within the ZAYAZ platform.
It is intended for:
- developers
- documentation authors
- data architects
- platform engineers
This document is normative and MUST be followed to ensure scalability and consistency.
A.2. Core Decision Rule
Every file MUST be evaluated using the following decision flow:
A.3. Decision Tree
Step 1 — Is this human-authored documentation?
- Written or maintained by humans
- Reviewed via GitHub PRs
- Intended to be read in the docs UI
YES → Store in GitHub (MDX / associated files)
NO → Continue
Step 2 — Is this tightly coupled to a specific documentation page?
- Used as a snippet, example, or schema
- Directly embedded or referenced in MDX
- Small and frequently edited
YES → Store in GitHub (associated files)
NO → Continue
Step 3 — Is this a large or growing dataset?
- Excel files
- bulk JSON
- registries
- generated exports
- datasets expected to scale
YES → Store in AWS
NO → Continue
Step 4 — Is this used by runtime systems (not just docs)?
- used by APIs
- used by search indexing
- used by computation engines
- used by GRPE / ZARA / Computation Hub
YES → Store in AWS (or runtime storage layer)
NO → Continue
Step 5 — Is this required at Docusaurus build time?
- needed to generate pages
- needed for static rendering
- lightweight enough for repo
YES → Store in GitHub
NO → Store in AWS
A.4. Decision Table (Quick Reference)
| File Type | Location | Reason |
|---|---|---|
| MDX documentation | GitHub | Authoring + version control |
| Snippets / examples | GitHub | Page-coupled |
| Small schemas | GitHub | Lightweight + reusable |
| Custom components | GitHub | Part of frontend |
| Excel workbooks | AWS | Large + not docs-native |
| Large JSON registries | AWS | Scales beyond repo |
| Generated datasets | AWS | Not human-authored |
| Search ingestion sources | AWS | Runtime layer |
| Resolver / policy data | AWS | System-critical |
| Static assets (small) | GitHub | Docs usage |
| Static assets (large) | AWS | Performance + scale |
A.5. Excel-Specific Decision Rules
A.5.1 Always store in AWS if:
- file is an Excel workbook (
.xlsx) - file represents a registry or table
- file exceeds trivial size
- file is part of a growing library
A.5.2 Only store in GitHub if ALL are true:
- file is very small
- file is used as a demo/example
- file is tightly coupled to a page
- file will not scale
A.5.3 Recommended Pattern
DO:
json id="z1f82p":
{
"id": "compute_method_registry",
"description": "ZAYAZ compute methods registry",
"url": "https://assets.zayaz.io/excel/compute_method_registry.xlsx"
}
DO NOT:
- store hundreds of Excel files in docusaurus/static/excel
- treat Excel as native docs content
A.6. Anti-Patterns (Must Avoid)
A.6.1 Repository Bloat
- storing large datasets in GitHub
- committing generated artifacts repeatedly
- embedding large files in docs repo
A.6.2 Docusaurus as Data Lake
- using /static as bulk storage
- treating docs as a file distribution system
- loading large datasets into frontend
A.6.3 Mixed Responsibilities
- combining documentation and runtime data
- coupling search ingestion to docs runtime
- forcing Docusaurus to process heavy data
A.7. Allowed Exceptions
Exceptions MAY be made only if:
- file is required for build-time rendering
- file is small and stable
- file improves developer experience significantly
All exceptions SHOULD be documented.
A.8. Integration with ZAYAZ Systems
This decision model aligns with:
- GRPE → policy-driven routing, not docs storage
- ZARA → computation and validation logic
- Computation Hub → heavy data processing
- Search Indexer → ingest external sources (e.g. Excel in AWS)
Documentation remains decoupled from runtime systems.
A.9. Governance Rule
Before adding any new file:
If this file grows 100×, will GitHub still be the right place?
- If NO → store in AWS
- If YES → store in GitHub
A.10. Final Principle
Documentation is for humans Data is for systems
ZAYAZ Docs MUST remain:
- fast
- readable
- maintainable
- scalable
A.11. One-Line Rule (for daily use)
If it explains → GitHub If it scales → AWS