ZADA

ZAYAZ Docs Architecture

1. Purpose

ZAYAZ Docs Architecture defines how documentation, supporting artifacts, search, and future registry-scale source files are organized across the platform.

The objective is to:

preserve the strengths of the current system
prevent documentation infrastructure from becoming a storage bottleneck
enable future scalability for ESG data, registries, and computation layers

This architecture is normative for all documentation and documentation-adjacent systems.

2. Core Principles

2.1. Docusaurus is the Documentation Frontend

Docusaurus is responsible for:

rendering MDX documents
composing documentation pages
rendering custom components
integrating snippets and metadata
providing navigation and documentation UX

Docusaurus MUST NOT be used as a storage layer for:

large Excel libraries
large structured registries
operational datasets
backend policy state

2.2. GitHub is the Authoring Layer

GitHub is the authoritative source for:

MDX documentation
documentation code
custom Docusaurus components
snippets and associated files
schemas and lightweight examples
documentation scripts
Jira integration

All documentation MUST be version-controlled and reviewed via GitHub.

2.3. AWS is the Heavy Artifact Layer

AWS is used for:

large Excel libraries
bulk source datasets
future structured registries
resolver-backed datasets
large binary artifacts

AWS is NOT a replacement for documentation authoring.

It is the correct storage layer for artifacts that are:

large
numerous
operational
not human-authored documentation

2.4. Cloudflare is the Delivery Layer

Cloudflare is responsible for:

hosting the built documentation site
edge delivery
caching and performance

2.5. Fly.io is the Search Runtime Layer

Fly.io hosts:

search indexing runtime
search APIs
ingestion outputs

This may evolve later, but remains valid for v1.

3. Architectural Layers

3.1. Authoring Layer (GitHub)

Contains:

MDX files
snippets
associated files
schemas
documentation scripts
Jira integration

3.2. Presentation Layer (Docusaurus)

Contains:

rendered documentation UI
navigation
custom components
page composition

3.3. Delivery Layer (Cloudflare)

Contains:

deployed documentation
static assets
edge distribution

3.4. Search Layer (Fly.io)

Contains:

search index
search API
ingestion pipelines

3.5. Heavy Artifact Layer (AWS)

Contains:

Excel source libraries
large datasets
future registries
future resolver-backed services

4. Storage Decision Rules

4.1. Store in GitHub if:

the asset is human-authored
it is part of documentation
it is reviewed like code
it is tightly coupled to a page
it is relatively small

Examples:

.mdx
snippets
schemas
code examples
associated files

4.2. Store in AWS if:

the asset is large
it is a source library
it is not documentation text
it is not required at build/runtime for docs
it will feed future services

Examples:

Excel workbooks
bulk datasets
large registries
generated exports

5. Excel File Policy

5.1. Current State

Excel files are currently:

stored in docusaurus/static/excel
referenced via URL in documentation
not actively processed at runtime

Example:

excel-linking.jsonGitHub ↗
{
  "id": "compute_method_registry",
  "description": "ZAYAZ compute methods registry (schemas, implementations, dependencies).",
  "url": "/excel/compute_method_registry.xlsx"
}

5.2. Future State

Excel files SHALL be treated as external artifacts.

They SHOULD:

be stored in AWS
be accessed via stable URLs
not reside inside the docs repository

5.3. Recommended URL Model

url-model.jsonGitHub ↗
{
  "id": "compute_method_registry",
  "description": "ZAYAZ compute methods registry",
  "url": "https://assets.zayaz.io/excel/compute_method_registry.xlsx"
}

5.4. Documentation Behavior

Docusaurus MUST:

display metadata about Excel tables
link to the source files
NOT load or process Excel files directly

6. Excel Ingestion Architecture

The system contains an Excel ingestion pipeline:

ingestExcel()

This pipeline:

scans .xlsx files
extracts:
workbook name
sheet name
column headers
sample rows
generates searchable documents

6.1. Architectural Interpretation

Excel ingestion belongs to:

Search / indexing layer

NOT to:

Docusaurus runtime

6.2. Future Direction

Excel ingestion SHOULD evolve to:

read from AWS storage
optionally use local cache
feed search/index systems

Docusaurus remains unaffected by this change.

7. Associated Files System

The Associated Files system remains a core capability.

It SHOULD contain:

snippets
examples
small schemas
page-specific supporting files

It MUST NOT be used for:

bulk Excel libraries
large datasets
archive-scale artifacts

8. Custom Docusaurus Capabilities

The following systems remain valid and SHOULD be preserved:

Jira integration
auto-generated developer instructions
snippet engine
custom visualization components
doc tooling

Docs Architecture extends these capabilities, it does not replace them.

9. Non-Negotiable Invariants

MDX documentation remains in GitHub
Docusaurus remains the documentation frontend
GitHub remains the authoring system
AWS stores heavy artifacts
Excel files are external artifacts
Search ingestion is decoupled from docs rendering
Documentation MUST remain lightweight and fast

10. Phase Plan

Phase 1 — Stabilization

keep current structure
remove unnecessary large artifacts from repo
clarify storage boundaries

Phase 2 — Excel Migration

move Excel libraries to AWS
update URLs in docs
keep metadata in GitHub

Phase 3 — Registry Externalization

move large registries to AWS-backed systems
introduce resolver APIs
keep documentation summaries in MDX

11. Final Principle

GitHub is for authored documentation Docusaurus is for presenting documentation AWS is for heavy artifacts and future registries

This separation ensures:

scalability
maintainability
performance
long-term architectural clarity

APPENDIX A - File Placement Decision Tree

A.1. Purpose

This document defines the operational decision logic for determining where files MUST be stored within the ZAYAZ platform.

It is intended for:

developers
documentation authors
data architects
platform engineers

This document is normative and MUST be followed to ensure scalability and consistency.

A.2. Core Decision Rule

Every file MUST be evaluated using the following decision flow:

A.3. Decision Tree

Step 1 — Is this human-authored documentation?

Written or maintained by humans
Reviewed via GitHub PRs
Intended to be read in the docs UI

YES → Store in GitHub (MDX / associated files)
NO → Continue

Step 2 — Is this tightly coupled to a specific documentation page?

Used as a snippet, example, or schema
Directly embedded or referenced in MDX
Small and frequently edited

YES → Store in GitHub (associated files)
NO → Continue

Step 3 — Is this a large or growing dataset?

Excel files
bulk JSON
registries
generated exports
datasets expected to scale

YES → Store in AWS
NO → Continue

Step 4 — Is this used by runtime systems (not just docs)?

used by APIs
used by search indexing
used by computation engines
used by GRPE / ZARA / Computation Hub

YES → Store in AWS (or runtime storage layer)
NO → Continue

Step 5 — Is this required at Docusaurus build time?

needed to generate pages
needed for static rendering
lightweight enough for repo

YES → Store in GitHub
NO → Store in AWS

A.4. Decision Table (Quick Reference)

File Type	Location	Reason
MDX documentation	GitHub	Authoring + version control
Snippets / examples	GitHub	Page-coupled
Small schemas	GitHub	Lightweight + reusable
Custom components	GitHub	Part of frontend
Excel workbooks	AWS	Large + not docs-native
Large JSON registries	AWS	Scales beyond repo
Generated datasets	AWS	Not human-authored
Search ingestion sources	AWS	Runtime layer
Resolver / policy data	AWS	System-critical
Static assets (small)	GitHub	Docs usage
Static assets (large)	AWS	Performance + scale

A.5. Excel-Specific Decision Rules

A.5.1 Always store in AWS if:

file is an Excel workbook (.xlsx)
file represents a registry or table
file exceeds trivial size
file is part of a growing library

A.5.2 Only store in GitHub if ALL are true:

file is very small
file is used as a demo/example
file is tightly coupled to a page
file will not scale

A.5.3 Recommended Pattern

DO:

json id="z1f82p":

z1f82p.jsonGitHub ↗
{
  "id": "compute_method_registry",
  "description": "ZAYAZ compute methods registry",
  "url": "https://assets.zayaz.io/excel/compute_method_registry.xlsx"
}

DO NOT:

store hundreds of Excel files in docusaurus/static/excel
treat Excel as native docs content

A.6. Anti-Patterns (Must Avoid)

A.6.1 Repository Bloat

storing large datasets in GitHub
committing generated artifacts repeatedly
embedding large files in docs repo

A.6.2 Docusaurus as Data Lake

using /static as bulk storage
treating docs as a file distribution system
loading large datasets into frontend

A.6.3 Mixed Responsibilities

combining documentation and runtime data
coupling search ingestion to docs runtime
forcing Docusaurus to process heavy data

A.7. Allowed Exceptions

Exceptions MAY be made only if:

file is required for build-time rendering
file is small and stable
file improves developer experience significantly

All exceptions SHOULD be documented.

A.8. Integration with ZAYAZ Systems

This decision model aligns with:

GRPE → policy-driven routing, not docs storage
ZARA → computation and validation logic
Computation Hub → heavy data processing
Search Indexer → ingest external sources (e.g. Excel in AWS)

Documentation remains decoupled from runtime systems.

A.9. Governance Rule

Before adding any new file:

If this file grows 100×, will GitHub still be the right place?

If NO → store in AWS
If YES → store in GitHub

A.10. Final Principle

Documentation is for humans Data is for systems

ZAYAZ Docs MUST remain:

fast
readable
maintainable
scalable

A.11. One-Line Rule (for daily use)

If it explains → GitHub If it scales → AWS

GitHub Repo Request for Change (RFC)

1. Purpose​

2. Core Principles​

2.1. Docusaurus is the Documentation Frontend​

2.2. GitHub is the Authoring Layer​

2.3. AWS is the Heavy Artifact Layer​

2.4. Cloudflare is the Delivery Layer​

2.5. Fly.io is the Search Runtime Layer​

3. Architectural Layers​

3.1. Authoring Layer (GitHub)​

3.2. Presentation Layer (Docusaurus)​

3.3. Delivery Layer (Cloudflare)​

3.4. Search Layer (Fly.io)​

3.5. Heavy Artifact Layer (AWS)​

4. Storage Decision Rules​

4.1. Store in GitHub if:​

4.2. Store in AWS if:​

5. Excel File Policy​

5.1. Current State​

5.2. Future State​

5.3. Recommended URL Model​

5.4. Documentation Behavior​

6. Excel Ingestion Architecture​

6.1. Architectural Interpretation​

6.2. Future Direction​

7. Associated Files System​

8. Custom Docusaurus Capabilities​

9. Non-Negotiable Invariants​

10. Phase Plan​

11. Final Principle​

APPENDIX A - File Placement Decision Tree​

A.1. Purpose​

A.2. Core Decision Rule​

A.3. Decision Tree​

Step 1 — Is this human-authored documentation?​

Step 2 — Is this tightly coupled to a specific documentation page?​

Step 3 — Is this a large or growing dataset?​

Step 4 — Is this used by runtime systems (not just docs)?​

Step 5 — Is this required at Docusaurus build time?​

A.4. Decision Table (Quick Reference)​

A.5. Excel-Specific Decision Rules​

A.5.1 Always store in AWS if:​

A.5.2 Only store in GitHub if ALL are true:​

A.5.3 Recommended Pattern​

A.6. Anti-Patterns (Must Avoid)​

A.6.1 Repository Bloat​

A.6.2 Docusaurus as Data Lake​

A.6.3 Mixed Responsibilities​

A.7. Allowed Exceptions​

A.8. Integration with ZAYAZ Systems​

A.9. Governance Rule​

A.10. Final Principle​

A.11. One-Line Rule (for daily use)​

1. Purpose

2. Core Principles

2.1. Docusaurus is the Documentation Frontend

2.2. GitHub is the Authoring Layer

2.3. AWS is the Heavy Artifact Layer

2.4. Cloudflare is the Delivery Layer

2.5. Fly.io is the Search Runtime Layer

3. Architectural Layers

3.1. Authoring Layer (GitHub)

3.2. Presentation Layer (Docusaurus)

3.3. Delivery Layer (Cloudflare)

3.4. Search Layer (Fly.io)

3.5. Heavy Artifact Layer (AWS)

4. Storage Decision Rules

4.1. Store in GitHub if:

4.2. Store in AWS if:

5. Excel File Policy

5.1. Current State

5.2. Future State

5.3. Recommended URL Model

5.4. Documentation Behavior

6. Excel Ingestion Architecture

6.1. Architectural Interpretation

6.2. Future Direction

7. Associated Files System

8. Custom Docusaurus Capabilities

9. Non-Negotiable Invariants

10. Phase Plan

11. Final Principle

APPENDIX A - File Placement Decision Tree

A.1. Purpose

A.2. Core Decision Rule

A.3. Decision Tree

Step 1 — Is this human-authored documentation?

Step 2 — Is this tightly coupled to a specific documentation page?

Step 3 — Is this a large or growing dataset?

Step 4 — Is this used by runtime systems (not just docs)?

Step 5 — Is this required at Docusaurus build time?

A.4. Decision Table (Quick Reference)

A.5. Excel-Specific Decision Rules

A.5.1 Always store in AWS if:

A.5.2 Only store in GitHub if ALL are true:

A.5.3 Recommended Pattern

A.6. Anti-Patterns (Must Avoid)

A.6.1 Repository Bloat

A.6.2 Docusaurus as Data Lake

A.6.3 Mixed Responsibilities

A.7. Allowed Exceptions

A.8. Integration with ZAYAZ Systems

A.9. Governance Rule

A.10. Final Principle

A.11. One-Line Rule (for daily use)