Skip to main content

ZADA

ZAYAZ Docs Architecture

1. Purpose

ZAYAZ Docs Architecture defines how documentation, supporting artifacts, search, and future registry-scale source files are organized across the platform.

The objective is to:

  • preserve the strengths of the current system
  • prevent documentation infrastructure from becoming a storage bottleneck
  • enable future scalability for ESG data, registries, and computation layers

This architecture is normative for all documentation and documentation-adjacent systems.


2. Core Principles

2.1. Docusaurus is the Documentation Frontend

Docusaurus is responsible for:

  • rendering MDX documents
  • composing documentation pages
  • rendering custom components
  • integrating snippets and metadata
  • providing navigation and documentation UX

Docusaurus MUST NOT be used as a storage layer for:

  • large Excel libraries
  • large structured registries
  • operational datasets
  • backend policy state

2.2. GitHub is the Authoring Layer

GitHub is the authoritative source for:

  • MDX documentation
  • documentation code
  • custom Docusaurus components
  • snippets and associated files
  • schemas and lightweight examples
  • documentation scripts
  • Jira integration

All documentation MUST be version-controlled and reviewed via GitHub.


2.3. AWS is the Heavy Artifact Layer

AWS is used for:

  • large Excel libraries
  • bulk source datasets
  • future structured registries
  • resolver-backed datasets
  • large binary artifacts

AWS is NOT a replacement for documentation authoring.

It is the correct storage layer for artifacts that are:

  • large
  • numerous
  • operational
  • not human-authored documentation

2.4. Cloudflare is the Delivery Layer

Cloudflare is responsible for:

  • hosting the built documentation site
  • edge delivery
  • caching and performance

2.5. Fly.io is the Search Runtime Layer

Fly.io hosts:

  • search indexing runtime
  • search APIs
  • ingestion outputs

This may evolve later, but remains valid for v1.


3. Architectural Layers

3.1. Authoring Layer (GitHub)

Contains:

  • MDX files
  • snippets
  • associated files
  • schemas
  • documentation scripts
  • Jira integration

3.2. Presentation Layer (Docusaurus)

Contains:

  • rendered documentation UI
  • navigation
  • custom components
  • page composition

3.3. Delivery Layer (Cloudflare)

Contains:

  • deployed documentation
  • static assets
  • edge distribution

3.4. Search Layer (Fly.io)

Contains:

  • search index
  • search API
  • ingestion pipelines

3.5. Heavy Artifact Layer (AWS)

Contains:

  • Excel source libraries
  • large datasets
  • future registries
  • future resolver-backed services

4. Storage Decision Rules

4.1. Store in GitHub if:

  • the asset is human-authored
  • it is part of documentation
  • it is reviewed like code
  • it is tightly coupled to a page
  • it is relatively small

Examples:

  • .mdx
  • snippets
  • schemas
  • code examples
  • associated files

4.2. Store in AWS if:

  • the asset is large
  • it is a source library
  • it is not documentation text
  • it is not required at build/runtime for docs
  • it will feed future services

Examples:

  • Excel workbooks
  • bulk datasets
  • large registries
  • generated exports

5. Excel File Policy

5.1. Current State

Excel files are currently:

  • stored in docusaurus/static/excel
  • referenced via URL in documentation
  • not actively processed at runtime

Example:

excel-linking.jsonGitHub ↗
{
"id": "compute_method_registry",
"description": "ZAYAZ compute methods registry (schemas, implementations, dependencies).",
"url": "/excel/compute_method_registry.xlsx"
}

5.2. Future State

Excel files SHALL be treated as external artifacts.

They SHOULD:

  • be stored in AWS
  • be accessed via stable URLs
  • not reside inside the docs repository

url-model.jsonGitHub ↗
{
"id": "compute_method_registry",
"description": "ZAYAZ compute methods registry",
"url": "https://assets.zayaz.io/excel/compute_method_registry.xlsx"
}

5.4. Documentation Behavior

Docusaurus MUST:

  • display metadata about Excel tables
  • link to the source files
  • NOT load or process Excel files directly

6. Excel Ingestion Architecture

The system contains an Excel ingestion pipeline:

ingestExcel()

This pipeline:

  • scans .xlsx files
  • extracts:
  • workbook name
  • sheet name
  • column headers
  • sample rows
  • generates searchable documents

6.1. Architectural Interpretation

Excel ingestion belongs to:

Search / indexing layer

NOT to:

Docusaurus runtime


6.2. Future Direction

Excel ingestion SHOULD evolve to:

  • read from AWS storage
  • optionally use local cache
  • feed search/index systems

Docusaurus remains unaffected by this change.


7. Associated Files System

The Associated Files system remains a core capability.

It SHOULD contain:

  • snippets
  • examples
  • small schemas
  • page-specific supporting files

It MUST NOT be used for:

  • bulk Excel libraries
  • large datasets
  • archive-scale artifacts

8. Custom Docusaurus Capabilities

The following systems remain valid and SHOULD be preserved:

  • Jira integration
  • auto-generated developer instructions
  • snippet engine
  • custom visualization components
  • doc tooling

Docs Architecture extends these capabilities, it does not replace them.


9. Non-Negotiable Invariants

  • MDX documentation remains in GitHub
  • Docusaurus remains the documentation frontend
  • GitHub remains the authoring system
  • AWS stores heavy artifacts
  • Excel files are external artifacts
  • Search ingestion is decoupled from docs rendering
  • Documentation MUST remain lightweight and fast

10. Phase Plan

Phase 1 — Stabilization

  • keep current structure
  • remove unnecessary large artifacts from repo
  • clarify storage boundaries

Phase 2 — Excel Migration

  • move Excel libraries to AWS
  • update URLs in docs
  • keep metadata in GitHub

Phase 3 — Registry Externalization

  • move large registries to AWS-backed systems
  • introduce resolver APIs
  • keep documentation summaries in MDX

11. Final Principle

GitHub is for authored documentation Docusaurus is for presenting documentation AWS is for heavy artifacts and future registries

This separation ensures:

  • scalability
  • maintainability
  • performance
  • long-term architectural clarity

APPENDIX A - File Placement Decision Tree

A.1. Purpose

This document defines the operational decision logic for determining where files MUST be stored within the ZAYAZ platform.

It is intended for:

  • developers
  • documentation authors
  • data architects
  • platform engineers

This document is normative and MUST be followed to ensure scalability and consistency.


A.2. Core Decision Rule

Every file MUST be evaluated using the following decision flow:


A.3. Decision Tree

Step 1 — Is this human-authored documentation?

  • Written or maintained by humans
  • Reviewed via GitHub PRs
  • Intended to be read in the docs UI

YES → Store in GitHub (MDX / associated files)
NO → Continue


Step 2 — Is this tightly coupled to a specific documentation page?

  • Used as a snippet, example, or schema
  • Directly embedded or referenced in MDX
  • Small and frequently edited

YES → Store in GitHub (associated files)
NO → Continue


Step 3 — Is this a large or growing dataset?

  • Excel files
  • bulk JSON
  • registries
  • generated exports
  • datasets expected to scale

YES → Store in AWS
NO → Continue


Step 4 — Is this used by runtime systems (not just docs)?

  • used by APIs
  • used by search indexing
  • used by computation engines
  • used by GRPE / ZARA / Computation Hub

YES → Store in AWS (or runtime storage layer)
NO → Continue


Step 5 — Is this required at Docusaurus build time?

  • needed to generate pages
  • needed for static rendering
  • lightweight enough for repo

YES → Store in GitHub
NO → Store in AWS


A.4. Decision Table (Quick Reference)

File TypeLocationReason
MDX documentationGitHubAuthoring + version control
Snippets / examplesGitHubPage-coupled
Small schemasGitHubLightweight + reusable
Custom componentsGitHubPart of frontend
Excel workbooksAWSLarge + not docs-native
Large JSON registriesAWSScales beyond repo
Generated datasetsAWSNot human-authored
Search ingestion sourcesAWSRuntime layer
Resolver / policy dataAWSSystem-critical
Static assets (small)GitHubDocs usage
Static assets (large)AWSPerformance + scale

A.5. Excel-Specific Decision Rules

A.5.1 Always store in AWS if:

  • file is an Excel workbook (.xlsx)
  • file represents a registry or table
  • file exceeds trivial size
  • file is part of a growing library

A.5.2 Only store in GitHub if ALL are true:

  • file is very small
  • file is used as a demo/example
  • file is tightly coupled to a page
  • file will not scale

DO:

json id="z1f82p":

z1f82p.jsonGitHub ↗
{
"id": "compute_method_registry",
"description": "ZAYAZ compute methods registry",
"url": "https://assets.zayaz.io/excel/compute_method_registry.xlsx"
}

DO NOT:

  • store hundreds of Excel files in docusaurus/static/excel
  • treat Excel as native docs content

A.6. Anti-Patterns (Must Avoid)

A.6.1 Repository Bloat

  • storing large datasets in GitHub
  • committing generated artifacts repeatedly
  • embedding large files in docs repo

A.6.2 Docusaurus as Data Lake

  • using /static as bulk storage
  • treating docs as a file distribution system
  • loading large datasets into frontend

A.6.3 Mixed Responsibilities

  • combining documentation and runtime data
  • coupling search ingestion to docs runtime
  • forcing Docusaurus to process heavy data

A.7. Allowed Exceptions

Exceptions MAY be made only if:

  • file is required for build-time rendering
  • file is small and stable
  • file improves developer experience significantly

All exceptions SHOULD be documented.


A.8. Integration with ZAYAZ Systems

This decision model aligns with:

  • GRPE → policy-driven routing, not docs storage
  • ZARA → computation and validation logic
  • Computation Hub → heavy data processing
  • Search Indexer → ingest external sources (e.g. Excel in AWS)

Documentation remains decoupled from runtime systems.


A.9. Governance Rule

Before adding any new file:

If this file grows 100×, will GitHub still be the right place?

  • If NO → store in AWS
  • If YES → store in GitHub

A.10. Final Principle

Documentation is for humans Data is for systems

ZAYAZ Docs MUST remain:

  • fast
  • readable
  • maintainable
  • scalable

A.11. One-Line Rule (for daily use)

If it explains → GitHub If it scales → AWS




GitHub RepoRequest for Change (RFC)