Skip to main content
Jira progress: loading…

ZAYAZ – Documentation & Registry Architecture

Unified Documentation, Schema Intelligence and MCP Integration

This repository powers the ZAYAZ documentation ecosystem, combining:

  • Human-readable documentation (Docusaurus)
  • Machine-readable registries (engines/modules, tables, signals)
  • Private structured assets (per-table Excel files)
  • Searchable MCP knowledge graph (for ChatGPT and internal AI tools)

The system is designed for:

  • Zero duplication
  • One source of truth
  • Automatic schema generation
  • Multi-audience builds
  • AI-safe, profile-based access

This document explains how everything fits together.


1. Three Sources of Truth

ZAYAZ documentation is driven by three synchronized inputs.

1.1 Engine/Module Registry (structural source of truth)

File:

  • config/system/registry.yaml

Compiled to:

  • docusaurus/config/system/registry.json

This defines:

  • Engines, Modules, Micro Engines
  • Their lifecycle metadata
  • Their associated tables
  • Owner/category information
  • Links to technical components

Example entry in registry.yaml:

- id: mice_fume
type: micro_engine
name: "FUME – Fuel Use Micro Engine"
code: MICE_FUME
owner: computation-hub
lifecycle:
status: ga
tables:
- name: zyz_mice_fume_input
role: input
excel:
data_file_id: zyz_mice_fume_input
- name: zyz_mice_fume_result
role: output
excel:
data_file_id: zyz_mice_fume_result

registry.json is used by Docusaurus and MCP as the structural backbone.


1.2 Signal Registry (schema and column metadata source of truth)

Source Excel:

  • excel/SignalRegistry.xlsx

Compiled to:

  • docusaurus/config/system/signal_registry.json

Each row describes one column (signal) in one table, with fields like:

  • signal_name
  • signal_type
  • signal_description
  • source_table

Example row in the compiled JSON:

{
"signal_name": "fuel_type",
"signal_type": "string",
"signal_description": "Normalized fuel category",
"source_table": "zyz_mice_fume_input"
}

This registry is the single source of truth for table schemas and column descriptions.


1.3 Per-Table Excel Workbooks (row/data source of truth)

Folder:

  • excel/

Each Excel file represents one database table, for example:

  • excel/zyz_mice_fume_input.xlsx
  • excel/zyz_mice_fume_result.xlsx
  • excel/altd_event.xlsx

They are listed in:

  • config/system/excel_files.yaml

Compiled to:

  • docusaurus/config/system/excel_files.json

Example entries in excel_files.yaml:

    - { id: zyz_mice_fume_input,  description: "FUME input table",  url: "/excel/zyz_mice_fume_input.xlsx" }
- { id: zyz_mice_fume_result, description: "FUME result table", url: "/excel/zyz_mice_fume_result.xlsx" }

These files contain the actual rows (potentially thousands) and are copied into the Docusaurus static folder and served as downloads, protected by Cloudflare Access.


2. How These Sources Combine in Docusaurus

Docusaurus consumes the three sources as follows:

  • registry.json
    • Defines which tables belong to which engine/module.
  • signal_registry.json
    • Defines which columns belong to which table, with types and descriptions.
  • excel_files.json
    • Defines where to download or view the full Excel tables.
  • NodeMeta component
    • Renders engine/module metadata and their related tables.
  • TableSignals component
    • Renders the schema (columns) for a given table.
  • /excel/*.xlsx
    • Full downloadable data tables, not rendered inline.

No table schema is manually written into MDX files. All schema information flows from the Signal Registry and Engine/Module registry.


3. MDX Rendering Architecture

MDX technical spec pages are intentionally thin. They contain:

  • Narrative text and conceptual explanations
  • A <NodeMeta id="..."/> component
  • Optional <TableSignals tableName="..."/> components for inline schemas
  • Optional audience blocks

Example MDX header and body:

    ---
id: mice_fume
title: MICE_FUME — Fuel Use Micro Engine
sidebar_label: MICE_FUME
doc_type: spec
version: 1.4.2
---

import NodeMeta from '@site/src/components/NodeMeta';
import TableSignals from '@site/src/components/TableSignals';

# MICE_FUME — Fuel Use Micro Engine

## Engine metadata and tables

<NodeMeta id="mice_fume" />

## Input schema (inline view)

<TableSignals tableName="zyz_mice_fume_input" />

## Output schema (inline view)

<TableSignals tableName="zyz_mice_fume_result" />

## Business logic overview

... narrative text here ...

3.1 <NodeMeta id="..."/>

NodeMeta:

  • Looks up the engine/module in registry.json using id.
  • Renders:
    • ID, code, name, type, lifecycle metadata.
    • Tables associated with the engine.
  • For each table:
    • Calls <TableSignals tableName={t.name} /> to show its column schema.
    • Resolves excel.data_file_id against excel_files.json to show a "Download / view full table" link.

The placement of <NodeMeta id="..."/> in the MDX determines where the engine tables appear.

3.2 <TableSignals tableName="..."/>

TableSignals:

  • Reads signal_registry.json.
  • Filters entries where source_table === tableName.
  • Renders a simple schema table (Column, Type, Description) inline in the MDX page.

You can use it:

  • Inside or outside NodeMeta.
  • Inline, wherever a schema explanation is needed.
  • Inside audience blocks (for internal-only schema).

4. Multi-Audience Documentation

The system supports different documentation audiences:

  • internal
  • developer
  • client

Audience-specific content is wrapped in an AudienceBlock MDX component:

<AudienceBlock audience={['internal']}>
## Internal Notes
Highly sensitive implementation details...
</AudienceBlock>
<AudienceBlock audience={['developer', 'internal']}>
## Developer Details
API shapes, edge cases, error handling...
</AudienceBlock>

At build time:

  • An environment variable DOCS_AUDIENCE selects which audience is built (client, developer, or internal).
  • The Docusaurus config exposes this as customFields.audience.
  • AudienceBlock uses this to decide which content to keep.
  • Non-matching audience blocks are removed from the final HTML, not just hidden.

Cloudflare Access enforces external vs internal access to the built sites.


5. MCP Integration

The MCP layer integrates two types of information:

  1. Documentation manifests (what pages exist, and which profiles can see them).
  2. Structured registries (what engines, tables and signals exist).

5.1 Documentation manifests

Each Docusaurus build produces:

  • manifest.json
  • manifest.json.sig
  • manifest.json.sha256

These manifest files describe:

  • Page URIs
  • Titles
  • Mapping to profiles (client, internal_full, etc.)
  • Hashes and signatures for integrity checking

MCP uses these manifests to:

  • Know which pages exist for each audience.
  • Restrict search and retrieval based on the active profile.

Example profile mapping:

  • Internal API key:
    • allowedProfiles = ["internal_full", "client"]
  • Client API key:
    • allowedProfiles = ["client"]

When MCP receives a query, it:

  • Identifies the API key and allowed profiles.
  • Limits documents to those profiles.
  • Then uses those pages as RAG context.

5.2 Structured registries

MCP also reads the structured JSON registries:

  • docusaurus/config/system/registry.json
  • docusaurus/config/system/signal_registry.json
  • docusaurus/config/system/excel_files.json

These allow structured queries such as:

  • "Which engine uses table zyz_mice_fume_input?"
  • "Which table contains column trust_score_at_event?"
  • "List all columns in altd_event."
  • "Where is the Excel file for table zyz_mice_fume_result?"

The MCP server provides HTTP endpoints such as:

  • GET /signals
    • Filter by name=, table=, q=
  • GET /tables
    • Filter by name=, q=, returns engines + schema + Excel link
  • GET /engines
    • Filter by id=, code=, q=, returns engine + tables + schema + Excel links

This effectively turns the registries into a small knowledge graph for use by AI tools.


6. Build Pipeline Overview

The build pipeline (CI) executes the following steps:

  1. Convert engine/module registry:
    • config/system/registry.yamldocusaurus/config/system/registry.json
  2. Convert Excel file mapping:
    • config/system/excel_files.yamldocusaurus/config/system/excel_files.json
  3. Convert Signal Registry Excel:
    • excel/SignalRegistry.xlsxdocusaurus/config/system/signal_registry.json
  4. Copy per-table Excel files:
    • excel/*.xlsxdocusaurus/static/excel/
  5. Build Docusaurus for each audience:
    • DOCS_AUDIENCE=client → client portal
    • DOCS_AUDIENCE=internal → internal portal
    • DOCS_AUDIENCE=developer (optional) → developer portal
  6. Generate and sign manifests:
    • manifest.json
    • manifest.json.sig
    • manifest.json.sha256
  7. Deploy builds to Cloudflare Pages:
    • Protected by Cloudflare Access policies.
  8. Generate architecture diagrams from JSON:
    • diagrams/zayaz-sssr-json-sequences.jsoncontent/architecture/zayaz-sssr-json-registry-diagrams.mdx
    • Uses scripts/generate-diagram-docs.mjs to convert a machine-readable sequence spec into Mermaid diagrams embedded in MDX.

The MCP server has a /refresh endpoint that reloads registries after a deployment.


7. Repository Structure

A simplified view of the relevant parts:

zayaz-docs/

├── config/
│ └── system/
│ ├── registry.yaml
│ ├── excel_files.yaml
│ └── signal_registry.json (generated)

├── excel/
│ ├── zyz_mice_fume_input.xlsx
│ ├── zyz_mice_fume_result.xlsx
│ ├── altd_event.xlsx
│ └── SignalRegistry.xlsx

├── scripts/
│ ├── generate-registry-json.js
│ ├── generate-excel-files-json.js
│ └── generate-signal-registry-json.js

├── docusaurus/
│ ├── src/
│ │ └── components/
│ │ ├── NodeMeta.js
│ │ └── TableSignals.js
│ ├── config/
│ │ └── system/
│ │ ├── registry.json
│ │ ├── excel_files.json
│ │ └── signal_registry.json
│ └── static/
│ └── excel/

└── mcp-server/
├── src/server.ts
├── package.json
└── tsconfig.json

8. Why This Architecture Works Well

  • No manual duplication of schemas in MDX.
  • Single source of truth:
    • Engines/modules and their tables → registry.
    • Column schemas → Signal Registry.
    • Data rows → per-table Excel workbooks.
  • Multi-audience builds with real removal of non-relevant content.
  • MCP-friendly: structured registries serve as a knowledge graph for AI.
  • Easy to audit: signed manifests and explicit registries.
  • Extensible: adding a new engine or table is a matter of updating the registries and Excel, not rewriting documentation by hand.

This makes the documentation ecosystem robust, maintainable and well suited for a complex ESG intelligence platform like EcoWorld ZAYAZ.

9. What docusaurus/static/schemas/ is meant for...

It is for JSON Schema files — meaning:

Machine-readable, formal schemas used by:

  • engines
  • modules
  • API contracts
  • config validation
  • signal validation
  • data ingestion/formats

Think of these as:

✔ Schemas with $schema, properties, required, type, etc. ✔ Validation specifications (AJV, Zod import, etc.) ✔ Files developers can download and use directly in code.

Examples that belong here:

/schemas/mice-engine.schema.json /schemas/zyz-signal.schema.json /schemas/pef-run-request.schema.json /schemas/carbon-passport.schema.json

These are not mere tables — they are formal programmatic structures.

Use /schemas when:

  • You need a formal JSON Schema
  • The structure is too complex for an Excel table
  • Machines must read it
  • Developers must validate against it
  • Engines or modules consume it

Visual breakdown

If it describes a data shape for code → /docusaurus/static/schemas

  • JSON Schema draft-07+
  • OpenAPI fragments
  • Validation schemas
  • API request/response schemas
  • Machine-friendly format definitions

If it documents relationships or metadata → /config/system

  • lineage graphs
  • table relationships
  • module registry
  • signal registry
  • metadata files

A rule of thumb...

If AJV can validate it → /schemas. If docs render it → /config/system.

Example files

/docusaurus/static/schemas/...

  • countries.json
  • efdb.json
  • nace.json
  • units.json

Here’s why they were created — and what would be inside them depending on which direction choosen.

When we built the first version of:

  • TableSignals
  • TableRelations
  • GraphExplorer
  • Schema link resolver

…we discovered something:

👉 The docs refer to source files for reference dimensions

Like:

TableSource File
dim_countriescountries.json
dim_unitsunits.json
ref_efdbefdb.json
dim_nacenace.json

We didn’t yet have JSON schemas for these dimensions, because in practice our master source is Excel.

To avoid broken links in:

/schemas/{file}

we created placeholder schemas so the docs would resolve links.

This prevented Docusaurus from rendering:

❌ broken /schemas/xxx.json links
❌ build-time errors
❌ warnings
❌ 404 pages

…while keeping the UI and metadata consistent.

1️⃣ Minimal Content

A Schema that simply describes:

  • the file exists
  • the expected fields (but not enumerations)

⭐ Ideal for: quickly documenting formats
⭐ Very lightweight
⭐ Uses Excel as the real source of truth

Example: countries.json minimal schema

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Country Reference",
"type": "object",
"properties": {
"iso_cc_id": { "type": "string" },
"country_name": { "type": "string" },
"region": { "type": "string" }
},
"required": ["iso_cc_id", "country_name"]
}

2️⃣ Full “enumerated dataset schema”

This version includes actual data lists, not just the structure.

⚠️ Only do this if the data is stable and not gigantic. For example, a few hundred countries is fine.

Example: units.json advanced schema

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Units of Measure",
"type": "array",
"items": {
"type": "object",
"properties": {
"unit_code": { "type": "string" },
"label": { "type": "string" },
"description": { "type": "string" }
},
"required": ["unit_code", "label"]
}
}

3️⃣ Machine-friendly “dictionary schema”

This approach uses the JSON Schema dictionary pattern:

{
"type": "object",
"patternProperties": {
"^[A-Z0-9_]+$": {
"type": "object",
"properties": {
"label": { "type": "string" },
"description": { "type": "string" }
}
}
}
}

This is especially good for:

  • EFDB emission factor IDs
  • NACE codes
  • Anything with a “code → metadata” mapping

✔ What each file should ideally represent

countries.json

A schema defining:

  • ISO numeric code
  • Name
  • Region / subregion
  • Possibly EU / OECD flags

efdb.json

A schema defining:

  • emission factor ID
  • category
  • gas or pollutant
  • default unit
  • metadata fields

nace.json

A schema specifying:

  • NACE code
  • Activity description
  • Level (1–4-digit)

units.json

A schema describing:

  • unit_code
  • label
  • conversion rules (optional)

You have 3 choices:

Next Step

Fully define them and auto-generate from Excel

✔ Best long-term outcome ✔ Requires defining “canonical fields” ✔ Fit perfectly into ZAYAZ architecture ✔ Enables validation and autocomplete for developers

10. SSSR JSON Registry & JSON Schemas

Beyond table-based signals and Excel-driven registries, ZAYAZ also depends on a growing set of structured JSON artifacts managed by the Smart Searchable Signal Registry (SSSR). These JSON artifacts represent:

  • Complex rulesets (e.g. MICE engine configurations, validation matrices)
  • XBRL/iXBRL taxonomy fragments
  • Agent prompt templates (ZAAM / ZADIF)
  • Policy and configuration blocks that are too complex for flat Excel
  • Semantic mappings and ontology-driven configurations

10.1 Role of /docusaurus/static/schemas for SSSR JSON

The docs repo does not host the live SSSR JSON registry (that lives in the platform), but it does host the JSON Schemas that define the shape of those artifacts.

For SSSR-related JSON, the convention is:

  • Formal JSON Schemas live in:

    • /docusaurus/static/schemas/sssr-*.schema.json
    • Example: /docusaurus/static/schemas/sssr-json-registry.schema.json
  • These schemas describe things like:

    • How a “JSON artifact registry entry” must look
      (e.g. json_id, category, version, status, checksum_sha256, uso_tags, etc.)
    • How a “JSON prompt template” or “engine ruleset” is structured
    • Which fields are required for governance (e.g. status, source_reference, security_class)

Rule of thumb (aligned with section 9):

  • If AJV/validators can check it → it belongs in /docusaurus/static/schemas
  • If it’s metadata for docs/registries → it belongs in /config/system or the Excel-based registries

This keeps JSON machine-validated and aligned with the live SSSR registry in the platform.
[oai_citation:0‡README_Architecture.md] (sediment://file_00000000acfc71f58c40c27a0b314309)

10.2 How to add a new SSSR JSON type (developer workflow)

When we introduce a new SSSR-backed JSON artifact type (e.g. a new micro-engine config class or agent template family), the docs repo should be updated as follows:

  1. Define the JSON Schema

    • Create a new schema under /docusaurus/static/schemas, for example:

      • /docusaurus/static/schemas/sssr-json-artifact.schema.json
      • /docusaurus/static/schemas/sssr-agent-template.schema.json
    • Follow the minimal pattern:

      {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "title": "SSSR JSON Artifact",
      "type": "object",
      "properties": {
      "json_id": { "type": "string" },
      "category": { "type": "string" },
      "version": { "type": "string" },
      "status": { "type": "string" },
      "data": { "type": "object" },
      "uso_tags": { "type": "array", "items": { "type": "string" } }
      },
      "required": ["json_id", "category", "version", "status", "data"]
      }
    • Keep the schema as generic as possible at the “registry” level, and define more specific schemas for specialized artifact types (e.g. a specific engine config).

  2. Document how it is used

    • Add or update an MDX page (usually under content/system/ or content/architecture/) that explains:

      • What this JSON artifact type represents
      • Where in SSSR / the platform it is used
      • Links to the relevant schema under /schemas/...
    • Example in MDX:

      See `/schemas/sssr-json-artifact.schema.json` for the formal machine-readable definition.
  3. Wire into registry / MCP if needed

    • If MCP needs to query or validate these JSON artifacts directly:

      • Ensure the MCP server knows about the schema file path
      • Optionally expose an endpoint like GET /json-artifacts/schema?id=... for tooling
  4. Keep SSSR & docs in sync

    • When the live SSSR registry introduces a breaking change:

      • Bump the JSON Schema version in /docusaurus/static/schemas
      • Add a short change note in the relevant MDX page (what changed, why, and how to migrate)

This ensures the platform registry (SSSR) and the documentation registry (schemas + MDX) stay aligned, traceable, and safe for AI/automation consumers.

11. Mermaid & Auto-Generated Sequence Diagrams

ZAYAZ documentation also includes architecture and sequence diagrams that are generated automatically from a machine-readable JSON specification.

11.1 Source of Truth for Diagrams

The source for these diagrams is:

  • diagrams/zayaz-sssr-json-sequences.json

This file contains a list of diagrams, each with:

  • id, name, category
  • participants (id, label, type)
  • steps (ordered message flow: from, to, message, kind, sync)

Example minimal structure:

{
"version": "1.0",
"diagrams": [
{
"id": "json-registry-read",
"name": "JSON Registry Read Flow (Engine Consumption)",
"category": "registry_read",
"participants": [
{ "id": "user", "label": "User", "type": "external" },
{ "id": "frontend", "label": "Frontend UI", "type": "system" }
],
"steps": [
{
"index": 1,
"from": "user",
"to": "frontend",
"message": "Request CSRD form generation",
"kind": "request",
"sync": true
}
]
}
]
}

Developers only modify this JSON file – never the generated MDX.

11.2 Generator Script → MDX with Mermaid

The conversion from JSON → rendered diagrams is handled by:

  • scripts/generate-diagram-docs.mjs

This script:

  1. Reads diagrams/zayaz-sssr-json-sequences.json
  2. Creates Mermaid sequenceDiagram definitions for each diagram
  3. Writes a single MDX page:
  • content/architecture/zayaz-sssr-json-registry-diagrams.mdx

The generated MDX contains fenced Mermaid code blocks like:

Docusaurus is configured with:

  • themes: ['@docusaurus/theme-mermaid']
  • markdown.mermaid: true

in docusaurus/docusaurus.config.js, allowing Mermaid to render the diagrams directly.

11.3 CI Integration (GitHub Actions)

A dedicated GitHub Action keeps the diagrams in sync:

  • .github/workflows/generate-diagrams.yml

It:

  1. Runs on pushes (and optionally PRs)

  2. Executes:

    node scripts/generate-diagram-docs.mjs
  3. Writes/updates content/architecture/zayaz-sssr-json-registry-diagrams.mdx

  4. Optionally commits the changed MDX back to the repo (so Cloudflare sees the updated docs)

This guarantees that:

  • Editing one JSON file (diagrams/zayaz-sssr-json-sequences.json)
  • Automatically updates the corresponding documentation diagrams
  • Keeps the architecture docs executable and version-controlled

11.4 How to add or update a diagram

To add a new sequence diagram:

  1. Edit diagrams/zayaz-sssr-json-sequences.json:
    • Add a new diagrams[] entry with participants and steps.
    • Keep id stable and unique.
  2. Run the generator locally:
node scripts/generate-diagram-docs.mjs
  1. Open:
  • content/architecture/zayaz-sssr-json-registry-diagrams.mdx and confirm your new diagram has appeared as a Mermaid block.
  1. Commit both:
  • diagrams/zayaz-sssr-json-sequences.json
  • content/architecture/zayaz-sssr-json-registry-diagrams.mdx
  1. Push to GitHub and let Cloudflare build/deploy.

This pattern keeps architecture diagrams as code – fully aligned with the rest of the ZAYAZ registry and documentation strategy.

12 GTP Search - Minimal Node + TypeScript Indexer

Environment variables summary

These can be overridden per environment (CI, local, Cloudflare worker, etc.):

  • ZAYAZ_DOCS_ROOT Root of the monorepo. Default: four levels up from zayaz-search-indexer/src.
  • ZAYAZ_CONTENT_DIR Where MDX content lives. Default: ${ZAYAZ_DOCS_ROOT}/content
  • ZAYAZ_CONFIG_DIR Where config/system lives. Default: ${ZAYAZ_DOCS_ROOT}/config
  • ZAYAZ_EXCEL_DIR Where Excel source files live. Default: ${ZAYAZ_DOCS_ROOT}/excel
  • ZAYAZ_INDEX_OUT Output JSONL path. Default: ${ZAYAZ_DOCS_ROOT}/generated/search-index.jsonl

13 Config: add searchApiBase to Docusaurus

In docusaurus/docusaurus.config.js, we have added a custom field:

// docusaurus/docusaurus.config.js

/** @type {import('@docusaurus/types').Config} */
const config = {
// ...existing config...

customFields: {
// keep whatever you already have here
audience: process.env.DOCS_AUDIENCE || 'internal',

// 👇 this:
searchApiBase:
process.env.ZAYAZ_SEARCH_API_BASE ||
'http://127.0.0.1:8787', // dev default; override in prod
},

Later, for Cloudflare / prod, we set ZAYAZ_SEARCH_API_BASE to the public API URL (search.zayaz.io).

Search See /workspaces/zayaz-docs/code/infrastructure/zayaz-search-indexer for all indxing files e.g. /zayaz-search-indexer/src/ingest-mdx.ts

Rebuild the index

cd /workspaces/zayaz-docs/code/infrastructure/zayaz-search-indexer
npm run build
npm run index

You should see something like:

[zayaz-indexer] Writing JSONL to: /workspaces/zayaz-docs/code/infrastructure/zayaz-search-indexer/data/index.jsonl

Then redeploy / restart the search API / worker

cd /workspaces/zayaz-docs/code/infrastructure/zayaz-search-api
npm start
GitHub RepoRequest for Change (RFC)