ZAYAZ – Documentation & Registry Architecture
Unified Documentation, Schema Intelligence and MCP Integration
This repository powers the ZAYAZ documentation ecosystem, combining:
- Human-readable documentation (Docusaurus)
- Machine-readable registries (engines/modules, tables, signals)
- Private structured assets (per-table Excel files)
- Searchable MCP knowledge graph (for ChatGPT and internal AI tools)
The system is designed for:
- Zero duplication
- One source of truth
- Automatic schema generation
- Multi-audience builds
- AI-safe, profile-based access
This document explains how everything fits together.
1. Three Sources of Truth
ZAYAZ documentation is driven by three synchronized inputs.
1.1 Engine/Module Registry (structural source of truth)
File:
config/system/registry.yaml
Compiled to:
docusaurus/config/system/registry.json
This defines:
- Engines, Modules, Micro Engines
- Their lifecycle metadata
- Their associated tables
- Owner/category information
- Links to technical components
Example entry in registry.yaml:
- id: mice_fume
type: micro_engine
name: "FUME – Fuel Use Micro Engine"
code: MICE_FUME
owner: computation-hub
lifecycle:
status: ga
tables:
- name: zyz_mice_fume_input
role: input
excel:
data_file_id: zyz_mice_fume_input
- name: zyz_mice_fume_result
role: output
excel:
data_file_id: zyz_mice_fume_result
registry.json is used by Docusaurus and MCP as the structural backbone.
1.2 Signal Registry (schema and column metadata source of truth)
Source Excel:
excel/SignalRegistry.xlsx
Compiled to:
docusaurus/config/system/signal_registry.json
Each row describes one column (signal) in one table, with fields like:
signal_namesignal_typesignal_descriptionsource_table
Example row in the compiled JSON:
{
"signal_name": "fuel_type",
"signal_type": "string",
"signal_description": "Normalized fuel category",
"source_table": "zyz_mice_fume_input"
}
This registry is the single source of truth for table schemas and column descriptions.
1.3 Per-Table Excel Workbooks (row/data source of truth)
Folder:
excel/
Each Excel file represents one database table, for example:
excel/zyz_mice_fume_input.xlsxexcel/zyz_mice_fume_result.xlsxexcel/altd_event.xlsx
They are listed in:
config/system/excel_files.yaml
Compiled to:
docusaurus/config/system/excel_files.json
Example entries in excel_files.yaml:
- { id: zyz_mice_fume_input, description: "FUME input table", url: "/excel/zyz_mice_fume_input.xlsx" }
- { id: zyz_mice_fume_result, description: "FUME result table", url: "/excel/zyz_mice_fume_result.xlsx" }
These files contain the actual rows (potentially thousands) and are copied into the Docusaurus static folder and served as downloads, protected by Cloudflare Access.
2. How These Sources Combine in Docusaurus
Docusaurus consumes the three sources as follows:
registry.json- Defines which tables belong to which engine/module.
signal_registry.json- Defines which columns belong to which table, with types and descriptions.
excel_files.json- Defines where to download or view the full Excel tables.
NodeMetacomponent- Renders engine/module metadata and their related tables.
TableSignalscomponent- Renders the schema (columns) for a given table.
/excel/*.xlsx- Full downloadable data tables, not rendered inline.
No table schema is manually written into MDX files. All schema information flows from the Signal Registry and Engine/Module registry.
3. MDX Rendering Architecture
MDX technical spec pages are intentionally thin. They contain:
- Narrative text and conceptual explanations
- A
<NodeMeta id="..."/>component - Optional
<TableSignals tableName="..."/>components for inline schemas - Optional audience blocks
Example MDX header and body:
---
id: mice_fume
title: MICE_FUME — Fuel Use Micro Engine
sidebar_label: MICE_FUME
doc_type: spec
version: 1.4.2
---
import NodeMeta from '@site/src/components/NodeMeta';
import TableSignals from '@site/src/components/TableSignals';
# MICE_FUME — Fuel Use Micro Engine
## Engine metadata and tables
<NodeMeta id="mice_fume" />
## Input schema (inline view)
<TableSignals tableName="zyz_mice_fume_input" />
## Output schema (inline view)
<TableSignals tableName="zyz_mice_fume_result" />
## Business logic overview
... narrative text here ...
3.1 <NodeMeta id="..."/>
NodeMeta:
- Looks up the engine/module in
registry.jsonusingid. - Renders:
- ID, code, name, type, lifecycle metadata.
- Tables associated with the engine.
- For each table:
- Calls
<TableSignals tableName={t.name} />to show its column schema. - Resolves
excel.data_file_idagainstexcel_files.jsonto show a "Download / view full table" link.
- Calls
The placement of <NodeMeta id="..."/> in the MDX determines where the engine tables appear.
3.2 <TableSignals tableName="..."/>
TableSignals:
- Reads
signal_registry.json. - Filters entries where
source_table === tableName. - Renders a simple schema table (Column, Type, Description) inline in the MDX page.
You can use it:
- Inside or outside
NodeMeta. - Inline, wherever a schema explanation is needed.
- Inside audience blocks (for internal-only schema).
4. Multi-Audience Documentation
The system supports different documentation audiences:
internaldeveloperclient
Audience-specific content is wrapped in an AudienceBlock MDX component:
<AudienceBlock audience={['internal']}>
## Internal Notes
Highly sensitive implementation details...
</AudienceBlock>
<AudienceBlock audience={['developer', 'internal']}>
## Developer Details
API shapes, edge cases, error handling...
</AudienceBlock>
At build time:
- An environment variable
DOCS_AUDIENCEselects which audience is built (client,developer, orinternal). - The Docusaurus config exposes this as
customFields.audience. AudienceBlockuses this to decide which content to keep.- Non-matching audience blocks are removed from the final HTML, not just hidden.
Cloudflare Access enforces external vs internal access to the built sites.
5. MCP Integration
The MCP layer integrates two types of information:
- Documentation manifests (what pages exist, and which profiles can see them).
- Structured registries (what engines, tables and signals exist).
5.1 Documentation manifests
Each Docusaurus build produces:
manifest.jsonmanifest.json.sigmanifest.json.sha256
These manifest files describe:
- Page URIs
- Titles
- Mapping to profiles (
client,internal_full, etc.) - Hashes and signatures for integrity checking
MCP uses these manifests to:
- Know which pages exist for each audience.
- Restrict search and retrieval based on the active profile.
Example profile mapping:
- Internal API key:
allowedProfiles = ["internal_full", "client"]
- Client API key:
allowedProfiles = ["client"]
When MCP receives a query, it:
- Identifies the API key and allowed profiles.
- Limits documents to those profiles.
- Then uses those pages as RAG context.
5.2 Structured registries
MCP also reads the structured JSON registries:
docusaurus/config/system/registry.jsondocusaurus/config/system/signal_registry.jsondocusaurus/config/system/excel_files.json
These allow structured queries such as:
- "Which engine uses table
zyz_mice_fume_input?" - "Which table contains column
trust_score_at_event?" - "List all columns in
altd_event." - "Where is the Excel file for table
zyz_mice_fume_result?"
The MCP server provides HTTP endpoints such as:
GET /signals- Filter by
name=,table=,q=
- Filter by
GET /tables- Filter by
name=,q=, returns engines + schema + Excel link
- Filter by
GET /engines- Filter by
id=,code=,q=, returns engine + tables + schema + Excel links
- Filter by
This effectively turns the registries into a small knowledge graph for use by AI tools.
6. Build Pipeline Overview
The build pipeline (CI) executes the following steps:
- Convert engine/module registry:
config/system/registry.yaml→docusaurus/config/system/registry.json
- Convert Excel file mapping:
config/system/excel_files.yaml→docusaurus/config/system/excel_files.json
- Convert Signal Registry Excel:
excel/SignalRegistry.xlsx→docusaurus/config/system/signal_registry.json
- Copy per-table Excel files:
excel/*.xlsx→docusaurus/static/excel/
- Build Docusaurus for each audience:
DOCS_AUDIENCE=client→ client portalDOCS_AUDIENCE=internal→ internal portalDOCS_AUDIENCE=developer(optional) → developer portal
- Generate and sign manifests:
manifest.jsonmanifest.json.sigmanifest.json.sha256
- Deploy builds to Cloudflare Pages:
- Protected by Cloudflare Access policies.
- Generate architecture diagrams from JSON:
diagrams/zayaz-sssr-json-sequences.json→content/architecture/zayaz-sssr-json-registry-diagrams.mdx- Uses
scripts/generate-diagram-docs.mjsto convert a machine-readable sequence spec into Mermaid diagrams embedded in MDX.
The MCP server has a /refresh endpoint that reloads registries after a deployment.
7. Repository Structure
A simplified view of the relevant parts:
zayaz-docs/
│
├── config/
│ └── system/
│ ├── registry.yaml
│ ├── excel_files.yaml
│ └── signal_registry.json (generated)
│
├── excel/
│ ├── zyz_mice_fume_input.xlsx
│ ├── zyz_mice_fume_result.xlsx
│ ├── altd_event.xlsx
│ └── SignalRegistry.xlsx
│
├── scripts/
│ ├── generate-registry-json.js
│ ├── generate-excel-files-json.js
│ └── generate-signal-registry-json.js
│
├── docusaurus/
│ ├── src/
│ │ └── components/
│ │ ├── NodeMeta.js
│ │ └── TableSignals.js
│ ├── config/
│ │ └── system/
│ │ ├── registry.json
│ │ ├── excel_files.json
│ │ └── signal_registry.json
│ └── static/
│ └── excel/
│
└── mcp-server/
├── src/server.ts
├── package.json
└── tsconfig.json
8. Why This Architecture Works Well
- No manual duplication of schemas in MDX.
- Single source of truth:
- Engines/modules and their tables →
registry. - Column schemas → Signal Registry.
- Data rows → per-table Excel workbooks.
- Engines/modules and their tables →
- Multi-audience builds with real removal of non-relevant content.
- MCP-friendly: structured registries serve as a knowledge graph for AI.
- Easy to audit: signed manifests and explicit registries.
- Extensible: adding a new engine or table is a matter of updating the registries and Excel, not rewriting documentation by hand.
This makes the documentation ecosystem robust, maintainable and well suited for a complex ESG intelligence platform like EcoWorld ZAYAZ.
9. What docusaurus/static/schemas/ is meant for...
It is for JSON Schema files — meaning:
Machine-readable, formal schemas used by:
- engines
- modules
- API contracts
- config validation
- signal validation
- data ingestion/formats
Think of these as:
✔ Schemas with $schema, properties, required, type, etc. ✔ Validation specifications (AJV, Zod import, etc.) ✔ Files developers can download and use directly in code.
Examples that belong here:
/schemas/mice-engine.schema.json
/schemas/zyz-signal.schema.json
/schemas/pef-run-request.schema.json
/schemas/carbon-passport.schema.json
These are not mere tables — they are formal programmatic structures.
Use /schemas when:
- You need a formal JSON Schema
- The structure is too complex for an Excel table
- Machines must read it
- Developers must validate against it
- Engines or modules consume it
⸻
Visual breakdown
If it describes a data shape for code → /docusaurus/static/schemas
- JSON Schema draft-07+
- OpenAPI fragments
- Validation schemas
- API request/response schemas
- Machine-friendly format definitions
If it documents relationships or metadata → /config/system
- lineage graphs
- table relationships
- module registry
- signal registry
- metadata files
⸻
A rule of thumb...
If AJV can validate it → /schemas.
If docs render it → /config/system.
⸻
Example files
/docusaurus/static/schemas/...
countries.jsonefdb.jsonnace.jsonunits.json
Here’s why they were created — and what would be inside them depending on which direction choosen.
When we built the first version of:
- TableSignals
- TableRelations
- GraphExplorer
- Schema link resolver
…we discovered something:
👉 The docs refer to source files for reference dimensions
Like:
| Table | Source File |
|---|---|
| dim_countries | countries.json |
| dim_units | units.json |
| ref_efdb | efdb.json |
| dim_nace | nace.json |
We didn’t yet have JSON schemas for these dimensions, because in practice our master source is Excel.
To avoid broken links in:
/schemas/{file}
we created placeholder schemas so the docs would resolve links.
This prevented Docusaurus from rendering:
❌ broken /schemas/xxx.json links
❌ build-time errors
❌ warnings
❌ 404 pages
…while keeping the UI and metadata consistent.
⸻
1️⃣ Minimal Content
A Schema that simply describes:
- the file exists
- the expected fields (but not enumerations)
⭐ Ideal for: quickly documenting formats
⭐ Very lightweight
⭐ Uses Excel as the real source of truth
Example: countries.json minimal schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Country Reference",
"type": "object",
"properties": {
"iso_cc_id": { "type": "string" },
"country_name": { "type": "string" },
"region": { "type": "string" }
},
"required": ["iso_cc_id", "country_name"]
}
⸻
2️⃣ Full “enumerated dataset schema”
This version includes actual data lists, not just the structure.
⚠️ Only do this if the data is stable and not gigantic. For example, a few hundred countries is fine.
Example: units.json advanced schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Units of Measure",
"type": "array",
"items": {
"type": "object",
"properties": {
"unit_code": { "type": "string" },
"label": { "type": "string" },
"description": { "type": "string" }
},
"required": ["unit_code", "label"]
}
}
⸻
3️⃣ Machine-friendly “dictionary schema”
This approach uses the JSON Schema dictionary pattern:
{
"type": "object",
"patternProperties": {
"^[A-Z0-9_]+$": {
"type": "object",
"properties": {
"label": { "type": "string" },
"description": { "type": "string" }
}
}
}
}
This is especially good for:
- EFDB emission factor IDs
- NACE codes
- Anything with a “code → metadata” mapping
⸻
✔ What each file should ideally represent
⸻
countries.json
A schema defining:
- ISO numeric code
- Name
- Region / subregion
- Possibly EU / OECD flags
⸻
efdb.json
A schema defining:
- emission factor ID
- category
- gas or pollutant
- default unit
- metadata fields
⸻
nace.json
A schema specifying:
- NACE code
- Activity description
- Level (1–4-digit)
⸻
units.json
A schema describing:
- unit_code
- label
- conversion rules (optional)
⸻
You have 3 choices:
Next Step
Fully define them and auto-generate from Excel
✔ Best long-term outcome ✔ Requires defining “canonical fields” ✔ Fit perfectly into ZAYAZ architecture ✔ Enables validation and autocomplete for developers
⸻
10. SSSR JSON Registry & JSON Schemas
Beyond table-based signals and Excel-driven registries, ZAYAZ also depends on a growing set of structured JSON artifacts managed by the Smart Searchable Signal Registry (SSSR). These JSON artifacts represent:
- Complex rulesets (e.g. MICE engine configurations, validation matrices)
- XBRL/iXBRL taxonomy fragments
- Agent prompt templates (ZAAM / ZADIF)
- Policy and configuration blocks that are too complex for flat Excel
- Semantic mappings and ontology-driven configurations
10.1 Role of /docusaurus/static/schemas for SSSR JSON
The docs repo does not host the live SSSR JSON registry (that lives in the platform), but it does host the JSON Schemas that define the shape of those artifacts.
For SSSR-related JSON, the convention is:
-
Formal JSON Schemas live in:
/docusaurus/static/schemas/sssr-*.schema.json- Example:
/docusaurus/static/schemas/sssr-json-registry.schema.json
-
These schemas describe things like:
- How a “JSON artifact registry entry” must look
(e.g.json_id,category,version,status,checksum_sha256,uso_tags, etc.) - How a “JSON prompt template” or “engine ruleset” is structured
- Which fields are required for governance (e.g.
status,source_reference,security_class)
- How a “JSON artifact registry entry” must look
Rule of thumb (aligned with section 9):
- If AJV/validators can check it → it belongs in
/docusaurus/static/schemas - If it’s metadata for docs/registries → it belongs in
/config/systemor the Excel-based registries
This keeps JSON machine-validated and aligned with the live SSSR registry in the platform.
[oai_citation:0‡README_Architecture.md]
(sediment://file_00000000acfc71f58c40c27a0b314309)
10.2 How to add a new SSSR JSON type (developer workflow)
When we introduce a new SSSR-backed JSON artifact type (e.g. a new micro-engine config class or agent template family), the docs repo should be updated as follows:
-
Define the JSON Schema
-
Create a new schema under
/docusaurus/static/schemas, for example:/docusaurus/static/schemas/sssr-json-artifact.schema.json/docusaurus/static/schemas/sssr-agent-template.schema.json
-
Follow the minimal pattern:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "SSSR JSON Artifact",
"type": "object",
"properties": {
"json_id": { "type": "string" },
"category": { "type": "string" },
"version": { "type": "string" },
"status": { "type": "string" },
"data": { "type": "object" },
"uso_tags": { "type": "array", "items": { "type": "string" } }
},
"required": ["json_id", "category", "version", "status", "data"]
} -
Keep the schema as generic as possible at the “registry” level, and define more specific schemas for specialized artifact types (e.g. a specific engine config).
-
-
Document how it is used
-
Add or update an MDX page (usually under
content/system/orcontent/architecture/) that explains:- What this JSON artifact type represents
- Where in SSSR / the platform it is used
- Links to the relevant schema under
/schemas/...
-
Example in MDX:
See `/schemas/sssr-json-artifact.schema.json` for the formal machine-readable definition.
-
-
Wire into registry / MCP if needed
-
If MCP needs to query or validate these JSON artifacts directly:
- Ensure the MCP server knows about the schema file path
- Optionally expose an endpoint like
GET /json-artifacts/schema?id=...for tooling
-
-
Keep SSSR & docs in sync
-
When the live SSSR registry introduces a breaking change:
- Bump the JSON Schema version in
/docusaurus/static/schemas - Add a short change note in the relevant MDX page (what changed, why, and how to migrate)
- Bump the JSON Schema version in
-
This ensures the platform registry (SSSR) and the documentation registry (schemas + MDX) stay aligned, traceable, and safe for AI/automation consumers.
⸻
11. Mermaid & Auto-Generated Sequence Diagrams
ZAYAZ documentation also includes architecture and sequence diagrams that are generated automatically from a machine-readable JSON specification.
11.1 Source of Truth for Diagrams
The source for these diagrams is:
diagrams/zayaz-sssr-json-sequences.json
This file contains a list of diagrams, each with:
id,name,categoryparticipants(id, label, type)steps(ordered message flow:from,to,message,kind,sync)
Example minimal structure:
{
"version": "1.0",
"diagrams": [
{
"id": "json-registry-read",
"name": "JSON Registry Read Flow (Engine Consumption)",
"category": "registry_read",
"participants": [
{ "id": "user", "label": "User", "type": "external" },
{ "id": "frontend", "label": "Frontend UI", "type": "system" }
],
"steps": [
{
"index": 1,
"from": "user",
"to": "frontend",
"message": "Request CSRD form generation",
"kind": "request",
"sync": true
}
]
}
]
}
Developers only modify this JSON file – never the generated MDX.
11.2 Generator Script → MDX with Mermaid
The conversion from JSON → rendered diagrams is handled by:
- scripts/generate-diagram-docs.mjs
This script:
- Reads diagrams/zayaz-sssr-json-sequences.json
- Creates Mermaid sequenceDiagram definitions for each diagram
- Writes a single MDX page:
content/architecture/zayaz-sssr-json-registry-diagrams.mdx
The generated MDX contains fenced Mermaid code blocks like:
Docusaurus is configured with:
themes: ['@docusaurus/theme-mermaid']markdown.mermaid: true
in docusaurus/docusaurus.config.js, allowing Mermaid to render the diagrams directly.
11.3 CI Integration (GitHub Actions)
A dedicated GitHub Action keeps the diagrams in sync:
.github/workflows/generate-diagrams.yml
It:
-
Runs on pushes (and optionally PRs)
-
Executes:
node scripts/generate-diagram-docs.mjs -
Writes/updates content/architecture/zayaz-sssr-json-registry-diagrams.mdx
-
Optionally commits the changed MDX back to the repo (so Cloudflare sees the updated docs)
This guarantees that:
- Editing one JSON file (diagrams/zayaz-sssr-json-sequences.json)
- Automatically updates the corresponding documentation diagrams
- Keeps the architecture docs executable and version-controlled
11.4 How to add or update a diagram
To add a new sequence diagram:
- Edit diagrams/zayaz-sssr-json-sequences.json:
- Add a new diagrams[] entry with participants and steps.
- Keep id stable and unique.
- Run the generator locally:
node scripts/generate-diagram-docs.mjs
- Open:
- content/architecture/zayaz-sssr-json-registry-diagrams.mdx and confirm your new diagram has appeared as a Mermaid block.
- Commit both:
- diagrams/zayaz-sssr-json-sequences.json
- content/architecture/zayaz-sssr-json-registry-diagrams.mdx
- Push to GitHub and let Cloudflare build/deploy.
This pattern keeps architecture diagrams as code – fully aligned with the rest of the ZAYAZ registry and documentation strategy.
⸻
12 GTP Search - Minimal Node + TypeScript Indexer
Environment variables summary
These can be overridden per environment (CI, local, Cloudflare worker, etc.):
- ZAYAZ_DOCS_ROOT Root of the monorepo. Default: four levels up from zayaz-search-indexer/src.
- ZAYAZ_CONTENT_DIR
Where MDX content lives.
Default:
${ZAYAZ_DOCS_ROOT}/content - ZAYAZ_CONFIG_DIR
Where config/system lives.
Default:
${ZAYAZ_DOCS_ROOT}/config - ZAYAZ_EXCEL_DIR
Where Excel source files live.
Default:
${ZAYAZ_DOCS_ROOT}/excel - ZAYAZ_INDEX_OUT
Output JSONL path.
Default:
${ZAYAZ_DOCS_ROOT}/generated/search-index.jsonl
⸻
13 Config: add searchApiBase to Docusaurus
In docusaurus/docusaurus.config.js, we have added a custom field:
// docusaurus/docusaurus.config.js
/** @type {import('@docusaurus/types').Config} */
const config = {
// ...existing config...
customFields: {
// keep whatever you already have here
audience: process.env.DOCS_AUDIENCE || 'internal',
// 👇 this:
searchApiBase:
process.env.ZAYAZ_SEARCH_API_BASE ||
'http://127.0.0.1:8787', // dev default; override in prod
},
Later, for Cloudflare / prod, we set ZAYAZ_SEARCH_API_BASE to the public API URL (search.zayaz.io).
Search
See /workspaces/zayaz-docs/code/infrastructure/zayaz-search-indexer for all indxing files e.g. /zayaz-search-indexer/src/ingest-mdx.ts
Rebuild the index
cd /workspaces/zayaz-docs/code/infrastructure/zayaz-search-indexer
npm run build
npm run index
You should see something like:
[zayaz-indexer] Writing JSONL to: /workspaces/zayaz-docs/code/infrastructure/zayaz-search-indexer/data/index.jsonl
Then redeploy / restart the search API / worker
cd /workspaces/zayaz-docs/code/infrastructure/zayaz-search-api
npm start