Skip to main content
Jira progress: loading…

ASK-ZARA-INDEX

Ask ZARA Unified Index Record Schema + AWS Table Adapter Rules

1. Purpose

This specification defines the unified indexing contract for Ask ZARA across documentation, code, Excel workbooks, JSON files, AWS-hosted production databases, and structured registry tables.

Ask ZARA must be able to answer questions across a highly complex ZAYAZ architecture, including:

  • Docusaurus MDX documentation
  • developer specs
  • code files
  • Excel reference tables
  • JSON registry exports
  • AWS-hosted ZAR registry tables
  • ESGX, opcode, route binding, grant, institution, manifest, and ruleset metadata
  • future domain databases and production-grade reference stores

The goal is to create a scalable ingestion architecture where many source-specific exporters produce records in one common format, allowing Ask ZARA to ingest and retrieve across all sources coherently.


2. Core principle

Ask ZARA should be built around:

many curated source adapters
→ one normalized record schema
→ one unified retrieval/indexing system

The system should not rely on one monolithic index file, nor should it blindly dump raw tables into search. Each source adapter is responsible for turning source data into safe, meaningful, human-readable knowledge records.


3. Source export strategy

Use one export file per major knowledge domain or database family, not one file per table and not one global file for everything.

Recommended examples:

docs_index.jsonl
code_index.jsonl
excel_index.jsonl
json_config_index.jsonl
zar_registry_index.jsonl
esgx_registry_index.jsonl
computation_hub_index.jsonl
reports_hub_index.jsonl
input_hub_index.jsonl
mice_index.jsonl

3.2 Rationale

This allows each source family to have its own:

  • safety filters
  • export schedule
  • freshness policy
  • ownership model
  • transformation logic
  • visibility rules
  • truth ranking
  • testing pipeline

while still producing records that Ask ZARA can ingest into one unified retrieval fabric.


4. Unified record schema

Every indexed item should be emitted as one JSONL record using the same normalized schema.

4.1 Canonical record shape

canonical-record-shape.tsGitHub ↗
export type AskZaraIndexRecord = {
id: string;
slug: string;
title: string;
heading?: string | null;
content: string;
sourceType: string;

metadata: {
kind: string;
source_system: string;
source_domain?: string | null;
source_database?: string | null;
source_schema?: string | null;
source_table?: string | null;
source_file?: string | null;

entity_type?: string | null;
entity_id?: string | null;
entity_name?: string | null;

status?: string | null;
lifecycle_status?: string | null;
version?: string | null;
updated_at?: string | null;
created_at?: string | null;

visibility_tier: 'public' | 'internal' | 'admin' | 'restricted' | 'tenant_scoped' | 'institution_scoped';
truth_rank: number;
source_priority?: number | null;
freshness_rank?: number | null;

owner_team?: string | null;
steward?: string | null;
approved_by?: string | null;
approved_at?: string | null;

tags?: string[];
related_entities?: Array<{
relation: string;
entity_type: string;
entity_id: string;
}>;

[key: string]: unknown;
};
};

4.2 Required top-level fields

FieldRequiredDescription
idYesGlobally stable record identifier.
slugYesInternal URL/path or virtual path for the record.
titleYesHuman-readable title.
contentYesSearchable body text.
sourceTypeYesBroad source class, such as mdx, code, excel, json, aws_table.
metadataYesStructured metadata for filtering, ranking, provenance, and governance.

Use controlled values where possible.

mdx
code
excel
json
aws_table
aws_view
aws_registry
structured_entity
generated_summary
runtime_summary

Guidance

  • mdx = Docusaurus/manual/spec pages
  • code = source code files
  • excel = workbook/sheet-derived records
  • json = static JSON config/schema/registry exports
  • aws_table = records derived directly from AWS-hosted production tables
  • aws_view = records derived from curated SQL views
  • aws_registry = canonical registry records from tables such as ZAR
  • structured_entity = synthesized records representing an entity assembled from multiple tables
  • generated_summary = AI- or deterministic-generated summary records
  • runtime_summary = summarized operational data, not raw logs

6. Metadata standards

6.1 Core provenance metadata

All records should include:

core-provenance-metadata.jsonGitHub ↗
{
"kind": "opcode",
"source_system": "aws",
"source_domain": "zar_registry",
"source_database": "zar",
"source_schema": "zar",
"source_table": "opcode_registry",
"entity_type": "opcode",
"entity_id": "OP_1410"
}

6.2 Visibility metadata

Use visibility_tier to prepare for future access-aware retrieval.

Recommended values:

ValueMeaning
publicSafe for public or external-facing answers.
internalInternal ZAYAZ/Viroway knowledge.
adminAdmin or privileged operational knowledge.
restrictedSensitive but generally non-tenant-specific.
tenant_scopedShould only be retrieved within a tenant/client context.
institution_scopedShould only be retrieved within a bank/institution context.

6.3 Truth rank

truth_rank should express source authority from 0 to 100.

Suggested guidance:

Source typeSuggested truth rank
Active canonical registry rows95–100
Approved manifest records90–98
Approved route/opcode/ruleset bindings90–98
Curated SQL views85–95
Current Docusaurus specs75–90
Static JSON config70–90
Excel reference sheets60–85
Code comments/source files50–80
Historical docs/drafts30–70
Runtime summaries50–90 depending on freshness and curation

Ask ZARA should usually prefer higher-truth records when sources conflict.


7. Content construction rules

7.1 Human-readable body

Every structured table record must include a human-readable content field. Do not rely on metadata alone.

Good example:

Opcode OP_1410: Pull From External System.
Operation key: pull_from_external_system.
Category: integration.
Intent class: stateful_execution.
Description: Pulls data from an approved external source through governed routing.
Status: active. Lifecycle: active.

Bad example:

bad-example-human-readable-body.jsonGitHub ↗
{"opcode":"OP_1410","operation_key":"pull_from_external_system"}

7.2 Content should include synonyms and domain language

Where useful, include alternate terms that users may search for.

Example for ESGX:

ESGX is also known as ESG Exchange Code, external governed access code, and partner access reference.

7.3 Avoid full raw dumps

Do not dump full rows with every field unless all fields are safe and useful. Prefer curated summaries and selected metadata.


8. AWS table adapter architecture

src/
ingest-mdx.ts
ingest-code.ts
ingest-excel.ts
ingest-json.ts

ingest-aws/
index.ts
db.ts
types.ts
adapters/
zar-opcodes.ts
zar-esgx-bindings.ts
zar-api-route-bindings.ts
zar-financial-institutions.ts
zar-access-grants.ts
zar-manifests.ts
zar-rulesets.ts

8.2 Adapter responsibilities

Each adapter must define:

  • source tables or views
  • included columns
  • excluded sensitive columns
  • transformation logic
  • record ID pattern
  • slug pattern
  • content template
  • metadata mapping
  • visibility tier
  • truth rank
  • owner/steward if known

9. AWS table adapter rules

Rule 1 — Curate, do not dump

Adapters must export meaningful knowledge records, not raw table copies.

Rule 2 — Prefer canonical views for complex entities

When an entity requires joins across multiple tables, create a SQL view or adapter-level join.

Examples:

  • ESGX binding + opcode + route binding summary
  • access grant + institution + ESGX binding summary
  • manifest + CMI + supported opcodes summary

Rule 3 — Use stable IDs

Record IDs must be stable across reindexing.

Examples:

zar:opcode:OP_1410
zar:esgx:ESGX_4jbj6c76_1410
zar:api_binding:api.reports.quarterly.pull.post.v1
zar:financial_institution:nordbank
zar:access_grant:42

Rule 4 — Include lifecycle metadata

Every production table record should include:

  • status
  • lifecycle_status
  • version if available
  • updated_at if available

Rule 5 — Do not index secrets

Never index:

  • tokens
  • credentials
  • raw auth configuration
  • API keys
  • private secrets
  • encrypted secret payloads
  • MFA/session data

Rule 6 — Minimize client-sensitive data

For client-specific records, index only the minimum safe operational summary unless the record is explicitly intended for tenant-scoped retrieval.

Rule 7 — Use visibility tiers now, enforce later if needed

Even if query-time visibility filtering is not complete in v1, every record should carry visibility_tier from the start.

Rule 8 — Prefer summaries for high-volume operational tables

Do not index raw event logs or high-volume execution rows. Use daily, weekly, or entity-level summaries.

Rule 9 — Make source ownership explicit

Include owner_team or steward where possible.

Rule 10 — Record generation should be deterministic

Given the same source state, the adapter should emit the same record IDs and mostly stable content.


10. Domain adapter examples

10.1 Opcode registry adapter

Source table

zar.opcode_registry

Record granularity

One record per opcode.

Record ID

zar:opcode:<opcode>

Example

example-opcode-registry-adapter.jsonGitHub ↗
{
"id": "zar:opcode:OP_1410",
"slug": "/zar/opcodes/OP_1410",
"title": "Opcode OP_1410: Pull From External System",
"heading": "OP_1410",
"content": "Opcode OP_1410: Pull From External System. Operation key: pull_from_external_system. Category: integration. Intent class: stateful_execution. Description: Pulls data from an approved external system through governed routing. Status: active. Lifecycle: active.",
"sourceType": "aws_registry",
"metadata": {
"kind": "opcode",
"source_system": "aws",
"source_domain": "zar_registry",
"source_database": "zar",
"source_schema": "zar",
"source_table": "opcode_registry",
"entity_type": "opcode",
"entity_id": "OP_1410",
"opcode": "OP_1410",
"operation_key": "pull_from_external_system",
"status": "active",
"lifecycle_status": "active",
"visibility_tier": "internal",
"truth_rank": 98
}
}

10.2 ESGX binding adapter

Source table

zar.external_opcode_binding_registry

Record granularity

One record per ESGX binding.

Record ID

zar:esgx:<esgx_code>

Content should include

  • ESGX code
  • binding name
  • semantic opcode
  • operation key
  • route binding if any
  • artifact type
  • output profiles
  • format negotiation setting
  • lifecycle status

10.3 Access grant adapter

Source table

zar.external_opcode_access_grant

Record granularity

One record per grant, or one summarized record per client-bank-ESGX relationship.

Safety rule

Access grant records should usually be tenant_scoped or restricted, not broadly internal, because they can reveal client-bank relationships.

Content example

Access grant allows institution NordBank to use ESGX_4jbj6c76_1410 for client_bigcorp under covenant_monitoring purpose. Access mode: rolling. Allowed output profiles: json, pdf, signed_package. Status: active.

Metadata example

example-metadata-access-grant-adapter.jsonGitHub ↗
{
"kind": "external_opcode_access_grant",
"visibility_tier": "tenant_scoped",
"truth_rank": 95,
"institution_id": "nordbank",
"grant_scope_ref": "client_bigcorp",
"permitted_use_code": "covenant_monitoring"
}

10.4 Financial institution adapter

Source table

zar.financial_institution_registry

Record granularity

One record per approved institution.

Safety rule

This may be internal or eventually public depending on product policy. If the institution directory is customer-visible, only export approved public-safe fields.


10.5 API route binding adapter

Source table

zar.api_route_binding_registry

Record granularity

One record per API route binding.

Content should include

  • route template
  • method
  • operation key
  • opcode
  • artifact type
  • target resolution mode
  • request/response mapping profiles
  • ZSSR requirement
  • status/lifecycle

recommended-adapter-interface.tsGitHub ↗
export type RawDocument = Record<string, unknown>;

export type QueryResult<Row extends Record<string, unknown> = Record<string, unknown>> = {
rows: Row[];
};

export type Queryable = {
query: <Row extends Record<string, unknown> = Record<string, unknown>>(
sql: string,
params?: readonly unknown[],
) => Promise<QueryResult<Row>>;
};

export type AwsTableAdapter<Document extends RawDocument = RawDocument> = {
name: string;
sourceDomain: string;
sourceDatabase: string;
exportDocuments: (db: Queryable) => Promise<Document[]>;
};

Example:

example-recommended-adapter-interface.tsGitHub ↗
/**
* AWS adapter + implementation
*/

export type RawDocument = Record<string, unknown>;

export type QueryResult<Row extends Record<string, unknown> = Record<string, unknown>> = {
rows: Row[];
};

export type Queryable = {
query: <Row extends Record<string, unknown> = Record<string, unknown>>(
sql: string,
params?: readonly unknown[],
) => Promise<QueryResult<Row>>;
};

export type AwsTableAdapter<Document extends RawDocument = RawDocument> = {
name: string;
sourceDomain: string;
sourceDatabase: string;
exportDocuments: (db: Queryable) => Promise<Document[]>;
};

/**
* Implementation-specific function (must be provided or replaced)
*/
export const ingestOpcodes = async (db: Queryable): Promise<RawDocument[]> => {
const result = await db.query('SELECT * FROM opcodes');
return result.rows;
};

/**
* Adapter instance
*/
export const zarOpcodeAdapter: AwsTableAdapter = {
name: 'zar-opcodes',
sourceDomain: 'zar_registry',
sourceDatabase: 'zar',
exportDocuments: ingestOpcodes,
};

12. Export file naming convention

Recommended pattern:

<source_domain>_index.jsonl

Examples:

zar_registry_index.jsonl
zar_esgx_index.jsonl
comp_hub_index.jsonl
reports_hub_index.jsonl

If source exports become large, split further:

zar_opcodes_index.jsonl
zar_esgx_index.jsonl
zar_manifests_index.jsonl

13. Safety classification rules

Public-safe

Examples:

  • published docs
  • public API descriptions
  • public glossary items

Internal

Examples:

  • opcode definitions
  • route binding summaries
  • architecture records

Restricted

Examples:

  • resolver internals
  • policy-sensitive operational configuration

Tenant scoped

Examples:

  • client-specific ESGX grants
  • client-specific access relationships

Institution scoped

Examples:

  • bank-specific operational access summaries
  • institution-specific integration settings

14. Freshness and scheduling rules

Suggested schedules

SourceSuggested refresh
Docusaurus docson commit / deploy
Code fileson commit / deploy
Excel/JSON configson commit / deploy
ZAR registry tablesevery 15–60 minutes or on registry event
ESGX grantsevent-driven or frequent scheduled refresh
Operational summariesdaily/hourly depending on use case

Future event-driven model

Eventually, registry changes should emit index update events:

registry row changed
→ EventBridge event
→ export adapter refreshes affected entity
→ Ask ZARA index update

15. Merge and ingestion model

Source adapters produce separate JSONL files, but Fly.io should ingest them into one unified Ask ZARA retrieval system.

Recommended flow:

Docusaurus exporter → docs_index.jsonl
AWS ZAR exporter → zar_registry_index.jsonl
AWS ESGX exporter → zar_esgx_index.jsonl
Other databases → domain_index.jsonl

All JSONL files
→ normalized ingestion
→ unified Ask ZARA retrieval index

The files are modular. The retrieval brain is unified.


16. Conflict handling and answer preference

When multiple records conflict, Ask ZARA should prefer:

  1. active canonical registry rows
  2. approved manifests and bindings
  3. current specs/manuals
  4. static JSON/Excel reference exports
  5. code comments/source inference
  6. historical/draft docs

This should be supported by:

  • truth_rank
  • status
  • lifecycle_status
  • updated_at
  • source_priority

17. Minimum viable AWS table adapter set

For the ZAR/ESGX work now, the first production adapters should be:

  1. zar-opcodes
  2. zar-api-route-bindings
  3. zar-esgx-bindings
  4. zar-financial-institutions
  5. zar-access-grants
  6. zar-ruleset-profile-bindings
  7. zar-artifact-manifests

18. Implementation checklist

For each new AWS table adapter:

  • define adapter owner
  • define included tables/views
  • define excluded fields
  • define record granularity
  • define ID pattern
  • define slug pattern
  • define content template
  • define metadata mapping
  • assign visibility tier
  • assign truth rank
  • test for empty exports
  • test for sensitive data leakage
  • test stable reindex output

19. Final principle

Ask ZARA should not merely search files. It should understand ZAYAZ as a governed, multi-source knowledge system.

That requires:

  • curated source adapters
  • normalized index records
  • authoritative metadata
  • trust ranking
  • visibility tiers
  • deterministic exports
  • unified retrieval

This schema and adapter rule set is the foundation for that system.




GitHub RepoRequest for Change (RFC)