ASK-ZARA-INDEX
Ask ZARA Unified Index Record Schema + AWS Table Adapter Rules
1. Purpose
This specification defines the unified indexing contract for Ask ZARA across documentation, code, Excel workbooks, JSON files, AWS-hosted production databases, and structured registry tables.
Ask ZARA must be able to answer questions across a highly complex ZAYAZ architecture, including:
- Docusaurus MDX documentation
- developer specs
- code files
- Excel reference tables
- JSON registry exports
- AWS-hosted ZAR registry tables
- ESGX, opcode, route binding, grant, institution, manifest, and ruleset metadata
- future domain databases and production-grade reference stores
The goal is to create a scalable ingestion architecture where many source-specific exporters produce records in one common format, allowing Ask ZARA to ingest and retrieve across all sources coherently.
2. Core principle
Ask ZARA should be built around:
many curated source adapters
→ one normalized record schema
→ one unified retrieval/indexing system
The system should not rely on one monolithic index file, nor should it blindly dump raw tables into search. Each source adapter is responsible for turning source data into safe, meaningful, human-readable knowledge records.
3. Source export strategy
3.1 Recommended export units
Use one export file per major knowledge domain or database family, not one file per table and not one global file for everything.
Recommended examples:
docs_index.jsonl
code_index.jsonl
excel_index.jsonl
json_config_index.jsonl
zar_registry_index.jsonl
esgx_registry_index.jsonl
computation_hub_index.jsonl
reports_hub_index.jsonl
input_hub_index.jsonl
mice_index.jsonl
3.2 Rationale
This allows each source family to have its own:
- safety filters
- export schedule
- freshness policy
- ownership model
- transformation logic
- visibility rules
- truth ranking
- testing pipeline
while still producing records that Ask ZARA can ingest into one unified retrieval fabric.
4. Unified record schema
Every indexed item should be emitted as one JSONL record using the same normalized schema.
4.1 Canonical record shape
export type AskZaraIndexRecord = {
id: string;
slug: string;
title: string;
heading?: string | null;
content: string;
sourceType: string;
metadata: {
kind: string;
source_system: string;
source_domain?: string | null;
source_database?: string | null;
source_schema?: string | null;
source_table?: string | null;
source_file?: string | null;
entity_type?: string | null;
entity_id?: string | null;
entity_name?: string | null;
status?: string | null;
lifecycle_status?: string | null;
version?: string | null;
updated_at?: string | null;
created_at?: string | null;
visibility_tier: 'public' | 'internal' | 'admin' | 'restricted' | 'tenant_scoped' | 'institution_scoped';
truth_rank: number;
source_priority?: number | null;
freshness_rank?: number | null;
owner_team?: string | null;
steward?: string | null;
approved_by?: string | null;
approved_at?: string | null;
tags?: string[];
related_entities?: Array<{
relation: string;
entity_type: string;
entity_id: string;
}>;
[key: string]: unknown;
};
};
4.2 Required top-level fields
| Field | Required | Description |
|---|---|---|
id | Yes | Globally stable record identifier. |
slug | Yes | Internal URL/path or virtual path for the record. |
title | Yes | Human-readable title. |
content | Yes | Searchable body text. |
sourceType | Yes | Broad source class, such as mdx, code, excel, json, aws_table. |
metadata | Yes | Structured metadata for filtering, ranking, provenance, and governance. |
5. Recommended sourceType values
Use controlled values where possible.
mdx
code
excel
json
aws_table
aws_view
aws_registry
structured_entity
generated_summary
runtime_summary
Guidance
mdx= Docusaurus/manual/spec pagescode= source code filesexcel= workbook/sheet-derived recordsjson= static JSON config/schema/registry exportsaws_table= records derived directly from AWS-hosted production tablesaws_view= records derived from curated SQL viewsaws_registry= canonical registry records from tables such as ZARstructured_entity= synthesized records representing an entity assembled from multiple tablesgenerated_summary= AI- or deterministic-generated summary recordsruntime_summary= summarized operational data, not raw logs
6. Metadata standards
6.1 Core provenance metadata
All records should include:
{
"kind": "opcode",
"source_system": "aws",
"source_domain": "zar_registry",
"source_database": "zar",
"source_schema": "zar",
"source_table": "opcode_registry",
"entity_type": "opcode",
"entity_id": "OP_1410"
}
6.2 Visibility metadata
Use visibility_tier to prepare for future access-aware retrieval.
Recommended values:
| Value | Meaning |
|---|---|
public | Safe for public or external-facing answers. |
internal | Internal ZAYAZ/Viroway knowledge. |
admin | Admin or privileged operational knowledge. |
restricted | Sensitive but generally non-tenant-specific. |
tenant_scoped | Should only be retrieved within a tenant/client context. |
institution_scoped | Should only be retrieved within a bank/institution context. |
6.3 Truth rank
truth_rank should express source authority from 0 to 100.
Suggested guidance:
| Source type | Suggested truth rank |
|---|---|
| Active canonical registry rows | 95–100 |
| Approved manifest records | 90–98 |
| Approved route/opcode/ruleset bindings | 90–98 |
| Curated SQL views | 85–95 |
| Current Docusaurus specs | 75–90 |
| Static JSON config | 70–90 |
| Excel reference sheets | 60–85 |
| Code comments/source files | 50–80 |
| Historical docs/drafts | 30–70 |
| Runtime summaries | 50–90 depending on freshness and curation |
Ask ZARA should usually prefer higher-truth records when sources conflict.
7. Content construction rules
7.1 Human-readable body
Every structured table record must include a human-readable content field. Do not rely on metadata alone.
Good example:
Opcode OP_1410: Pull From External System.
Operation key: pull_from_external_system.
Category: integration.
Intent class: stateful_execution.
Description: Pulls data from an approved external source through governed routing.
Status: active. Lifecycle: active.
Bad example:
{"opcode":"OP_1410","operation_key":"pull_from_external_system"}
7.2 Content should include synonyms and domain language
Where useful, include alternate terms that users may search for.
Example for ESGX:
ESGX is also known as ESG Exchange Code, external governed access code, and partner access reference.
7.3 Avoid full raw dumps
Do not dump full rows with every field unless all fields are safe and useful. Prefer curated summaries and selected metadata.
8. AWS table adapter architecture
8.1 Recommended directory structure
src/
ingest-mdx.ts
ingest-code.ts
ingest-excel.ts
ingest-json.ts
ingest-aws/
index.ts
db.ts
types.ts
adapters/
zar-opcodes.ts
zar-esgx-bindings.ts
zar-api-route-bindings.ts
zar-financial-institutions.ts
zar-access-grants.ts
zar-manifests.ts
zar-rulesets.ts
8.2 Adapter responsibilities
Each adapter must define:
- source tables or views
- included columns
- excluded sensitive columns
- transformation logic
- record ID pattern
- slug pattern
- content template
- metadata mapping
- visibility tier
- truth rank
- owner/steward if known
9. AWS table adapter rules
Rule 1 — Curate, do not dump
Adapters must export meaningful knowledge records, not raw table copies.
Rule 2 — Prefer canonical views for complex entities
When an entity requires joins across multiple tables, create a SQL view or adapter-level join.
Examples:
- ESGX binding + opcode + route binding summary
- access grant + institution + ESGX binding summary
- manifest + CMI + supported opcodes summary
Rule 3 — Use stable IDs
Record IDs must be stable across reindexing.
Examples:
zar:opcode:OP_1410
zar:esgx:ESGX_4jbj6c76_1410
zar:api_binding:api.reports.quarterly.pull.post.v1
zar:financial_institution:nordbank
zar:access_grant:42
Rule 4 — Include lifecycle metadata
Every production table record should include:
- status
- lifecycle_status
- version if available
- updated_at if available
Rule 5 — Do not index secrets
Never index:
- tokens
- credentials
- raw auth configuration
- API keys
- private secrets
- encrypted secret payloads
- MFA/session data
Rule 6 — Minimize client-sensitive data
For client-specific records, index only the minimum safe operational summary unless the record is explicitly intended for tenant-scoped retrieval.
Rule 7 — Use visibility tiers now, enforce later if needed
Even if query-time visibility filtering is not complete in v1, every record should carry visibility_tier from the start.
Rule 8 — Prefer summaries for high-volume operational tables
Do not index raw event logs or high-volume execution rows. Use daily, weekly, or entity-level summaries.
Rule 9 — Make source ownership explicit
Include owner_team or steward where possible.
Rule 10 — Record generation should be deterministic
Given the same source state, the adapter should emit the same record IDs and mostly stable content.
10. Domain adapter examples
10.1 Opcode registry adapter
Source table
zar.opcode_registry
Record granularity
One record per opcode.
Record ID
zar:opcode:<opcode>
Example
{
"id": "zar:opcode:OP_1410",
"slug": "/zar/opcodes/OP_1410",
"title": "Opcode OP_1410: Pull From External System",
"heading": "OP_1410",
"content": "Opcode OP_1410: Pull From External System. Operation key: pull_from_external_system. Category: integration. Intent class: stateful_execution. Description: Pulls data from an approved external system through governed routing. Status: active. Lifecycle: active.",
"sourceType": "aws_registry",
"metadata": {
"kind": "opcode",
"source_system": "aws",
"source_domain": "zar_registry",
"source_database": "zar",
"source_schema": "zar",
"source_table": "opcode_registry",
"entity_type": "opcode",
"entity_id": "OP_1410",
"opcode": "OP_1410",
"operation_key": "pull_from_external_system",
"status": "active",
"lifecycle_status": "active",
"visibility_tier": "internal",
"truth_rank": 98
}
}
10.2 ESGX binding adapter
Source table
zar.external_opcode_binding_registry
Record granularity
One record per ESGX binding.
Record ID
zar:esgx:<esgx_code>
Content should include
- ESGX code
- binding name
- semantic opcode
- operation key
- route binding if any
- artifact type
- output profiles
- format negotiation setting
- lifecycle status
10.3 Access grant adapter
Source table
zar.external_opcode_access_grant
Record granularity
One record per grant, or one summarized record per client-bank-ESGX relationship.
Safety rule
Access grant records should usually be tenant_scoped or restricted, not broadly internal, because they can reveal client-bank relationships.
Content example
Access grant allows institution NordBank to use ESGX_4jbj6c76_1410 for client_bigcorp under covenant_monitoring purpose. Access mode: rolling. Allowed output profiles: json, pdf, signed_package. Status: active.
Metadata example
{
"kind": "external_opcode_access_grant",
"visibility_tier": "tenant_scoped",
"truth_rank": 95,
"institution_id": "nordbank",
"grant_scope_ref": "client_bigcorp",
"permitted_use_code": "covenant_monitoring"
}
10.4 Financial institution adapter
Source table
zar.financial_institution_registry
Record granularity
One record per approved institution.
Safety rule
This may be internal or eventually public depending on product policy. If the institution directory is customer-visible, only export approved public-safe fields.
10.5 API route binding adapter
Source table
zar.api_route_binding_registry
Record granularity
One record per API route binding.
Content should include
- route template
- method
- operation key
- opcode
- artifact type
- target resolution mode
- request/response mapping profiles
- ZSSR requirement
- status/lifecycle
11. Recommended adapter interface
export type RawDocument = Record<string, unknown>;
export type QueryResult<Row extends Record<string, unknown> = Record<string, unknown>> = {
rows: Row[];
};
export type Queryable = {
query: <Row extends Record<string, unknown> = Record<string, unknown>>(
sql: string,
params?: readonly unknown[],
) => Promise<QueryResult<Row>>;
};
export type AwsTableAdapter<Document extends RawDocument = RawDocument> = {
name: string;
sourceDomain: string;
sourceDatabase: string;
exportDocuments: (db: Queryable) => Promise<Document[]>;
};
Example:
/**
* AWS adapter + implementation
*/
export type RawDocument = Record<string, unknown>;
export type QueryResult<Row extends Record<string, unknown> = Record<string, unknown>> = {
rows: Row[];
};
export type Queryable = {
query: <Row extends Record<string, unknown> = Record<string, unknown>>(
sql: string,
params?: readonly unknown[],
) => Promise<QueryResult<Row>>;
};
export type AwsTableAdapter<Document extends RawDocument = RawDocument> = {
name: string;
sourceDomain: string;
sourceDatabase: string;
exportDocuments: (db: Queryable) => Promise<Document[]>;
};
/**
* Implementation-specific function (must be provided or replaced)
*/
export const ingestOpcodes = async (db: Queryable): Promise<RawDocument[]> => {
const result = await db.query('SELECT * FROM opcodes');
return result.rows;
};
/**
* Adapter instance
*/
export const zarOpcodeAdapter: AwsTableAdapter = {
name: 'zar-opcodes',
sourceDomain: 'zar_registry',
sourceDatabase: 'zar',
exportDocuments: ingestOpcodes,
};
12. Export file naming convention
Recommended pattern:
<source_domain>_index.jsonl
Examples:
zar_registry_index.jsonl
zar_esgx_index.jsonl
comp_hub_index.jsonl
reports_hub_index.jsonl
If source exports become large, split further:
zar_opcodes_index.jsonl
zar_esgx_index.jsonl
zar_manifests_index.jsonl
13. Safety classification rules
Public-safe
Examples:
- published docs
- public API descriptions
- public glossary items
Internal
Examples:
- opcode definitions
- route binding summaries
- architecture records
Restricted
Examples:
- resolver internals
- policy-sensitive operational configuration
Tenant scoped
Examples:
- client-specific ESGX grants
- client-specific access relationships
Institution scoped
Examples:
- bank-specific operational access summaries
- institution-specific integration settings
14. Freshness and scheduling rules
Suggested schedules
| Source | Suggested refresh |
|---|---|
| Docusaurus docs | on commit / deploy |
| Code files | on commit / deploy |
| Excel/JSON configs | on commit / deploy |
| ZAR registry tables | every 15–60 minutes or on registry event |
| ESGX grants | event-driven or frequent scheduled refresh |
| Operational summaries | daily/hourly depending on use case |
Future event-driven model
Eventually, registry changes should emit index update events:
registry row changed
→ EventBridge event
→ export adapter refreshes affected entity
→ Ask ZARA index update
15. Merge and ingestion model
Source adapters produce separate JSONL files, but Fly.io should ingest them into one unified Ask ZARA retrieval system.
Recommended flow:
Docusaurus exporter → docs_index.jsonl
AWS ZAR exporter → zar_registry_index.jsonl
AWS ESGX exporter → zar_esgx_index.jsonl
Other databases → domain_index.jsonl
All JSONL files
→ normalized ingestion
→ unified Ask ZARA retrieval index
The files are modular. The retrieval brain is unified.
16. Conflict handling and answer preference
When multiple records conflict, Ask ZARA should prefer:
- active canonical registry rows
- approved manifests and bindings
- current specs/manuals
- static JSON/Excel reference exports
- code comments/source inference
- historical/draft docs
This should be supported by:
truth_rankstatuslifecycle_statusupdated_atsource_priority
17. Minimum viable AWS table adapter set
For the ZAR/ESGX work now, the first production adapters should be:
zar-opcodeszar-api-route-bindingszar-esgx-bindingszar-financial-institutionszar-access-grantszar-ruleset-profile-bindingszar-artifact-manifests
18. Implementation checklist
For each new AWS table adapter:
- define adapter owner
- define included tables/views
- define excluded fields
- define record granularity
- define ID pattern
- define slug pattern
- define content template
- define metadata mapping
- assign visibility tier
- assign truth rank
- test for empty exports
- test for sensitive data leakage
- test stable reindex output
19. Final principle
Ask ZARA should not merely search files. It should understand ZAYAZ as a governed, multi-source knowledge system.
That requires:
- curated source adapters
- normalized index records
- authoritative metadata
- trust ranking
- visibility tiers
- deterministic exports
- unified retrieval
This schema and adapter rule set is the foundation for that system.