ZRTS
ZAYAZ Registry Toolchain Specification
1. Purpose
The ZAYAZ Registry Toolchain is the authoritative execution layer that transforms semantic source artifacts into validated, versioned, and promotable registry datasets.
It sits between:
- ZARATHUSTRA — semantic authoring and framework creation
- ZARA — advanced validation, reasoning, and consistency enforcement
- FOGE / engines / APIs — downstream execution and consumption
The toolchain is designed to be:
- deterministic
- production-ready
- AWS-native
- least-privilege by default
- extensible through plugins and validation hooks
2. Architectural Position
ZARATHUSTRA (semantic authoring)
↓
ZAYAZ Registry Toolchain (ingest / normalize / validate / version / promote)
↓
ZARA (advanced validation and semantic enforcement)
↓
FOGE / APIs / computation engines / reports
3. Core Principle
ZARATHUSTRA defines meaning. The Registry Toolchain structures it. ZARA verifies it. ZAYAZ executes it.
4. Storage Model
4.1 Bucket Strategy
The toolchain operates against the existing zayaz-assets bucket unless and until security or lifecycle separation requires a split-bucket model.
Recommended v1 layout:
s3://zayaz-assets/
/dev/
/excel/
/datasets/
/schemas/
/exports/
/models/
/staging/
/datasets/
/schemas/
/exports/
/models/
/prod/
/datasets/
/schemas/
/exports/
/models/
4.2 Environment Rules
devmay contain source Excelstagingshould contain normalized artifacts only unless explicitly required for QAprodmust not contain source Excel- runtime systems consume datasets, not source spreadsheets
5. Canonical Inputs and Outputs
5.1 Inputs
The Registry Toolchain consumes only the minimal canonical object set:
- Registry Definition Object
- Row Validation Schema
- Source artifact (Excel / API / DB)
5.2 Outputs
The toolchain produces:
- Dataset Object
- Validation Report
- Promotion Report
- Registry Catalog
- Optional generation / hash / audit reports
6. Command Router
The toolchain exposes a single command router:
zyz-registry <command> [options]
6.1 Core Commands
ingestvalidatepromotecatalogpublishinspectscaffold
6.2 Initial Required Commands
For v1, the required executable commands are:
gen:registryvalidate:registrypromote:registrycatalog:registry
6.3 Example Usage
zyz-registry ingest --registry sig_residency_region_policy --env dev
zyz-registry validate --registry sig_residency_region_policy --env dev
zyz-registry promote --registry sig_residency_region_policy --from dev --to staging
zyz-registry catalog --env dev
7. Typed Configuration
7.1 Purpose
All runtime behavior must be controlled by typed configuration, not by scattered constants or ad hoc environment handling.
7.2 Example Configuration Contract
interface RegistryToolchainConfig {
appEnv: "dev" | "staging" | "prod";
awsRegion: string;
bucketName: string;
sourcePrefix: string;
datasetPrefix: string;
schemaPrefix: string;
reportPrefix: string;
allowProdPromotion: boolean;
requireManualApprovalForProd: boolean;
}
7.3 Configuration Sources
Configuration should be loaded from:
- environment variables
- AWS Systems Manager Parameter Store
- AWS Secrets Manager (for secrets only)
7.4 Configuration Rule
Secrets must never be stored in GitHub or checked into local config files.
8. Plugin-Based Ingestion
8.1 Rationale
The toolchain must support multiple source modalities without rewriting the pipeline core.
8.2 Plugin Interface
/**
* Source types
*/
type SourceType = "excel" | "api" | "db";
/**
* Registry definition (input descriptor)
*/
export interface RegistryDefinition {
id: string;
sourceType: SourceType;
sourceConfig: Record<string, unknown>;
}
/**
* Pipeline context (runtime execution context)
*/
export interface PipelineContext {
executionId: string;
timestamp: string;
metadata?: Record<string, unknown>;
}
/**
* Normalized output rows
*/
export type NormalizedRow = Record<string, unknown>;
export type NormalizedRows = NormalizedRow[];
/**
* Ingestion Plugin Interface
*/
export interface IngestionPlugin {
sourceType: SourceType;
canHandle(definition: RegistryDefinition): boolean;
ingest(
definition: RegistryDefinition,
context: PipelineContext
): Promise<NormalizedRows>;
}
/**
* Example implementation (Excel plugin)
*/
export const ExcelIngestionPlugin: IngestionPlugin = {
sourceType: "excel",
canHandle(definition) {
return definition.sourceType === "excel";
},
async ingest(definition, context) {
console.log("Ingesting Excel source:", definition.id);
// Mock example
return [
{ row: 1, value: "example", source: definition.id },
{ row: 2, value: "data", source: definition.id }
];
}
};
/**
* Example usage
*/
const definition: RegistryDefinition = {
id: "registry-001",
sourceType: "excel",
sourceConfig: {
filePath: "/data/example.xlsx"
}
};
const context: PipelineContext = {
executionId: "run-123",
timestamp: new Date().toISOString()
};
async function runExample() {
if (ExcelIngestionPlugin.canHandle(definition)) {
const rows = await ExcelIngestionPlugin.ingest(definition, context);
console.log(rows);
}
}
runExample();
8.3 Initial Plugins
ExcelIngestionPluginApiIngestionPlugin(stub)DatabaseIngestionPlugin(stub)
8.4 Excel Plugin Responsibilities
- read source artifact from private S3
- resolve sheet and header row
- normalize headers
- normalize row values
- emit canonical row objects
8.5 API Plugin Responsibilities
- fetch from controlled API source
- map remote payload into canonical rows
- attach lineage and fetch metadata
8.6 Database Plugin Responsibilities
- execute controlled query or snapshot read
- map resultset into canonical rows
- preserve source lineage
9. Validation Engine Hooks (ZARA-ready)
9.1 Principle
The toolchain must support deterministic validation hooks from day one and be ready for future ZARA integration without depending on LLM behavior.
9.2 Hook Interface
/**
* Validation context (input to validation hook)
*/
export interface ValidationContext {
recordId: string;
data: Record<string, unknown>;
metadata?: Record<string, unknown>;
}
/**
* Validation result
*/
export interface ValidationResult {
valid: boolean;
errors?: string[];
warnings?: string[];
}
/**
* Validation Hook Interface
*/
export interface ValidationHook {
name: string;
run(input: ValidationContext): Promise<ValidationResult>;
}
/**
* Example implementation
*/
export const RequiredFieldValidation: ValidationHook = {
name: "required-field-check",
async run(input: ValidationContext): Promise<ValidationResult> {
const errors: string[] = [];
if (!input.data["name"]) {
errors.push("Missing required field: name");
}
if (!input.data["id"]) {
errors.push("Missing required field: id");
}
return {
valid: errors.length === 0,
errors: errors.length > 0 ? errors : undefined,
warnings: []
};
}
};
/**
* Example usage
*/
const exampleInput: ValidationContext = {
recordId: "rec-001",
data: {
name: "Sample"
// id is missing → will trigger error
}
};
async function runValidation() {
const result = await RequiredFieldValidation.run(exampleInput);
console.log(result);
}
runValidation();
9.3 Required v1 Hooks
JsonSchemaValidationHookIntegrityRulesValidationHookPrimaryKeyValidationHookPromotionEligibilityHook
9.4 Planned v2 Hooks
CrossRegistryConsistencyHookUnitConsistencyHookReferenceResolutionHookZaraRuleEngineHook
9.5 ZARA-ready Clarification
In this context, ZARA-ready means:
- the toolchain can call deterministic validation and rule-engine hooks
- validation output is structured and machine-consumable
- an LLM-enabled ZARA layer may be added later as an assistive or advisory component
The base pipeline must remain deterministic and auditable.
10. Command Specifications
10.1 gen:registry
Purpose
Generate a normalized dataset from a source artifact.
Inputs
- source artifact
- Registry Definition Object
- Row Validation Schema
Outputs
- dataset object
- generation report
- versioned dataset object
Example
npm run gen:registry -- --registry sig_residency_region_policy --env dev
10.2 validate:registry
Purpose
Validate a generated dataset against row schema, integrity rules, and dataset invariants.
Inputs
- dataset object
- Registry Definition Object
- Row Validation Schema
Outputs
- validation report
Example
npm run validate:registry -- --registry sig_residency_region_policy --env dev
10.3 promote:registry
Purpose
Promote a validated dataset between environments.
Allowed Transitions
dev -> stagingstaging -> prod
Inputs
- source dataset
- validation report
- Registry Definition Object
- promotion rules
Outputs
- promoted dataset
- promotion report
- copied schema and definition artifacts
Example
npm run promote:registry -- --registry sig_residency_region_policy --from dev --to staging
npm run promote:registry -- --registry sig_residency_region_policy --from staging --to prod --approve
10.4 catalog:registry
Purpose
Rebuild the environment-specific discovery index.
Inputs
- Registry Definition Objects
- Row Validation Schemas
- Dataset Objects
- optional reports
Outputs
registry_catalog.json
Example
npm run catalog:registry -- --env dev
11. AWS Execution Model
11.1 Design Goal
All authoritative execution must occur inside AWS so that:
- real source Excel never needs to be distributed to external parties
- production behavior is reproducible
- logs and approvals are centralized
- least-privilege access can be enforced
11.2 Recommended Runtime Components
- Amazon S3 — source artifacts and generated outputs
- AWS ECS Fargate or AWS Batch — job execution
- AWS Step Functions — orchestration
- Amazon EventBridge — triggers and schedules
- Amazon CloudWatch Logs — execution logs and traceability
- AWS IAM — role separation and least privilege
11.3 Example Flow
Private S3 source artifact
→ Step Functions
→ ingest task
→ validate task
→ catalog task
→ optional approval gate
→ promote task
→ S3 outputs + CloudWatch logs
12. IAM Boundaries
12.1 Security Principle
Code access must not imply source data access.
12.2 Recommended Access Tiers
Source Custodians
- 2–3 trusted operators
- upload/read private Excel
- approve sensitive promotions
Pipeline Engineers
- maintain code and infrastructure
- work with test fixtures
- no access to real source Excel by default
Reviewers / Approvers
- review validation and promotion outputs
- approve production publication
12.3 Runtime Roles
Ingestion Role
- read source artifacts from private S3
- write dev datasets and reports
Validation Role
- read definitions, schemas, and datasets
- write validation reports
Promotion Role
- read validation reports and source artifacts
- write target environment artifacts
Catalog Role
- read environment artifacts
- write
registry_catalog.json
13. Artifact Lifecycle
13.1 Source Artifact
/dev/excel/<registry_id>.xlsx
13.2 Registry Definition Object
/<env>/schemas/registry_definitions/<registry_id>.definition.json
13.3 Row Validation Schema
/<env>/schemas/row_schemas/<registry_id>.row.schema.json
13.4 Dataset Object
/<env>/datasets/<registry_id>.json
/<env>/datasets/<registry_id>.v1.0.0.json
13.5 Reports
/<env>/exports/generation_reports/<registry_id>.generation.json
/<env>/exports/validation_reports/<registry_id>.validation.json
/<env>/exports/promotion_reports/<registry_id>.promotion.json
13.6 Registry Catalog
/<env>/datasets/registry_catalog.json
14. Promotion Gates
14.1 Dev → Staging
Requires:
- generated dataset exists
- validation status = pass
- promotion rules allow transition
14.2 Staging → Prod
Requires:
- validation status = pass
- promotion rules allow transition
- explicit manual approval when configured
- no Excel copied into prod
15. Failure Modes
15.1 Ingestion Failures
- source artifact missing
- invalid sheet layout
- unsupported plugin source type
15.2 Validation Failures
- row schema mismatch
- duplicate primary keys
- integrity rule failures
15.3 Promotion Failures
- missing validation report
- failed validation status
- missing approval marker
- disallowed environment transition
15.4 ZARA Failures (future hooks)
- semantic inconsistency
- cross-registry mismatch
- unresolved references
16. Repository Structure
Recommended code layout:
code/infrastructure/zayaz-registry-toolchain/
package.json
tsconfig.json
src/
cli/
index.ts
commands/
ingest.ts
validate.ts
promote.ts
catalog.ts
scaffold.ts
config/
index.ts
types.ts
loaders.ts
core/
registry-definition.ts
dataset.ts
validation.ts
promotion.ts
catalog.ts
plugins/
ingestion/
base.ts
excel.ts
api.ts
db.ts
validation/
base.ts
json-schema.ts
integrity.ts
pk.ts
zara-hook.ts
aws/
s3.ts
step-functions.ts
eventbridge.ts
logging.ts
types/
registry.ts
dataset.ts
reports.ts
utils/
hashing.ts
timestamps.ts
paths.ts
17. Local Codex Workbench Integration
The toolchain is AWS-executed, but it may receive controlled inputs from a local ZARATHUSTRA workbench.
Rule
- local workbenches may author or refine artifacts
- authoritative execution happens in AWS
- real source Excel must not be distributed broadly
This allows:
- controlled AI-assisted authoring
- centralized pipeline execution
- separation between semantic authoring and operational processing
18. Implementation Priorities
v1 Required
- command router
- typed config
- Excel ingestion plugin
- schema + integrity validation hooks
- promotion logic
- catalog generation
- AWS-native execution path
v2 Planned
- API / DB ingestion plugins
- content hashing and signed build outputs
- approval marker workflow
- semantic validation via ZARA hooks
- dataset attestation and verifier integration
19. Final Statement
The ZAYAZ Registry Toolchain is the production execution spine of the platform’s semantic data layer.
It ensures that:
- high-value source artifacts remain protected
- generated datasets are deterministic and auditable
- validation is structured and extensible
- promotion is controlled and reproducible
ZARATHUSTRA authors the semantic truth. The Registry Toolchain operationalizes it. ZARA enforces it. ZAYAZ runs on the result.
20. BASH Commands
Examples:
CD /workspaces/zayaz-docs/code/infrastructure/zayaz-registry-toolchain
export AWS_ACCESS_KEY_ID="THE_KEY"
export AWS_SECRET_ACCESS_KEY="THE_SECRET"
export AWS_REGION="eu-north-1"
npm run ingest -- --registry sig_residency_region_policy --env dev --verbose
npm run validate -- --registry sig_residency_region_policy --env dev --verbose
npm run catalog -- --env dev --verbose
npm run promote -- --registry sig_residency_region_policy --from dev --to staging --verbose