Skip to main content
Jira progress: loading…

ZRTS

ZAYAZ Registry Toolchain Specification

1. Purpose

The ZAYAZ Registry Toolchain is the authoritative execution layer that transforms semantic source artifacts into validated, versioned, and promotable registry datasets.

It sits between:

  • ZARATHUSTRA — semantic authoring and framework creation
  • ZARA — advanced validation, reasoning, and consistency enforcement
  • FOGE / engines / APIs — downstream execution and consumption

The toolchain is designed to be:

  • deterministic
  • production-ready
  • AWS-native
  • least-privilege by default
  • extensible through plugins and validation hooks

2. Architectural Position

ZARATHUSTRA (semantic authoring)

ZAYAZ Registry Toolchain (ingest / normalize / validate / version / promote)

ZARA (advanced validation and semantic enforcement)

FOGE / APIs / computation engines / reports

3. Core Principle

ZARATHUSTRA defines meaning. The Registry Toolchain structures it. ZARA verifies it. ZAYAZ executes it.


4. Storage Model

4.1 Bucket Strategy

The toolchain operates against the existing zayaz-assets bucket unless and until security or lifecycle separation requires a split-bucket model.

Recommended v1 layout:

s3://zayaz-assets/
/dev/
/excel/
/datasets/
/schemas/
/exports/
/models/
/staging/
/datasets/
/schemas/
/exports/
/models/
/prod/
/datasets/
/schemas/
/exports/
/models/

4.2 Environment Rules

  • dev may contain source Excel
  • staging should contain normalized artifacts only unless explicitly required for QA
  • prod must not contain source Excel
  • runtime systems consume datasets, not source spreadsheets

5. Canonical Inputs and Outputs

5.1 Inputs

The Registry Toolchain consumes only the minimal canonical object set:

  1. Registry Definition Object
  2. Row Validation Schema
  3. Source artifact (Excel / API / DB)

5.2 Outputs

The toolchain produces:

  1. Dataset Object
  2. Validation Report
  3. Promotion Report
  4. Registry Catalog
  5. Optional generation / hash / audit reports

6. Command Router

The toolchain exposes a single command router:

zyz-registry <command> [options]

6.1 Core Commands

  • ingest
  • validate
  • promote
  • catalog
  • publish
  • inspect
  • scaffold

6.2 Initial Required Commands

For v1, the required executable commands are:

  • gen:registry
  • validate:registry
  • promote:registry
  • catalog:registry

6.3 Example Usage

zyz-registry ingest --registry sig_residency_region_policy --env dev
zyz-registry validate --registry sig_residency_region_policy --env dev
zyz-registry promote --registry sig_residency_region_policy --from dev --to staging
zyz-registry catalog --env dev

7. Typed Configuration

7.1 Purpose

All runtime behavior must be controlled by typed configuration, not by scattered constants or ad hoc environment handling.

7.2 Example Configuration Contract

Example Configuration ContractGitHub ↗
interface RegistryToolchainConfig {
appEnv: "dev" | "staging" | "prod";
awsRegion: string;
bucketName: string;
sourcePrefix: string;
datasetPrefix: string;
schemaPrefix: string;
reportPrefix: string;
allowProdPromotion: boolean;
requireManualApprovalForProd: boolean;
}

7.3 Configuration Sources

Configuration should be loaded from:

  • environment variables
  • AWS Systems Manager Parameter Store
  • AWS Secrets Manager (for secrets only)

7.4 Configuration Rule

Secrets must never be stored in GitHub or checked into local config files.


8. Plugin-Based Ingestion

8.1 Rationale

The toolchain must support multiple source modalities without rewriting the pipeline core.

8.2 Plugin Interface

Plugin InterfaceGitHub ↗
/**
* Source types
*/
type SourceType = "excel" | "api" | "db";

/**
* Registry definition (input descriptor)
*/
export interface RegistryDefinition {
id: string;
sourceType: SourceType;
sourceConfig: Record<string, unknown>;
}

/**
* Pipeline context (runtime execution context)
*/
export interface PipelineContext {
executionId: string;
timestamp: string;
metadata?: Record<string, unknown>;
}

/**
* Normalized output rows
*/
export type NormalizedRow = Record<string, unknown>;
export type NormalizedRows = NormalizedRow[];

/**
* Ingestion Plugin Interface
*/
export interface IngestionPlugin {
sourceType: SourceType;

canHandle(definition: RegistryDefinition): boolean;

ingest(
definition: RegistryDefinition,
context: PipelineContext
): Promise<NormalizedRows>;
}

/**
* Example implementation (Excel plugin)
*/
export const ExcelIngestionPlugin: IngestionPlugin = {
sourceType: "excel",

canHandle(definition) {
return definition.sourceType === "excel";
},

async ingest(definition, context) {
console.log("Ingesting Excel source:", definition.id);

// Mock example
return [
{ row: 1, value: "example", source: definition.id },
{ row: 2, value: "data", source: definition.id }
];
}
};

/**
* Example usage
*/
const definition: RegistryDefinition = {
id: "registry-001",
sourceType: "excel",
sourceConfig: {
filePath: "/data/example.xlsx"
}
};

const context: PipelineContext = {
executionId: "run-123",
timestamp: new Date().toISOString()
};

async function runExample() {
if (ExcelIngestionPlugin.canHandle(definition)) {
const rows = await ExcelIngestionPlugin.ingest(definition, context);
console.log(rows);
}
}

runExample();

8.3 Initial Plugins

  • ExcelIngestionPlugin
  • ApiIngestionPlugin (stub)
  • DatabaseIngestionPlugin (stub)

8.4 Excel Plugin Responsibilities

  • read source artifact from private S3
  • resolve sheet and header row
  • normalize headers
  • normalize row values
  • emit canonical row objects

8.5 API Plugin Responsibilities

  • fetch from controlled API source
  • map remote payload into canonical rows
  • attach lineage and fetch metadata

8.6 Database Plugin Responsibilities

  • execute controlled query or snapshot read
  • map resultset into canonical rows
  • preserve source lineage

9. Validation Engine Hooks (ZARA-ready)

9.1 Principle

The toolchain must support deterministic validation hooks from day one and be ready for future ZARA integration without depending on LLM behavior.

9.2 Hook Interface

Hook InterfaceGitHub ↗
/**
* Validation context (input to validation hook)
*/
export interface ValidationContext {
recordId: string;
data: Record<string, unknown>;
metadata?: Record<string, unknown>;
}

/**
* Validation result
*/
export interface ValidationResult {
valid: boolean;
errors?: string[];
warnings?: string[];
}

/**
* Validation Hook Interface
*/
export interface ValidationHook {
name: string;
run(input: ValidationContext): Promise<ValidationResult>;
}

/**
* Example implementation
*/
export const RequiredFieldValidation: ValidationHook = {
name: "required-field-check",

async run(input: ValidationContext): Promise<ValidationResult> {
const errors: string[] = [];

if (!input.data["name"]) {
errors.push("Missing required field: name");
}

if (!input.data["id"]) {
errors.push("Missing required field: id");
}

return {
valid: errors.length === 0,
errors: errors.length > 0 ? errors : undefined,
warnings: []
};
}
};

/**
* Example usage
*/
const exampleInput: ValidationContext = {
recordId: "rec-001",
data: {
name: "Sample"
// id is missing → will trigger error
}
};

async function runValidation() {
const result = await RequiredFieldValidation.run(exampleInput);
console.log(result);
}

runValidation();

9.3 Required v1 Hooks

  • JsonSchemaValidationHook
  • IntegrityRulesValidationHook
  • PrimaryKeyValidationHook
  • PromotionEligibilityHook

9.4 Planned v2 Hooks

  • CrossRegistryConsistencyHook
  • UnitConsistencyHook
  • ReferenceResolutionHook
  • ZaraRuleEngineHook

9.5 ZARA-ready Clarification

In this context, ZARA-ready means:

  • the toolchain can call deterministic validation and rule-engine hooks
  • validation output is structured and machine-consumable
  • an LLM-enabled ZARA layer may be added later as an assistive or advisory component

The base pipeline must remain deterministic and auditable.


10. Command Specifications

10.1 gen:registry

Purpose

Generate a normalized dataset from a source artifact.

Inputs

  • source artifact
  • Registry Definition Object
  • Row Validation Schema

Outputs

  • dataset object
  • generation report
  • versioned dataset object

Example

npm run gen:registry -- --registry sig_residency_region_policy --env dev

10.2 validate:registry

Purpose

Validate a generated dataset against row schema, integrity rules, and dataset invariants.

Inputs

  • dataset object
  • Registry Definition Object
  • Row Validation Schema

Outputs

  • validation report

Example

npm run validate:registry -- --registry sig_residency_region_policy --env dev

10.3 promote:registry

Purpose

Promote a validated dataset between environments.

Allowed Transitions

  • dev -> staging
  • staging -> prod

Inputs

  • source dataset
  • validation report
  • Registry Definition Object
  • promotion rules

Outputs

  • promoted dataset
  • promotion report
  • copied schema and definition artifacts

Example

npm run promote:registry -- --registry sig_residency_region_policy --from dev --to staging
npm run promote:registry -- --registry sig_residency_region_policy --from staging --to prod --approve

10.4 catalog:registry

Purpose

Rebuild the environment-specific discovery index.

Inputs

  • Registry Definition Objects
  • Row Validation Schemas
  • Dataset Objects
  • optional reports

Outputs

  • registry_catalog.json

Example

npm run catalog:registry -- --env dev

11. AWS Execution Model

11.1 Design Goal

All authoritative execution must occur inside AWS so that:

  • real source Excel never needs to be distributed to external parties
  • production behavior is reproducible
  • logs and approvals are centralized
  • least-privilege access can be enforced
  • Amazon S3 — source artifacts and generated outputs
  • AWS ECS Fargate or AWS Batch — job execution
  • AWS Step Functions — orchestration
  • Amazon EventBridge — triggers and schedules
  • Amazon CloudWatch Logs — execution logs and traceability
  • AWS IAM — role separation and least privilege

11.3 Example Flow

Private S3 source artifact
→ Step Functions
→ ingest task
→ validate task
→ catalog task
→ optional approval gate
→ promote task
→ S3 outputs + CloudWatch logs

12. IAM Boundaries

12.1 Security Principle

Code access must not imply source data access.

Source Custodians

  • 2–3 trusted operators
  • upload/read private Excel
  • approve sensitive promotions

Pipeline Engineers

  • maintain code and infrastructure
  • work with test fixtures
  • no access to real source Excel by default

Reviewers / Approvers

  • review validation and promotion outputs
  • approve production publication

12.3 Runtime Roles

Ingestion Role

  • read source artifacts from private S3
  • write dev datasets and reports

Validation Role

  • read definitions, schemas, and datasets
  • write validation reports

Promotion Role

  • read validation reports and source artifacts
  • write target environment artifacts

Catalog Role

  • read environment artifacts
  • write registry_catalog.json

13. Artifact Lifecycle

13.1 Source Artifact

/dev/excel/<registry_id>.xlsx

13.2 Registry Definition Object

/<env>/schemas/registry_definitions/<registry_id>.definition.json

13.3 Row Validation Schema

/<env>/schemas/row_schemas/<registry_id>.row.schema.json

13.4 Dataset Object

/<env>/datasets/<registry_id>.json
/<env>/datasets/<registry_id>.v1.0.0.json

13.5 Reports

/<env>/exports/generation_reports/<registry_id>.generation.json
/<env>/exports/validation_reports/<registry_id>.validation.json
/<env>/exports/promotion_reports/<registry_id>.promotion.json

13.6 Registry Catalog

/<env>/datasets/registry_catalog.json

14. Promotion Gates

14.1 Dev → Staging

Requires:

  • generated dataset exists
  • validation status = pass
  • promotion rules allow transition

14.2 Staging → Prod

Requires:

  • validation status = pass
  • promotion rules allow transition
  • explicit manual approval when configured
  • no Excel copied into prod

15. Failure Modes

15.1 Ingestion Failures

  • source artifact missing
  • invalid sheet layout
  • unsupported plugin source type

15.2 Validation Failures

  • row schema mismatch
  • duplicate primary keys
  • integrity rule failures

15.3 Promotion Failures

  • missing validation report
  • failed validation status
  • missing approval marker
  • disallowed environment transition

15.4 ZARA Failures (future hooks)

  • semantic inconsistency
  • cross-registry mismatch
  • unresolved references

16. Repository Structure

Recommended code layout:

code/infrastructure/zayaz-registry-toolchain/
package.json
tsconfig.json
src/
cli/
index.ts
commands/
ingest.ts
validate.ts
promote.ts
catalog.ts
scaffold.ts
config/
index.ts
types.ts
loaders.ts
core/
registry-definition.ts
dataset.ts
validation.ts
promotion.ts
catalog.ts
plugins/
ingestion/
base.ts
excel.ts
api.ts
db.ts
validation/
base.ts
json-schema.ts
integrity.ts
pk.ts
zara-hook.ts
aws/
s3.ts
step-functions.ts
eventbridge.ts
logging.ts
types/
registry.ts
dataset.ts
reports.ts
utils/
hashing.ts
timestamps.ts
paths.ts

17. Local Codex Workbench Integration

The toolchain is AWS-executed, but it may receive controlled inputs from a local ZARATHUSTRA workbench.

Rule

  • local workbenches may author or refine artifacts
  • authoritative execution happens in AWS
  • real source Excel must not be distributed broadly

This allows:

  • controlled AI-assisted authoring
  • centralized pipeline execution
  • separation between semantic authoring and operational processing

18. Implementation Priorities

v1 Required

  • command router
  • typed config
  • Excel ingestion plugin
  • schema + integrity validation hooks
  • promotion logic
  • catalog generation
  • AWS-native execution path

v2 Planned

  • API / DB ingestion plugins
  • content hashing and signed build outputs
  • approval marker workflow
  • semantic validation via ZARA hooks
  • dataset attestation and verifier integration

19. Final Statement

The ZAYAZ Registry Toolchain is the production execution spine of the platform’s semantic data layer.

It ensures that:

  • high-value source artifacts remain protected
  • generated datasets are deterministic and auditable
  • validation is structured and extensible
  • promotion is controlled and reproducible

ZARATHUSTRA authors the semantic truth. The Registry Toolchain operationalizes it. ZARA enforces it. ZAYAZ runs on the result.


20. BASH Commands

Examples:

CD /workspaces/zayaz-docs/code/infrastructure/zayaz-registry-toolchain

export AWS_ACCESS_KEY_ID="THE_KEY"
export AWS_SECRET_ACCESS_KEY="THE_SECRET"
export AWS_REGION="eu-north-1"

npm run ingest -- --registry sig_residency_region_policy --env dev --verbose
npm run validate -- --registry sig_residency_region_policy --env dev --verbose
npm run catalog -- --env dev --verbose
npm run promote -- --registry sig_residency_region_policy --from dev --to staging --verbose



GitHub RepoRequest for Change (RFC)