Jira progress: loading…

ZRTS

ZAYAZ Registry Toolchain Specification

1. Purpose

The ZAYAZ Registry Toolchain is the authoritative execution layer that transforms semantic source artifacts into validated, versioned, and promotable registry datasets.

It sits between:

ZARATHUSTRA — semantic authoring and framework creation
ZARA — advanced validation, reasoning, and consistency enforcement
FOGE / engines / APIs — downstream execution and consumption

The toolchain is designed to be:

deterministic
production-ready
AWS-native
least-privilege by default
extensible through plugins and validation hooks

2. Architectural Position

ZARATHUSTRA (semantic authoring)
        ↓
ZAYAZ Registry Toolchain (ingest / normalize / validate / version / promote)
        ↓
ZARA (advanced validation and semantic enforcement)
        ↓
FOGE / APIs / computation engines / reports

3. Core Principle

ZARATHUSTRA defines meaning. The Registry Toolchain structures it. ZARA verifies it. ZAYAZ executes it.

4. Storage Model

4.1 Bucket Strategy

The toolchain operates against the existing zayaz-assets bucket unless and until security or lifecycle separation requires a split-bucket model.

Recommended v1 layout:

s3://zayaz-assets/
  /dev/
    /excel/
    /datasets/
    /schemas/
    /exports/
    /models/
  /staging/
    /datasets/
    /schemas/
    /exports/
    /models/
  /prod/
    /datasets/
    /schemas/
    /exports/
    /models/

4.2 Environment Rules

dev may contain source Excel
staging should contain normalized artifacts only unless explicitly required for QA
prod must not contain source Excel
runtime systems consume datasets, not source spreadsheets

5. Canonical Inputs and Outputs

5.1 Inputs

The Registry Toolchain consumes only the minimal canonical object set:

Registry Definition Object
Row Validation Schema
Source artifact (Excel / API / DB)

5.2 Outputs

The toolchain produces:

Dataset Object
Validation Report
Promotion Report
Registry Catalog
Optional generation / hash / audit reports

6. Command Router

The toolchain exposes a single command router:

zyz-registry <command> [options]

6.1 Core Commands

ingest
validate
promote
catalog
publish
inspect
scaffold

6.2 Initial Required Commands

For v1, the required executable commands are:

gen:registry
validate:registry
promote:registry
catalog:registry

6.3 Example Usage

zyz-registry ingest --registry sig_residency_region_policy --env dev
zyz-registry validate --registry sig_residency_region_policy --env dev
zyz-registry promote --registry sig_residency_region_policy --from dev --to staging
zyz-registry catalog --env dev

7. Typed Configuration

7.1 Purpose

All runtime behavior must be controlled by typed configuration, not by scattered constants or ad hoc environment handling.

7.2 Example Configuration Contract

Example Configuration ContractGitHub ↗
interface RegistryToolchainConfig {
  appEnv: "dev" | "staging" | "prod";
  awsRegion: string;
  bucketName: string;
  sourcePrefix: string;
  datasetPrefix: string;
  schemaPrefix: string;
  reportPrefix: string;
  allowProdPromotion: boolean;
  requireManualApprovalForProd: boolean;
}

7.3 Configuration Sources

Configuration should be loaded from:

environment variables
AWS Systems Manager Parameter Store
AWS Secrets Manager (for secrets only)

7.4 Configuration Rule

Secrets must never be stored in GitHub or checked into local config files.

8. Plugin-Based Ingestion

8.1 Rationale

The toolchain must support multiple source modalities without rewriting the pipeline core.

8.2 Plugin Interface

Plugin InterfaceGitHub ↗
/**
 * Source types
 */
type SourceType = "excel" | "api" | "db";

/**
 * Registry definition (input descriptor)
 */
export interface RegistryDefinition {
  id: string;
  sourceType: SourceType;
  sourceConfig: Record<string, unknown>;
}

/**
 * Pipeline context (runtime execution context)
 */
export interface PipelineContext {
  executionId: string;
  timestamp: string;
  metadata?: Record<string, unknown>;
}

/**
 * Normalized output rows
 */
export type NormalizedRow = Record<string, unknown>;
export type NormalizedRows = NormalizedRow[];

/**
 * Ingestion Plugin Interface
 */
export interface IngestionPlugin {
  sourceType: SourceType;

  canHandle(definition: RegistryDefinition): boolean;

  ingest(
    definition: RegistryDefinition,
    context: PipelineContext
  ): Promise<NormalizedRows>;
}

/**
 * Example implementation (Excel plugin)
 */
export const ExcelIngestionPlugin: IngestionPlugin = {
  sourceType: "excel",

  canHandle(definition) {
    return definition.sourceType === "excel";
  },

  async ingest(definition, context) {
    console.log("Ingesting Excel source:", definition.id);

    // Mock example
    return [
      { row: 1, value: "example", source: definition.id },
      { row: 2, value: "data", source: definition.id }
    ];
  }
};

/**
 * Example usage
 */
const definition: RegistryDefinition = {
  id: "registry-001",
  sourceType: "excel",
  sourceConfig: {
    filePath: "/data/example.xlsx"
  }
};

const context: PipelineContext = {
  executionId: "run-123",
  timestamp: new Date().toISOString()
};

async function runExample() {
  if (ExcelIngestionPlugin.canHandle(definition)) {
    const rows = await ExcelIngestionPlugin.ingest(definition, context);
    console.log(rows);
  }
}

runExample();

8.3 Initial Plugins

ExcelIngestionPlugin
ApiIngestionPlugin (stub)
DatabaseIngestionPlugin (stub)

8.4 Excel Plugin Responsibilities

read source artifact from private S3
resolve sheet and header row
normalize headers
normalize row values
emit canonical row objects

8.5 API Plugin Responsibilities

fetch from controlled API source
map remote payload into canonical rows
attach lineage and fetch metadata

8.6 Database Plugin Responsibilities

execute controlled query or snapshot read
map resultset into canonical rows
preserve source lineage

9. Validation Engine Hooks (ZARA-ready)

9.1 Principle

The toolchain must support deterministic validation hooks from day one and be ready for future ZARA integration without depending on LLM behavior.

9.2 Hook Interface

Hook InterfaceGitHub ↗
/**
 * Validation context (input to validation hook)
 */
export interface ValidationContext {
  recordId: string;
  data: Record<string, unknown>;
  metadata?: Record<string, unknown>;
}

/**
 * Validation result
 */
export interface ValidationResult {
  valid: boolean;
  errors?: string[];
  warnings?: string[];
}

/**
 * Validation Hook Interface
 */
export interface ValidationHook {
  name: string;
  run(input: ValidationContext): Promise<ValidationResult>;
}

/**
 * Example implementation
 */
export const RequiredFieldValidation: ValidationHook = {
  name: "required-field-check",

  async run(input: ValidationContext): Promise<ValidationResult> {
    const errors: string[] = [];

    if (!input.data["name"]) {
      errors.push("Missing required field: name");
    }

    if (!input.data["id"]) {
      errors.push("Missing required field: id");
    }

    return {
      valid: errors.length === 0,
      errors: errors.length > 0 ? errors : undefined,
      warnings: []
    };
  }
};

/**
 * Example usage
 */
const exampleInput: ValidationContext = {
  recordId: "rec-001",
  data: {
    name: "Sample"
    // id is missing → will trigger error
  }
};

async function runValidation() {
  const result = await RequiredFieldValidation.run(exampleInput);
  console.log(result);
}

runValidation();

9.3 Required v1 Hooks

JsonSchemaValidationHook
IntegrityRulesValidationHook
PrimaryKeyValidationHook
PromotionEligibilityHook

9.4 Planned v2 Hooks

CrossRegistryConsistencyHook
UnitConsistencyHook
ReferenceResolutionHook
ZaraRuleEngineHook

9.5 ZARA-ready Clarification

In this context, ZARA-ready means:

the toolchain can call deterministic validation and rule-engine hooks
validation output is structured and machine-consumable
an LLM-enabled ZARA layer may be added later as an assistive or advisory component

The base pipeline must remain deterministic and auditable.

10. Command Specifications

10.1 `gen:registry`

Purpose

Generate a normalized dataset from a source artifact.

Inputs

source artifact
Registry Definition Object
Row Validation Schema

Outputs

dataset object
generation report
versioned dataset object

Example

npm run gen:registry -- --registry sig_residency_region_policy --env dev

10.2 `validate:registry`

Purpose

Validate a generated dataset against row schema, integrity rules, and dataset invariants.

Inputs

dataset object
Registry Definition Object
Row Validation Schema

Outputs

validation report

Example

npm run validate:registry -- --registry sig_residency_region_policy --env dev

10.3 `promote:registry`

Purpose

Promote a validated dataset between environments.

Allowed Transitions

dev -> staging
staging -> prod

Inputs

source dataset
validation report
Registry Definition Object
promotion rules

Outputs

promoted dataset
promotion report
copied schema and definition artifacts

Example

npm run promote:registry -- --registry sig_residency_region_policy --from dev --to staging
npm run promote:registry -- --registry sig_residency_region_policy --from staging --to prod --approve

10.4 `catalog:registry`

Purpose

Rebuild the environment-specific discovery index.

Inputs

Registry Definition Objects
Row Validation Schemas
Dataset Objects
optional reports

Outputs

registry_catalog.json

Example

npm run catalog:registry -- --env dev

11. AWS Execution Model

11.1 Design Goal

All authoritative execution must occur inside AWS so that:

real source Excel never needs to be distributed to external parties
production behavior is reproducible
logs and approvals are centralized
least-privilege access can be enforced

11.2 Recommended Runtime Components

Amazon S3 — source artifacts and generated outputs
AWS ECS Fargate or AWS Batch — job execution
AWS Step Functions — orchestration
Amazon EventBridge — triggers and schedules
Amazon CloudWatch Logs — execution logs and traceability
AWS IAM — role separation and least privilege

11.3 Example Flow

Private S3 source artifact
→ Step Functions
   → ingest task
   → validate task
   → catalog task
   → optional approval gate
   → promote task
→ S3 outputs + CloudWatch logs

12. IAM Boundaries

12.1 Security Principle

Code access must not imply source data access.

12.2 Recommended Access Tiers

Source Custodians

2–3 trusted operators
upload/read private Excel
approve sensitive promotions

Pipeline Engineers

maintain code and infrastructure
work with test fixtures
no access to real source Excel by default

Reviewers / Approvers

review validation and promotion outputs
approve production publication

12.3 Runtime Roles

Ingestion Role

read source artifacts from private S3
write dev datasets and reports

Validation Role

read definitions, schemas, and datasets
write validation reports

Promotion Role

read validation reports and source artifacts
write target environment artifacts

Catalog Role

read environment artifacts
write registry_catalog.json

13. Artifact Lifecycle

13.1 Source Artifact

/dev/excel/<registry_id>.xlsx

13.2 Registry Definition Object

/<env>/schemas/registry_definitions/<registry_id>.definition.json

13.3 Row Validation Schema

/<env>/schemas/row_schemas/<registry_id>.row.schema.json

13.4 Dataset Object

/<env>/datasets/<registry_id>.json
/<env>/datasets/<registry_id>.v1.0.0.json

13.5 Reports

/<env>/exports/generation_reports/<registry_id>.generation.json
/<env>/exports/validation_reports/<registry_id>.validation.json
/<env>/exports/promotion_reports/<registry_id>.promotion.json

13.6 Registry Catalog

/<env>/datasets/registry_catalog.json

14. Promotion Gates

14.1 Dev → Staging

Requires:

generated dataset exists
validation status = pass
promotion rules allow transition

14.2 Staging → Prod

Requires:

validation status = pass
promotion rules allow transition
explicit manual approval when configured
no Excel copied into prod

15. Failure Modes

15.1 Ingestion Failures

source artifact missing
invalid sheet layout
unsupported plugin source type

15.2 Validation Failures

row schema mismatch
duplicate primary keys
integrity rule failures

15.3 Promotion Failures

missing validation report
failed validation status
missing approval marker
disallowed environment transition

15.4 ZARA Failures (future hooks)

semantic inconsistency
cross-registry mismatch
unresolved references

16. Repository Structure

Recommended code layout:

code/infrastructure/zayaz-registry-toolchain/
  package.json
  tsconfig.json
  src/
    cli/
      index.ts
      commands/
        ingest.ts
        validate.ts
        promote.ts
        catalog.ts
        scaffold.ts
    config/
      index.ts
      types.ts
      loaders.ts
    core/
      registry-definition.ts
      dataset.ts
      validation.ts
      promotion.ts
      catalog.ts
    plugins/
      ingestion/
        base.ts
        excel.ts
        api.ts
        db.ts
      validation/
        base.ts
        json-schema.ts
        integrity.ts
        pk.ts
        zara-hook.ts
    aws/
      s3.ts
      step-functions.ts
      eventbridge.ts
      logging.ts
    types/
      registry.ts
      dataset.ts
      reports.ts
    utils/
      hashing.ts
      timestamps.ts
      paths.ts

17. Local Codex Workbench Integration

The toolchain is AWS-executed, but it may receive controlled inputs from a local ZARATHUSTRA workbench.

Rule

local workbenches may author or refine artifacts
authoritative execution happens in AWS
real source Excel must not be distributed broadly

This allows:

controlled AI-assisted authoring
centralized pipeline execution
separation between semantic authoring and operational processing

18. Implementation Priorities

v1 Required

command router
typed config
Excel ingestion plugin
schema + integrity validation hooks
promotion logic
catalog generation
AWS-native execution path

v2 Planned

API / DB ingestion plugins
content hashing and signed build outputs
approval marker workflow
semantic validation via ZARA hooks
dataset attestation and verifier integration

19. Final Statement

The ZAYAZ Registry Toolchain is the production execution spine of the platform’s semantic data layer.

It ensures that:

high-value source artifacts remain protected
generated datasets are deterministic and auditable
validation is structured and extensible
promotion is controlled and reproducible

ZARATHUSTRA authors the semantic truth. The Registry Toolchain operationalizes it. ZARA enforces it. ZAYAZ runs on the result.

20. BASH Commands

Examples:

CD /workspaces/zayaz-docs/code/infrastructure/zayaz-registry-toolchain

export AWS_ACCESS_KEY_ID="THE_KEY"
export AWS_SECRET_ACCESS_KEY="THE_SECRET"
export AWS_REGION="eu-north-1"

npm run ingest -- --registry sig_residency_region_policy --env dev --verbose
npm run validate -- --registry sig_residency_region_policy --env dev --verbose
npm run catalog -- --env dev --verbose
npm run promote -- --registry sig_residency_region_policy --from dev --to staging --verbose

GitHub Repo Request for Change (RFC)

1. Purpose​

2. Architectural Position​

3. Core Principle​

4. Storage Model​

4.1 Bucket Strategy​

4.2 Environment Rules​

5. Canonical Inputs and Outputs​

5.1 Inputs​

5.2 Outputs​

6. Command Router​

6.1 Core Commands​

6.2 Initial Required Commands​

6.3 Example Usage​

7. Typed Configuration​

7.1 Purpose​

7.2 Example Configuration Contract​

7.3 Configuration Sources​

7.4 Configuration Rule​

8. Plugin-Based Ingestion​

8.1 Rationale​

8.2 Plugin Interface​

8.3 Initial Plugins​

8.4 Excel Plugin Responsibilities​

8.5 API Plugin Responsibilities​

8.6 Database Plugin Responsibilities​

9. Validation Engine Hooks (ZARA-ready)​

9.1 Principle​

9.2 Hook Interface​

9.3 Required v1 Hooks​

9.4 Planned v2 Hooks​

9.5 ZARA-ready Clarification​

10. Command Specifications​

10.1 gen:registry​

Purpose​

Inputs​

Outputs​

Example​

10.2 validate:registry​

Purpose​

Inputs​

Outputs​

Example​

10.3 promote:registry​

Purpose​

Allowed Transitions​

Inputs​

Outputs​

Example​

10.4 catalog:registry​

Purpose​

Inputs​

Outputs​

Example​

11. AWS Execution Model​

11.1 Design Goal​

11.2 Recommended Runtime Components​

11.3 Example Flow​

12. IAM Boundaries​

12.1 Security Principle​

12.2 Recommended Access Tiers​

Source Custodians​

Pipeline Engineers​

Reviewers / Approvers​

12.3 Runtime Roles​

Ingestion Role​

Validation Role​

Promotion Role​

Catalog Role​

13. Artifact Lifecycle​

13.1 Source Artifact​

13.2 Registry Definition Object​

13.3 Row Validation Schema​

13.4 Dataset Object​

13.5 Reports​

13.6 Registry Catalog​

14. Promotion Gates​

14.1 Dev → Staging​

14.2 Staging → Prod​

15. Failure Modes​

15.1 Ingestion Failures​

1. Purpose

2. Architectural Position

3. Core Principle

4. Storage Model

4.1 Bucket Strategy

4.2 Environment Rules

5. Canonical Inputs and Outputs

5.1 Inputs

5.2 Outputs

6. Command Router

6.1 Core Commands

6.2 Initial Required Commands

6.3 Example Usage

7. Typed Configuration

7.1 Purpose

7.2 Example Configuration Contract

7.3 Configuration Sources

7.4 Configuration Rule

8. Plugin-Based Ingestion

8.1 Rationale

8.2 Plugin Interface

8.3 Initial Plugins

8.4 Excel Plugin Responsibilities

8.5 API Plugin Responsibilities

8.6 Database Plugin Responsibilities

9. Validation Engine Hooks (ZARA-ready)

9.1 Principle

9.2 Hook Interface

9.3 Required v1 Hooks

9.4 Planned v2 Hooks

9.5 ZARA-ready Clarification

10. Command Specifications

10.1 `gen:registry`

Purpose

Inputs

Outputs

Example

10.2 `validate:registry`

Purpose

Inputs

Outputs

Example

10.3 `promote:registry`

Purpose

Allowed Transitions

Inputs

Outputs

Example

10.4 `catalog:registry`

Purpose

Inputs

Outputs

Example

11. AWS Execution Model

11.1 Design Goal

11.2 Recommended Runtime Components

11.3 Example Flow

12. IAM Boundaries

12.1 Security Principle

12.2 Recommended Access Tiers

Source Custodians

Pipeline Engineers

Reviewers / Approvers

12.3 Runtime Roles

Ingestion Role

Validation Role

Promotion Role

Catalog Role

13. Artifact Lifecycle

13.1 Source Artifact

13.2 Registry Definition Object

13.3 Row Validation Schema

13.4 Dataset Object

13.5 Reports

13.6 Registry Catalog

14. Promotion Gates

14.1 Dev → Staging

14.2 Staging → Prod

15. Failure Modes

15.1 Ingestion Failures