Skip to main content
Jira progress: loading…

ZAR-FW

ZAYAZ Artifact Registry Framework

1. Introduction

The ZAYAZ Artifact Registry (ZAR) is the foundational persistence, lineage, and governance layer of the ZAYAZ platform.

It serves as the canonical system of record for all computational artifacts, enabling full traceability across the ESG data lifecycle—from raw inputs to validated disclosures.

ZAR is not a traditional registry. It is a deterministic, intelligence-aware infrastructure that connects:

  • data (signals)
  • computation (engines, models, rulesets)
  • governance (validation, assurance, audit)

into a single, unified system of truth.


2. Role Within the ZAYAZ Architecture

ZAR operates as a core pillar of the Shared Intelligence Stack (SIS) and ensures coherence across all platform modules.

It integrates directly with:

  • SSSR (Smart Searchable Signal Registry) → semantic definition of signals
  • USO (Universal Signal Ontology) → runtime lineage and instance tracking
  • ZSSR (Smart System Router) → routing and orchestration
  • ZARA / ZAAM → AI-driven reasoning and interaction
  • Verification & Assurance (VERA) → trust, validation, and audit workflows

Together, these systems form a closed-loop ESG intelligence architecture where every data point is:

Defined → Produced → Validated → Traced → Explained


3. The ZAYAZ Identifier System

At the core of ZAR lies the Canonical Identifier Architecture (CIA), which ensures that every element in the system is uniquely and immutably identifiable.

ZAYAZ distinguishes between three identity layers:

LayerIdentifierPurpose
InstanceUSO IDIdentifies a specific occurrence of a signal
TypeCSIDefines the semantic meaning of the signal
ArtifactCMIIdentifies the component that produced or processed the signal

These identifiers operate at three distinct abstraction levels: CSI (type), CMI (artifact), and USO (instance) and form the ZAYAZ Identifier Trinity, enabling full traceability across the entire lifecycle of ESG data.


4. Canonical System Identifiers (CSI)

The Canonical Signal Identifier (CSI) defines the semantic identity of a signal and is governed within the SSSR.

Each signal field within the platform is assigned a CSI, making it:

  • discoverable
  • comparable
  • auditable
  • reusable across modules

CSI is governed within the SSSR and referenced by ZAR, but not owned by it.

CSI Structure

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<NAME>.v<MAJOR>_<MINOR>

Key Concepts

  • MODULE_CODE represents the top-level ZAYAZ module (e.g. comp, vera, inpt)
  • COMPONENT_ID corresponds to the frontmatter ID defined in the ZAYAZ manual
  • KIND defines the role of the signal (e.g. INPUT, OUTPUT, METRIC)
  • NAME is the canonical semantic identifier and is the same as the signal_name in the signal_reistry. (The signal_name is a curated version of the column name.)
  • VERSION tracks the evolution of the signal’s meaning

Example

comp.PEF-ME.OUTPUT.CO2E.v1_0
vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0

Module Codes

ModuleCode
Input Hubinpt
Computation Hubcomp
Reports & Insightsrepo
SISsiss
ZARAzara
ZAAMzaam
Risk (RIF)risk
NetZeronetz
Verification & Assurancevera
SEELseel
EcoWorld Academyacad

Design Principle — Documentation-Linked Identity

A core design principle of ZAYAZ is that:

Every CSI is directly resolvable to its originating component specification.

By aligning COMPONENT_ID with frontmatter IDs:

  • auditors can trace signals directly to documented logic
  • ZARA can explain how values are produced
  • developers maintain a single source of truth

5. What ZAR Registers

ZAR maintains a governed catalog of all artifacts within the platform, including:

  • computation engines (micro-engines, pipelines)
  • schemas and data contracts
  • AI models and feature generators
  • routing rulesets and orchestration logic
  • validation and assurance modules

Each artifact is assigned a Canonical Managed Identifier (CMI) and a ZAR Code, enabling:

  • deterministic lineage tracking
  • reproducible computations
  • audit-ready traceability

6. End-to-End Traceability

ZAR enables full traceability across five stages:

  1. Input → data is ingested via Input Hub (inpt)
  2. Processing → computed through engines in Computation Hub (comp)
  3. Validation → evaluated via Verification & Assurance (vera)
  4. Routing → orchestrated via ZSSR
  5. Disclosure → exposed through Reports & Insights (repo)

At every step:

  • the CMI identifies which artifact processed the data
  • the CSI defines what the data represents
  • the USO ID tracks which instance is being observed

This creates a fully connected lineage chain.


7. Core Capabilities

ZAR enables the following system-critical capabilities:

  1. Deterministic Lineage

Every signal instance can be traced through its complete processing chain.

  1. Replay & Reproducibility

Any ESG disclosure can be reconstructed using:

  • CSI (semantic definition)
  • CMI (execution logic)
  • USO lineage
  1. Audit-Ready Architecture

Supports:

  • CSRD assurance requirements
  • ESRS data traceability
  • ISO 14064 reproducibility
  1. AI Explainability

ZARA and ZAAM can:

  • resolve any CSI to its component
  • explain computation logic
  • surface assumptions and validation layers
  1. Modular Scalability

ZAR supports:

  • multi-tenant deployments
  • white-label configurations
  • global supply chain integration

8. Design Principles

ZAR is built on the following principles:

  1. Immutability
  • Identifiers and lineage records are append-only
  1. Separation of Concerns
  • SSSR → semantics (CSI)
  • ZAR → artifacts (CMI)
  • USO → runtime instances
  1. Deterministic Identity
  • Every element is uniquely and consistently identifiable
  1. Documentation as Infrastructure
  • Component identities are directly linked to system specifications
  1. Precision Before Automation
  • All computations must be explainable, auditable, and verifiable

9. Strategic Positioning

ZAR transforms conventional ESG reporting into a traceable ESG infrastructure layer.

It enables organizations to move from:

  • fragmented data handling to unified traceability
  • opaque computations to explainable decision chains
  • reactive compliance to auditable governance-by-design

In architectural terms, ZAR makes every sustainability-relevant output traceable back to:

  • its semantic definition
  • its producing artifact
  • its runtime processing history
  • its documented component specification

10. Transition to Canonical Identifier Architecture

The following section defines the Canonical Identifier Architecture (CIA) in detail, including:

  • CSI (signal identity)
  • CMI (artifact identity)
  • USO (runtime identity)
  • and their interaction across the ZAYAZ platform

APPENDIX A - CSI Naming Taxonomy

The <MODULE_CODE>, <COMPONENT_ID> and the <VERSION> is given.

Below is examples of <KIND> and <NAME> for CSIs

A.1. KIND

Represents the role or artifact type that the signal belongs to. Typically one per schema or message family.

KINDDescription
INPUTInput schema or raw signal
OUTPUTOutput schema or derived signal
SIGNALAtomic reusable signal
SCHEMAJSON Schema or tabular schema type
CONFIGConfiguration schema or parameter set
FEATUREDerived ML feature
METRICAggregated KPI or model output
EVENTSystem event schema
VIEWAnalytical or reporting view

A.2. NAME conventions (semantic or technical label)

Describes what the signal is semantically. Uppercase with underscores for clarity. Name must be unambiguous across all components and is equivalent to the signal's signal_name.

ExampleMeaning
TRUST_SCOREWeighted trust index (0–1)
CO2ECarbon equivalent emissions
EF_QUALITYEmission factor quality
SUPPLIER_TRUSTSupplier reliability score
EF_TIEREmission factor source tier
WATER_USEWater consumption metric

APPENDIX B - CMI Naming Taxonomy

The <MODULE_CODE>, <COMPONENT_ID> and the <VERSION> is pretty much given.

Below is examples of <KIND> and <NAME> for CMIs

B.1. KIND

KINDMeaning
ENGINEExecutable micro-engine (Python, Node, etc.)
SCHEMASchema or data contract
SCRIPTScript or ETL job
RULESETRuleset / policy definition
CONNECTORIntegration adapter (e.g., SAP, QuickBooks)
MODELTrained ML model
UIFront-end component
DASHBOARDVisualization artifact
JOBOrchestrated workflow (Airflow/StepFunction)
LIBShared library
TESTValidation or regression test bundle

B.2. NAME

The artifact or sub-function name within the component.

ExampleMeaning
CoreMain runtime module
ParserText parsing module
ValidatorRule validator
ConnectorAPI connector
DecisionOutput schema
OutputDecisionDecision schema type
InvoiceLinesRouter ruleset
EU_ValidatorRegion-specific variant

APPENDIX C - The Birth of a Signal

When a signal is born, a USO ID is created, and the appropriate CSI and CMI are assigned from their registries.

The canonical creation sequence

StepActionCreated / AssignedRegistryMeaning
1. Signal instance is generatedA micro-engine finishes a computation or data extraction.“A new data record is born.”
2. System mints a USO IDNew globally unique lineage identifier.CreatedUSO (runtime)“This is one unique signal instance.”
3. System attaches CMIEngine’s canonical artifact ID.AssignedZAR“It was produced by this artifact.”
4. System attaches CSICanonical signal type ID.AssignedSSSR“It is a signal of this conceptual type.”
5. (Optional) Add origin_chain and origin_chain_codesFor future provenance hops.AppendedUSO“Here’s its movement trail.”

In plain language

  • USO ID → created at runtime (new each time a data point exists)
  • CMI → assigned from the ZAR registry (the artifact that produced it)
  • CSI → assigned from the SSSR registry (the conceptual type of signal)

Example — “Invoice CO2E” in context

LayerFieldExampleHow it got thereMeaning
USOuso_id01JBF0W8S9Q0R1S2T3U4V5W6XAuto-created ULID at runtimeThis is one unique signal instance.
USOprimary_origin_cmicomp.TAC.ENGINE.CORE.1_1_0Assigned from ZARIt was first produced by this artifact.
USOcsicomp.TAC.OUTPUT.CO2E.v1_0 Assigned from SSSRIt is a signal of this conceptual type.
USOorigin_chain[comp.TAC.ENGINE.CORE.1_1_0]Initialized from producing artifactHere is the ordered chain of artifacts that touched it.
USOorigin_chain_codes[TAC12]Derived from CMI short codeCompact representation of the processing trail.
USOborn_at2025-10-25T12:40ZAuto-timestampedWhen this signal instance was created.

Later, if the same record passes through TrustGate:

FieldNew ValueWhy
current_cmivera.TG-CORE.ENGINE.CORE.1_0_0Assigned from ZAR
origin_chain[…, vera.TG-CORE.ENGINE.CORE.1_0_0]Appended
origin_chain_codes[…, TG3K7]Appended

Mental shortcut

Think of each registry as a “naming service” that the runtime joins together:

RegistryGives youExample
ZARWho produced or consumed itMICE.InvoiceEmissions.Engine.1_1_0
SSSRWhat type of signal it isMICE.InvoiceEmissions.OUTPUT.CO2E.v1_0
USO (runtime)Which specific instance this is01JBF0W8S9Q0R1S2T3U4V5W6X

Summary statement

  1. When a signal is generated, the ZAYAZ platform:
  2. Creates a new USO ID (unique lineage instance),
  3. Assigns the correct CMI (the producing artifact, from ZAR),
  4. Assigns the correct CSI (the signal type, from SSSR),
  5. Optionally begins its origin_chain with the producing CMI and short code.

APPENDIX D - CSI Validation & SSSR Enforcement

D.1. Purpose

The CSI validator ensures that every Canonical Signal Identifier is:

  • syntactically valid
  • semantically well-formed
  • aligned with the ZAYAZ module system
  • linked to a valid documented component
  • versioned consistently

It should be enforced in:

  • CI/CD
  • schema publishing
  • SSSR inserts/updates
  • code generation pipelines
  • linting for MDX/manual examples

D.2. Canonical CSI Format

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<NAME>.v<MAJOR>_<MINOR>

Example

comp.PEF-ME.OUTPUT.CO2E.v1_0
vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0
inpt.FOGE-FORM.INPUT.WATER_USE.v1_0

Allowed Module Codes

inpt
comp
repo
siss
zara
zaam
risk
netz
vera
seel
acad

Allowed KIND values

INPUT
OUTPUT
SIGNAL
SCHEMA
CONFIG
FEATURE
METRIC
EVENT
VIEW

D.3. Field Rules

  1. MODULE_CODE
  • must be lowercase
  • must be one of the approved module codes
  • must be exactly one registered ZAYAZ module namespace
  1. COMPONENT_ID
  • must match a valid frontmatter ID
  • must be globally unique across the platform
  • recommended pattern:

^[A-Z0-9]+(?:-[A-Z0-9]+)*$

  • examples:
    • PEF-ME
    • TG-CORE
    • FOGE-FORM
  1. KIND
  • must be uppercase
  • must belong to the approved enum
  1. NAME
  • must be uppercase snake case or uppercase alphanumeric token
  • recommended pattern:

^[A-Z][A-Z0-9_]*$

  • examples:
    • CO2E
    • TRUST_SCORE
    • VALIDATION_STATUS
  1. VERSION
  • must use:

v<MAJOR>_<MINOR>

  • examples:
    • v1_0
    • v2_1
  1. Full CSI
  • no spaces
  • no extra segments
  • no lowercase in KIND or NAME
  • no missing v prefix
  • no dots inside segments

D.4. Regex

Use this as the base validator:

^(inpt|comp|repo|siss|zara|zaam|risk|netz|vera|seel|acad)\.([A-Z0-9]+(?:-[A-Z0-9]+)*)\.(INPUT|OUTPUT|SIGNAL|SCHEMA|CONFIG|FEATURE|METRIC|EVENT|VIEW)\.([A-Z][A-Z0-9_]*)\.v([0-9]+)_([0-9]+)$

D.5. Validation Levels

Level 1 — Syntax Validation

Checks:

  • regex match
  • segment count
  • allowed character set
  • required v version prefix

Level 2 — Registry Validation

Checks:

  • MODULE_CODE exists
  • COMPONENT_ID exists in component/frontmatter registry
  • component belongs to correct module
  • KIND is valid enum

Level 3 — Semantic Validation

Checks:

  • CSI not already assigned conflicting meaning
  • version bump rules followed
  • deprecated CSI not reused
  • NAME uniqueness rules respected within intended scope

Level 4 — Governance Validation

Checks:

  • change approved if semantic meaning changed
  • major version bump for breaking semantic changes
  • minor version bump only for non-breaking semantic refinements

D.6. Versioning Rules

Minor bump (v1_0 → v1_1)

Use when:

  • description refined
  • metadata expanded
  • documentation clarified
  • no semantic meaning change

Major bump (v1_1 → v2_0)

Use when:

  • signal meaning changes
  • methodology changes materially
  • unit changes
  • value interpretation changes
  • framework mapping changes in a way that alters semantics

Forbidden

  • changing meaning without version bump
  • reusing deprecated CSI for new meaning
  • patch-style CSI versions like v1_0_1

D.7. Example CSIs

Example Valid CSIs

comp.PEF-ME.OUTPUT.CO2E.v1_0
vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0
inpt.FOGE-FORM.INPUT.WATER_USE.v1_0
repo.REPORT-BUILDER.VIEW.ESRS_DASHBOARD.v2_0
risk.RIF-CORE.EVENT.RISK_ALERT.v1_1

Example Invalid CSIs

calc.TAC.OUTPUT.CO2E.1_0

Invalid:

  • calc not approved
  • missing v

comp.pef-me.OUTPUT.CO2E.v1_0

Invalid:

  • component not uppercase frontmatter format

comp.PEF-ME.output.CO2E.v1_0

Invalid:

  • KIND not uppercase

comp.PEF-ME.OUTPUT.co2e.v1_0

Invalid:

  • NAME not uppercase

comp.PEF-ME.OUTPUT.CO2E.v1_0_1

Invalid:

  • CSI does not use patch versioning

D.8. CI Enforcement Spec

Required checks in CI

Every new or changed CSI should be validated against:

  1. regex format
  2. approved module code list
  3. frontmatter/component registry lookup
  4. duplicate/conflict detection in SSSR
  5. version bump policy

Suggested CI failure messages

Invalid CSI: calc.TAC.OUTPUT.CO2E.1_0
Reason: module_code 'calc' is not registered. Use one of: inpt, comp, repo, siss, zara, zaam, risk, netz, vera, seel, acad.

Invalid CSI: comp.PEF-ME.OUTPUT.CO2E.1_0
Reason: version must use 'v<MAJOR>_<MINOR>' format, e.g. v1_0.

Invalid CSI: comp.PEF-ME.output.CO2E.v1_0
Reason: KIND must be one of INPUT, OUTPUT, SIGNAL, SCHEMA, CONFIG, FEATURE, METRIC, EVENT, VIEW.

D.9. SSSR Schema Enforcement Model

Since CSI does not have its own registry and lives inside SSSR, the cleanest model is:

  • keep CSI as a first-class field in signal_registry
  • validate it against:
    • module registry
    • component/frontmatter registry
    • CSI rules
  • optionally decompose it into indexed columns

Recommended signal_registry Structure

CREATE TABLE signal_registry (
signal_id TEXT PRIMARY KEY,
csi TEXT NOT NULL UNIQUE,

module_code TEXT NOT NULL,
component_id TEXT NOT NULL,
kind TEXT NOT NULL,
signal_name TEXT NOT NULL,
version_major INTEGER NOT NULL,
version_minor INTEGER NOT NULL,

display_name TEXT NOT NULL,
description TEXT,
value_type TEXT,
unit TEXT,
status TEXT NOT NULL DEFAULT 'active',
deprecated_by_csi TEXT NULL,

source_module_id TEXT NULL,
framework_tags JSONB,
metadata JSONB,

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

(NOTE: This is just a part of the full signal_registry table.)


D.10. Why store decomposed CSI columns too

Do not rely only on the full csi string.

Store:

  • module_code
  • component_id
  • kind
  • signal_name
  • version_major
  • version_minor

This gives us:

  • faster filtering
  • easier joins
  • easier governance
  • safer validation
  • better analytics

The full csi remains the canonical string, but the decomposed columns make the system operable.


  1. Module constraint
CHECK (module_code IN ('inpt','comp','repo','siss','zara','zaam','risk','netz','vera','seel','acad'))
  1. Kind constraint
CHECK (kind IN ('INPUT','OUTPUT','SIGNAL','SCHEMA','CONFIG','FEATURE','METRIC','EVENT','VIEW'))
  1. Version constraint
CHECK (version_major >= 0),
CHECK (version_minor >= 0)
  1. CSI format constraint

If the DB supports regex checks:

CHECK (
csi ~ '^(inpt|comp|repo|siss|zara|zaam|risk|netz|vera|seel|acad)\.([A-Z0-9]+(?:-[A-Z0-9]+)*)\.(INPUT|OUTPUT|SIGNAL|SCHEMA|CONFIG|FEATURE|METRIC|EVENT|VIEW)\.([A-Z][A-Z0-9_]*)\.v([0-9]+)_([0-9]+)$'
)
  1. Canonical string consistency

Ensure decomposed fields match the csi string through trigger or generated column logic.


In the module/component documentation registry:

component_id REFERENCES documented_components(component_id)

And:

(module_code, component_id) REFERENCES documented_components(module_code, component_id)

This is the strongest way to enforce:

  • frontmatter linkage
  • documentation integrity
  • ZARA explainability compatibility

D.13. Suggested documented_components Table

FieldPurpose
component_idFrontmatter ID, e.g. ZAR-FW, PEF-ME, TG-CORE
module_codecomp, vera, inpt, etc.
titleHuman-readable title
slugDocs route
source_fileMDX file path
doc_statusdraft / review / active / deprecated
versionSpec version
owner_teamResponsible team
summaryOne-paragraph description
parent_component_idOptional link to parent spec/component
tagsSearch/filter metadata
legacy_manual_refOptional backward reference
last_updatedAuditability / sync support
CREATE TABLE documented_components (
component_id TEXT PRIMARY KEY,
module_code TEXT NOT NULL,
title TEXT NOT NULL,
slug TEXT,
doc_status TEXT NOT NULL DEFAULT 'active',
owner_team TEXT,
source_file TEXT
);

Add uniqueness:

CREATE UNIQUE INDEX documented_components_module_component_uidx
ON documented_components(module_code, component_id);

D.14. Trigger Strategy

Use a trigger on sssr_signals insert/update to:

  1. parse csi
  2. validate segment values
  3. populate decomposed fields
  4. verify (module_code, component_id) exists in documented_components
  5. reject semantic collisions

D.15. Pseudocode

on insert/update sssr_signals:
parse csi into module_code, component_id, kind, signal_name, version_major, version_minor
assert module_code in allowed_modules
assert kind in allowed_kinds
assert documented_components contains (module_code, component_id)
assert no conflicting active signal with same csi
assert versioning rules are respected
write parsed fields back into columns

CREATE UNIQUE INDEX sssr_signals_csi_uidx ON sssr_signals(csi);
CREATE INDEX sssr_signals_module_idx ON sssr_signals(module_code);
CREATE INDEX sssr_signals_component_idx ON sssr_signals(component_id);
CREATE INDEX sssr_signals_kind_idx ON sssr_signals(kind);
CREATE INDEX sssr_signals_name_idx ON sssr_signals(signal_name);
CREATE INDEX sssr_signals_status_idx ON sssr_signals(status);

D.17. Strong Governance Rules for SSSR

A signal record must not be created unless:

  • CSI is valid
  • component exists in documentation registry
  • semantic definition is present
  • display name is present
  • value type is defined for machine handling

A signal record must be deprecated instead of overwritten when:

  • meaning changes
  • framework logic changes materially
  • unit/value interpretation changes

A signal record may be revised in place only when:

  • documentation is clarified
  • metadata is enriched without semantic change

D.18. Best Practice: Canonical + Display Split

In SSSR, keep:

  • csi as canonical machine identity
  • display_name as human label
  • description as semantic definition

Example:

FieldValue
csivera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0
display_nameTrust Score
descriptionWeighted trust index between 0 and 1 for signal-level or aggregate validation confidence.

This avoids semantic drift.


D.19. Final Recommendation

The strongest setup for ZAYAZ is:

  • CSI stays inside SSSR
  • CSI is validated by regex + registry + trigger
  • component linkage is enforced against frontmatter-derived documentation metadata
  • decomposed CSI fields are stored alongside the full canonical string
  • CI validates examples and schema changes before merge

That gives us:

  • documentation-linked identity
  • strong DB enforcement
  • reliable routing inputs
  • ZARA-readable architecture
  • audit-grade consistency

APPENDIX E — Signal Naming Governance Policy

E.1. Purpose

This appendix defines the governance policy for generating signal_name, classifying MODULE_CODE and KIND, and validating pre-version CSI structures for the ZAYAZ platform.

The policy is used by the Signal Classification Pipeline-assisted classification workflow and applies to all signal records prepared for insertion into the SSSR signal registry.

The workflow relies on structured context extracted from signal_registry and table_registry, including:

  • component title
  • component description
  • table description
  • column reference
  • column description
  • cleaned datatype
  • enum values or example content
  • other relevant metadata required for classification

This information is exported into a working csi_registry for classification and review. Once approved, the enriched results are written back into signal_registry, where the final CSI is assembled.


E.2. Governing Principles

The following principles apply throughout the classification process:

  1. Classification before concatenation
    Semantic classification must be completed before the full CSI is assembled.

  2. Validation before review
    Automated checks must run before human review is triggered.

  3. JSON evidence before approval
    Every processed column must produce a JSON evidence record.

  4. Human review only where needed
    Manual review is reserved for low-confidence or flagged cases.

  5. Versioning remains outside the Signal Classification Pipeline
    CSI versioning is assigned manually and appended later during Excel concatenation.

  6. Semantics over storage
    signal_name and KIND must reflect semantic intent, not merely physical column names or storage formats.


E.3. Signal Classification Pipeline

The Signal Classification Pipeline prepares SSSR signal metadata in a deterministic, reviewable, and auditable way before final CSI concatenation.

The pipeline stages are:

  1. Datatype cleanup app
  2. MODULE_CODE app
  3. KIND app
  4. SIGNAL_NAME app
  5. Validator checks (pre-version only)
  6. JSON export
  7. Human review only for low-confidence cases
  8. Excel concatenation

E.4 Pipeline Stages

E.4.1 Datatype Cleanup App

Normalizes:

  • base datatype
  • nullability
  • enum structure
  • scalar vs array
  • object vs text
  • reference vs reference-list semantics
  • timestamp/date conventions

Inputs:

  • source_data_type (signal_type)
  • column_description (signal_description)
  • sample_values
  • table_prefix
  • source_table

Outputs:

  • cleaned_data_type
  • datatype_normalization_notes
  • datatype_confidence

E.4.2 MODULE_CODE App

Classifies the correct module from the fixed approved module list (use Module Code):

MODULE_CODE Dictionary

  • inpt
  • comp
  • repo
  • siss
  • zara
  • zaam
  • risk
  • netz
  • vera
  • seel
  • acad

Rules:

  • MODULE_CODE must match one of the values above.
  • Module classification is component-governed, not column-level.
  • A component should map to exactly one module unless explicitly documented.

Inputs:

  • component title (component_name)
  • component description (component_description)
  • table description (table_notes, short_description)
  • table type hint (table_prefix)
  • owning component context = component_name + component_description + table context
  • existing component-to-module mapping where available (recommended external lookup table)

Outputs:

  • module_code
  • module_confidence
  • rationale

Rule: MODULE_CODE should normally be determined at the component or table level, not independently per column.


E.4.3 KIND App

Classifies the semantic role of each field using the KIND policy defined below.

Inputs:

  • table baseline context (table_notes, short_description)
  • table type hint (table_prefix)
  • column description (column_description / signal_description)
  • cleaned datatype (cleaned_data_type)
  • enum values or example content (sample_values)
  • module context (module_code)
  • component context (component_name, component_description)

Outputs:

  • kind
  • kind_confidence
  • rationale

E.4.4 SIGNAL_NAME App

Generates the curated semantic signal_name used as the NAME segment in the CSI.

Inputs:

  • column_reference
  • column_description (signal_description)
  • cleaned datatype (cleaned_data_type)
  • enum values or example content (sample_values)
  • table context (source_table, table_prefix, table_notes, short_description)
  • component context (component_name, component_description)
  • module context (module_code)
  • approved naming governance rules

Outputs:

  • signal_name
  • signal_name_confidence
  • naming basis
  • review flags if ambiguous

E.4.5 Validator Checks

Runs pre-version validation on:

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<SIGNAL_NAME>

Example: comp.AIIL-CON.CONFIG.METHOD_VERSION

Checks include:

  • module validity
  • component linkage
  • allowed KIND
  • naming policy compliance
  • duplicate collision detection
  • near-collision detection
  • reserved-word and anti-pattern checks

Outputs:

  • pre_version_key
  • is_valid
  • collision_check_result
  • near_collision_result
  • needs_review
  • review_reason

E.4.6. JSON Export

Exports one JSON record per processed column.

Recommended output file:

  • zarathustra-csi-proposals.json

This file serves as:

  • audit evidence
  • training data
  • QA input
  • migration/reference source

E.4.7. Human Review

Only low-confidence or flagged records are reviewed manually.

Typical review triggers include:

  • ambiguous KIND
  • weak or missing column descriptions
  • near-collision results
  • naming policy exceptions
  • low total confidence scores

E.4.8. Excel Concatenation

Approved values are pasted into Excel and concatenated into the final CSI:

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<SIGNAL_NAME>.v<MAJOR>_<MINOR>

Versioning remains manual.


E.5. Classification Governance Rules

E.5.1. MODULE_CODE Governance

MODULE_CODE is component-governed.

It must not be invented independently for each field.

For most tables, all columns should inherit the same module as the owning component.

Example:

  • component: AIIL-CON
  • table: compute_method_registry
  • module: comp

All signals in that table therefore inherit the comp.* namespace unless a documented exception exists.


E.5.2. KIND Governance

KIND is field-governed, but table-aware.

It must not be guessed from the column name alone.

A baseline kind may be established at table level, but field-level overrides are allowed and expected where the semantic role differs.


E.5.3. SIGNAL_NAME Governance

SIGNAL_NAME is field-governed and semantics-first.

It must not be copied blindly from the physical column name unless the physical name already expresses the correct semantic meaning according to policy.

column_reference remains the physical storage reference. signal_name is the curated semantic identifier. The NAME segment in the CSI is derived from signal_name, not from column_reference.


E.6. Confidence Model

E.6.1. MODULE_CODE confidence

Usually high confidence when:

  • the component is already mapped
  • the table description is clearly anchored
  • MDX/frontmatter context is available

Low confidence when:

  • the component spans multiple modules
  • descriptions are vague
  • ownership is unclear

E.6.2. KIND confidence

Usually high confidence when:

  • datatype and description align
  • the table has a clear semantic role
  • the field meaning is obvious

Low confidence when:

  • the field is generic (value, status, type, data)
  • classification is ambiguous between CONFIG and SCHEMA
  • classification is ambiguous between OUTPUT and METRIC
  • classification is ambiguous between SIGNAL and FEATURE

E.6.3. SIGNAL_NAME confidence

Usually high confidence when:

  • the field description is specific
  • the semantic meaning is clear
  • naming matches approved suffix and token conventions
  • no collision or near-collision exists

Low confidence when:

  • the field is generic
  • the description is weak
  • multiple expansions are plausible
  • the field could be interpreted in more than one semantic way

E.7. KIND Classification Policy

E.7.1. Purpose

The KIND segment classifies the semantic role of a field or signal within the ZAYAZ platform.

It is not merely a datatype label and not merely a UI label. It expresses the field’s functional role in context.

Approved KIND values are:

  • INPUT
  • OUTPUT
  • SIGNAL
  • SCHEMA
  • CONFIG
  • FEATURE
  • METRIC
  • EVENT
  • VIEW

Rules:

  • KIND must match one of the values above.
  • KIND is field-governed but table-aware.
  • A table may define a baseline KIND, but field-level overrides are allowed.

Important constraints:

  • *_SCHEMA_REF → must be SCHEMA
  • CREATED_AT, UPDATED_AT (in registry tables) → must be CONFIG
  • Numeric fields are not automatically METRIC

E.7.2. Core Principle

KIND must describe the semantic role of the field in the platform, not just how the field happens to be stored.

This means:

  • a schema reference is not automatically an INPUT or OUTPUT
  • a registry timestamp is not automatically an EVENT
  • a config row is not automatically a METRIC
  • a physical column name must not determine KIND by itself

E.7.3. Decision Hierarchy

The KIND app should classify in the following order:

A. Table or component context What kind of object is the table primarily describing?

Examples:

  • registry/config tables → baseline often CONFIG
  • runtime payload tables → may contain INPUT, OUTPUT, SIGNAL
  • analytical views → often VIEW or METRIC
  • event logs → often EVENT

B. Column semantic meaning What does the field actually represent?

Examples:

  • schema references → often SCHEMA
  • lifecycle metadata → often CONFIG
  • computed KPI values → often METRIC
  • derived model variables → often FEATURE

C. Datatype and shape Use cleaned datatype as a secondary signal, not the primary one.

Examples:

  • timestamp alone does not imply EVENT
  • JSON alone does not imply SCHEMA
  • enum alone does not imply CONFIG

E.7.4. KIND Definitions

INPUT Use when the field represents an input value or input-facing signal consumed by a process, engine, form, or model.

Typical examples:

  • activity input value
  • emissions input quantity
  • user-entered data field
  • machine-provided input signal

Do not use for:

  • references to input schemas
  • method configuration describing inputs in general

OUTPUT Use when the field represents a computed or emitted output value from a method, engine, or transformation.

Typical examples:

  • CO2E
  • TRUST_SCORE
  • VALIDATION_RESULT

Do not use for:

  • references to output schemas
  • report display metadata

SIGNAL Use when the field represents an atomic reusable signal that is neither best modeled as explicit input, explicit output, nor higher-level metric.

Use sparingly. Prefer INPUT, OUTPUT, or METRIC when those are clearly more accurate.


SCHEMA Use when the field defines, references, or primarily concerns schema structure or data contracts.

Typical examples:

  • INPUTS_SCHEMA_REF
  • OPTIONS_SCHEMA_REF
  • OUTPUT_SCHEMA_REF

CONFIG Use when the field represents configuration, registry metadata, lifecycle settings, implementation bindings, dependency metadata, or governance-related setup information.

Typical examples:

  • METHOD_ID
  • METHOD_NAME
  • METHOD_VERSION
  • LIFECYCLE_STATUS
  • IMPLEMENTATION_REF
  • MICRO_ENGINE_REF
  • DATASET_REQUIREMENTS
  • CREATED_AT
  • UPDATED_AT

FEATURE Use when the field represents a derived model feature used for ML/statistical processing rather than a business-facing metric.


METRIC Use when the field represents an aggregated KPI, score, index, benchmark, or business-facing measurement.

Typical examples:

  • TRUST_SCORE
  • ECO_SCORE
  • MATERIALITY_INDEX
  • ABATEMENT_COST

EVENT Use when the field belongs to an event record or explicitly represents an event signal or state-change record.

Do not use for:

  • CREATED_AT
  • UPDATED_AT

when those occur in registry/config tables.


VIEW Use when the field belongs to a read-model, dashboard projection, analytical presentation layer, or reporting-specific view model.


E.7.5. Baseline + Override Model

Do not force a rigid table-wide KIND.

Instead use:

  • a table-level baseline KIND
  • per-column overrides where justified

Example: compute_method_registry

Baseline:

  • CONFIG

Overrides:

  • INPUTS_SCHEMA_REF → SCHEMA
  • OPTIONS_SCHEMA_REF → SCHEMA
  • OUTPUT_SCHEMA_REF → SCHEMA

E.7.6. Example Classification for compute_method_registry

Field / signal_nameKIND
METHOD_IDCONFIG
METHOD_NAMECONFIG
METHOD_VERSIONCONFIG
LIFECYCLE_STATUSCONFIG
METHOD_TYPECONFIG
DESCRIPTIONCONFIG
INPUTS_SCHEMA_REFSCHEMA
OPTIONS_SCHEMA_REFSCHEMA
OUTPUT_SCHEMA_REFSCHEMA
IMPLEMENTATION_REFCONFIG
MICRO_ENGINE_REFCONFIG
ASSUMPTIONS_TEXT / ASSUMPTIONS_JSONCONFIG
FRAMEWORK_REFSCONFIG
DATASET_REQUIREMENTSCONFIG
ACL_TAGSCONFIG
CREATED_ATCONFIG
UPDATED_ATCONFIG

E.7.7. Confidence and Review Rules

Auto-accept when:

  • table purpose is clear
  • field description clearly matches one KIND
  • datatype aligns with the interpretation
  • no near-equal alternative KIND is plausible

Human review required when:

  • field is generic (value, type, status, data)
  • ambiguous between CONFIG and SCHEMA
  • ambiguous between OUTPUT and METRIC
  • ambiguous between SIGNAL and FEATURE
  • description is weak or missing

E.7.8. Hard Exclusions

The KIND app must never infer:

  • OUTPUT for *_SCHEMA_REF
  • INPUT for *_SCHEMA_REF
  • EVENT for CREATED_AT / UPDATED_AT in registry/config tables
  • METRIC only because a field is numeric

E.8. Controlled Classification Dictionaries

The following controlled dictionaries define the allowed values for MODULE_CODE, KIND, and table_prefix.

These tables serve as:

  • authoritative classification references
  • validation sources for the Signal Classification Pipeline
  • future candidates for formal registry tables in ZAR

E.8.1 MODULE_CODE Dictionary (zar.module_registry)

ModuleModule CodeDomainDescription
Input HubinptData AcquisitionStructured ESG input, onboarding, system capability mapping
Computation HubcompAnalyticsCross-domain computation & modeling
Reports & Insights HubrepoDisclosureReport generation, visualization, stakeholder outputs
SISsissServicesShared governance services
ZARAzaraGovernance AIPrompt-driven ESG orchestration
ZAAMzaamAI AssistanceRole-aware agent system
RIFriskRiskESG risk intelligence & escalation
NETZEROnetzClimateDecarbonization modeling & pathways
Verification & AssuranceveraTrustVerifier workflows & assurance logic
SEELseelMaterialityStakeholder engagement & materiality
EcoWorld AcademyacadEducationCapacity building & ESG fluency

Rules:

  • MODULE_CODE must match one of the values above.
  • Module classification is component-governed, not column-level.
  • A component should map to exactly one module unless explicitly documented.

zar.module_registry

CREATE TABLE zar.module_registry (
module_code TEXT PRIMARY KEY,
module_name TEXT NOT NULL,
domain TEXT NOT NULL,
description TEXT NOT NULL,
sort_order INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
status TEXT NOT NULL DEFAULT 'active',
version TEXT NOT NULL DEFAULT '1_0_0',
source_doc_id TEXT,
approved_by TEXT,
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

CONSTRAINT module_registry_module_code_chk
CHECK (module_code ~ '^[a-z]{4}$'),

CONSTRAINT module_registry_status_chk
CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

CONSTRAINT module_registry_version_chk
CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$')
);

CREATE UNIQUE INDEX module_registry_module_name_uidx
ON zar.module_registry (module_name);

Seed insert:

INSERT INTO zar.module_registry
(module_code, module_name, domain, description, sort_order, status, version)
VALUES
('inpt', 'Input Hub', 'Data Acquisition', 'Structured ESG input, onboarding, system capability mapping', 10, 'active', '1_0_0'),
('comp', 'Computation Hub', 'Analytics', 'Cross-domain computation & modeling', 20, 'active', '1_0_0'),
('repo', 'Reports & Insights Hub', 'Disclosure', 'Report generation, visualization, stakeholder outputs', 30, 'active', '1_0_0'),
('siss', 'SIS', 'Services', 'Shared governance services', 40, 'active', '1_0_0'),
('zara', 'ZARA', 'Governance AI', 'Prompt-driven ESG orchestration', 50, 'active', '1_0_0'),
('zaam', 'ZAAM', 'AI Assistance', 'Role-aware agent system', 60, 'active', '1_0_0'),
('risk', 'RIF', 'Risk', 'ESG risk intelligence & escalation', 70, 'active', '1_0_0'),
('netz', 'NETZERO', 'Climate', 'Decarbonization modeling & pathways', 80, 'active', '1_0_0'),
('vera', 'Verification & Assurance', 'Trust', 'Verifier workflows & assurance logic', 90, 'active', '1_0_0'),
('seel', 'SEEL', 'Materiality', 'Stakeholder engagement & materiality', 100, 'active', '1_0_0'),
('acad', 'EcoWorld Academy', 'Education', 'Capacity building & ESG fluency', 110, 'active', '1_0_0');

This table also exist as a JSON file that will be used for the Signal Classification Pipeline inputs: zarathustra-module-registry.json


E.8.2 KIND Dictionary (zar.kind_registry)

KINDDescription
INPUTInput schema or raw signal
OUTPUTOutput schema or derived signal
SIGNALAtomic reusable signal
SCHEMAJSON Schema or tabular schema reference
CONFIGConfiguration, registry metadata, or parameters
FEATUREDerived ML feature
METRICAggregated KPI or model output
EVENTSystem event or state-change record
VIEWAnalytical or reporting view

Rules:

  • KIND must match one of the values above.
  • KIND is field-governed but table-aware.
  • A table may define a baseline KIND, but field-level overrides are allowed.

Important constraints:

  • *_SCHEMA_REF → must be SCHEMA
  • CREATED_AT, UPDATED_AT (in registry tables) → must be CONFIG
  • Numeric fields are not automatically METRIC

zar.kind_registry

CREATE TABLE zar.kind_registry (
csi_kind TEXT PRIMARY KEY,
csi_kind_description TEXT NOT NULL,
semantic_role TEXT,
usage_notes TEXT,
sort_order INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
status TEXT NOT NULL DEFAULT 'active',
version TEXT NOT NULL DEFAULT '1_0_0',
source_doc_id TEXT,
approved_by TEXT,
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

CONSTRAINT kind_registry_csi_kind_chk
CHECK (csi_kind IN (
'INPUT',
'OUTPUT',
'SIGNAL',
'SCHEMA',
'CONFIG',
'FEATURE',
'METRIC',
'EVENT',
'VIEW'
)),

CONSTRAINT kind_registry_status_chk
CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

CONSTRAINT kind_registry_version_chk
CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$')
);

Seed insert:

INSERT INTO zar.kind_registry
(csi_kind, csi_kind_description, semantic_role, usage_notes, sort_order, status, version)
VALUES
('INPUT', 'Input schema or raw signal', 'Input-facing', 'Use for runtime or user/system-provided input values.', 10, 'active', '1_0_0'),
('OUTPUT', 'Output schema or derived signal', 'Output-facing', 'Use for computed or emitted result values.', 20, 'active', '1_0_0'),
('SIGNAL', 'Atomic reusable signal', 'Neutral semantic', 'Use sparingly when neither INPUT, OUTPUT, nor METRIC is the best fit.', 30, 'active', '1_0_0'),
('SCHEMA', 'JSON Schema or tabular schema reference', 'Structural', 'Use for schema-defining or schema-reference fields such as *_SCHEMA_REF.', 40, 'active', '1_0_0'),
('CONFIG', 'Configuration, registry metadata, or parameters', 'Configuration', 'Baseline kind for most registry and method-definition tables.', 50, 'active', '1_0_0'),
('FEATURE', 'Derived ML feature', 'ML feature', 'Use for engineered features intended for models or scoring.', 60, 'active', '1_0_0'),
('METRIC', 'Aggregated KPI or model output', 'Business metric', 'Use for KPIs, indexes, scores, and business-facing measurements.', 70, 'active', '1_0_0'),
('EVENT', 'System event or state-change record', 'Event-driven', 'Use for event logs, alerts, state transitions, and emitted event records.', 80, 'active', '1_0_0'),
('VIEW', 'Analytical or reporting view', 'Presentation', 'Use for read-models, marts, dashboards, and reporting-facing projections.', 90, 'active', '1_0_0');

This table also exist as a JSON file that will be used for the Signal Classification Pipeline inputs: zarathustra-kind-registry.json


E.8.3 Table Prefix Dictionary (zar.table_prefix_registry)

PrefixDescription
data_Legacy or raw general-purpose data
dim_Dimension tables (countries, units, sectors)
fact_Fact/event tables (emissions, indicators, executions)
ref_Reference data (EFDB, NACE, method registries)
stg_Staging tables (raw Excel/API imports)
int_Intermediate tables (engine merge outputs)
agg_Aggregated data (KPI rollups)
mrt_Data marts (domain-tailored outputs)
tmp_Temporary pipeline tables
rl_Relation tables (many-to-many joins)
eng_Engine outputs (computed results, scored outputs)
mod_Module-owned business objects (user-facing state)
sig_Signal registry tables (signal definitions in SSSR)

Usage in the Signal Classification Pipeline:

  • table_prefix is used as a strong heuristic signal for:
    • KIND baseline classification
    • table semantic role inference
    • validation consistency checks

Examples:

  • ref_ → typically CONFIG or SCHEMA-heavy tables
  • fact_ → often EVENT, SIGNAL, or METRIC
  • eng_ → often OUTPUT, FEATURE, or METRIC
  • mrt_ → often VIEW or METRIC

zar.table_prefix_registry

CREATE TABLE zar.table_prefix_registry (
table_prefix TEXT PRIMARY KEY,
table_prefix_desc TEXT NOT NULL,
baseline_kind_hint TEXT[],
usage_notes TEXT,
sort_order INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
status TEXT NOT NULL DEFAULT 'active',
version TEXT NOT NULL DEFAULT '1_0_0',
source_doc_id TEXT,
approved_by TEXT,
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

CONSTRAINT table_prefix_registry_prefix_chk
CHECK (table_prefix ~ '^[a-z]+_$'),

CONSTRAINT table_prefix_registry_status_chk
CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

CONSTRAINT table_prefix_registry_version_chk
CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$')
);

Seed insert:

INSERT INTO zar.table_prefix_registry
(table_prefix, table_prefix_desc, baseline_kind_hint, usage_notes, sort_order, status, version)
VALUES
('data_', 'Legacy / Raw general', ARRAY['SIGNAL','CONFIG'], 'General-purpose raw or inherited data structures.', 10, 'active', '1_0_0'),
('dim_', 'Dimensions (Countries, Units, Sectors)', ARRAY['CONFIG'], 'Reference-like dimensional structures used for joins and classification.', 20, 'active', '1_0_0'),
('fact_', 'Facts (events) (Emissions, indicators, executions)', ARRAY['EVENT','SIGNAL','METRIC'], 'Fact-style records often contain runtime observations, events, or measured outputs.', 30, 'active', '1_0_0'),
('ref_', 'Reference data (EFDB, NACE, method registries)', ARRAY['CONFIG','SCHEMA'], 'Reference and registry tables, often configuration-heavy with schema references.', 40, 'active', '1_0_0'),
('stg_', 'Staging (Raw Excel / API imports)', ARRAY['INPUT','SIGNAL'], 'Landing-zone data pending normalization or transformation.', 50, 'active', '1_0_0'),
('int_', 'Intermediate (Engine merge outputs)', ARRAY['SIGNAL','OUTPUT'], 'Intermediate computation structures between raw and final outputs.', 60, 'active', '1_0_0'),
('agg_', 'Aggregates (KPI rollups)', ARRAY['METRIC'], 'Aggregated KPI or rollup outputs.', 70, 'active', '1_0_0'),
('mrt_', 'Data marts (Domain-tailored outputs)', ARRAY['VIEW','METRIC'], 'Domain-facing analytical outputs and reporting structures.', 80, 'active', '1_0_0'),
('tmp_', 'Temporary (Pipeline intermediates)', ARRAY['SIGNAL','CONFIG'], 'Ephemeral pipeline support structures.', 90, 'active', '1_0_0'),
('rl_', 'Pure join tables / Relations (Many-to-many links)', ARRAY['CONFIG'], 'Relationship and join support tables.', 100, 'active', '1_0_0'),
('eng_', 'Outputs produced by computation engines (algorithmic results, scored outputs)', ARRAY['OUTPUT','FEATURE','METRIC'], 'Engine-produced computed outputs.', 110, 'active', '1_0_0'),
('mod_', 'Module-owned output tables (business objects, user-facing module state)', ARRAY['CONFIG','VIEW'], 'Business-object or user-facing module state tables.', 120, 'active', '1_0_0'),
('sig_', 'Signals registry (Signal definitions)', ARRAY['SIGNAL','CONFIG'], 'Signal-definition and metadata registry structures.', 130, 'active', '1_0_0');

This table also exist as a JSON file that will be used for the Signal Classification Pipeline inputs: zarathustra-table-prefix-registry.json


E.8.5 zar.component_module_map

This stabilize the entire Signal Classification Pipeline.

  • MODULE_CODE app = deterministic lookup
CREATE TABLE zar.component_module_map (
component_id TEXT PRIMARY KEY,
module_code TEXT NOT NULL,
confidence NUMERIC(3,2) DEFAULT 1.00,
source TEXT DEFAULT 'manual',
notes TEXT,
sort_order INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
status TEXT NOT NULL DEFAULT 'active',
version TEXT NOT NULL DEFAULT '1_0_0',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

CONSTRAINT fk_component_module
FOREIGN KEY (module_code)
REFERENCES zar.module_registry (module_code),

CONSTRAINT component_module_status_chk
CHECK (status IN ('active', 'deprecated', 'draft'))
);

This table becomes:

The authoritative bridge between documentation (components) and runtime classification (modules)

Used by:

  • Signal Classification Pipeline MODULE_CODE
  • Validator
  • ZARA explainability
  • Auditors

A few seed entries (example)

INSERT INTO zar.component_module_map
(component_id, module_code, confidence, source, sort_order)
VALUES
('AIIL-CON', 'comp', 0.99, 'manual', 10),
('ZAR-FW', 'siss', 0.95, 'manual', 20),
('TG-CORE', 'vera', 0.99, 'manual', 30);

Test of first real join

SELECT 
c.component_id,
c.module_code,
m.module_name
FROM zar.component_module_map c
JOIN zar.module_registry m
ON c.module_code = m.module_code;

Output:

component_id | module_code | module_name
--------------+-------------+-------------------------- AIIL-CON | comp | Computation Hub ZAR-FW | siss | SIS TG-CORE | vera | Verification & Assurance (3 rows)


zar.documented_component_registry

The zar.documented_component_registry gives us:

  • canonical component_id
  • title
  • source MDX path
  • owner
  • status
  • stronger linkage for component_module_map
  • future ZARA explainability lookup
CREATE TABLE zar.documented_component_registry (
component_id TEXT PRIMARY KEY,
component_title TEXT NOT NULL,
module_code TEXT NOT NULL,
source_doc_id TEXT,
source_file TEXT,
slug TEXT,
owner_team TEXT,
status TEXT NOT NULL DEFAULT 'active',
version TEXT NOT NULL DEFAULT '1_0_0',
sort_order INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

CONSTRAINT documented_component_status_chk
CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

CONSTRAINT documented_component_version_chk
CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$'),

CONSTRAINT fk_documented_component_module
FOREIGN KEY (module_code)
REFERENCES zar.module_registry (module_code)
);

Architecture

ZAR Governance Layer (v1)

LayerTablePurpose
Module taxonomymodule_registrySystem domains
Signal semanticskind_registryField roles
Data structuretable_prefix_registryTable meaning
Component mappingcomponent_module_mapSystem wiring

This is what powers the Signal Classification Pipeline + CSI + ZARA


Example validation query

SELECT *
FROM zar.module_registry
WHERE module_code = 'comp';

E.8.4 Design Note

These dictionaries should be treated as:

  • controlled vocabularies
  • validation constraints in the Signal Classification Pipeline
  • future candidates for formal ZAR registry tables

Over time, they should be promoted into:

  • module_registry
  • kind_registry
  • table_type_registry

within ZAR for full governance and traceability.

PrefixDescription
data_Legacy or raw general-purpose data
dim_Dimension tables (countries, units, sectors)
fact_Fact/event tables (emissions, indicators, executions)
ref_Reference data (EFDB, NACE, method registries)
stg_Staging tables (raw Excel/API imports)
int_Intermediate tables (engine merge outputs)
agg_Aggregated data (KPI rollups)
mrt_Data marts (domain-tailored outputs)
tmp_Temporary pipeline tables
rl_Relation tables (many-to-many joins)
eng_Engine outputs (computed results, scored outputs)
mod_Module-owned business objects (user-facing state)
sig_Signal registry tables (signal definitions in SSSR)

Usage in the Signal Classification Pipeline:

  • table_prefix is used as a strong heuristic signal for:
    • KIND baseline classification
    • table semantic role inference
    • validation consistency checks

Examples:

  • ref_ → typically CONFIG or SCHEMA-heavy tables
  • fact_ → often EVENT, SIGNAL, or METRIC
  • eng_ → often OUTPUT, FEATURE, or METRIC
  • mrt_ → often VIEW or METRIC

E.9. JSON Evidence Record

Each processed column must produce a JSON evidence record.

Recommended structure:

zarathustra-csi-proposals.jsonGitHub ↗
{
"component_id": "AIIL-CON",
"table_name": "compute_method_registry",
"column_reference": "version",
"column_description": "Semantic version of the method implementation and schema (e.g., 1.0.0). Enables side-by-side versions.",
"data_type": "text",
"module_code": "comp",
"kind": "CONFIG",
"signal_name": "METHOD_VERSION",
"confidence_scores": {
"module_confidence": 0.91,
"kind_confidence": 0.97,
"signal_name_confidence": 0.92,
"total_score": 0.94
},
"review_reason": null,
"existing_similar_signals": [],
"datatype_normalization_notes": "No datatype normalization required. Source type 'text' retained.",
"naming_basis": [
"column_description indicates semantic version of method",
"generic VERSION expanded to domain-specific METHOD_VERSION",
"matches existing suffix conventions"
],
"needs_review": false,
"pre_version_key": "comp.AIIL-CON.CONFIG.METHOD_VERSION",
"suggested_csi_pattern": "comp.AIIL-CON.CONFIG.METHOD_VERSION.v<MAJOR>_<MINOR>",
"collision_check_result": "no_conflict",
"near_collision_result": [],
"generated_at": "2026-03-24T12:00:00Z",
"generator_version": "zarathustra-naming-0.1.0"
}

zarathustra-csi-proposals.json


E.10. Summary

The Signal Naming Governance Policy ensures that ZAYAZ generates MODULE_CODE, KIND, and signal_name in a disciplined, explainable, and reviewable manner before final CSI concatenation.

It exists to ensure:

  • semantic consistency across the SSSR
  • documentation-linked traceability
  • reduced naming drift
  • collision prevention
  • auditable AI-assisted classification

APPENDIX F - Query Results - Tests

F.1. Inspect the latest view to confirm the new run outputs landed correctly

SELECT
row_id,
source_signal_id,
column_reference,
cleaned_data_type,
module_code,
kind,
signal_name,
pre_version_key,
is_valid
FROM zar.v_codex_signal_registry_latest
ORDER BY row_id;

row_id | source_signal_id | column_reference | cleaned_data_type | module_code | kind | signal_name | pre_version_key | is_valid --------+------------------+----------------------+-------------------+-------------+--------+----------------------+-------------------------------------------+---------- 1 | sssr-000343 | method_id | TEXT | comp | CONFIG | METHOD_ID | comp.AIIL-CON.CONFIG.METHOD_ID | t | 2 | sssr-000344 | method_name | TEXT | comp | CONFIG | METHOD_NAME | comp.AIIL-CON.CONFIG.METHOD_NAME | t | 3 | sssr-000345 | version | TEXT | comp | CONFIG | METHOD_VERSION | comp.AIIL-CON.CONFIG.METHOD_VERSION | t | 4 | sssr-000346 | status | TEXT | comp | CONFIG | LIFECYCLE_STATUS | comp.AIIL-CON.CONFIG.LIFECYCLE_STATUS | t | 5 | sssr-000347 | method_type | ENUM | comp | CONFIG | METHOD_TYPE | comp.AIIL-CON.CONFIG.METHOD_TYPE | t | 6 | sssr-000348 | description | TEXT | comp | CONFIG | DESCRIPTION | comp.AIIL-CON.CONFIG.DESCRIPTION | t | 7 | sssr-000349 | inputs_schema_json | TEXT | comp | SCHEMA | INPUTS_SCHEMA_REF | comp.AIIL-CON.SCHEMA.INPUTS_SCHEMA_REF | t | 8 | sssr-000350 | options_schema_json | TEXT | comp | SCHEMA | OPTIONS_SCHEMA_REF | comp.AIIL-CON.SCHEMA.OPTIONS_SCHEMA_REF | t | 9 | sssr-000351 | output_schema_json | TEXT | comp | SCHEMA | OUTPUT_SCHEMA_REF | comp.AIIL-CON.SCHEMA.OUTPUT_SCHEMA_REF | t | 10 | sssr-000352 | implementation_ref | TEXT | comp | CONFIG | IMPLEMENTATION_REF | comp.AIIL-CON.CONFIG.IMPLEMENTATION_REF | t | 11 | sssr-000353 | micro_engine_ref | TEXT | comp | CONFIG | MICRO_ENGINE_REF | comp.AIIL-CON.CONFIG.MICRO_ENGINE_REF | t | 12 | sssr-000354 | assumptions_json | JSONB | comp | CONFIG | ASSUMPTIONS_JSON | comp.AIIL-CON.CONFIG.ASSUMPTIONS_JSON | t | 13 | sssr-000355 | framework_refs | JSONB | comp | CONFIG | FRAMEWORK_REFS | comp.AIIL-CON.CONFIG.FRAMEWORK_REFS | t | 14 | sssr-000356 | dataset_requirements | JSONB | comp | CONFIG | DATASET_REQUIREMENTS | comp.AIIL-CON.CONFIG.DATASET_REQUIREMENTS | t | 15 | sssr-000357 | acl_tags | JSONB | comp | CONFIG | ACL_TAGS | comp.AIIL-CON.CONFIG.ACL_TAGS | t | 16 | sssr-000358 | created_at | TIMESTAMPTZ | comp | CONFIG | CREATED_AT | comp.AIIL-CON.CONFIG.CREATED_AT | t | 17 | sssr-000359 | updated_at | TIMESTAMPTZ | comp | CONFIG | UPDATED_AT | comp.AIIL-CON.CONFIG.UPDATED_AT | t |


F.2. Follow up query

SELECT
row_id,
column_reference,
kind,
kind_confidence,
kind_rationale,
kind_needs_review,
kind_review_reason
FROM zar.v_codex_signal_registry_latest
ORDER BY row_id;

row_id | column_reference | kind | kind_confidence | kind_rationale | kind_needs_review | kind_review_reason --------+----------------------+--------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+-------------------- 1 | method_id | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 2 | method_name | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 3 | version | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 4 | status | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 5 | method_type | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 6 | description | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 7 | inputs_schema_json | SCHEMA | 0.980 | Schema-reference field override. | f | | 8 | options_schema_json | SCHEMA | 0.980 | Schema-reference field override. | f | | 9 | output_schema_json | SCHEMA | 0.980 | Schema-reference field override. | f | | 10 | implementation_ref | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 11 | micro_engine_ref | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 12 | assumptions_json | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 13 | framework_refs | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 14 | dataset_requirements | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 15 | acl_tags | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 16 | created_at | CONFIG | 0.960 | Registry timestamp field classified as CONFIG. | f | | 17 | updated_at | CONFIG | 0.960 | Registry timestamp field classified as CONFIG. | f |




GitHub RepoRequest for Change (RFC)