Jira progress: loading…

ZAR-FW

ZAYAZ Artifact Registry Framework

1. Introduction

The ZAYAZ Artifact Registry (ZAR) is the foundational persistence, lineage, and governance layer of the ZAYAZ platform.

It serves as the canonical system of record for all computational artifacts, enabling full traceability across the ESG data lifecycle—from raw inputs to validated disclosures.

ZAR is not a traditional registry. It is a deterministic, intelligence-aware infrastructure that connects:

data (signals)
computation (engines, models, rulesets)
governance (validation, assurance, audit)

into a single, unified system of truth.

2. Role Within the ZAYAZ Architecture

ZAR operates as a core pillar of the Shared Intelligence Stack (SIS) and ensures coherence across all platform modules.

It integrates directly with:

SSSR (Smart Searchable Signal Registry) → semantic definition of signals
USO (Universal Signal Ontology) → runtime lineage and instance tracking
ZSSR (Smart System Router) → routing and orchestration
ZARA / ZAAM → AI-driven reasoning and interaction
Verification & Assurance (VERA) → trust, validation, and audit workflows

Together, these systems form a closed-loop ESG intelligence architecture where every data point is:

Defined → Produced → Validated → Traced → Explained

3. The ZAYAZ Identifier System

At the core of ZAR lies the Canonical Identifier Architecture (CIA), which ensures that every element in the system is uniquely and immutably identifiable.

ZAYAZ distinguishes between three identity layers:

Layer	Identifier	Purpose
Instance	USO ID	Identifies a specific occurrence of a signal
Type	CSI	Defines the semantic meaning of the signal
Artifact	CMI	Identifies the component that produced or processed the signal

These identifiers operate at three distinct abstraction levels: CSI (type), CMI (artifact), and USO (instance) and form the ZAYAZ Identifier Trinity, enabling full traceability across the entire lifecycle of ESG data.

4. Canonical System Identifiers (CSI)

The Canonical Signal Identifier (CSI) defines the semantic identity of a signal and is governed within the SSSR.

Each signal field within the platform is assigned a CSI, making it:

discoverable
comparable
auditable
reusable across modules

CSI is governed within the SSSR and referenced by ZAR, but not owned by it.

CSI Structure

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<NAME>.v<MAJOR>_<MINOR>

Key Concepts

MODULE_CODE represents the top-level ZAYAZ module (e.g. comp, vera, inpt)
COMPONENT_ID corresponds to the frontmatter ID defined in the ZAYAZ manual
KIND defines the role of the signal (e.g. INPUT, OUTPUT, METRIC)
NAME is the canonical semantic identifier and is the same as the signal_name in the signal_reistry. (The signal_name is a curated version of the column name.)
VERSION tracks the evolution of the signal’s meaning

Example

comp.PEF-ME.OUTPUT.CO2E.v1_0
vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0

Module Codes

Module	Code
Input Hub	inpt
Computation Hub	comp
Reports & Insights	repo
SIS	siss
ZARA	zara
ZAAM	zaam
Risk (RIF)	risk
NetZero	netz
Verification & Assurance	vera
SEEL	seel
EcoWorld Academy	acad

Design Principle — Documentation-Linked Identity

A core design principle of ZAYAZ is that:

Every CSI is directly resolvable to its originating component specification.

By aligning COMPONENT_ID with frontmatter IDs:

auditors can trace signals directly to documented logic
ZARA can explain how values are produced
developers maintain a single source of truth

5. What ZAR Registers

ZAR maintains a governed catalog of all artifacts within the platform, including:

computation engines (micro-engines, pipelines)
schemas and data contracts
AI models and feature generators
routing rulesets and orchestration logic
validation and assurance modules

Each artifact is assigned a Canonical Managed Identifier (CMI) and a ZAR Code, enabling:

deterministic lineage tracking
reproducible computations
audit-ready traceability

6. End-to-End Traceability

ZAR enables full traceability across five stages:

Input → data is ingested via Input Hub (inpt)
Processing → computed through engines in Computation Hub (comp)
Validation → evaluated via Verification & Assurance (vera)
Routing → orchestrated via ZSSR
Disclosure → exposed through Reports & Insights (repo)

At every step:

the CMI identifies which artifact processed the data
the CSI defines what the data represents
the USO ID tracks which instance is being observed

This creates a fully connected lineage chain.

7. Core Capabilities

ZAR enables the following system-critical capabilities:

Deterministic Lineage

Every signal instance can be traced through its complete processing chain.

Replay & Reproducibility

Any ESG disclosure can be reconstructed using:

CSI (semantic definition)
CMI (execution logic)
USO lineage

Audit-Ready Architecture

Supports:

CSRD assurance requirements
ESRS data traceability
ISO 14064 reproducibility

AI Explainability

ZARA and ZAAM can:

resolve any CSI to its component
explain computation logic
surface assumptions and validation layers

Modular Scalability

ZAR supports:

multi-tenant deployments
white-label configurations
global supply chain integration

8. Design Principles

ZAR is built on the following principles:

Immutability

Identifiers and lineage records are append-only

Separation of Concerns

SSSR → semantics (CSI)
ZAR → artifacts (CMI)
USO → runtime instances

Deterministic Identity

Every element is uniquely and consistently identifiable

Documentation as Infrastructure

Component identities are directly linked to system specifications

Precision Before Automation

All computations must be explainable, auditable, and verifiable

9. Strategic Positioning

ZAR transforms conventional ESG reporting into a traceable ESG infrastructure layer.

It enables organizations to move from:

fragmented data handling to unified traceability
opaque computations to explainable decision chains
reactive compliance to auditable governance-by-design

In architectural terms, ZAR makes every sustainability-relevant output traceable back to:

its semantic definition
its producing artifact
its runtime processing history
its documented component specification

10. Transition to Canonical Identifier Architecture

The following section defines the Canonical Identifier Architecture (CIA) in detail, including:

CSI (signal identity)
CMI (artifact identity)
USO (runtime identity)
and their interaction across the ZAYAZ platform

APPENDIX A - CSI Naming Taxonomy

The <MODULE_CODE>, <COMPONENT_ID> and the <VERSION> is given.

Below is examples of <KIND> and <NAME> for CSIs

A.1. KIND

Represents the role or artifact type that the signal belongs to. Typically one per schema or message family.

KIND	Description
INPUT	Input schema or raw signal
OUTPUT	Output schema or derived signal
SIGNAL	Atomic reusable signal
SCHEMA	JSON Schema or tabular schema type
CONFIG	Configuration schema or parameter set
FEATURE	Derived ML feature
METRIC	Aggregated KPI or model output
EVENT	System event schema
VIEW	Analytical or reporting view

A.2. NAME conventions (semantic or technical label)

Describes what the signal is semantically. Uppercase with underscores for clarity. Name must be unambiguous across all components and is equivalent to the signal's signal_name.

Example	Meaning
TRUST_SCORE	Weighted trust index (0–1)
CO2E	Carbon equivalent emissions
EF_QUALITY	Emission factor quality
SUPPLIER_TRUST	Supplier reliability score
EF_TIER	Emission factor source tier
WATER_USE	Water consumption metric

APPENDIX B - CMI Naming Taxonomy

The <MODULE_CODE>, <COMPONENT_ID> and the <VERSION> is pretty much given.

Below is examples of <KIND> and <NAME> for CMIs

B.1. KIND

KIND	Meaning
ENGINE	Executable micro-engine (Python, Node, etc.)
SCHEMA	Schema or data contract
SCRIPT	Script or ETL job
RULESET	Ruleset / policy definition
CONNECTOR	Integration adapter (e.g., SAP, QuickBooks)
MODEL	Trained ML model
UI	Front-end component
DASHBOARD	Visualization artifact
JOB	Orchestrated workflow (Airflow/StepFunction)
LIB	Shared library
TEST	Validation or regression test bundle

B.2. NAME

The artifact or sub-function name within the component.

Example	Meaning
Core	Main runtime module
Parser	Text parsing module
Validator	Rule validator
Connector	API connector
Decision	Output schema
OutputDecision	Decision schema type
InvoiceLines	Router ruleset
EU_Validator	Region-specific variant

APPENDIX C - The Birth of a Signal

When a signal is born, a USO ID is created, and the appropriate CSI and CMI are assigned from their registries.

The canonical creation sequence

Step	Action	Created / Assigned	Registry	Meaning
1. Signal instance is generated	A micro-engine finishes a computation or data extraction.	—	—	“A new data record is born.”
2. System mints a USO ID	New globally unique lineage identifier.	Created	USO (runtime)	“This is one unique signal instance.”
3. System attaches CMI	Engine’s canonical artifact ID.	Assigned	ZAR	“It was produced by this artifact.”
4. System attaches CSI	Canonical signal type ID.	Assigned	SSSR	“It is a signal of this conceptual type.”
5. (Optional) Add origin_chain and origin_chain_codes	For future provenance hops.	Appended	USO	“Here’s its movement trail.”

In plain language

USO ID → created at runtime (new each time a data point exists)
CMI → assigned from the ZAR registry (the artifact that produced it)
CSI → assigned from the SSSR registry (the conceptual type of signal)

Example — “Invoice CO2E” in context

Layer	Field	Example	How it got there	Meaning
USO	uso_id	`01JBF0W8S9Q0R1S2T3U4V5W6X`	Auto-created ULID at runtime	This is one unique signal instance.
USO	primary_origin_cmi	`comp.TAC.ENGINE.CORE.1_1_0`	Assigned from ZAR	It was first produced by this artifact.
USO	csi	`comp.TAC.OUTPUT.CO2E.v1_0`	Assigned from SSSR	It is a signal of this conceptual type.
USO	origin_chain	`[comp.TAC.ENGINE.CORE.1_1_0]`	Initialized from producing artifact	Here is the ordered chain of artifacts that touched it.
USO	origin_chain_codes	`[TAC12]`	Derived from CMI short code	Compact representation of the processing trail.
USO	born_at	`2025-10-25T12:40Z`	Auto-timestamped	When this signal instance was created.

Later, if the same record passes through TrustGate:

Field	New Value	Why
current_cmi	vera.TG-CORE.ENGINE.CORE.1_0_0	Assigned from ZAR
origin_chain	[…, vera.TG-CORE.ENGINE.CORE.1_0_0]	Appended
origin_chain_codes	[…, TG3K7]	Appended

Mental shortcut

Think of each registry as a “naming service” that the runtime joins together:

Registry	Gives you	Example
ZAR	Who produced or consumed it	MICE.InvoiceEmissions.Engine.1_1_0
SSSR	What type of signal it is	MICE.InvoiceEmissions.OUTPUT.CO2E.v1_0
USO (runtime)	Which specific instance this is	01JBF0W8S9Q0R1S2T3U4V5W6X

Summary statement

When a signal is generated, the ZAYAZ platform:
Creates a new USO ID (unique lineage instance),
Assigns the correct CMI (the producing artifact, from ZAR),
Assigns the correct CSI (the signal type, from SSSR),
Optionally begins its origin_chain with the producing CMI and short code.

APPENDIX D - CSI Validation & SSSR Enforcement

D.1. Purpose

The CSI validator ensures that every Canonical Signal Identifier is:

syntactically valid
semantically well-formed
aligned with the ZAYAZ module system
linked to a valid documented component
versioned consistently

It should be enforced in:

CI/CD
schema publishing
SSSR inserts/updates
code generation pipelines
linting for MDX/manual examples

D.2. Canonical CSI Format

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<NAME>.v<MAJOR>_<MINOR>

Example

comp.PEF-ME.OUTPUT.CO2E.v1_0
vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0
inpt.FOGE-FORM.INPUT.WATER_USE.v1_0

Allowed Module Codes

inpt
comp
repo
siss
zara
zaam
risk
netz
vera
seel
acad

Allowed KIND values

INPUT
OUTPUT
SIGNAL
SCHEMA
CONFIG
FEATURE
METRIC
EVENT
VIEW

D.3. Field Rules

MODULE_CODE

must be lowercase
must be one of the approved module codes
must be exactly one registered ZAYAZ module namespace

COMPONENT_ID

must match a valid frontmatter ID
must be globally unique across the platform
recommended pattern:

^[A-Z0-9]+(?:-[A-Z0-9]+)*$

examples:
- PEF-ME
- TG-CORE
- FOGE-FORM

KIND

must be uppercase
must belong to the approved enum

NAME

must be uppercase snake case or uppercase alphanumeric token
recommended pattern:

^[A-Z][A-Z0-9_]*$

examples:
- CO2E
- TRUST_SCORE
- VALIDATION_STATUS

VERSION

must use:

v<MAJOR>_<MINOR>

examples:
- v1_0
- v2_1

Full CSI

no spaces
no extra segments
no lowercase in KIND or NAME
no missing v prefix
no dots inside segments

D.4. Regex

Use this as the base validator:

^(inpt|comp|repo|siss|zara|zaam|risk|netz|vera|seel|acad)\.([A-Z0-9]+(?:-[A-Z0-9]+)*)\.(INPUT|OUTPUT|SIGNAL|SCHEMA|CONFIG|FEATURE|METRIC|EVENT|VIEW)\.([A-Z][A-Z0-9_]*)\.v([0-9]+)_([0-9]+)$

D.5. Validation Levels

Level 1 — Syntax Validation

Checks:

regex match
segment count
allowed character set
required v version prefix

Level 2 — Registry Validation

Checks:

MODULE_CODE exists
COMPONENT_ID exists in component/frontmatter registry
component belongs to correct module
KIND is valid enum

Level 3 — Semantic Validation

Checks:

CSI not already assigned conflicting meaning
version bump rules followed
deprecated CSI not reused
NAME uniqueness rules respected within intended scope

Level 4 — Governance Validation

Checks:

change approved if semantic meaning changed
major version bump for breaking semantic changes
minor version bump only for non-breaking semantic refinements

D.6. Versioning Rules

Minor bump (v1_0 → v1_1)

Use when:

description refined
metadata expanded
documentation clarified
no semantic meaning change

Major bump (v1_1 → v2_0)

Use when:

signal meaning changes
methodology changes materially
unit changes
value interpretation changes
framework mapping changes in a way that alters semantics

Forbidden

changing meaning without version bump
reusing deprecated CSI for new meaning
patch-style CSI versions like v1_0_1

D.7. Example CSIs

Example Valid CSIs

comp.PEF-ME.OUTPUT.CO2E.v1_0
vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0
inpt.FOGE-FORM.INPUT.WATER_USE.v1_0
repo.REPORT-BUILDER.VIEW.ESRS_DASHBOARD.v2_0
risk.RIF-CORE.EVENT.RISK_ALERT.v1_1

Example Invalid CSIs

calc.TAC.OUTPUT.CO2E.1_0

Invalid:

calc not approved
missing v

comp.pef-me.OUTPUT.CO2E.v1_0

Invalid:

component not uppercase frontmatter format

comp.PEF-ME.output.CO2E.v1_0

Invalid:

KIND not uppercase

comp.PEF-ME.OUTPUT.co2e.v1_0

Invalid:

NAME not uppercase

comp.PEF-ME.OUTPUT.CO2E.v1_0_1

Invalid:

CSI does not use patch versioning

D.8. CI Enforcement Spec

Required checks in CI

Every new or changed CSI should be validated against:

regex format
approved module code list
frontmatter/component registry lookup
duplicate/conflict detection in SSSR
version bump policy

Suggested CI failure messages

Invalid CSI: calc.TAC.OUTPUT.CO2E.1_0
Reason: module_code 'calc' is not registered. Use one of: inpt, comp, repo, siss, zara, zaam, risk, netz, vera, seel, acad.

Invalid CSI: comp.PEF-ME.OUTPUT.CO2E.1_0
Reason: version must use 'v<MAJOR>_<MINOR>' format, e.g. v1_0.

Invalid CSI: comp.PEF-ME.output.CO2E.v1_0
Reason: KIND must be one of INPUT, OUTPUT, SIGNAL, SCHEMA, CONFIG, FEATURE, METRIC, EVENT, VIEW.

D.9. SSSR Schema Enforcement Model

Since CSI does not have its own registry and lives inside SSSR, the cleanest model is:

keep CSI as a first-class field in signal_registry
validate it against:
- module registry
- component/frontmatter registry
- CSI rules
optionally decompose it into indexed columns

Recommended signal_registry Structure

CREATE TABLE signal_registry (
    signal_id                TEXT PRIMARY KEY,
    csi                      TEXT NOT NULL UNIQUE,

    module_code              TEXT NOT NULL,
    component_id             TEXT NOT NULL,
    kind                     TEXT NOT NULL,
    signal_name              TEXT NOT NULL,
    version_major            INTEGER NOT NULL,
    version_minor            INTEGER NOT NULL,

    display_name             TEXT NOT NULL,
    description              TEXT,
    value_type               TEXT,
    unit                     TEXT,
    status                   TEXT NOT NULL DEFAULT 'active',
    deprecated_by_csi        TEXT NULL,

    source_module_id         TEXT NULL,
    framework_tags           JSONB,
    metadata                 JSONB,

    created_at               TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by               TEXT,
    updated_at               TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

(NOTE: This is just a part of the full signal_registry table.)

D.10. Why store decomposed CSI columns too

Do not rely only on the full csi string.

Store:

module_code
component_id
kind
signal_name
version_major
version_minor

This gives us:

faster filtering
easier joins
easier governance
safer validation
better analytics

The full csi remains the canonical string, but the decomposed columns make the system operable.

D.11. Recommended Constraints

Module constraint

CHECK (module_code IN ('inpt','comp','repo','siss','zara','zaam','risk','netz','vera','seel','acad'))

Kind constraint

CHECK (kind IN ('INPUT','OUTPUT','SIGNAL','SCHEMA','CONFIG','FEATURE','METRIC','EVENT','VIEW'))

Version constraint

CHECK (version_major >= 0),
CHECK (version_minor >= 0)

CSI format constraint

If the DB supports regex checks:

CHECK (
  csi ~ '^(inpt|comp|repo|siss|zara|zaam|risk|netz|vera|seel|acad)\.([A-Z0-9]+(?:-[A-Z0-9]+)*)\.(INPUT|OUTPUT|SIGNAL|SCHEMA|CONFIG|FEATURE|METRIC|EVENT|VIEW)\.([A-Z][A-Z0-9_]*)\.v([0-9]+)_([0-9]+)$'
)

Canonical string consistency

Ensure decomposed fields match the csi string through trigger or generated column logic.

D.12. Recommended Foreign Keys

In the module/component documentation registry:

component_id REFERENCES documented_components(component_id)

And:

(module_code, component_id) REFERENCES documented_components(module_code, component_id)

This is the strongest way to enforce:

frontmatter linkage
documentation integrity
ZARA explainability compatibility

D.13. Suggested documented_components Table

Field	Purpose
component_id	Frontmatter ID, e.g. ZAR-FW, PEF-ME, TG-CORE
module_code	comp, vera, inpt, etc.
title	Human-readable title
slug	Docs route
source_file	MDX file path
doc_status	draft / review / active / deprecated
version	Spec version
owner_team	Responsible team
summary	One-paragraph description
parent_component_id	Optional link to parent spec/component
tags	Search/filter metadata
legacy_manual_ref	Optional backward reference
last_updated	Auditability / sync support

CREATE TABLE documented_components (
    component_id     TEXT PRIMARY KEY,
    module_code      TEXT NOT NULL,
    title            TEXT NOT NULL,
    slug             TEXT,
    doc_status       TEXT NOT NULL DEFAULT 'active',
    owner_team       TEXT,
    source_file      TEXT
);

Add uniqueness:

CREATE UNIQUE INDEX documented_components_module_component_uidx
ON documented_components(module_code, component_id);

D.14. Trigger Strategy

Use a trigger on sssr_signals insert/update to:

parse csi
validate segment values
populate decomposed fields
verify (module_code, component_id) exists in documented_components
reject semantic collisions

D.15. Pseudocode

on insert/update sssr_signals:
  parse csi into module_code, component_id, kind, signal_name, version_major, version_minor
  assert module_code in allowed_modules
  assert kind in allowed_kinds
  assert documented_components contains (module_code, component_id)
  assert no conflicting active signal with same csi
  assert versioning rules are respected
  write parsed fields back into columns

D.16. Recommended Indexes

CREATE UNIQUE INDEX sssr_signals_csi_uidx ON sssr_signals(csi);
CREATE INDEX sssr_signals_module_idx ON sssr_signals(module_code);
CREATE INDEX sssr_signals_component_idx ON sssr_signals(component_id);
CREATE INDEX sssr_signals_kind_idx ON sssr_signals(kind);
CREATE INDEX sssr_signals_name_idx ON sssr_signals(signal_name);
CREATE INDEX sssr_signals_status_idx ON sssr_signals(status);

D.17. Strong Governance Rules for SSSR

A signal record must not be created unless:

CSI is valid
component exists in documentation registry
semantic definition is present
display name is present
value type is defined for machine handling

A signal record must be deprecated instead of overwritten when:

meaning changes
framework logic changes materially
unit/value interpretation changes

A signal record may be revised in place only when:

documentation is clarified
metadata is enriched without semantic change

D.18. Best Practice: Canonical + Display Split

In SSSR, keep:

csi as canonical machine identity
display_name as human label
description as semantic definition

Example:

Field	Value
`csi`	`vera.TG-CORE.OUTPUT.TRUST_SCORE.v1_0`
`display_name`	Trust Score
`description`	Weighted trust index between 0 and 1 for signal-level or aggregate validation confidence.

This avoids semantic drift.

D.19. Final Recommendation

The strongest setup for ZAYAZ is:

CSI stays inside SSSR
CSI is validated by regex + registry + trigger
component linkage is enforced against frontmatter-derived documentation metadata
decomposed CSI fields are stored alongside the full canonical string
CI validates examples and schema changes before merge

That gives us:

documentation-linked identity
strong DB enforcement
reliable routing inputs
ZARA-readable architecture
audit-grade consistency

APPENDIX E — Signal Naming Governance Policy

E.1. Purpose

This appendix defines the governance policy for generating signal_name, classifying MODULE_CODE and KIND, and validating pre-version CSI structures for the ZAYAZ platform.

The policy is used by the Signal Classification Pipeline-assisted classification workflow and applies to all signal records prepared for insertion into the SSSR signal registry.

The workflow relies on structured context extracted from signal_registry and table_registry, including:

component title
component description
table description
column reference
column description
cleaned datatype
enum values or example content
other relevant metadata required for classification

This information is exported into a working csi_registry for classification and review. Once approved, the enriched results are written back into signal_registry, where the final CSI is assembled.

E.2. Governing Principles

The following principles apply throughout the classification process:

Classification before concatenation
Semantic classification must be completed before the full CSI is assembled.
Validation before review
Automated checks must run before human review is triggered.
JSON evidence before approval
Every processed column must produce a JSON evidence record.
Human review only where needed
Manual review is reserved for low-confidence or flagged cases.
Versioning remains outside the Signal Classification Pipeline
CSI versioning is assigned manually and appended later during Excel concatenation.
Semantics over storage
signal_name and KIND must reflect semantic intent, not merely physical column names or storage formats.

E.3. Signal Classification Pipeline

The Signal Classification Pipeline prepares SSSR signal metadata in a deterministic, reviewable, and auditable way before final CSI concatenation.

The pipeline stages are:

Datatype cleanup app
MODULE_CODE app
KIND app
SIGNAL_NAME app
Validator checks (pre-version only)
JSON export
Human review only for low-confidence cases
Excel concatenation

E.4 Pipeline Stages

E.4.1 Datatype Cleanup App

Normalizes:

base datatype
nullability
enum structure
scalar vs array
object vs text
reference vs reference-list semantics
timestamp/date conventions

Inputs:

source_data_type (signal_type)
column_description (signal_description)
sample_values
table_prefix
source_table

Outputs:

cleaned_data_type
datatype_normalization_notes
datatype_confidence

E.4.2 MODULE_CODE App

Classifies the correct module from the fixed approved module list (use Module Code):

MODULE_CODE Dictionary

inpt
comp
repo
siss
zara
zaam
risk
netz
vera
seel
acad

Rules:

MODULE_CODE must match one of the values above.
Module classification is component-governed, not column-level.
A component should map to exactly one module unless explicitly documented.

Inputs:

component title (component_name)
component description (component_description)
table description (table_notes, short_description)
table type hint (table_prefix)
owning component context = component_name + component_description + table context
existing component-to-module mapping where available (recommended external lookup table)

Outputs:

module_code
module_confidence
rationale

Rule: MODULE_CODE should normally be determined at the component or table level, not independently per column.

E.4.3 KIND App

Classifies the semantic role of each field using the KIND policy defined below.

Inputs:

table baseline context (table_notes, short_description)
table type hint (table_prefix)
column description (column_description / signal_description)
cleaned datatype (cleaned_data_type)
enum values or example content (sample_values)
module context (module_code)
component context (component_name, component_description)

Outputs:

kind
kind_confidence
rationale

E.4.4 SIGNAL_NAME App

Generates the curated semantic signal_name used as the NAME segment in the CSI.

Inputs:

column_reference
column_description (signal_description)
cleaned datatype (cleaned_data_type)
enum values or example content (sample_values)
table context (source_table, table_prefix, table_notes, short_description)
component context (component_name, component_description)
module context (module_code)
approved naming governance rules

Outputs:

signal_name
signal_name_confidence
naming basis
review flags if ambiguous

E.4.5 Validator Checks

Runs pre-version validation on:

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<SIGNAL_NAME>

Example: comp.AIIL-CON.CONFIG.METHOD_VERSION

Checks include:

module validity
component linkage
allowed KIND
naming policy compliance
duplicate collision detection
near-collision detection
reserved-word and anti-pattern checks

Outputs:

pre_version_key
is_valid
collision_check_result
near_collision_result
needs_review
review_reason

E.4.6. JSON Export

Exports one JSON record per processed column.

Recommended output file:

zarathustra-csi-proposals.json

This file serves as:

audit evidence
training data
QA input
migration/reference source

E.4.7. Human Review

Only low-confidence or flagged records are reviewed manually.

Typical review triggers include:

ambiguous KIND
weak or missing column descriptions
near-collision results
naming policy exceptions
low total confidence scores

E.4.8. Excel Concatenation

Approved values are pasted into Excel and concatenated into the final CSI:

<MODULE_CODE>.<COMPONENT_ID>.<KIND>.<SIGNAL_NAME>.v<MAJOR>_<MINOR>

Versioning remains manual.

E.5. Classification Governance Rules

E.5.1. MODULE_CODE Governance

MODULE_CODE is component-governed.

It must not be invented independently for each field.

For most tables, all columns should inherit the same module as the owning component.

Example:

component: AIIL-CON
table: compute_method_registry
module: comp

All signals in that table therefore inherit the comp.* namespace unless a documented exception exists.

E.5.2. KIND Governance

KIND is field-governed, but table-aware.

It must not be guessed from the column name alone.

A baseline kind may be established at table level, but field-level overrides are allowed and expected where the semantic role differs.

E.5.3. SIGNAL_NAME Governance

SIGNAL_NAME is field-governed and semantics-first.

It must not be copied blindly from the physical column name unless the physical name already expresses the correct semantic meaning according to policy.

column_reference remains the physical storage reference. signal_name is the curated semantic identifier. The NAME segment in the CSI is derived from signal_name, not from column_reference.

E.6. Confidence Model

E.6.1. MODULE_CODE confidence

Usually high confidence when:

the component is already mapped
the table description is clearly anchored
MDX/frontmatter context is available

Low confidence when:

the component spans multiple modules
descriptions are vague
ownership is unclear

E.6.2. KIND confidence

Usually high confidence when:

datatype and description align
the table has a clear semantic role
the field meaning is obvious

Low confidence when:

the field is generic (value, status, type, data)
classification is ambiguous between CONFIG and SCHEMA
classification is ambiguous between OUTPUT and METRIC
classification is ambiguous between SIGNAL and FEATURE

E.6.3. SIGNAL_NAME confidence

Usually high confidence when:

the field description is specific
the semantic meaning is clear
naming matches approved suffix and token conventions
no collision or near-collision exists

Low confidence when:

the field is generic
the description is weak
multiple expansions are plausible
the field could be interpreted in more than one semantic way

E.7. KIND Classification Policy

E.7.1. Purpose

The KIND segment classifies the semantic role of a field or signal within the ZAYAZ platform.

It is not merely a datatype label and not merely a UI label. It expresses the field’s functional role in context.

Approved KIND values are:

INPUT
OUTPUT
SIGNAL
SCHEMA
CONFIG
FEATURE
METRIC
EVENT
VIEW

Rules:

KIND must match one of the values above.
KIND is field-governed but table-aware.
A table may define a baseline KIND, but field-level overrides are allowed.

Important constraints:

*_SCHEMA_REF → must be SCHEMA
CREATED_AT, UPDATED_AT (in registry tables) → must be CONFIG
Numeric fields are not automatically METRIC

E.7.2. Core Principle

KIND must describe the semantic role of the field in the platform, not just how the field happens to be stored.

This means:

a schema reference is not automatically an INPUT or OUTPUT
a registry timestamp is not automatically an EVENT
a config row is not automatically a METRIC
a physical column name must not determine KIND by itself

E.7.3. Decision Hierarchy

The KIND app should classify in the following order:

A. Table or component context What kind of object is the table primarily describing?

Examples:

registry/config tables → baseline often CONFIG
runtime payload tables → may contain INPUT, OUTPUT, SIGNAL
analytical views → often VIEW or METRIC
event logs → often EVENT

B. Column semantic meaning What does the field actually represent?

Examples:

schema references → often SCHEMA
lifecycle metadata → often CONFIG
computed KPI values → often METRIC
derived model variables → often FEATURE

C. Datatype and shape Use cleaned datatype as a secondary signal, not the primary one.

Examples:

timestamp alone does not imply EVENT
JSON alone does not imply SCHEMA
enum alone does not imply CONFIG

E.7.4. KIND Definitions

INPUT Use when the field represents an input value or input-facing signal consumed by a process, engine, form, or model.

Typical examples:

activity input value
emissions input quantity
user-entered data field
machine-provided input signal

Do not use for:

references to input schemas
method configuration describing inputs in general

OUTPUT Use when the field represents a computed or emitted output value from a method, engine, or transformation.

Typical examples:

CO2E
TRUST_SCORE
VALIDATION_RESULT

Do not use for:

references to output schemas
report display metadata

SIGNAL Use when the field represents an atomic reusable signal that is neither best modeled as explicit input, explicit output, nor higher-level metric.

Use sparingly. Prefer INPUT, OUTPUT, or METRIC when those are clearly more accurate.

SCHEMA Use when the field defines, references, or primarily concerns schema structure or data contracts.

Typical examples:

INPUTS_SCHEMA_REF
OPTIONS_SCHEMA_REF
OUTPUT_SCHEMA_REF

CONFIG Use when the field represents configuration, registry metadata, lifecycle settings, implementation bindings, dependency metadata, or governance-related setup information.

Typical examples:

METHOD_ID
METHOD_NAME
METHOD_VERSION
LIFECYCLE_STATUS
IMPLEMENTATION_REF
MICRO_ENGINE_REF
DATASET_REQUIREMENTS
CREATED_AT
UPDATED_AT

FEATURE Use when the field represents a derived model feature used for ML/statistical processing rather than a business-facing metric.

METRIC Use when the field represents an aggregated KPI, score, index, benchmark, or business-facing measurement.

Typical examples:

TRUST_SCORE
ECO_SCORE
MATERIALITY_INDEX
ABATEMENT_COST

EVENT Use when the field belongs to an event record or explicitly represents an event signal or state-change record.

Do not use for:

CREATED_AT
UPDATED_AT

when those occur in registry/config tables.

VIEW Use when the field belongs to a read-model, dashboard projection, analytical presentation layer, or reporting-specific view model.

E.7.5. Baseline + Override Model

Do not force a rigid table-wide KIND.

Instead use:

a table-level baseline KIND
per-column overrides where justified

Example: compute_method_registry

Baseline:

CONFIG

Overrides:

INPUTS_SCHEMA_REF → SCHEMA
OPTIONS_SCHEMA_REF → SCHEMA
OUTPUT_SCHEMA_REF → SCHEMA

E.7.6. Example Classification for compute_method_registry

Field / signal_name	KIND
METHOD_ID	CONFIG
METHOD_NAME	CONFIG
METHOD_VERSION	CONFIG
LIFECYCLE_STATUS	CONFIG
METHOD_TYPE	CONFIG
DESCRIPTION	CONFIG
INPUTS_SCHEMA_REF	SCHEMA
OPTIONS_SCHEMA_REF	SCHEMA
OUTPUT_SCHEMA_REF	SCHEMA
IMPLEMENTATION_REF	CONFIG
MICRO_ENGINE_REF	CONFIG
ASSUMPTIONS_TEXT / ASSUMPTIONS_JSON	CONFIG
FRAMEWORK_REFS	CONFIG
DATASET_REQUIREMENTS	CONFIG
ACL_TAGS	CONFIG
CREATED_AT	CONFIG
UPDATED_AT	CONFIG

E.7.7. Confidence and Review Rules

Auto-accept when:

table purpose is clear
field description clearly matches one KIND
datatype aligns with the interpretation
no near-equal alternative KIND is plausible

Human review required when:

field is generic (value, type, status, data)
ambiguous between CONFIG and SCHEMA
ambiguous between OUTPUT and METRIC
ambiguous between SIGNAL and FEATURE
description is weak or missing

E.7.8. Hard Exclusions

The KIND app must never infer:

OUTPUT for *_SCHEMA_REF
INPUT for *_SCHEMA_REF
EVENT for CREATED_AT / UPDATED_AT in registry/config tables
METRIC only because a field is numeric

E.8. Controlled Classification Dictionaries

The following controlled dictionaries define the allowed values for MODULE_CODE, KIND, and table_prefix.

These tables serve as:

authoritative classification references
validation sources for the Signal Classification Pipeline
future candidates for formal registry tables in ZAR

E.8.1 MODULE_CODE Dictionary (zar.module_registry)

Module	Module Code	Domain	Description
Input Hub	`inpt`	Data Acquisition	Structured ESG input, onboarding, system capability mapping
Computation Hub	`comp`	Analytics	Cross-domain computation & modeling
Reports & Insights Hub	`repo`	Disclosure	Report generation, visualization, stakeholder outputs
SIS	`siss`	Services	Shared governance services
ZARA	`zara`	Governance AI	Prompt-driven ESG orchestration
ZAAM	`zaam`	AI Assistance	Role-aware agent system
RIF	`risk`	Risk	ESG risk intelligence & escalation
NETZERO	`netz`	Climate	Decarbonization modeling & pathways
Verification & Assurance	`vera`	Trust	Verifier workflows & assurance logic
SEEL	`seel`	Materiality	Stakeholder engagement & materiality
EcoWorld Academy	`acad`	Education	Capacity building & ESG fluency

Rules:

MODULE_CODE must match one of the values above.
Module classification is component-governed, not column-level.
A component should map to exactly one module unless explicitly documented.

zar.module_registry

CREATE TABLE zar.module_registry (
    module_code      TEXT PRIMARY KEY,
    module_name      TEXT NOT NULL,
    domain           TEXT NOT NULL,
    description      TEXT NOT NULL,
    sort_order       INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
    status           TEXT NOT NULL DEFAULT 'active',
    version          TEXT NOT NULL DEFAULT '1_0_0',
    source_doc_id    TEXT,
    approved_by      TEXT,
    notes            TEXT,
    created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT module_registry_module_code_chk
        CHECK (module_code ~ '^[a-z]{4}$'),

    CONSTRAINT module_registry_status_chk
        CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

    CONSTRAINT module_registry_version_chk
        CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$')
);

CREATE UNIQUE INDEX module_registry_module_name_uidx
    ON zar.module_registry (module_name);

Seed insert:

INSERT INTO zar.module_registry
(module_code, module_name, domain, description, sort_order, status, version)
VALUES
('inpt', 'Input Hub', 'Data Acquisition', 'Structured ESG input, onboarding, system capability mapping', 10, 'active', '1_0_0'),
('comp', 'Computation Hub', 'Analytics', 'Cross-domain computation & modeling', 20, 'active', '1_0_0'),
('repo', 'Reports & Insights Hub', 'Disclosure', 'Report generation, visualization, stakeholder outputs', 30, 'active', '1_0_0'),
('siss', 'SIS', 'Services', 'Shared governance services', 40, 'active', '1_0_0'),
('zara', 'ZARA', 'Governance AI', 'Prompt-driven ESG orchestration', 50, 'active', '1_0_0'),
('zaam', 'ZAAM', 'AI Assistance', 'Role-aware agent system', 60, 'active', '1_0_0'),
('risk', 'RIF', 'Risk', 'ESG risk intelligence & escalation', 70, 'active', '1_0_0'),
('netz', 'NETZERO', 'Climate', 'Decarbonization modeling & pathways', 80, 'active', '1_0_0'),
('vera', 'Verification & Assurance', 'Trust', 'Verifier workflows & assurance logic', 90, 'active', '1_0_0'),
('seel', 'SEEL', 'Materiality', 'Stakeholder engagement & materiality', 100, 'active', '1_0_0'),
('acad', 'EcoWorld Academy', 'Education', 'Capacity building & ESG fluency', 110, 'active', '1_0_0');

This table also exist as a JSON file that will be used for the Signal Classification Pipeline inputs: zarathustra-module-registry.json

E.8.2 KIND Dictionary (zar.kind_registry)

KIND	Description
`INPUT`	Input schema or raw signal
`OUTPUT`	Output schema or derived signal
`SIGNAL`	Atomic reusable signal
`SCHEMA`	JSON Schema or tabular schema reference
`CONFIG`	Configuration, registry metadata, or parameters
`FEATURE`	Derived ML feature
`METRIC`	Aggregated KPI or model output
`EVENT`	System event or state-change record
`VIEW`	Analytical or reporting view

Rules:

KIND must match one of the values above.
KIND is field-governed but table-aware.
A table may define a baseline KIND, but field-level overrides are allowed.

Important constraints:

*_SCHEMA_REF → must be SCHEMA
CREATED_AT, UPDATED_AT (in registry tables) → must be CONFIG
Numeric fields are not automatically METRIC

zar.kind_registry

CREATE TABLE zar.kind_registry (
    csi_kind              TEXT PRIMARY KEY,
    csi_kind_description  TEXT NOT NULL,
    semantic_role         TEXT,
    usage_notes           TEXT,
    sort_order            INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
    status                TEXT NOT NULL DEFAULT 'active',
    version               TEXT NOT NULL DEFAULT '1_0_0',
    source_doc_id         TEXT,
    approved_by           TEXT,
    notes                 TEXT,
    created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT kind_registry_csi_kind_chk
        CHECK (csi_kind IN (
            'INPUT',
            'OUTPUT',
            'SIGNAL',
            'SCHEMA',
            'CONFIG',
            'FEATURE',
            'METRIC',
            'EVENT',
            'VIEW'
        )),

    CONSTRAINT kind_registry_status_chk
        CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

    CONSTRAINT kind_registry_version_chk
        CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$')
);

Seed insert:

INSERT INTO zar.kind_registry
(csi_kind, csi_kind_description, semantic_role, usage_notes, sort_order, status, version)
VALUES
('INPUT',   'Input schema or raw signal',                      'Input-facing',     'Use for runtime or user/system-provided input values.', 10, 'active', '1_0_0'),
('OUTPUT',  'Output schema or derived signal',                 'Output-facing',    'Use for computed or emitted result values.', 20, 'active', '1_0_0'),
('SIGNAL',  'Atomic reusable signal',                          'Neutral semantic', 'Use sparingly when neither INPUT, OUTPUT, nor METRIC is the best fit.', 30, 'active', '1_0_0'),
('SCHEMA',  'JSON Schema or tabular schema reference',         'Structural',       'Use for schema-defining or schema-reference fields such as *_SCHEMA_REF.', 40, 'active', '1_0_0'),
('CONFIG',  'Configuration, registry metadata, or parameters', 'Configuration',    'Baseline kind for most registry and method-definition tables.', 50, 'active', '1_0_0'),
('FEATURE', 'Derived ML feature',                              'ML feature',       'Use for engineered features intended for models or scoring.', 60, 'active', '1_0_0'),
('METRIC',  'Aggregated KPI or model output',                  'Business metric',  'Use for KPIs, indexes, scores, and business-facing measurements.', 70, 'active', '1_0_0'),
('EVENT',   'System event or state-change record',             'Event-driven',     'Use for event logs, alerts, state transitions, and emitted event records.', 80, 'active', '1_0_0'),
('VIEW',    'Analytical or reporting view',                    'Presentation',     'Use for read-models, marts, dashboards, and reporting-facing projections.', 90, 'active', '1_0_0');

This table also exist as a JSON file that will be used for the Signal Classification Pipeline inputs: zarathustra-kind-registry.json

E.8.3 Table Prefix Dictionary (zar.table_prefix_registry)

Prefix	Description
`data_`	Legacy or raw general-purpose data
`dim_`	Dimension tables (countries, units, sectors)
`fact_`	Fact/event tables (emissions, indicators, executions)
`ref_`	Reference data (EFDB, NACE, method registries)
`stg_`	Staging tables (raw Excel/API imports)
`int_`	Intermediate tables (engine merge outputs)
`agg_`	Aggregated data (KPI rollups)
`mrt_`	Data marts (domain-tailored outputs)
`tmp_`	Temporary pipeline tables
`rl_`	Relation tables (many-to-many joins)
`eng_`	Engine outputs (computed results, scored outputs)
`mod_`	Module-owned business objects (user-facing state)
`sig_`	Signal registry tables (signal definitions in SSSR)

Usage in the Signal Classification Pipeline:

table_prefix is used as a strong heuristic signal for:
- KIND baseline classification
- table semantic role inference
- validation consistency checks

Examples:

ref_ → typically CONFIG or SCHEMA-heavy tables
fact_ → often EVENT, SIGNAL, or METRIC
eng_ → often OUTPUT, FEATURE, or METRIC
mrt_ → often VIEW or METRIC

zar.table_prefix_registry

CREATE TABLE zar.table_prefix_registry (
    table_prefix       TEXT PRIMARY KEY,
    table_prefix_desc  TEXT NOT NULL,
    baseline_kind_hint TEXT[],
    usage_notes        TEXT,
    sort_order         INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
    status             TEXT NOT NULL DEFAULT 'active',
    version            TEXT NOT NULL DEFAULT '1_0_0',
    source_doc_id      TEXT,
    approved_by        TEXT,
    notes              TEXT,
    created_at         TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at         TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT table_prefix_registry_prefix_chk
        CHECK (table_prefix ~ '^[a-z]+_$'),

    CONSTRAINT table_prefix_registry_status_chk
        CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

    CONSTRAINT table_prefix_registry_version_chk
        CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$')
);

Seed insert:

INSERT INTO zar.table_prefix_registry
(table_prefix, table_prefix_desc, baseline_kind_hint, usage_notes, sort_order, status, version)
VALUES
('data_', 'Legacy / Raw general', ARRAY['SIGNAL','CONFIG'], 'General-purpose raw or inherited data structures.', 10, 'active', '1_0_0'),
('dim_',  'Dimensions (Countries, Units, Sectors)', ARRAY['CONFIG'], 'Reference-like dimensional structures used for joins and classification.', 20, 'active', '1_0_0'),
('fact_', 'Facts (events) (Emissions, indicators, executions)', ARRAY['EVENT','SIGNAL','METRIC'], 'Fact-style records often contain runtime observations, events, or measured outputs.', 30, 'active', '1_0_0'),
('ref_',  'Reference data (EFDB, NACE, method registries)', ARRAY['CONFIG','SCHEMA'], 'Reference and registry tables, often configuration-heavy with schema references.', 40, 'active', '1_0_0'),
('stg_',  'Staging (Raw Excel / API imports)', ARRAY['INPUT','SIGNAL'], 'Landing-zone data pending normalization or transformation.', 50, 'active', '1_0_0'),
('int_',  'Intermediate (Engine merge outputs)', ARRAY['SIGNAL','OUTPUT'], 'Intermediate computation structures between raw and final outputs.', 60, 'active', '1_0_0'),
('agg_',  'Aggregates (KPI rollups)', ARRAY['METRIC'], 'Aggregated KPI or rollup outputs.', 70, 'active', '1_0_0'),
('mrt_',  'Data marts (Domain-tailored outputs)', ARRAY['VIEW','METRIC'], 'Domain-facing analytical outputs and reporting structures.', 80, 'active', '1_0_0'),
('tmp_',  'Temporary (Pipeline intermediates)', ARRAY['SIGNAL','CONFIG'], 'Ephemeral pipeline support structures.', 90, 'active', '1_0_0'),
('rl_',   'Pure join tables / Relations (Many-to-many links)', ARRAY['CONFIG'], 'Relationship and join support tables.', 100, 'active', '1_0_0'),
('eng_',  'Outputs produced by computation engines (algorithmic results, scored outputs)', ARRAY['OUTPUT','FEATURE','METRIC'], 'Engine-produced computed outputs.', 110, 'active', '1_0_0'),
('mod_',  'Module-owned output tables (business objects, user-facing module state)', ARRAY['CONFIG','VIEW'], 'Business-object or user-facing module state tables.', 120, 'active', '1_0_0'),
('sig_',  'Signals registry (Signal definitions)', ARRAY['SIGNAL','CONFIG'], 'Signal-definition and metadata registry structures.', 130, 'active', '1_0_0');

This table also exist as a JSON file that will be used for the Signal Classification Pipeline inputs: zarathustra-table-prefix-registry.json

E.8.5 zar.component_module_map

This stabilize the entire Signal Classification Pipeline.

MODULE_CODE app = deterministic lookup

CREATE TABLE zar.component_module_map (
    component_id   TEXT PRIMARY KEY,
    module_code    TEXT NOT NULL,
    confidence     NUMERIC(3,2) DEFAULT 1.00,
    source         TEXT DEFAULT 'manual',
    notes          TEXT,
    sort_order     INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
    status         TEXT NOT NULL DEFAULT 'active',
    version        TEXT NOT NULL DEFAULT '1_0_0',
    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT fk_component_module
        FOREIGN KEY (module_code)
        REFERENCES zar.module_registry (module_code),

    CONSTRAINT component_module_status_chk
        CHECK (status IN ('active', 'deprecated', 'draft'))
);

This table becomes:

The authoritative bridge between documentation (components) and runtime classification (modules)

Used by:

Signal Classification Pipeline MODULE_CODE
Validator
ZARA explainability
Auditors

A few seed entries (example)

INSERT INTO zar.component_module_map
(component_id, module_code, confidence, source, sort_order)
VALUES
('AIIL-CON', 'comp', 0.99, 'manual', 10),
('ZAR-FW',   'siss', 0.95, 'manual', 20),
('TG-CORE',  'vera', 0.99, 'manual', 30);

Test of first real join

SELECT 
    c.component_id,
    c.module_code,
    m.module_name
FROM zar.component_module_map c
JOIN zar.module_registry m
  ON c.module_code = m.module_code;

Output:

zar.documented_component_registry

The zar.documented_component_registry gives us:

canonical component_id
title
source MDX path
owner
status
stronger linkage for component_module_map
future ZARA explainability lookup

CREATE TABLE zar.documented_component_registry (
    component_id        TEXT PRIMARY KEY,
    component_title     TEXT NOT NULL,
    module_code         TEXT NOT NULL,
    source_doc_id       TEXT,
    source_file         TEXT,
    slug                TEXT,
    owner_team          TEXT,
    status              TEXT NOT NULL DEFAULT 'active',
    version             TEXT NOT NULL DEFAULT '1_0_0',
    sort_order          INTEGER NOT NULL DEFAULT 100 CHECK (sort_order >= 0),
    notes               TEXT,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT documented_component_status_chk
        CHECK (status IN ('active', 'deprecated', 'draft', 'retired')),

    CONSTRAINT documented_component_version_chk
        CHECK (version ~ '^[0-9]+_[0-9]+_[0-9]+$'),

    CONSTRAINT fk_documented_component_module
        FOREIGN KEY (module_code)
        REFERENCES zar.module_registry (module_code)
);

Architecture

ZAR Governance Layer (v1)

Layer	Table	Purpose
Module taxonomy	module_registry	System domains
Signal semantics	kind_registry	Field roles
Data structure	table_prefix_registry	Table meaning
Component mapping	component_module_map	System wiring

This is what powers the Signal Classification Pipeline + CSI + ZARA

Example validation query

SELECT *
FROM zar.module_registry
WHERE module_code = 'comp';

E.8.4 Design Note

These dictionaries should be treated as:

controlled vocabularies
validation constraints in the Signal Classification Pipeline
future candidates for formal ZAR registry tables

Over time, they should be promoted into:

module_registry
kind_registry
table_type_registry

within ZAR for full governance and traceability.

Prefix	Description
`data_`	Legacy or raw general-purpose data
`dim_`	Dimension tables (countries, units, sectors)
`fact_`	Fact/event tables (emissions, indicators, executions)
`ref_`	Reference data (EFDB, NACE, method registries)
`stg_`	Staging tables (raw Excel/API imports)
`int_`	Intermediate tables (engine merge outputs)
`agg_`	Aggregated data (KPI rollups)
`mrt_`	Data marts (domain-tailored outputs)
`tmp_`	Temporary pipeline tables
`rl_`	Relation tables (many-to-many joins)
`eng_`	Engine outputs (computed results, scored outputs)
`mod_`	Module-owned business objects (user-facing state)
`sig_`	Signal registry tables (signal definitions in SSSR)

Usage in the Signal Classification Pipeline:

table_prefix is used as a strong heuristic signal for:
- KIND baseline classification
- table semantic role inference
- validation consistency checks

Examples:

ref_ → typically CONFIG or SCHEMA-heavy tables
fact_ → often EVENT, SIGNAL, or METRIC
eng_ → often OUTPUT, FEATURE, or METRIC
mrt_ → often VIEW or METRIC

E.9. JSON Evidence Record

Each processed column must produce a JSON evidence record.

Recommended structure:

zarathustra-csi-proposals.jsonGitHub ↗
{
  "component_id": "AIIL-CON",
  "table_name": "compute_method_registry",
  "column_reference": "version",
  "column_description": "Semantic version of the method implementation and schema (e.g., 1.0.0). Enables side-by-side versions.",
  "data_type": "text",
  "module_code": "comp",
  "kind": "CONFIG",
  "signal_name": "METHOD_VERSION",
  "confidence_scores": {
    "module_confidence": 0.91,
    "kind_confidence": 0.97,
    "signal_name_confidence": 0.92,
    "total_score": 0.94
  },
  "review_reason": null,
  "existing_similar_signals": [],
  "datatype_normalization_notes": "No datatype normalization required. Source type 'text' retained.",
  "naming_basis": [
    "column_description indicates semantic version of method",
    "generic VERSION expanded to domain-specific METHOD_VERSION",
    "matches existing suffix conventions"
  ],
  "needs_review": false,
  "pre_version_key": "comp.AIIL-CON.CONFIG.METHOD_VERSION",
  "suggested_csi_pattern": "comp.AIIL-CON.CONFIG.METHOD_VERSION.v<MAJOR>_<MINOR>",
  "collision_check_result": "no_conflict",
  "near_collision_result": [],
  "generated_at": "2026-03-24T12:00:00Z",
  "generator_version": "zarathustra-naming-0.1.0"
}

zarathustra-csi-proposals.json

E.10. Summary

The Signal Naming Governance Policy ensures that ZAYAZ generates MODULE_CODE, KIND, and signal_name in a disciplined, explainable, and reviewable manner before final CSI concatenation.

It exists to ensure:

semantic consistency across the SSSR
documentation-linked traceability
reduced naming drift
collision prevention
auditable AI-assisted classification

APPENDIX F - Query Results - Tests

F.1. Inspect the latest view to confirm the new run outputs landed correctly

SELECT
    row_id,
    source_signal_id,
    column_reference,
    cleaned_data_type,
    module_code,
    kind,
    signal_name,
    pre_version_key,
    is_valid
FROM zar.v_codex_signal_registry_latest
ORDER BY row_id;

F.2. Follow up query

SELECT
    row_id,
    column_reference,
    kind,
    kind_confidence,
    kind_rationale,
    kind_needs_review,
    kind_review_reason
FROM zar.v_codex_signal_registry_latest
ORDER BY row_id;

row_id | column_reference | kind | kind_confidence | kind_rationale | kind_needs_review | kind_review_reason --------+----------------------+--------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+-------------------- 1 | method_id | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 2 | method_name | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 3 | version | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 4 | status | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 5 | method_type | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 6 | description | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 7 | inputs_schema_json | SCHEMA | 0.980 | Schema-reference field override. | f | | 8 | options_schema_json | SCHEMA | 0.980 | Schema-reference field override. | f | | 9 | output_schema_json | SCHEMA | 0.980 | Schema-reference field override. | f | | 10 | implementation_ref | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 11 | micro_engine_ref | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 12 | assumptions_json | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 13 | framework_refs | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 14 | dataset_requirements | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 15 | acl_tags | CONFIG | 0.960 | Baseline classification from zar.table_prefix_registry primary_kind_hint=CONFIG for table_prefix=ref_. Secondary hints: ['SCHEMA']. trust_score=0.960. | f | | 16 | created_at | CONFIG | 0.960 | Registry timestamp field classified as CONFIG. | f | | 17 | updated_at | CONFIG | 0.960 | Registry timestamp field classified as CONFIG. | f |

GitHub Repo Request for Change (RFC)

1. Introduction​

2. Role Within the ZAYAZ Architecture​

3. The ZAYAZ Identifier System​

4. Canonical System Identifiers (CSI)​

5. What ZAR Registers​

6. End-to-End Traceability​

7. Core Capabilities​

8. Design Principles​

9. Strategic Positioning​

10. Transition to Canonical Identifier Architecture​

APPENDIX A - CSI Naming Taxonomy​

A.1. KIND​

A.2. NAME conventions (semantic or technical label)​

APPENDIX B - CMI Naming Taxonomy​

B.1. KIND​

B.2. NAME​

APPENDIX C - The Birth of a Signal​

APPENDIX D - CSI Validation & SSSR Enforcement​

D.1. Purpose​

D.2. Canonical CSI Format​

D.3. Field Rules​

D.4. Regex​

D.5. Validation Levels​

D.6. Versioning Rules​

D.7. Example CSIs​

D.8. CI Enforcement Spec​

D.9. SSSR Schema Enforcement Model​

D.10. Why store decomposed CSI columns too​

D.11. Recommended Constraints​

D.12. Recommended Foreign Keys​

D.13. Suggested documented_components Table​

D.14. Trigger Strategy​

D.15. Pseudocode​

D.16. Recommended Indexes​

D.17. Strong Governance Rules for SSSR​

D.18. Best Practice: Canonical + Display Split​

D.19. Final Recommendation​

APPENDIX E — Signal Naming Governance Policy​

E.1. Purpose​

E.2. Governing Principles​

E.3. Signal Classification Pipeline​

E.4 Pipeline Stages​

E.4.1 Datatype Cleanup App​

E.4.2 MODULE_CODE App​

E.4.3 KIND App​

E.4.4 SIGNAL_NAME App​

E.4.5 Validator Checks​

E.4.6. JSON Export​

E.4.7. Human Review​

E.4.8. Excel Concatenation​

E.5. Classification Governance Rules​

E.5.1. MODULE_CODE Governance​

E.5.2. KIND Governance​

E.5.3. SIGNAL_NAME Governance​

E.6. Confidence Model​

E.6.1. MODULE_CODE confidence​

E.6.2. KIND confidence​

E.6.3. SIGNAL_NAME confidence​

E.7. KIND Classification Policy​

E.7.1. Purpose​

E.7.2. Core Principle​

E.7.3. Decision Hierarchy​

E.7.4. KIND Definitions​

E.7.5. Baseline + Override Model​

E.7.6. Example Classification for compute_method_registry​

E.7.7. Confidence and Review Rules​

E.7.8. Hard Exclusions​

E.8. Controlled Classification Dictionaries​

E.8.1 MODULE_CODE Dictionary (zar.module_registry)​

E.8.2 KIND Dictionary (zar.kind_registry)​

E.8.3 Table Prefix Dictionary (zar.table_prefix_registry)​

E.8.5 zar.component_module_map​

E.8.4 Design Note​

E.9. JSON Evidence Record​

E.10. Summary​

APPENDIX F - Query Results - Tests​

F.1. Inspect the latest view to confirm the new run outputs landed correctly​

F.2. Follow up query​

1. Introduction

2. Role Within the ZAYAZ Architecture

3. The ZAYAZ Identifier System

4. Canonical System Identifiers (CSI)

5. What ZAR Registers

6. End-to-End Traceability

7. Core Capabilities

8. Design Principles

9. Strategic Positioning

10. Transition to Canonical Identifier Architecture

APPENDIX A - CSI Naming Taxonomy

A.1. KIND

A.2. NAME conventions (semantic or technical label)

APPENDIX B - CMI Naming Taxonomy

B.1. KIND

B.2. NAME

APPENDIX C - The Birth of a Signal

APPENDIX D - CSI Validation & SSSR Enforcement

D.1. Purpose

D.2. Canonical CSI Format

D.3. Field Rules

D.4. Regex

D.5. Validation Levels

D.6. Versioning Rules

D.7. Example CSIs

D.8. CI Enforcement Spec

D.9. SSSR Schema Enforcement Model

D.10. Why store decomposed CSI columns too

D.11. Recommended Constraints

D.12. Recommended Foreign Keys

D.13. Suggested documented_components Table

D.14. Trigger Strategy

D.15. Pseudocode

D.16. Recommended Indexes

D.17. Strong Governance Rules for SSSR

D.18. Best Practice: Canonical + Display Split

D.19. Final Recommendation

APPENDIX E — Signal Naming Governance Policy

E.1. Purpose

E.2. Governing Principles

E.3. Signal Classification Pipeline

E.4 Pipeline Stages

E.4.1 Datatype Cleanup App

E.4.2 MODULE_CODE App

E.4.3 KIND App

E.4.4 SIGNAL_NAME App

E.4.5 Validator Checks

E.4.6. JSON Export

E.4.7. Human Review

E.4.8. Excel Concatenation

E.5. Classification Governance Rules

E.5.1. MODULE_CODE Governance

E.5.2. KIND Governance

E.5.3. SIGNAL_NAME Governance

E.6. Confidence Model

E.6.1. MODULE_CODE confidence

E.6.2. KIND confidence

E.6.3. SIGNAL_NAME confidence

E.7. KIND Classification Policy

E.7.1. Purpose

E.7.2. Core Principle

E.7.3. Decision Hierarchy

E.7.4. KIND Definitions

E.7.5. Baseline + Override Model

E.7.6. Example Classification for compute_method_registry

E.7.7. Confidence and Review Rules

E.7.8. Hard Exclusions

E.8. Controlled Classification Dictionaries

E.8.1 MODULE_CODE Dictionary (zar.module_registry)

E.8.2 KIND Dictionary (zar.kind_registry)

E.8.3 Table Prefix Dictionary (zar.table_prefix_registry)

E.8.5 zar.component_module_map

E.8.4 Design Note

E.9. JSON Evidence Record

E.10. Summary

APPENDIX F - Query Results - Tests

F.1. Inspect the latest view to confirm the new run outputs landed correctly

F.2. Follow up query