Skip to main content
Jira progress: loading…

ZHTRA-SIGNAL

ZAR Signal Registry Population (CODEX → CSI → USO → TrustGate)

1. Purpose

The ZAR Signal Registry Population Pipeline is the link between ZARATHUSTRA and CSI/USO/Trustgate and defines how raw source data is transformed into:

  • Canonical Signals (SSSR)
  • Canonical Signal Identifiers (CSI)
  • Universal Signal Ontology Types (USO Types)
  • Runtime Lineage Records (USO Instances)
  • TrustGate Telemetry Events

This pipeline is the foundation of auditability within ZAYAZ.

It ensures that every signal:

  • has a validated structure
  • is semantically classified
  • is traceable end-to-end
  • can be audited against ESRS / IPCC / PEF

2. Pipeline Overview


3. CODEX Working Pipeline

The pipeline begins in:

zar.codex_signal_registry_working

Each row represents a candidate signal derived from:

  • database schema
  • Excel ingestion
  • API metadata
  • documentation (MDX)

4. Latest Results View (Core Review Interface)

To unify all pipeline stages, the system exposes:

zar.v_codex_signal_registry_latest

This view provides one row per source signal, enriched with:

  • datatype normalization
  • module classification
  • kind classification
  • signal naming
  • validation results

Key Design Principle

Each stage resolves using:

LEFT JOIN LATERAL (...) ORDER BY created_at DESC, result_id DESC LIMIT 1

This ensures:

  • deterministic “latest result”
  • stable review surface
  • no ambiguity in pipeline state

5. Core SQL View

v-codex-signal-registry-latest.sqlGitHub ↗
CREATE OR REPLACE VIEW zar.v_codex_signal_registry_latest AS
SELECT
w.row_id,

-- Source / canonical working fields
w.source_table,
w.table_prefix,
w.table_notes,
w.short_description,
w.component_name,
w.component_description,
w.column_reference,
w.column_description,
w.source_data_type,
w.sample_values,
w.source_doc_id,
w.source_file,
w.slug,
w.status AS working_status,
w.version AS working_version,
w.sort_order,
w.created_at AS working_created_at,
w.updated_at AS working_updated_at,

-- Datatype cleanup (latest)
d.result_id AS datatype_result_id,
d.run_id AS datatype_run_id,
d.cleaned_data_type,
d.datatype_normalization_notes,
d.datatype_confidence,
d.needs_review AS datatype_needs_review,
d.review_reason AS datatype_review_reason,
d.created_at AS datatype_created_at,

-- Module classification (latest)
m.result_id AS module_result_id,
m.run_id AS module_run_id,
m.module_code,
m.module_confidence,
m.module_rationale,
m.needs_review AS module_needs_review,
m.review_reason AS module_review_reason,
m.created_at AS module_created_at,

-- Kind classification (latest)
k.result_id AS kind_result_id,
k.run_id AS kind_run_id,
k.kind,
k.kind_confidence,
k.kind_rationale,
k.needs_review AS kind_needs_review,
k.review_reason AS kind_review_reason,
k.created_at AS kind_created_at,

-- Signal name generation (latest)
s.result_id AS signal_name_result_id,
s.run_id AS signal_name_run_id,
s.signal_name,
s.signal_name_confidence,
s.naming_basis,
s.needs_review AS signal_name_needs_review,
s.review_reason AS signal_name_review_reason,
s.created_at AS signal_name_created_at,

-- Validator results (latest)
v.result_id AS validator_result_id,
v.run_id AS validator_run_id,
v.pre_version_key,
v.is_valid,
v.collision_check_result,
v.near_collision_result,
v.suggested_csi_pattern,
v.needs_review AS validator_needs_review,
v.review_reason AS validator_review_reason,
v.created_at AS validator_created_at

FROM zar.codex_signal_registry_working w

LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_datatype_results d
WHERE d.row_id = w.row_id
AND d.status = 'active'
ORDER BY d.created_at DESC, d.result_id DESC
LIMIT 1
) d ON TRUE

LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_module_results m
WHERE m.row_id = w.row_id
AND m.status = 'active'
ORDER BY m.created_at DESC, m.result_id DESC
LIMIT 1
) m ON TRUE

LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_kind_results k
WHERE k.row_id = w.row_id
AND k.status = 'active'
ORDER BY k.created_at DESC, k.result_id DESC
LIMIT 1
) k ON TRUE

LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_signal_name_results s
WHERE s.row_id = w.row_id
AND s.status = 'active'
ORDER BY s.created_at DESC, s.result_id DESC
LIMIT 1
) s ON TRUE

LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_validator_results v
WHERE v.row_id = w.row_id
AND v.status = 'active'
ORDER BY v.created_at DESC, v.result_id DESC
LIMIT 1
) v ON TRUE;

Quick test

After creating it, run:

SELECT *
FROM zar.v_codex_signal_registry_latest
ORDER BY row_id
LIMIT 20;

Very useful filtered tests

Show rows still missing stage outputs

SELECT row_id, component_name, column_reference
FROM zar.v_codex_signal_registry_latest
WHERE cleaned_data_type IS NULL
OR module_code IS NULL
OR kind IS NULL
OR signal_name IS NULL
ORDER BY row_id;

Show rows needing review

SELECT
row_id,
component_name,
column_reference,
datatype_needs_review,
module_needs_review,
kind_needs_review,
signal_name_needs_review,
validator_needs_review
FROM zar.v_codex_signal_registry_latest
WHERE datatype_needs_review = TRUE
OR module_needs_review = TRUE
OR kind_needs_review = TRUE
OR signal_name_needs_review = TRUE
OR validator_needs_review = TRUE
ORDER BY row_id;

Show rows ready for Excel concatenation

SELECT
row_id,
module_code,
component_name,
kind,
signal_name,
pre_version_key,
suggested_csi_pattern
FROM zar.v_codex_signal_registry_latest
WHERE module_code IS NOT NULL
AND kind IS NOT NULL
AND signal_name IS NOT NULL
AND (is_valid IS TRUE OR is_valid IS NULL)
ORDER BY row_id;

6. Operational Queries

6.1. Missing Pipeline Outputs

missing-pipeline-outputs.sqlGitHub ↗
SELECT row_id, component_name, column_reference
FROM zar.v_codex_signal_registry_latest
WHERE cleaned_data_type IS NULL
OR module_code IS NULL
OR kind IS NULL
OR signal_name IS NULL;

6.2. Review Queue

review-queue.sqlGitHub ↗
SELECT *
FROM zar.v_codex_signal_registry_latest
WHERE datatype_needs_review = TRUE
OR module_needs_review = TRUE
OR kind_needs_review = TRUE
OR signal_name_needs_review = TRUE
OR validator_needs_review = TRUE;

6.3. Ready for CSI Generation

ready-for-csi-generation.sqlGitHub ↗
SELECT *
FROM zar.v_codex_signal_registry_latest
WHERE module_code IS NOT NULL
AND kind IS NOT NULL
AND signal_name IS NOT NULL;

7. Transition to Signal Publish

Once a signal passes validation:

POST /admin/signal-registry/proposals/publish

This triggers:

7.1. Signal Registry Insert

zar.signal_registry

7.2. CSI Binding

zar.signal_csi_binding

7.3. USO Type Resolution

zar.uso_type_registry

7.4. USO Instance Creation (Signal Birth)

zar.uso_instance


8. USO Integration (New Architecture)

8.1. At Publish Time

The system:

  1. Resolves or creates a USO Type
  2. Creates a USO Instance (ULID)
  3. Links:
    • CMI (origin)
    • CSI (signal type)
    • USO Type (semantic class)

8.2. Result

Each signal becomes:

CMI → CSI → USO Type → USO Instance

This is the minimum unit of auditability.


9. TrustGate Telemetry Integration

After signal creation, runtime events are captured via:

POST /admin/trustgate/telemetry/ingest

Example BASH

curl -X POST \
-H "Authorization: Bearer <ADMIN_TOKEN>" \
-H "Content-Type: application/json" \
https://zayaz-search-api.fly.dev/admin/trustgate/telemetry/ingest \
-d '{
"signal_id": "sssr:compute_method_registry.created_at",
"trustgate_stage": "validation",
"trustgate_decision": "pass",
"trustgate_status": "observed",
"trust_score": 0.9825,
"policy_id": "TG-POLICY-SIGNAL-VALIDATION-v1",
"policy_result": "pass",
"validator_id": "trustgate.signal.required_fields.v1",
"validator_result": "pass",
"updated_by": "cto@viroway.com"
}'

Stored In

zar.trustgate_telemetry

Purpose

  • track signal validation lifecycle
  • compute trust scores
  • enable audit replay
  • support AI reasoning

10. Additional View

Create a review queue view:

zar.v_codex_signal_registry_review_queue

Should include:

  • low-confidence signals
  • validation failures
  • collisions
  • missing pipeline outputs

This becomes the primary QA interface.


11. Data Ingestion (Bootstrap)

Example BASH:

\COPY zar.codex_signal_registry_working(
source_signal_id,
source_table,
table_prefix,
table_notes,
short_description,
component_name,
component_description,
column_reference,
column_description,
source_data_type,
sample_values
)
FROM '/path/to/file.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ';');

Diagrams

CODEX Signal Registry Population Flow


Signal Publish and Identity Binding Flow


Full Signal Identity Chain


USO Type Resolution Flow


CODEX Signal Registry Population Flow


USO Instance and Origin Chain Flow


TrustGate Telemetry Ingestion Flow


Ask ZARA CMI Proposal Flow


Complete ZAYAZ Governance Chain




GitHub RepoRequest for Change (RFC)