ZHTRA-SIGNAL
ZAR Signal Registry Population (CODEX → CSI → USO → TrustGate)
1. Purpose
The ZAR Signal Registry Population Pipeline is the link between ZARATHUSTRA and CSI/USO/Trustgate and defines how raw source data is transformed into:
- Canonical Signals (SSSR)
- Canonical Signal Identifiers (CSI)
- Universal Signal Ontology Types (USO Types)
- Runtime Lineage Records (USO Instances)
- TrustGate Telemetry Events
This pipeline is the foundation of auditability within ZAYAZ.
It ensures that every signal:
- has a validated structure
- is semantically classified
- is traceable end-to-end
- can be audited against ESRS / IPCC / PEF
2. Pipeline Overview
3. CODEX Working Pipeline
The pipeline begins in:
zar.codex_signal_registry_working
Each row represents a candidate signal derived from:
- database schema
- Excel ingestion
- API metadata
- documentation (MDX)
4. Latest Results View (Core Review Interface)
To unify all pipeline stages, the system exposes:
zar.v_codex_signal_registry_latest
This view provides one row per source signal, enriched with:
- datatype normalization
- module classification
- kind classification
- signal naming
- validation results
Key Design Principle
Each stage resolves using:
LEFT JOIN LATERAL (...) ORDER BY created_at DESC, result_id DESC LIMIT 1
This ensures:
- deterministic “latest result”
- stable review surface
- no ambiguity in pipeline state
5. Core SQL View
CREATE OR REPLACE VIEW zar.v_codex_signal_registry_latest AS
SELECT
w.row_id,
-- Source / canonical working fields
w.source_table,
w.table_prefix,
w.table_notes,
w.short_description,
w.component_name,
w.component_description,
w.column_reference,
w.column_description,
w.source_data_type,
w.sample_values,
w.source_doc_id,
w.source_file,
w.slug,
w.status AS working_status,
w.version AS working_version,
w.sort_order,
w.created_at AS working_created_at,
w.updated_at AS working_updated_at,
-- Datatype cleanup (latest)
d.result_id AS datatype_result_id,
d.run_id AS datatype_run_id,
d.cleaned_data_type,
d.datatype_normalization_notes,
d.datatype_confidence,
d.needs_review AS datatype_needs_review,
d.review_reason AS datatype_review_reason,
d.created_at AS datatype_created_at,
-- Module classification (latest)
m.result_id AS module_result_id,
m.run_id AS module_run_id,
m.module_code,
m.module_confidence,
m.module_rationale,
m.needs_review AS module_needs_review,
m.review_reason AS module_review_reason,
m.created_at AS module_created_at,
-- Kind classification (latest)
k.result_id AS kind_result_id,
k.run_id AS kind_run_id,
k.kind,
k.kind_confidence,
k.kind_rationale,
k.needs_review AS kind_needs_review,
k.review_reason AS kind_review_reason,
k.created_at AS kind_created_at,
-- Signal name generation (latest)
s.result_id AS signal_name_result_id,
s.run_id AS signal_name_run_id,
s.signal_name,
s.signal_name_confidence,
s.naming_basis,
s.needs_review AS signal_name_needs_review,
s.review_reason AS signal_name_review_reason,
s.created_at AS signal_name_created_at,
-- Validator results (latest)
v.result_id AS validator_result_id,
v.run_id AS validator_run_id,
v.pre_version_key,
v.is_valid,
v.collision_check_result,
v.near_collision_result,
v.suggested_csi_pattern,
v.needs_review AS validator_needs_review,
v.review_reason AS validator_review_reason,
v.created_at AS validator_created_at
FROM zar.codex_signal_registry_working w
LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_datatype_results d
WHERE d.row_id = w.row_id
AND d.status = 'active'
ORDER BY d.created_at DESC, d.result_id DESC
LIMIT 1
) d ON TRUE
LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_module_results m
WHERE m.row_id = w.row_id
AND m.status = 'active'
ORDER BY m.created_at DESC, m.result_id DESC
LIMIT 1
) m ON TRUE
LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_kind_results k
WHERE k.row_id = w.row_id
AND k.status = 'active'
ORDER BY k.created_at DESC, k.result_id DESC
LIMIT 1
) k ON TRUE
LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_signal_name_results s
WHERE s.row_id = w.row_id
AND s.status = 'active'
ORDER BY s.created_at DESC, s.result_id DESC
LIMIT 1
) s ON TRUE
LEFT JOIN LATERAL (
SELECT *
FROM zar.codex_validator_results v
WHERE v.row_id = w.row_id
AND v.status = 'active'
ORDER BY v.created_at DESC, v.result_id DESC
LIMIT 1
) v ON TRUE;
Quick test
After creating it, run:
SELECT *
FROM zar.v_codex_signal_registry_latest
ORDER BY row_id
LIMIT 20;
Very useful filtered tests
Show rows still missing stage outputs
SELECT row_id, component_name, column_reference
FROM zar.v_codex_signal_registry_latest
WHERE cleaned_data_type IS NULL
OR module_code IS NULL
OR kind IS NULL
OR signal_name IS NULL
ORDER BY row_id;
Show rows needing review
SELECT
row_id,
component_name,
column_reference,
datatype_needs_review,
module_needs_review,
kind_needs_review,
signal_name_needs_review,
validator_needs_review
FROM zar.v_codex_signal_registry_latest
WHERE datatype_needs_review = TRUE
OR module_needs_review = TRUE
OR kind_needs_review = TRUE
OR signal_name_needs_review = TRUE
OR validator_needs_review = TRUE
ORDER BY row_id;
Show rows ready for Excel concatenation
SELECT
row_id,
module_code,
component_name,
kind,
signal_name,
pre_version_key,
suggested_csi_pattern
FROM zar.v_codex_signal_registry_latest
WHERE module_code IS NOT NULL
AND kind IS NOT NULL
AND signal_name IS NOT NULL
AND (is_valid IS TRUE OR is_valid IS NULL)
ORDER BY row_id;
6. Operational Queries
6.1. Missing Pipeline Outputs
SELECT row_id, component_name, column_reference
FROM zar.v_codex_signal_registry_latest
WHERE cleaned_data_type IS NULL
OR module_code IS NULL
OR kind IS NULL
OR signal_name IS NULL;
6.2. Review Queue
SELECT *
FROM zar.v_codex_signal_registry_latest
WHERE datatype_needs_review = TRUE
OR module_needs_review = TRUE
OR kind_needs_review = TRUE
OR signal_name_needs_review = TRUE
OR validator_needs_review = TRUE;
6.3. Ready for CSI Generation
SELECT *
FROM zar.v_codex_signal_registry_latest
WHERE module_code IS NOT NULL
AND kind IS NOT NULL
AND signal_name IS NOT NULL;
7. Transition to Signal Publish
Once a signal passes validation:
POST /admin/signal-registry/proposals/publish
This triggers:
7.1. Signal Registry Insert
→ zar.signal_registry
7.2. CSI Binding
→ zar.signal_csi_binding
7.3. USO Type Resolution
→ zar.uso_type_registry
7.4. USO Instance Creation (Signal Birth)
→ zar.uso_instance
8. USO Integration (New Architecture)
8.1. At Publish Time
The system:
- Resolves or creates a USO Type
- Creates a USO Instance (ULID)
- Links:
- CMI (origin)
- CSI (signal type)
- USO Type (semantic class)
8.2. Result
Each signal becomes:
CMI → CSI → USO Type → USO Instance
This is the minimum unit of auditability.
9. TrustGate Telemetry Integration
After signal creation, runtime events are captured via:
POST /admin/trustgate/telemetry/ingest
Example BASH
curl -X POST \
-H "Authorization: Bearer <ADMIN_TOKEN>" \
-H "Content-Type: application/json" \
https://zayaz-search-api.fly.dev/admin/trustgate/telemetry/ingest \
-d '{
"signal_id": "sssr:compute_method_registry.created_at",
"trustgate_stage": "validation",
"trustgate_decision": "pass",
"trustgate_status": "observed",
"trust_score": 0.9825,
"policy_id": "TG-POLICY-SIGNAL-VALIDATION-v1",
"policy_result": "pass",
"validator_id": "trustgate.signal.required_fields.v1",
"validator_result": "pass",
"updated_by": "cto@viroway.com"
}'
Stored In
zar.trustgate_telemetry
Purpose
- track signal validation lifecycle
- compute trust scores
- enable audit replay
- support AI reasoning
10. Additional View
Create a review queue view:
zar.v_codex_signal_registry_review_queue
Should include:
- low-confidence signals
- validation failures
- collisions
- missing pipeline outputs
This becomes the primary QA interface.
11. Data Ingestion (Bootstrap)
Example BASH:
\COPY zar.codex_signal_registry_working(
source_signal_id,
source_table,
table_prefix,
table_notes,
short_description,
component_name,
component_description,
column_reference,
column_description,
source_data_type,
sample_values
)
FROM '/path/to/file.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ';');