Skip to main content
Jira progress: loading…

SSSR

Smart Searchable Signal Registry

1. Introduction

ZAYAZ supports thousands of ESG metrics across diverse frameworks like CSRD, ESRS, GRI, and TCFD. To manage and operationalize this ecosystem effectively, the platform requires a structured, intelligent system that understands not just the metrics—but the underlying data architecture that supports them.

The Smart Searchable Signal Registry (SSSR) is that system. It is a structured registry of all available table columns, powering the entire data flow of ZAYAZ. Acting as the semantic backbone of the platform, SSSR enables dynamic form generation, automated extrapolation, advanced analytics, AI-driven reasoning, and traceable reporting logic. It ensures that every column-level signal can be discovered, routed, processed, and validated consistently across the platform.

What Is a Signal? A signal is a column-level data definition—not a metric itself, but a data source that can feed one or more metrics, calculations, extrapolations, visualizations, or decision models.

The Signal Registry documents every available column across normalized tables, APIs, and JSON structures. It defines the context, type, and operational behavior of that field—allowing ZAYAZ engines to treat data not as isolated values, but as intelligent, traceable building blocks.

Each signal includes metadata that tells the system:

  • How to reference it
  • How to validate it
  • Whether it can be extrapolated or inferred
  • Which engines or workflows depend on it
  • How it connects to other signals or data structures

Why a Signal Registry? The registry is designed to serve developers, integrators, validators, and automation engines. It answers critical questions such as:

  • Where does this field live?
  • What kind of data is it?
  • How should it be processed or validated?
  • What systems depend on this column?
  • Can it be extrapolated, inferred, or manually entered?

By centralizing these definitions, SSSR eliminates redundancy, reduces hardcoding, and supports data-driven orchestration across the ZAYAZ platform.

1.1. Strategic Benefits

Smart Search & Discovery Search across thousands of table columns using keyword matching, semantic tagging, AI embeddings, and hierarchical filters (e.g., by scope, sector, or signal type).

Automated Routing & Processing Signals automatically route to the correct processing engines (e.g., SEM, FOGE, DAIM, RIF) based on their metadata—eliminating hardcoded logic paths.

Scalable Form & Workflow Generation Forms, dashboards, validation flows, and extrapolation triggers are all dynamically generated based on registry mappings.

Framework Interoperability A single signal may be referenced by multiple metrics and frameworks, and interpreted differently based on sector, geography, or reporting level.

Auditability & Traceability Every signal has a full metadata profile—capturing input origin, update logic, processing history, fallback rules, and AI usage—ensuring transparency and regulatory compliance.

Core Use Cases

  • Developers use the registry to generate and bind dynamic form fields via FOGE.
  • Compliance teams use it to align inputs with ESRS, CSRD, and GRI logic.
  • AI/ML engines use it to locate valid inputs for extrapolation, classification, or prediction.
  • Extrapolation modules (e.g., SEM) query it for fallback hierarchies and sufficiency logic.
  • Auditors use it to trace how values were calculated, inferred, or adjusted.

1.2. Position in the ZAYAZ Architecture

The Signal Registry integrates with and supports all major computation and reporting modules:

  • FOGE – Form Generator Engine
  • SEM – Smart Extrapolation Module
  • RIF – Risk Intelligence Framework
  • DAIM – Dynamic Actionable Insights Module
  • Validator & DaVE – Data Validation & Trust Engines

It functions as the semantic control plane of ZAYAZ—turning raw column definitions into dynamic, intelligent system behaviors.

The Signal Registry Table (signal_registry) - (Simplified. See the signal_registry for full table):

ColumnRequiredDescription
signal_idUnique internal ID (UUID or slug) -  sssr:source_table.column_reference
signal_nameCanonical name (e.g., hazard_level)
signal_typeData type and shape (e.g., Integer (1–4), Decimal, JSON, Boolean)
signal_descriptionHuman-readable explanation of what this column represents
source_tableFully qualified table name or object path (e.g., chemical_registry, product.hazards[]) Dropdown: → Hub > Subsystem > Table (e.g. ECO-Number > SupplierPlatform > suppliers_eco)
column_referenceName of the exact column/field (e.g., hazard_level)
access_pathFully resolved field path for JSON / nested objects
schema_linkDirect link to DB schema or API spec
source_formatSQL, JSON, API, CSV, Derived, etc.
source_schemaOptional: schema or namespace if multi-schema setup exists
signal_domainOne of the 6 USO domains. Used to enable Level 4 routing.
Values:["ENV", "SOC", "GOV", "EMO", "CORE", "REF"]
availability_statusactive, deprecated, pending, future, etc.
fallback_logicIf this column is unavailable, what should be used instead (e.g., use hazard_code_category)
data_hierarchyOptional: if this field is region-, sector-, or time-hierarchical
linked_metricsList of metrics that reference this column (for reverse lookup)
signal_handlerWhich source handles it (Dropdown: → SIS > Engine > Micro Engine (e.g. SEM > Hierarchy Resolver))
ai_suggestion_typeIf AI can operate here: extrapolation, tagging, inference
validation_requiredBoolean – should this signal be included in data validation rules
dependenciesIf this field is derived: list of signal_ids used to compute it
created_atTimestamp of first registration
updated_atTimestamp of last modification
update_frequencyExpected update frequency
notesFreeform field for documentation / technical remarks
sample_valuesArray of values to illustrate what’s stored
docs_linkLink to internal documentation or wiki
xbrl_bindingIf tied to structured reporting outputs
form_binding_flagsJSON for FOGE compatibility (e.g., required, pre-fillable, read-only)
visibility_scope*Access control for USO lists (dropdown or parsed) (Admin, NGO, User)
preferred_chart_typeSuggests default visual for this signal in isolation (e.g., "line", "bar", “choropleth”, “donut”, ”heatmap”). Not binding — can be overridden in reporting table.
default_granularityIndicates aggregation level (e.g., daily, monthly, yearly).
value_distribution_typeDescribes data nature: categorical, continuous, percentage, binary, etc. Useful for auto-visualization.
unit_of_measureFor y-axis scaling logic.
signal_rolee.g., "KPI", "benchmark", "flag", "input", etc.
allow_visualizationBoolean toggle for whether this signal should be visualized.
hierarchical_axis_groupingAllows auto-drilldown (e.g., Country → Region → Facility).
agent_profile_idEnables dynamic triggering of agents based on signal issues, scope, or missing data.

Note: *visibility_scope values:

ValueWho Sees It
adminOnly backend admins, analysts
userBusiness users, form builders
systemInternal engines only (e.g. DaVE, SEM, Validators)
publicExposed to external clients or portals

2. AI-Signal Auto-Resolver

The AI-Signal Auto-Resolver is a small AI assistant module that:

  • Intercepts unresolved signal references (e.g., "regional_water_index"),
  • Queries the Signal Registry,
  • Attempts resolution, and
  • If not found, proposes a new entry and routes it for admin approval or adds it in draft mode.

What it does:

TaskExample
DetectCatches unregistered signal name in a formula
MatchSuggests similar signal names from the registry
ProposeUses LLM to generate a smart draft signal entry
LearnStores confirmation logic for future use

How to build it:

  1. Hook into validation layer of FOGE, SEM, or any formula parser.
  2. When a signal isn’t found:
  • Run fuzzy match (pg_trgm, FAISS, or GPT-based).
  • Prompt user or admin: “This signal is unknown. Did you mean…?”
  1. If accepted, write new entry:
  • status = draft
  • created_by = auto_resolver
  • Include LLM-inferred metadata (source table guess, description, type).
  1. Add to admin dashboard for review or batch approval.

We will use GPT to power this logic in early versions. Just send the formula and ask for signal resolution suggestions from a stored dictionary of registered signals.

3. Smart Signal Enrichment: Scaling SSSR with Intelligence

<!-- ZAYAZ-TODO: STRUCTURE | Must be upgraded to a Open AI CODEX App description -->

A Open AI CODEX App is being set up to do this work, so this chapter mught become obsolete or upgraded.

3.1. Populating the Signal System & Semantic Registry (SSSR) with 3,000–4,000 signals may appear overwhelming — but ZAYAZ uses a modular, intelligent enrichment strategy to make this scalable, auditable, and AI-accelerated.

Rather than relying on manual tagging or one-off scripts, we implement a 6-Stage Smart Population Strategy, supported by a powerful technique: the prompt-ready CSV format for batch LLM suggestions.

This allows us to:

  • Automate 70–90% of metadata enrichment using pattern logic and join inference
  • Inject contextual, domain-aware suggestions from trusted LLMs (e.g., ChatGPT, GPT-4o, ZAYAZ Architect GPT)
  • Maintain full editorial control through QA dashboards or spreadsheet reviews
  • Future-proof the registry by making all enrichments traceable and upgradeable

Workflow Overview

  1. Extract blank or incomplete fields from SSSR
  2. Generate prompt-ready rows with instructions tailored to each field
  3. Feed rows into GPT for batch enrichment
  4. Review suggestions and re-import or stage them
  5. Log changes and track enrichment status per field
  6. Repeat as new fields or signals are added

This approach allows ZAYAZ to build and maintain the “signal brain” with rigor, speed, and minimal overhead — while preserving flexibility to align with ESRS, NACE, CBAM, trust models, and multi-role governance logic.

3.2. 6-Stage Smart Population Strategy

  1. Leverage Existing Metadata (Join-Based Enrichment)
  • Auto-fill: metric_category, signal_type, unit_of_measure, source_table, source_id, nace_relevance
  • Method: SQL joins with:
    • esrs_metrics_list
    • nace-codes
    • countries.xlsx for geo-context
    • IPCC-EFDB for emission types
  • Benefit: ~40–50% of rows get filled by logic-based enrichment.
  1. Rule-Based Assignment (Field Patterns & Naming)
  • If signal_name contains "emissions" → metric_category = Environmental, signal_type = numeric_energy_input
  • If column_reference ends in _id or _codeinput_type = choice, ai_assist_allowed = false
  • Use regex + pattern matchers to auto-fill 20–25% more.
  1. Heuristic-Based NACE Relevance
  • Use sector keywords in source_table, column_reference, and linked_metrics
  • Cross-match with nace_applicability_map’s applicability_flags (e.g., “CBAM_covered”)
  • Output: nace_relevance + trust_config_id inference
  1. Profile Inheritance
  • Signals from the same table/module often share:
    • agent_profile_id
    • role_scope
    • prompt_enrichment_refs
  • Inference: group-by logic and inheritance filling
  1. Fallback Models (Lightweight ML or LLM)
  • Use OpenAI or internal LLM to batch-process missing descriptions, ai_suggestion_types, materiality_tags
  • Example prompt:
“Suggest ai_suggestion_type, input_type, and agent_profile for signal_name: 
‘total_scopes_emissions’, unit: tCO2e, role_scope: supplier”
  1. Review via Smart QA Dashboard
  • Show rows with:
    • conflicting auto-generated vs manual tags
    • unresolved agent_profile_id
    • ambiguous nace_relevance
  • Use dashboards with filters: ai_assist_allowed = null, trust_config_id = null

3.3. Prompt-Ready CSV Format for Batch LLM Suggestions

The prompt-ready CSV format is a lightweight but powerful way to scale up the signal enrichment using GPT or any LLM, without building complex pipelines. It treats the LLMs as an on-demand assistants to suggest missing values for structured ESG metadata.

What It Is A CSV file where each row is:

  • A single signal (from the SSSR registry)
  • With known fields filled in…
  • …and missing/enrichable fields left blank
  • Plus a “prompt_instruction” column that tells the LLM what to infer

We then:

  1. Feed the CSV to GPT in chunks
  2. Let GPT fill in the blanks
  3. Re-import the enriched CSV into the system

Simplified Sample Format

signal_idsignal_nameunit_of_measuresignal_typeagent_profile_idtrust_config_idai_assist_allowedprompt_instruction
sssr:e1-1total_energy_kwhkWh“Suggest signal_type, trust_config_id, and whether an agent should assist for this metric.”
sssr:g1-3anti_corruption_policy“Suggest unit_of_measure, input_type, and agent profile ID for this policy signal.”
sssr:e1-9scope3_emissions_totaltCO2e“Suggest unit_of_measure, input_type, and agent profile ID for this policy signal.”

How It Works (Workflow):

  1. Extract & Generate CSV: Filter signals with missing/blank fields
  2. Feed in Batches to GPT:
  • Example input:
For the following signals, infer the missing fields based on name, unit, and 
description. Output in CSV format with columns:
signal_id, signal_type, metric_category, agent_profile_id, ai_assist_allowed,
trust_config_id
  1. Review Output: You or a QA agent review rows before import
  2. Re-import: Update your SSSR registry or use for bulk pre-fill in the admin panel

3.4. How to Continue Enrichment in Smart Iterations

Strategy: Iterative Prompt-Ready Sheets Instead of remaking a new file, extend the existing one by:

  1. Keeping enriched fields locked (e.g., agent_profile_id, signal_type)
  2. Adding new target columns like:
  • metric_category
  • role_scope
  • materiality_tags
  • prompt_enrichment_refs
  1. Appending or regenerating prompt_instruction with new enrichment goals

Note: The LLM generates prompts based on the uploaded Excel file and extract a few columns from the Excel file. It then makes a CSV file. Upload the CSV file and ask the LLM to populated the files. The file is populated.

Import the CSV file and add 5 new, empty columns and repeat the process (ask the LLM to make prompts for the enhanced CSV file, upload the updated file with the new prompts to the LLM and ask it to populate the empty fields). Repeat to all columns are populated (at least the required fields). Then import into the Excel file and convert to a database table when completed.

3.5. Additional Tools:

Example of script to apply rule-based and join-based filling

sssr_populator.pyGitHub ↗
# sssr_populator.py
# ZAYAZ Smart Signal Populator Script
# Applies rule-based and join-based logic to enrich SSSR metadata

import pandas as pd

# === Load Source Files ===
sssr_file = "sssr_registry.csv" # Your raw SSSR file
esrs_file = "esrs_metrics_list.csv"
nace_file = "nace_applicability_map.csv"

sssr = pd.read_csv(sssr_file)
esrs = pd.read_csv(esrs_file)
nace = pd.read_csv(nace_file)

# === Helper Functions ===
def infer_signal_type(name, column):
name = name.lower()
column = column.lower()
if "emission" in name or "kwh" in column:
return "numeric_energy_input"
if "id" in column or "code" in column:
return "identifier_string"
if "policy" in name:
return "binary_policy"
return "reference_string"

def infer_metric_category(name):
if any(k in name.lower() for k in ["ghg", "energy", "climate"]):
return "Environmental"
if any(k in name.lower() for k in ["employee", "diversity", "labor"]):
return "Social"
if "policy" in name.lower():
return "Governance"
return "Other"

# === Rule-Based Enrichment ===
sssr["signal_type"] = sssr.apply(lambda r: infer_signal_type(r["signal_name"],
r["column_reference"]), axis=1)
sssr["metric_category"] = sssr["signal_name"].apply(infer_metric_category)

# === Join-Based Enrichment ===
sssr = sssr.merge(esrs[["metric_id", "unit_of_measure"]], how="left",
left_on="source_id", right_on="metric_id")

# Join NACE relevance
nace_grouped = nace.groupby(“source_id")
["nace_code"].apply(list).reset_index(name="nace_relevance")
sssr = sssr.merge(nace_grouped, how="left", left_on="source_id", right_on="source_id")

# === Export Enriched File ===
sssr.to_csv("sssr_registry_enriched.csv", index=False)
print("✅ SSSR enrichment complete. Output: sssr_registry_enriched.csv")

3.6. Signal Enhanced Dashboard:

Admin-facing panel for batch review and correction including AI-powered autocomplete fields. This is the solution for adding data when the Excel file is converted into a database table.

sssr_populator.jsGitHub ↗
import React, { useState } from "react";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Table, TableHeader, TableRow, TableCell } from "@/components/ui/table";
import { Input } from "@/components/ui/input";
import { Badge } from "@/components/ui/badge";
import { Popover, PopoverTrigger, PopoverContent } from "@/components/ui/popover";
import { cn } from "@/lib/utils";

const mockSignals = [
{
signal_id: "sssr:e1-1",
signal_name: "total_energy_kwh",
unit_of_measure: "kWh",
signal_type: "",
metric_category: "",
agent_profile_id: "",
trust_config_id: "",
status: "incomplete",
},
{
signal_id: "sssr:s1-3",
signal_name: "diversity_index",
unit_of_measure: "%",
signal_type: "",
metric_category: "",
agent_profile_id: "",
trust_config_id: "",
status: "incomplete",
},
];

const aiSuggestions = {
signal_type: ["numeric_energy_input", "binary_policy", "reference_string"],
metric_category: ["Environmental", "Social", "Governance"],
agent_profile_id: ["form_assistant", "geo_lookup_agent", "default_signal_resolver"],
trust_config_id: ["trust_default", "refdata_pass_through", "manual_review_optional"],
};

function AutocompleteInput({ placeholder, suggestions, defaultValue }) {
const [value, setValue] = useState(defaultValue || "");
const [open, setOpen] = useState(false);

return (
<Popover open={open} onOpenChange={setOpen}>
<PopoverTrigger asChild>
<Input
value={value}
placeholder={placeholder}
onChange={(e) => setValue(e.target.value)}
onFocus={() => setOpen(true)}
/>
</PopoverTrigger>
<PopoverContent className="p-2 w-full">
{suggestions
.filter((s) => s.toLowerCase().includes(value.toLowerCase()))
.map((s, idx) => (
<div
key={idx}
className="cursor-pointer p-1 hover:bg-gray-100 rounded"
onClick={() => {
setValue(s);
setOpen(false);
}}
>
{s}
</div>
))}
</PopoverContent>
</Popover>
);
}

export default function SignalEnrichmentDashboard() {
return (
<div className="p-6 grid gap-6">
<h1 className="text-2xl font-bold">SSSR Signal Enrichment Dashboard</h1>
<Card className="shadow-md">
<CardContent className="p-4">
<Table>
<TableHeader>
<TableRow>
<TableCell>ID</TableCell>
<TableCell>Name</TableCell>
<TableCell>Unit</TableCell>
<TableCell>Signal Type</TableCell>
<TableCell>Metric Category</TableCell>
<TableCell>Agent Profile</TableCell>
<TableCell>Trust Config</TableCell>
<TableCell>Status</TableCell>
<TableCell>Actions</TableCell>
</TableRow>
</TableHeader>
{mockSignals.map((s, i) => (
<TableRow key={i} className="items-center">
<TableCell>{s.signal_id}</TableCell>
<TableCell>{s.signal_name}</TableCell>
<TableCell>{s.unit_of_measure}</TableCell>
<TableCell>
<AutocompleteInput
placeholder="e.g., numeric_energy_input"
suggestions={aiSuggestions.signal_type}
defaultValue={s.signal_type}
/>
</TableCell>
<TableCell>
<AutocompleteInput
placeholder="e.g., Environmental"
suggestions={aiSuggestions.metric_category}
defaultValue={s.metric_category}
/>
</TableCell>
<TableCell>
<AutocompleteInput
placeholder="e.g., form_assistant"
suggestions={aiSuggestions.agent_profile_id}
defaultValue={s.agent_profile_id}
/>
</TableCell>
<TableCell>
<AutocompleteInput
placeholder="e.g., trust_default"
suggestions={aiSuggestions.trust_config_id}
defaultValue={s.trust_config_id}
/>
</TableCell>
<TableCell><Badge>{s.status}</Badge></TableCell>
<TableCell><Button variant="secondary">Save</Button></TableCell>
</TableRow>
))}
</Table>
</CardContent>
</Card>
</div>
);
}

Optimal number of rows that can be process per batch: ChatGPT UI: 5–8 (Interactive, best for QA rounds)OpenAI API (scripted): 10–20 (Streamlined, token-limited). Finetuned batch model: 50–100 (Structured input/output required)


4. AI-Signal Auto-Resolver

4.1. Implementation Spec & Flow Diagram

The AI-Signal Auto-Resolver is a backend module that intercepts unresolved signal references in formulas, extrapolation models, or dynamic form generation. It uses fuzzy logic and AI (e.g., GPT-4-turbo) to match against the Signal Registry, propose new signals if needed, and route them for approval or insertion in draft mode.

This feature reduces developer friction, prevents broken references, and creates a human-AI loop for expanding ZAYAZ’s signal intelligence.

Functional Goals

  • Detect missing signal references during validation or compilation.
  • Match unresolved names against registered signals.
  • Suggest best matches or propose a new signal with inferred metadata.
  • Log all resolver events for traceability and learning.
  • Route proposed signals to admin dashboard for approval.

4.2. System Components

  1. Signal Resolver Core
  • Triggered by any module parsing a formula, rule, or form field.
  • Intercepts unknown signal_name.
  • Queries Signal Registry for best match candidates.
  • Ranks matches using vector similarity (e.g., pg_trgm, embeddings, or GPT).
  1. AI Suggestion Layer (Optional First Phase via GPT-4-turbo)
  • If no high-confidence match: ask GPT-4 to infer signal purpose.
  • Generate proposed metadata:
    • Name
    • Description
    • Data type
    • Fallback logic (if pattern is clear)
    • Draft source info (e.g., "Likely from emissions_model”)
  1. Resolver Log Table signal_resolver_log:
ColumnTypeDescription
resolver_trace_idUUIDUnique trace for this resolution event
unknown_signal_nameTEXTSignal that failed to resolve
matched_signal_idsJSON[]Ranked suggestions from registry
suggestion_confidence_scoreFLOAT[] Optional: match confidence per suggestion
selected_signal_idTEXTIf user accepted a match
new_signal_proposalJSONProposed metadata for new signal
gpt_input_contextTEXTInput prompt sent to LLM (optional logging)
gpt_response_textTEXTLLM's raw output (optional logging)
created_byTEXTE.g., auto_resolver, admin_manual, etc.
statusTEXTmatched, proposed, approved, dismissed
created_atTIMESTAMPResolution timestamp

This log is the learning engine, trust layer, and governance audit for AI-generated or fuzzy-matched signals.

  1. Admin Review UI
  • Admin dashboard showing pending signal proposals.
  • Option to edit, approve, reject, or promote to full signal.
  1. signal_id
  • All signal_id(s) follow the same format:
    • It starts with: “sssr:”, then the table name: “source_table” a separator “.” and finally the column name “column_reference”. E.g. “sssr:signal_resolver_log.resolver_trace_id

Other:

  • Learning model from accepted proposals to improve suggestions.
  • Auto-suggest schema (e.g., likely source_table and column_reference).
  • UI-integrated DocSearch fallback when confidence is low.
  • Automatic notification system for unreviewed proposed signals.

4.3. System Flow Diagram

System Flow Diagram (PDF)

Link to the Whimsical flie for editing...


5. How to Use the Smart Searchable Signal Registry

A practical guide to integrating, querying, and working with column-level signals in ZAYAZ

  1. Understand the Role of the Registry
  • Before using the registry, clarify this:
    • This is not a list of metrics.
    • It is a registry of columns across ZAYAZ’s tables, APIs, and JSON structures.
    • Each entry tells the system:
      • Where a column is
      • What type of data it holds
      • How it should be validated or processed
      • Whether it can be extrapolated, inferred, or visualized
  1. Register a New Signal
  • When adding a new column to any dataset (e.g., a new emissions table or compliance form), you must also register it as a signal.

Example

FieldValue
signal_namehazard_level
signal_typeInteger (1–4)
signal_description“Hazard level derived from H-codes (1 = most hazardous, 4 = least)”
source_tablechemical_registry
column_referencehazard_level
created_at2025-05-13T10:22:00Z

Note: This is just an except of the columns, but enough to get it registered. Several other data must also be filled out. Completely new tables must be registered in the table_registry.

  1. Link from one table to a metric in a specific row in another table
link-table-to-metric.sqlGitHub ↗
[SSSR_REF: <signal_id> @ <row_id>] 
e.g. [SSSR_REF: sssr:iso_3166-1_alpha-3.notes@ICC-0004]
  1. Look Up a Signal for Use in a Form or Engine
  • Goal: You want to find the canonical source for a data point like “GHG Scope 3 – Transport”.

Step-by-Step:

look-up-signal.sqlGitHub ↗
SELECT * FROM signal_registry
WHERE signal_name ILIKE '%ghg%' AND signal_description ILIKE%transport%';

This returns a set of candidates with their:

  • Source table and column
  • Processing engine (e.g., SEM)
  • Trust profile
  • Data format

Now the form engine (FOGE) or extrapolation engine (SEM) knows exactly what to use and how to process it.

  1. Use the Signal in a Form Generator (FOGE) When FOGE builds forms dynamically, it queries the Signal Registry like this:
query-signal-registry.sqlGitHub ↗
SELECT signal_name, signal_type, signal_description, column_reference
FROM signal_registry
WHERE processing_engine = 'FOGE'
AND availability_status = 'active'
AND validation_required = true;

This ensures that only active, validatable columns are shown to users—and they’re rendered with the correct data types and descriptions.

  1. Enable Fallback Logic in SEM If data is missing and extrapolation_allowed = true, the Smart Extrapolation Module uses:
  • fallback_logic (JSON or signal IDs)
  • data_hierarchy (e.g., NACE, Country)

Example Fallback:

example-fallback.jsonGitHub ↗
{
"fallback_path": ["scope3_transport_company", "scope3_transport_sector_avg", "scope3_transport_continent"]
}

SEM traverses the hierarchy and uses the registry to identify which column to extrapolate from.

  1. Integrate with AI or Validation Engines
  • DAIM uses ai_suggestion_type to know which signals are AI-supported.
  • RIF / DaVE use trust_profile to classify signals into risk or validation buckets.
  • Validators look at validation_required and processing_engine to check how to run cross-checks.

Trace and Debug Signal Usage To understand how a value was derived or processed:

trace-signal-usage.sqlGitHub ↗
SELECT * FROM signal_registry
WHERE signal_name = ‘hazard_level';

Check:

  • dependencies — is this signal calculated from others?
  • fallback_logic — was this a backup column?
  • updated_at — was this signal recently modified? Use this data to generate audit logs, XBRL tags, or trust dashboards.
  1. Best Practices for Working with the Registry
  • Always register new columns at the time of table creation or schema change.
  • Use consistent naming conventions for signal_name and column_reference.
  • Deprecate unused signals instead of deleting them (use availability_status = 'deprecated').
  • Document every fallback_logic and processing_engine with clarity.
  • Avoid hardcoded logic in engines—always route through the registry.

Developer Tools (Recommended Setup)

  • Store in PostgreSQL, using JSONB for flexible fields like fallback_logic and dependencies.
  • Create an admin UI (e.g., via React + Material UI) for adding/editing/viewing registry entries.
  • Support REST or GraphQL API for registry queries across modules.
  • Use a search index (like pg_trgm or Meilisearch) for smart search integration.
  • Maintain changelog/version tracking (auto-generated diffs or manual commits).

6. Developer Tools Specification

API Wrapper, Visual Mapper UI, and Signal Impact Explorer

6.1. Signal Registry API Wrapper

Purpose: Expose the Signal Registry to all modules and external tools in a secure, queryable form. The API enables systems like FOGE, SEM, DAIM, validators, and the admin UI to query or update the registry consistently.

REST Endpoints (Suggested) GET /signals

  • Query Params: search, filter, processing_engine, availability_status, data_hierarchy, validation_required
  • Returns: Array of signal registry entries with metadata

GET /signals/:id

  • Returns full metadata for a single signal

POST /signals

  • Inserts a new signal (draft or active)
  • Required fields: signal_name, column_reference, source_table, signal_type

PUT /signals/:id

  • Updates metadata fields (admin-only fields: availability_status, processing_engine, etc.)

GET /signals/search

  • Full-text search with synonyms, typos, and semantic scoring
  • Optional: return ranked matches using vector search

GET /signals/dependencies/:id

  • Returns a dependency tree (what this signal relies on or is used in)

GraphQL Schema (Sample)

graphql-schema-sample.sqlGitHub ↗
type Signal {
id: ID!
name: String!
description: String
column_reference: String
source_table: String
linked_metrics: [String]
dependencies: [Signal]
processing_engine: String
created_at: DateTime
updated_at: DateTime
}

schema-sample.graphqlGitHub ↗
query {
signals(filter: SignalFilter): [Signal]
signal(id: ID!): Signal
searchSignals(text: String!): [Signal]
}

Security & Governance

  • JWT-authenticated access
  • Role-based permissions: read, write, admin, reviewer
  • Audit logging for all writes/updates

6.2. Visual Signal Mapper UI

Purpose: Provides a web-based, visual interface to:

  • Understand how signals relate to metrics and engines
  • Browse or debug signal relationships and lineage
  • Enable non-technical stakeholders to explore signal usage

Core Features

  • Signal Graph View:
    • Nodes: Signal, Metric, Engine
    • Edges: feeds, depends_on, processed_by
  • Search Bar + Filters:
    • Find signals by name, engine, or metric tag
  • Detail Drawer on Click:
    • Show full metadata, source table, description, dependencies
  • Export to CSV / JSON / SVG
  • Highlight by Engine: Toggle visibility by processing_engine

Example Use Case "Where is hazard_level used, and which engines depend on it?" Graph shows:

  • hazard_level → metric: Worker Exposure Risk
  • hazard_level → SEM
  • hazard_level → validator

Stack Recommendation

  • Frontend: Vue3 or React.js + D3.js or Cytoscape.js
  • Backend: Use Signal Registry API Wrapper
  • Optional: Neo4j for relationship queries and graph rendering

6.3. Signal Impact Explorer

Purpose: Diagnose what breaks if a signal becomes unavailable, fails validation, or is deprecated.

Inputs

  • signal_id
  • Current status or trust_score

Outputs

  • Direct Dependencies: Metrics or calculations directly using the signal
  • Engine Triggers: FOGE, SEM, DAIM, validators using this signal
  • Cascade Path: Signals that rely on this signal via dependencies

View Modes

  • Graph View: Similar to Visual Signal Mapper, but filtered to active impact path
  • Table View:
    • Column: Downstream Item
    • Column: Type (Metric, Engine, Signal)
    • Column: Impact (e.g., "validation blocked", "metric fallback triggered")

Trigger Use Cases

  • When marking a signal as deprecated
  • When a validator detects low confidence or mismatch
  • When updating formulas in FOGE and checking for ripple effects

Summary These three tools form the operational and visual intelligence layer of the ZAYAZ Signal Registry:

  • The API Wrapper enables modular system integration.
  • The Visual Signal Mapper makes complexity navigable.
  • The Impact Explorer makes governance scalable and risk-aware.
  • They are the keys to scaling your signal-based architecture without compromising clarity, traceability, or control.

7. GraphQL API Wrapper for Signal Registry

This is the validation layers structured access to signal data (uso_table) — enabling creation, update, and intelligent search.

7.1. Types

  1. Query Types
  • signals(filter: SignalFilter): [Signal]
    • Fetch multiple signals with advanced filtering.
  • signal(id: ID!): [Signal]
    • Fetch a single signal by its unique ID.
  • searchSignals(text: String!): [Signal]
    • Text-based search for autocomplete, resolver, etc.
  1. SignalFilter Input Type
signalfilter-input-types.graphqlGitHub ↗
input SignalFilter {
signal_domain: [String] # Must match uso_table.level_4_code (e.g., ENV, SOC)
level_0_code: [String] # → uso_table.level_0_code (e.g., ZIH, ZCH)
level_1_code: [String] # → uso_table.level_1_code (e.g., AIA, FOGE)
level_2_code: [String] # → uso_table.level_2_code (e.g., NLP, TRFM)
level_3_code: [String] # → uso_table.level_3_code (e.g., GHG, WST)
signal_name_contains: String
processing_engine: [String] # Must match MICE or defined engine list
ai_suggestion_type: [String]
availability_status: [String] # One of: active, draft, deprecated, orphaned
extrapolation_allowed: Boolean
validation_required: Boolean
}

Supports dropdown filtering, AI targeting, and resolver lookups.

  1. Signal Type Definition
signal-type-definition.graphqlGitHub ↗
type Signal {
id: ID!
signal_name: String!
signal_description: String
column_reference: String
source_table: String
signal_domain: String # Must match uso_table.level_4_code (e.g., ENV, SOC)
level_0_code: String # → uso_table.level_0_code (e.g., ZIH, ZCH)
level_1_code: String # → uso_table.level_1_code (e.g., AIA, FOGE)
level_2_code: String # → uso_table.level_2_code (e.g., NLP, TRFM)
level_3_code: String # → uso_table.level_3_code (e.g., GHG, WST)
level_4_code: String # Optional, if domains are mapped
preferred_input_source: String
data_hierarchy: String
extrapolation_allowed: Boolean
validation_required: Boolean
ai_suggestion_type: String
processing_engine: String
fallback_logic: String
trust_profile: String
visualization_type: String
dependencies: [Signal] # Recursive
linked_metrics: [String]
availability_status: String
created_at: DateTime
updated_at: DateTime
}
  1. Mutation Types addSignal(input: AddSignalInput!): Signal
mutation-types.graphqlGitHub ↗
input AddSignalInput {
signal_name: String!
signal_description: String
column_reference: String!
source_table: String!
signal_domain: String!
level_0_code: String!
level_1_code: String!
level_2_code: String!
level_3_code: String!
level_4_code: String
processing_engine: String
linked_metrics: [String]
dependencies: [ID]
availability_status: String = "draft"
}
  • updateSignal(id: ID!, input: UpdateSignalInput!): Signal
    • Partial updates for corrections, approvals, etc.
  • proposeSignalFromTrace(trace_id: String!): SignalProposal
    • Optional — connect to telemetry/resolver systems.

Alternatively use Typed Enums when levels are known and stable, or synced into build-time enums (example):

enum Level0Code {
ZIH
ZCH
ZRH
SIS
EXT
}
  1. Use Case Hooks
ModuleHow it uses this API
FOGELoads filtered signals by level/module/domain
SEMQueries fallback-eligible signals
Signal ResolverSuggests new entries via mutation
Admin UILists signals by status/domain/engine
ValidatorCross-checks validation_required fields

7.2. Foreign Key Mapping Between the signal_registry and uso_table

  1. PostgreSQL Schema Snippet

Assumptions

  • A normalized uso_table that includes all levels: level_0_code, level_1_code, …, level_4_code is created
  • signal_registry uses these as foreign keys (single-row linkage per level)

uso_table Structure

uso-table.sqlGitHub ↗
CREATE TABLE uso_table (
id SERIAL PRIMARY KEY,
level_0_code TEXT NOT NULL,
level_1_code TEXT,
level_2_code TEXT,
level_3_code TEXT,
level_4_code TEXT,
uso_path TEXT, -- e.g., "ZIH-AIA-NLP-GHG-ENV"
level_3_name TEXT,
level_4_name TEXT,
ai_tag_context TEXT,
UNIQUE(level_0_code, level_1_code, level_2_code, level_3_code, level_4_code)
);

signal_registry Table with Foreign Keys

sssr-fk.sqlGitHub ↗
CREATE TABLE signal_registry (
signal_id UUID PRIMARY KEY,
signal_name TEXT NOT NULL,
column_reference TEXT,
source_table TEXT,
level_0_code TEXT NOT NULL,
level_1_code TEXT,
level_2_code TEXT,
level_3_code TEXT,
level_4_code TEXT,
signal_domain TEXT, -- Redundant with level_4_code but may exist for UI clarity

-- FK constraint: soft link on normalized hierarchy
FOREIGN KEY (level_0_code, level_1_code, level_2_code, level_3_code, level_4_code)
REFERENCES uso_table(level_0_code, level_1_code, level_2_code, level_3_code,
level_4_code)
ON DELETE RESTRICT
ON UPDATE CASCADE
);
  • Ensures only USO-valid signal paths can be created
  • No duplication of path logic in the registry
  1. GraphQL Schema with USO Join
schema-uso-join.graphqlGitHub ↗
type Signal {
id: ID!
signal_name: String!
column_reference: String
source_table: String

level_0_code: String!
level_1_code: String
level_2_code: String
level_3_code: String
level_4_code: String

uso_metadata: USOReference
}

type USOReference {
level_0_code: String!
level_1_code: String
level_2_code: String
level_3_code: String
level_4_code: String
level_3_name: String
level_4_name: String
ai_tag_context: String
uso_path: String
}

Backend Resolver Join (Pseudocode)

signal-resolvers.tsGitHub ↗
/**
* Example GraphQL resolver for fetching USO metadata
* based on hierarchical signal codes.
*/

// Example resolver container
// const Signal = {};

/**
* Resolver: Signal.uso_metadata
* Retrieves the corresponding metadata record
* from the USO table using the signal's level codes.
*/
type UsoRecord = {
level_0_code: string | null;
level_1_code: string | null;
level_2_code: string | null;
level_3_code: string | null;
level_4_code: string | null;
};

type SignalParent = {
level_0_code: string | null;
level_1_code: string | null;
level_2_code: string | null;
level_3_code: string | null;
level_4_code: string | null;
};

type ResolverContext = {
db: {
uso_table: {
findFirst: (args: {
where: {
level_0_code: string | null;
level_1_code: string | null;
level_2_code: string | null;
level_3_code: string | null;
level_4_code: string | null;
};
}) => Promise<UsoRecord | null>;
};
};
};

const Signal: {
uso_metadata?: (
signal: SignalParent,
args: Record<string, never>,
ctx: ResolverContext
) => Promise<UsoRecord | null>;
} = {};

Signal.uso_metadata = async (
signal: SignalParent,
args: Record<string, never>,
ctx: ResolverContext
): Promise<UsoRecord | null> => {
return ctx.db.uso_table.findFirst({
where: {
level_0_code: signal.level_0_code,
level_1_code: signal.level_1_code,
level_2_code: signal.level_2_code,
level_3_code: signal.level_3_code,
level_4_code: signal.level_4_code
}
});
};

export { Signal };

8. Telemetry-Based Schema Audit

Telemetry-based schema audits are a powerful ways to future-proof the ZAYAZ architecture and catch issues before they impact trust, compliance, or usability.

It’s a system that continuously monitors live data usage (e.g., which signals are invoked by which engines, under which modules) and compares that activity to the declared structure in the Signal Registry and USO table.

When it detects mismatches or anomalies, it can:

  • Log the issue
  • Suggest updates
  • Trigger warnings (e.g., in FOGE or SEM)
  • Feed insights back into AI tuning or audit dashboards

8.1. Architecture

  1. signal_telemetry_log captures runtime behavior Every signal usage is logged:
signal-telemetry-log.jsonGitHub ↗
{
"signal_id": "GHG_scope1",
"engine_invoked": "TRFM",
"module": "SEM",
"timestamp": "2025-05-14T18:20:12Z",
"used_in_path": "ZCH-SEM-CALC-GHG-ENV",
"trace_id": "abc123"
}
  1. Audit Engine compares usage to registry A scheduled audit compares each telemetry event against:
  • level_3_dependency
  • signal_domain
  • processing_engine
  • availability_status

If something is undeclared, deprecated, or misaligned, flag it.

  1. schema_audit_log stores results
schema-audit-log.jsonGitHub ↗
{
"signal_id": "GHG_scope1",
"issue_type": "undeclared_engine_usage",
"expected": ["CALC", "VALI"],
"actual": ["TRFM"],
"first_seen": "2025-05-14T18:20:12Z",
"impact_score": 0.92,
"suggested_fix": {
"add_engine": "TRFM",
"review_dependency": true
},
"resolved": false
}
  1. Admin/AI Review Panel (UI) Create a dashboard with tabs like:
  • Unresolved Issues
  • Suggested Registry Updates
  • Signals with Conflicting Domains
  • Unused/Orphaned Signals
  • Automatic Re-mapping Candidates

8.2. Audit Types to Track

Audit TypeTrigger Condition
undeclared_engine_usageRuntime engine not in declared deps
domain_conflictSignal processed in a different signal_domain than declared
deprecated_signal_invokedUsage of availability_status = deprecated
missing_linked_metricSignal in use but not mapped to any metric
empty_dependency_signalSignal with no declared processing_engine or deps, yet being used

Benefits

  • Autonomous signal registry governance
  • AI feedback loop for continuous tuning of dependencies
  • Auditor-ready traceability (“we can prove why this signal was processed this way”)
  • Fast debugging of missing signals, form errors, or SEM fallback mismatches

8.3. Example Usage: Metrics Table Metadata

From the Metrics Table:

Column NamePurposeExample
table_columnsNames and types of required table columns{"country_code": "ISO 3166", "risk_level": "String", "action_required": "Text"}
data_sourcesReferences to data sources (using SSSR IDs or DB references){"country_code": "sssr:countries_table.iso3166", "risk_level": "sssr:risk.child_labour.ilo_index"}
placeholder_notesUser-facing placeholder notes for empty fields{"risk_level": "Select risk from ILO list if not auto-populated"}
special_logicInstructions for dynamic logic or data retrieval"lookup_risk_level_using_iso3166"

Detailed Example (for clarity): Metric: S1.SBM-3_10 - Countries/areas at risk of child labour Metadata example:

risk-of-child-labor.jsonGitHub ↗
{
"table_columns": {
"country_code": "ISO 3166",
"location_id": "String",
"employee_count": "Integer",
"gender_breakdown": "JSON",
"child_labour_risk": "Boolean"
},
"data_sources": {
"country_code": "sssr:eco_number_db.country_code",
"location_id": "sssr:eco_number_db.location_id",
"employee_count": "sssr:eco_number_db.employee_count",
"gender_breakdown": "sssr:eco_number_db.gender_breakdown",
"child_labour_risk": "sssr:countries_table.child_labour_risk_status"
},
"placeholder_notes": {
"child_labour_risk": "Verify auto-populated risk status."
},
"special_logic": "populate_child_labour_risk_based_on_iso3166"
}

Integration with Smart Searchable Signal Registry (SSSR):

  • data_sources fields explicitly reference SSSR IDs for precise, traceable, and auditable linkage to authoritative data.
  • special_logic instructions precisely guide FOGE to execute more complex logic, such as dynamic lookups or conditional workflows.



GitHub RepoRequest for Change (RFC)