Jira progress: loading…

SSSR

Smart Searchable Signal Registry

1. Introduction

ZAYAZ supports thousands of ESG metrics across diverse frameworks like CSRD, ESRS, GRI, and TCFD. To manage and operationalize this ecosystem effectively, the platform requires a structured, intelligent system that understands not just the metrics—but the underlying data architecture that supports them.

The Smart Searchable Signal Registry (SSSR) is that system. It is a structured registry of all available table columns, powering the entire data flow of ZAYAZ. Acting as the semantic backbone of the platform, SSSR enables dynamic form generation, automated extrapolation, advanced analytics, AI-driven reasoning, and traceable reporting logic. It ensures that every column-level signal can be discovered, routed, processed, and validated consistently across the platform.

What Is a Signal? A signal is a column-level data definition—not a metric itself, but a data source that can feed one or more metrics, calculations, extrapolations, visualizations, or decision models.

The Signal Registry documents every available column across normalized tables, APIs, and JSON structures. It defines the context, type, and operational behavior of that field—allowing ZAYAZ engines to treat data not as isolated values, but as intelligent, traceable building blocks.

Each signal includes metadata that tells the system:

How to reference it
How to validate it
Whether it can be extrapolated or inferred
Which engines or workflows depend on it
How it connects to other signals or data structures

Why a Signal Registry? The registry is designed to serve developers, integrators, validators, and automation engines. It answers critical questions such as:

Where does this field live?
What kind of data is it?
How should it be processed or validated?
What systems depend on this column?
Can it be extrapolated, inferred, or manually entered?

By centralizing these definitions, SSSR eliminates redundancy, reduces hardcoding, and supports data-driven orchestration across the ZAYAZ platform.

1.1. Strategic Benefits

Smart Search & Discovery Search across thousands of table columns using keyword matching, semantic tagging, AI embeddings, and hierarchical filters (e.g., by scope, sector, or signal type).

Automated Routing & Processing Signals automatically route to the correct processing engines (e.g., SEM, FOGE, DAIM, RIF) based on their metadata—eliminating hardcoded logic paths.

Scalable Form & Workflow Generation Forms, dashboards, validation flows, and extrapolation triggers are all dynamically generated based on registry mappings.

Framework Interoperability A single signal may be referenced by multiple metrics and frameworks, and interpreted differently based on sector, geography, or reporting level.

Auditability & Traceability Every signal has a full metadata profile—capturing input origin, update logic, processing history, fallback rules, and AI usage—ensuring transparency and regulatory compliance.

Core Use Cases

Developers use the registry to generate and bind dynamic form fields via FOGE.
Compliance teams use it to align inputs with ESRS, CSRD, and GRI logic.
AI/ML engines use it to locate valid inputs for extrapolation, classification, or prediction.
Extrapolation modules (e.g., SEM) query it for fallback hierarchies and sufficiency logic.
Auditors use it to trace how values were calculated, inferred, or adjusted.

1.2. Position in the ZAYAZ Architecture

The Signal Registry integrates with and supports all major computation and reporting modules:

FOGE – Form Generator Engine
SEM – Smart Extrapolation Module
RIF – Risk Intelligence Framework
DAIM – Dynamic Actionable Insights Module
Validator & DaVE – Data Validation & Trust Engines

It functions as the semantic control plane of ZAYAZ—turning raw column definitions into dynamic, intelligent system behaviors.

The Signal Registry Table (signal_registry) - (Simplified. See the signal_registry for full table):

Column	Required	Description
signal_id	✅	Unique internal ID (UUID or slug) - `sssr:source_table.column_reference`
signal_name	✅	Canonical name (e.g., hazard_level)
signal_type	✅	Data type and shape (e.g., Integer (1–4), Decimal, JSON, Boolean)
signal_description	✅	Human-readable explanation of what this column represents
source_table	✅	Fully qualified table name or object path (e.g., `chemical_registry, product.hazards[]`) Dropdown: `→ Hub > Subsystem` > Table (e.g. `ECO-Number > SupplierPlatform > suppliers_eco`)
column_reference	✅	Name of the exact column/field (e.g., hazard_level)
access_path		Fully resolved field path for JSON / nested objects
schema_link		Direct link to DB schema or API spec
source_format		SQL, JSON, API, CSV, Derived, etc.
source_schema		Optional: schema or namespace if multi-schema setup exists
signal_domain	✅	One of the 6 USO domains. Used to enable Level 4 routing.
Values:		`["ENV", "SOC", "GOV", "EMO", "CORE", "REF"]`
availability_status	✅	active, deprecated, pending, future, etc.
fallback_logic		If this column is unavailable, what should be used instead (e.g., use `hazard_code_category`)
data_hierarchy		Optional: if this field is region-, sector-, or time-hierarchical
linked_metrics		List of metrics that reference this column (for reverse lookup)
signal_handler	✅	Which source handles it (Dropdown: `→ SIS > Engine > Micro Engine` (e.g. `SEM > Hierarchy Resolver)`)
ai_suggestion_type		If AI can operate here: extrapolation, tagging, inference
validation_required	✅	Boolean – should this signal be included in data validation rules
dependencies		If this field is derived: list of signal_ids used to compute it
created_at	✅	Timestamp of first registration
updated_at	✅	Timestamp of last modification
update_frequency		Expected update frequency
notes		Freeform field for documentation / technical remarks
sample_values		Array of values to illustrate what’s stored
docs_link		Link to internal documentation or wiki
xbrl_binding		If tied to structured reporting outputs
form_binding_flags		JSON for FOGE compatibility (e.g., required, pre-fillable, read-only)
visibility_scope*	✅	Access control for USO lists (dropdown or parsed) (Admin, NGO, User)
preferred_chart_type		Suggests default visual for this signal in isolation (e.g., "line", "bar", “choropleth”, “donut”, ”heatmap”). Not binding — can be overridden in reporting table.
default_granularity		Indicates aggregation level (e.g., daily, monthly, yearly).
value_distribution_type		Describes data nature: categorical, continuous, percentage, binary, etc. Useful for auto-visualization.
unit_of_measure		For y-axis scaling logic.
signal_role		e.g., "KPI", "benchmark", "flag", "input", etc.
allow_visualization		Boolean toggle for whether this signal should be visualized.
hierarchical_axis_grouping		Allows auto-drilldown (e.g., Country → Region → Facility).
agent_profile_id		Enables dynamic triggering of agents based on signal issues, scope, or missing data.

Note: *visibility_scope values:

Value	Who Sees It
admin	Only backend admins, analysts
user	Business users, form builders
system	Internal engines only (e.g. DaVE, SEM, Validators)
public	Exposed to external clients or portals

2. AI-Signal Auto-Resolver

The AI-Signal Auto-Resolver is a small AI assistant module that:

Intercepts unresolved signal references (e.g., "regional_water_index"),
Queries the Signal Registry,
Attempts resolution, and
If not found, proposes a new entry and routes it for admin approval or adds it in draft mode.

What it does:

Task	Example
Detect	Catches unregistered signal name in a formula
Match	Suggests similar signal names from the registry
Propose	Uses LLM to generate a smart draft signal entry
Learn	Stores confirmation logic for future use

How to build it:

Hook into validation layer of FOGE, SEM, or any formula parser.
When a signal isn’t found:

Run fuzzy match (pg_trgm, FAISS, or GPT-based).
Prompt user or admin: “This signal is unknown. Did you mean…?”

If accepted, write new entry:

status = draft
created_by = auto_resolver
Include LLM-inferred metadata (source table guess, description, type).

Add to admin dashboard for review or batch approval.

We will use GPT to power this logic in early versions. Just send the formula and ask for signal resolution suggestions from a stored dictionary of registered signals.

3. Smart Signal Enrichment: Scaling SSSR with Intelligence

A Open AI CODEX App is being set up to do this work, so this chapter mught become obsolete or upgraded.

3.1. Populating the Signal System & Semantic Registry (SSSR) with 3,000–4,000 signals may appear overwhelming — but ZAYAZ uses a modular, intelligent enrichment strategy to make this scalable, auditable, and AI-accelerated.

Rather than relying on manual tagging or one-off scripts, we implement a 6-Stage Smart Population Strategy, supported by a powerful technique: the prompt-ready CSV format for batch LLM suggestions.

This allows us to:

Automate 70–90% of metadata enrichment using pattern logic and join inference
Inject contextual, domain-aware suggestions from trusted LLMs (e.g., ChatGPT, GPT-4o, ZAYAZ Architect GPT)
Maintain full editorial control through QA dashboards or spreadsheet reviews
Future-proof the registry by making all enrichments traceable and upgradeable

Workflow Overview

Extract blank or incomplete fields from SSSR
Generate prompt-ready rows with instructions tailored to each field
Feed rows into GPT for batch enrichment
Review suggestions and re-import or stage them
Log changes and track enrichment status per field
Repeat as new fields or signals are added

This approach allows ZAYAZ to build and maintain the “signal brain” with rigor, speed, and minimal overhead — while preserving flexibility to align with ESRS, NACE, CBAM, trust models, and multi-role governance logic.

3.2. 6-Stage Smart Population Strategy

Leverage Existing Metadata (Join-Based Enrichment)

Auto-fill: metric_category, signal_type, unit_of_measure, source_table, source_id, nace_relevance
Method: SQL joins with:
- esrs_metrics_list
- nace-codes
- countries.xlsx for geo-context
- IPCC-EFDB for emission types
Benefit: ~40–50% of rows get filled by logic-based enrichment.

Rule-Based Assignment (Field Patterns & Naming)

If signal_name contains "emissions" → metric_category = Environmental, signal_type = numeric_energy_input
If column_reference ends in _id or _code → input_type = choice, ai_assist_allowed = false
Use regex + pattern matchers to auto-fill 20–25% more.

Heuristic-Based NACE Relevance

Use sector keywords in source_table, column_reference, and linked_metrics
Cross-match with nace_applicability_map’s applicability_flags (e.g., “CBAM_covered”)
Output: nace_relevance + trust_config_id inference

Profile Inheritance

Signals from the same table/module often share:
- agent_profile_id
- role_scope
- prompt_enrichment_refs
Inference: group-by logic and inheritance filling

Fallback Models (Lightweight ML or LLM)

Use OpenAI or internal LLM to batch-process missing descriptions, ai_suggestion_types, materiality_tags
Example prompt:

“Suggest ai_suggestion_type, input_type, and agent_profile for signal_name: 
‘total_scopes_emissions’, unit: tCO2e, role_scope: supplier”

Review via Smart QA Dashboard

Show rows with:
- conflicting auto-generated vs manual tags
- unresolved agent_profile_id
- ambiguous nace_relevance
Use dashboards with filters: ai_assist_allowed = null, trust_config_id = null

3.3. Prompt-Ready CSV Format for Batch LLM Suggestions

The prompt-ready CSV format is a lightweight but powerful way to scale up the signal enrichment using GPT or any LLM, without building complex pipelines. It treats the LLMs as an on-demand assistants to suggest missing values for structured ESG metadata.

What It Is A CSV file where each row is:

A single signal (from the SSSR registry)
With known fields filled in…
…and missing/enrichable fields left blank
Plus a “prompt_instruction” column that tells the LLM what to infer

We then:

Feed the CSV to GPT in chunks
Let GPT fill in the blanks
Re-import the enriched CSV into the system

Simplified Sample Format

signal_id	signal_name	unit_of_measure	prompt_instruction
sssr:e1-1	`total_energy_kwh`	kWh	“Suggest `signal_type`, `trust_config_id`, and whether an agent should assist for this metric.”
sssr:g1-3	anti_corruption_policy		“Suggest `unit_of_measure`, `input_type`, and agent profile ID for this policy signal.”
sssr:e1-9	`scope3_emissions_total`	tCO2e	“Suggest `unit_of_measure`, `input_type`, and agent profile ID for this policy signal.”

How It Works (Workflow):

Extract & Generate CSV: Filter signals with missing/blank fields
Feed in Batches to GPT:

Example input:

For the following signals, infer the missing fields based on name, unit, and 
description. Output in CSV format with columns:
signal_id, signal_type, metric_category, agent_profile_id, ai_assist_allowed, 
trust_config_id

Review Output: You or a QA agent review rows before import
Re-import: Update your SSSR registry or use for bulk pre-fill in the admin panel

3.4. How to Continue Enrichment in Smart Iterations

Strategy: Iterative Prompt-Ready Sheets Instead of remaking a new file, extend the existing one by:

Keeping enriched fields locked (e.g., agent_profile_id, signal_type)
Adding new target columns like:

metric_category
role_scope
materiality_tags
prompt_enrichment_refs

Appending or regenerating prompt_instruction with new enrichment goals

Note: The LLM generates prompts based on the uploaded Excel file and extract a few columns from the Excel file. It then makes a CSV file. Upload the CSV file and ask the LLM to populated the files. The file is populated.

Import the CSV file and add 5 new, empty columns and repeat the process (ask the LLM to make prompts for the enhanced CSV file, upload the updated file with the new prompts to the LLM and ask it to populate the empty fields). Repeat to all columns are populated (at least the required fields). Then import into the Excel file and convert to a database table when completed.

3.5. Additional Tools:

Example of script to apply rule-based and join-based filling

sssr_populator.pyGitHub ↗
# sssr_populator.py
# ZAYAZ Smart Signal Populator Script
# Applies rule-based and join-based logic to enrich SSSR metadata

import pandas as pd

# === Load Source Files ===
sssr_file = "sssr_registry.csv"  # Your raw SSSR file
esrs_file = "esrs_metrics_list.csv"
nace_file = "nace_applicability_map.csv"

sssr = pd.read_csv(sssr_file)
esrs = pd.read_csv(esrs_file)
nace = pd.read_csv(nace_file)

# === Helper Functions ===
def infer_signal_type(name, column):
    name = name.lower()
    column = column.lower()
    if "emission" in name or "kwh" in column:
        return "numeric_energy_input"
    if "id" in column or "code" in column:
        return "identifier_string"
    if "policy" in name:
        return "binary_policy"
    return "reference_string"

def infer_metric_category(name):
    if any(k in name.lower() for k in ["ghg", "energy", "climate"]):
        return "Environmental"
    if any(k in name.lower() for k in ["employee", "diversity", "labor"]):
        return "Social"
    if "policy" in name.lower():
        return "Governance"
    return "Other"

# === Rule-Based Enrichment ===
sssr["signal_type"] = sssr.apply(lambda r: infer_signal_type(r["signal_name"], 
r["column_reference"]), axis=1)
sssr["metric_category"] = sssr["signal_name"].apply(infer_metric_category)

# === Join-Based Enrichment ===
sssr = sssr.merge(esrs[["metric_id", "unit_of_measure"]], how="left", 
left_on="source_id", right_on="metric_id")

# Join NACE relevance
nace_grouped = nace.groupby(“source_id")
["nace_code"].apply(list).reset_index(name="nace_relevance")
sssr = sssr.merge(nace_grouped, how="left", left_on="source_id", right_on="source_id")

# === Export Enriched File ===
sssr.to_csv("sssr_registry_enriched.csv", index=False)
print("✅ SSSR enrichment complete. Output: sssr_registry_enriched.csv")

3.6. Signal Enhanced Dashboard:

Admin-facing panel for batch review and correction including AI-powered autocomplete fields. This is the solution for adding data when the Excel file is converted into a database table.

sssr_populator.jsGitHub ↗
import React, { useState } from "react";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Table, TableHeader, TableRow, TableCell } from "@/components/ui/table";
import { Input } from "@/components/ui/input";
import { Badge } from "@/components/ui/badge";
import { Popover, PopoverTrigger, PopoverContent } from "@/components/ui/popover";
import { cn } from "@/lib/utils";

const mockSignals = [
  {
    signal_id: "sssr:e1-1",
    signal_name: "total_energy_kwh",
    unit_of_measure: "kWh",
    signal_type: "",
    metric_category: "",
    agent_profile_id: "",
    trust_config_id: "",
    status: "incomplete",
  },
  {
    signal_id: "sssr:s1-3",
    signal_name: "diversity_index",
    unit_of_measure: "%",
    signal_type: "",
    metric_category: "",
    agent_profile_id: "",
    trust_config_id: "",
    status: "incomplete",
  },
];

const aiSuggestions = {
  signal_type: ["numeric_energy_input", "binary_policy", "reference_string"],
  metric_category: ["Environmental", "Social", "Governance"],
  agent_profile_id: ["form_assistant", "geo_lookup_agent", "default_signal_resolver"],
  trust_config_id: ["trust_default", "refdata_pass_through", "manual_review_optional"],
};

function AutocompleteInput({ placeholder, suggestions, defaultValue }) {
  const [value, setValue] = useState(defaultValue || "");
  const [open, setOpen] = useState(false);

  return (
    <Popover open={open} onOpenChange={setOpen}>
      <PopoverTrigger asChild>
        <Input
          value={value}
          placeholder={placeholder}
          onChange={(e) => setValue(e.target.value)}
          onFocus={() => setOpen(true)}
        />
      </PopoverTrigger>
      <PopoverContent className="p-2 w-full">
        {suggestions
          .filter((s) => s.toLowerCase().includes(value.toLowerCase()))
          .map((s, idx) => (
            <div
              key={idx}
              className="cursor-pointer p-1 hover:bg-gray-100 rounded"
              onClick={() => {
                setValue(s);
                setOpen(false);
              }}
            >
              {s}
            </div>
          ))}
      </PopoverContent>
    </Popover>
  );
}

export default function SignalEnrichmentDashboard() {
  return (
    <div className="p-6 grid gap-6">
      <h1 className="text-2xl font-bold">SSSR Signal Enrichment Dashboard</h1>
      <Card className="shadow-md">
        <CardContent className="p-4">
          <Table>
            <TableHeader>
              <TableRow>
                <TableCell>ID</TableCell>
                <TableCell>Name</TableCell>
                <TableCell>Unit</TableCell>
                <TableCell>Signal Type</TableCell>
                <TableCell>Metric Category</TableCell>
                <TableCell>Agent Profile</TableCell>
                <TableCell>Trust Config</TableCell>
                <TableCell>Status</TableCell>
                <TableCell>Actions</TableCell>
              </TableRow>
            </TableHeader>
            {mockSignals.map((s, i) => (
              <TableRow key={i} className="items-center">
                <TableCell>{s.signal_id}</TableCell>
                <TableCell>{s.signal_name}</TableCell>
                <TableCell>{s.unit_of_measure}</TableCell>
                <TableCell>
                  <AutocompleteInput
                    placeholder="e.g., numeric_energy_input"
                    suggestions={aiSuggestions.signal_type}
                    defaultValue={s.signal_type}
                  />
                </TableCell>
                <TableCell>
                  <AutocompleteInput
                    placeholder="e.g., Environmental"
                    suggestions={aiSuggestions.metric_category}
                    defaultValue={s.metric_category}
                  />
                </TableCell>
                <TableCell>
                  <AutocompleteInput
                    placeholder="e.g., form_assistant"
                    suggestions={aiSuggestions.agent_profile_id}
                    defaultValue={s.agent_profile_id}
                  />
                </TableCell>
                <TableCell>
                  <AutocompleteInput
                    placeholder="e.g., trust_default"
                    suggestions={aiSuggestions.trust_config_id}
                    defaultValue={s.trust_config_id}
                  />
                </TableCell>
                <TableCell><Badge>{s.status}</Badge></TableCell>
                <TableCell><Button variant="secondary">Save</Button></TableCell>
              </TableRow>
            ))}
          </Table>
        </CardContent>
      </Card>
    </div>
  );
}

Optimal number of rows that can be process per batch: ChatGPT UI: 5–8 (Interactive, best for QA rounds)OpenAI API (scripted): 10–20 (Streamlined, token-limited). Finetuned batch model: 50–100 (Structured input/output required)

4. AI-Signal Auto-Resolver

4.1. Implementation Spec & Flow Diagram

The AI-Signal Auto-Resolver is a backend module that intercepts unresolved signal references in formulas, extrapolation models, or dynamic form generation. It uses fuzzy logic and AI (e.g., GPT-4-turbo) to match against the Signal Registry, propose new signals if needed, and route them for approval or insertion in draft mode.

This feature reduces developer friction, prevents broken references, and creates a human-AI loop for expanding ZAYAZ’s signal intelligence.

Functional Goals

Detect missing signal references during validation or compilation.
Match unresolved names against registered signals.
Suggest best matches or propose a new signal with inferred metadata.
Log all resolver events for traceability and learning.
Route proposed signals to admin dashboard for approval.

4.2. System Components

Signal Resolver Core

Triggered by any module parsing a formula, rule, or form field.
Intercepts unknown signal_name.
Queries Signal Registry for best match candidates.
Ranks matches using vector similarity (e.g., pg_trgm, embeddings, or GPT).

AI Suggestion Layer (Optional First Phase via GPT-4-turbo)

If no high-confidence match: ask GPT-4 to infer signal purpose.
Generate proposed metadata:
- Name
- Description
- Data type
- Fallback logic (if pattern is clear)
- Draft source info (e.g., "Likely from emissions_model”)

Resolver Log Table signal_resolver_log:

Column	Type	Description
resolver_trace_id	UUID	Unique trace for this resolution event
unknown_signal_name	TEXT	Signal that failed to resolve
matched_signal_ids	`JSON[]`	Ranked suggestions from registry
suggestion_confidence_score	`FLOAT[]`	Optional: match confidence per suggestion
selected_signal_id	TEXT	If user accepted a match
new_signal_proposal	JSON	Proposed metadata for new signal
gpt_input_context	TEXT	Input prompt sent to LLM (optional logging)
gpt_response_text	TEXT	LLM's raw output (optional logging)
created_by	TEXT	E.g., auto_resolver, admin_manual, etc.
status	TEXT	matched, proposed, approved, dismissed
created_at	TIMESTAMP	Resolution timestamp

This log is the learning engine, trust layer, and governance audit for AI-generated or fuzzy-matched signals.

Admin Review UI

Admin dashboard showing pending signal proposals.
Option to edit, approve, reject, or promote to full signal.

signal_id

All signal_id(s) follow the same format:
- It starts with: “sssr:”, then the table name: “source_table” a separator “.” and finally the column name “column_reference”. E.g. “sssr:signal_resolver_log.resolver_trace_id”

Other:

Learning model from accepted proposals to improve suggestions.
Auto-suggest schema (e.g., likely source_table and column_reference).
UI-integrated DocSearch fallback when confidence is low.
Automatic notification system for unreviewed proposed signals.

4.3. System Flow Diagram

System Flow Diagram (PDF)

Link to the Whimsical flie for editing...

5. How to Use the Smart Searchable Signal Registry

A practical guide to integrating, querying, and working with column-level signals in ZAYAZ

Understand the Role of the Registry

Before using the registry, clarify this:
- This is not a list of metrics.
- It is a registry of columns across ZAYAZ’s tables, APIs, and JSON structures.
- Each entry tells the system:
  - Where a column is
  - What type of data it holds
  - How it should be validated or processed
  - Whether it can be extrapolated, inferred, or visualized

Register a New Signal

When adding a new column to any dataset (e.g., a new emissions table or compliance form), you must also register it as a signal.

Example

Field	Value
signal_name	`hazard_level`
signal_type	Integer (1–4)
signal_description	“Hazard level derived from H-codes (1 = most hazardous, 4 = least)”
source_table	`chemical_registry`
column_reference	`hazard_level`
created_at	2025-05-13T10:22:00Z

Note: This is just an except of the columns, but enough to get it registered. Several other data must also be filled out. Completely new tables must be registered in the table_registry.

Link from one table to a metric in a specific row in another table

link-table-to-metric.sqlGitHub ↗
[SSSR_REF: <signal_id> @ <row_id>] 
e.g. [SSSR_REF: sssr:iso_3166-1_alpha-3.notes@ICC-0004]

Look Up a Signal for Use in a Form or Engine

Goal: You want to find the canonical source for a data point like “GHG Scope 3 – Transport”.

Step-by-Step:

look-up-signal.sqlGitHub ↗
SELECT * FROM signal_registry
WHERE signal_name ILIKE '%ghg%' AND signal_description ILIKE ‘%transport%';

This returns a set of candidates with their:

Source table and column
Processing engine (e.g., SEM)
Trust profile
Data format

Now the form engine (FOGE) or extrapolation engine (SEM) knows exactly what to use and how to process it.

Use the Signal in a Form Generator (FOGE) When FOGE builds forms dynamically, it queries the Signal Registry like this:

query-signal-registry.sqlGitHub ↗
SELECT signal_name, signal_type, signal_description, column_reference
FROM signal_registry
WHERE processing_engine = 'FOGE'
  AND availability_status = 'active'
  AND validation_required = true;

This ensures that only active, validatable columns are shown to users—and they’re rendered with the correct data types and descriptions.

Enable Fallback Logic in SEM If data is missing and extrapolation_allowed = true, the Smart Extrapolation Module uses:

fallback_logic (JSON or signal IDs)
data_hierarchy (e.g., NACE, Country)

Example Fallback:

example-fallback.jsonGitHub ↗
{
  "fallback_path": ["scope3_transport_company", "scope3_transport_sector_avg", "scope3_transport_continent"]
}

SEM traverses the hierarchy and uses the registry to identify which column to extrapolate from.

Integrate with AI or Validation Engines

DAIM uses ai_suggestion_type to know which signals are AI-supported.
RIF / DaVE use trust_profile to classify signals into risk or validation buckets.
Validators look at validation_required and processing_engine to check how to run cross-checks.

Trace and Debug Signal Usage To understand how a value was derived or processed:

trace-signal-usage.sqlGitHub ↗
SELECT * FROM signal_registry
WHERE signal_name = ‘hazard_level';

Check:

dependencies — is this signal calculated from others?
fallback_logic — was this a backup column?
updated_at — was this signal recently modified? Use this data to generate audit logs, XBRL tags, or trust dashboards.

Best Practices for Working with the Registry

Always register new columns at the time of table creation or schema change.
Use consistent naming conventions for signal_name and column_reference.
Deprecate unused signals instead of deleting them (use availability_status = 'deprecated').
Document every fallback_logic and processing_engine with clarity.
Avoid hardcoded logic in engines—always route through the registry.

Developer Tools (Recommended Setup)

Store in PostgreSQL, using JSONB for flexible fields like fallback_logic and dependencies.
Create an admin UI (e.g., via React + Material UI) for adding/editing/viewing registry entries.
Support REST or GraphQL API for registry queries across modules.
Use a search index (like pg_trgm or Meilisearch) for smart search integration.
Maintain changelog/version tracking (auto-generated diffs or manual commits).

6. Developer Tools Specification

API Wrapper, Visual Mapper UI, and Signal Impact Explorer

6.1. Signal Registry API Wrapper

Purpose: Expose the Signal Registry to all modules and external tools in a secure, queryable form. The API enables systems like FOGE, SEM, DAIM, validators, and the admin UI to query or update the registry consistently.

REST Endpoints (Suggested) GET /signals

Query Params: search, filter, processing_engine, availability_status, data_hierarchy, validation_required
Returns: Array of signal registry entries with metadata

GET /signals/:id

Returns full metadata for a single signal

POST /signals

Inserts a new signal (draft or active)
Required fields: signal_name, column_reference, source_table, signal_type

PUT /signals/:id

Updates metadata fields (admin-only fields: availability_status, processing_engine, etc.)

GET /signals/search

Full-text search with synonyms, typos, and semantic scoring
Optional: return ranked matches using vector search

GET /signals/dependencies/:id

Returns a dependency tree (what this signal relies on or is used in)

GraphQL Schema (Sample)

graphql-schema-sample.sqlGitHub ↗
type Signal {
  id: ID!
  name: String!
  description: String
  column_reference: String
  source_table: String
  linked_metrics: [String]
  dependencies: [Signal]
  processing_engine: String
  created_at: DateTime
  updated_at: DateTime
}

schema-sample.graphqlGitHub ↗
query {
  signals(filter: SignalFilter): [Signal]
  signal(id: ID!): Signal
  searchSignals(text: String!): [Signal]
}

Security & Governance

JWT-authenticated access
Role-based permissions: read, write, admin, reviewer
Audit logging for all writes/updates

6.2. Visual Signal Mapper UI

Purpose: Provides a web-based, visual interface to:

Understand how signals relate to metrics and engines
Browse or debug signal relationships and lineage
Enable non-technical stakeholders to explore signal usage

Core Features

Signal Graph View:
- Nodes: Signal, Metric, Engine
- Edges: feeds, depends_on, processed_by
Search Bar + Filters:
- Find signals by name, engine, or metric tag
Detail Drawer on Click:
- Show full metadata, source table, description, dependencies
Export to CSV / JSON / SVG
Highlight by Engine: Toggle visibility by processing_engine

Example Use Case "Where is hazard_level used, and which engines depend on it?" Graph shows:

hazard_level → metric: Worker Exposure Risk
hazard_level → SEM
hazard_level → validator

Stack Recommendation

Frontend: Vue3 or React.js + D3.js or Cytoscape.js
Backend: Use Signal Registry API Wrapper
Optional: Neo4j for relationship queries and graph rendering

6.3. Signal Impact Explorer

Purpose: Diagnose what breaks if a signal becomes unavailable, fails validation, or is deprecated.

Inputs

signal_id
Current status or trust_score

Outputs

Direct Dependencies: Metrics or calculations directly using the signal
Engine Triggers: FOGE, SEM, DAIM, validators using this signal
Cascade Path: Signals that rely on this signal via dependencies

View Modes

Graph View: Similar to Visual Signal Mapper, but filtered to active impact path
Table View:
- Column: Downstream Item
- Column: Type (Metric, Engine, Signal)
- Column: Impact (e.g., "validation blocked", "metric fallback triggered")

Trigger Use Cases

When marking a signal as deprecated
When a validator detects low confidence or mismatch
When updating formulas in FOGE and checking for ripple effects

Summary These three tools form the operational and visual intelligence layer of the ZAYAZ Signal Registry:

The API Wrapper enables modular system integration.
The Visual Signal Mapper makes complexity navigable.
The Impact Explorer makes governance scalable and risk-aware.
They are the keys to scaling your signal-based architecture without compromising clarity, traceability, or control.

7. GraphQL API Wrapper for Signal Registry

This is the validation layers structured access to signal data (uso_table) — enabling creation, update, and intelligent search.

7.1. Types

Query Types

signals(filter: SignalFilter): [Signal]
- Fetch multiple signals with advanced filtering.
signal(id: ID!): [Signal]
- Fetch a single signal by its unique ID.
searchSignals(text: String!): [Signal]
- Text-based search for autocomplete, resolver, etc.

SignalFilter Input Type

signalfilter-input-types.graphqlGitHub ↗
input SignalFilter {
  signal_domain: [String]         # Must match uso_table.level_4_code (e.g., ENV, SOC)
  level_0_code: [String]          # → uso_table.level_0_code (e.g., ZIH, ZCH)
  level_1_code: [String]          # → uso_table.level_1_code (e.g., AIA, FOGE)
  level_2_code: [String]          # → uso_table.level_2_code (e.g., NLP, TRFM)
  level_3_code: [String]          # → uso_table.level_3_code (e.g., GHG, WST)
  signal_name_contains: String
  processing_engine: [String]     # Must match MICE or defined engine list
  ai_suggestion_type: [String]
  availability_status: [String]   # One of: active, draft, deprecated, orphaned
  extrapolation_allowed: Boolean
  validation_required: Boolean
}

Supports dropdown filtering, AI targeting, and resolver lookups.

Signal Type Definition

signal-type-definition.graphqlGitHub ↗
type Signal {
  id: ID!
  signal_name: String!
  signal_description: String
  column_reference: String
  source_table: String
  signal_domain: String           # Must match uso_table.level_4_code (e.g., ENV, SOC)
  level_0_code: String            # → uso_table.level_0_code (e.g., ZIH, ZCH)
  level_1_code: String            # → uso_table.level_1_code (e.g., AIA, FOGE)
  level_2_code: String            # → uso_table.level_2_code (e.g., NLP, TRFM)
  level_3_code: String            # → uso_table.level_3_code (e.g., GHG, WST)
  level_4_code: String            # Optional, if domains are mapped
  preferred_input_source: String
  data_hierarchy: String
  extrapolation_allowed: Boolean
  validation_required: Boolean
  ai_suggestion_type: String
  processing_engine: String
  fallback_logic: String
  trust_profile: String
  visualization_type: String
  dependencies: [Signal]         # Recursive
  linked_metrics: [String]
  availability_status: String
  created_at: DateTime
  updated_at: DateTime
}

Mutation Types addSignal(input: AddSignalInput!): Signal

mutation-types.graphqlGitHub ↗
input AddSignalInput {
  signal_name: String!
  signal_description: String
  column_reference: String!
  source_table: String!
  signal_domain: String!
  level_0_code: String!
  level_1_code: String!
  level_2_code: String!
  level_3_code: String!
  level_4_code: String
  processing_engine: String
  linked_metrics: [String]
  dependencies: [ID]
  availability_status: String = "draft"
}

updateSignal(id: ID!, input: UpdateSignalInput!): Signal
- Partial updates for corrections, approvals, etc.
proposeSignalFromTrace(trace_id: String!): SignalProposal
- Optional — connect to telemetry/resolver systems.

Alternatively use Typed Enums when levels are known and stable, or synced into build-time enums (example):

enum Level0Code {
  ZIH
  ZCH
  ZRH
  SIS
  EXT
}

Use Case Hooks

Module	How it uses this API
FOGE	Loads filtered signals by level/module/domain
SEM	Queries fallback-eligible signals
Signal Resolver	Suggests new entries via mutation
Admin UI	Lists signals by status/domain/engine
Validator	Cross-checks validation_required fields

7.2. Foreign Key Mapping Between the `signal_registry` and `uso_table`

PostgreSQL Schema Snippet

Assumptions

A normalized uso_table that includes all levels: level_0_code, level_1_code, …, level_4_code is created
signal_registry uses these as foreign keys (single-row linkage per level)

uso_table Structure

uso-table.sqlGitHub ↗
CREATE TABLE uso_table (
  id SERIAL PRIMARY KEY,
  level_0_code TEXT NOT NULL,
  level_1_code TEXT,
  level_2_code TEXT,
  level_3_code TEXT,
  level_4_code TEXT,
  uso_path TEXT,                  -- e.g., "ZIH-AIA-NLP-GHG-ENV"
  level_3_name TEXT,
  level_4_name TEXT,
  ai_tag_context TEXT,
  UNIQUE(level_0_code, level_1_code, level_2_code, level_3_code, level_4_code)
);

signal_registry Table with Foreign Keys

sssr-fk.sqlGitHub ↗
CREATE TABLE signal_registry (
  signal_id UUID PRIMARY KEY,
  signal_name TEXT NOT NULL,
  column_reference TEXT,
  source_table TEXT,
  level_0_code TEXT NOT NULL,
  level_1_code TEXT,
  level_2_code TEXT,
  level_3_code TEXT,
  level_4_code TEXT,
  signal_domain TEXT, -- Redundant with level_4_code but may exist for UI clarity

  -- FK constraint: soft link on normalized hierarchy
  FOREIGN KEY (level_0_code, level_1_code, level_2_code, level_3_code, level_4_code)
    REFERENCES uso_table(level_0_code, level_1_code, level_2_code, level_3_code, 
level_4_code)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);

Ensures only USO-valid signal paths can be created
No duplication of path logic in the registry

GraphQL Schema with USO Join

schema-uso-join.graphqlGitHub ↗
type Signal {
  id: ID!
  signal_name: String!
  column_reference: String
  source_table: String

  level_0_code: String!
  level_1_code: String
  level_2_code: String
  level_3_code: String
  level_4_code: String

  uso_metadata: USOReference
}

type USOReference {
  level_0_code: String!
  level_1_code: String
  level_2_code: String
  level_3_code: String
  level_4_code: String
  level_3_name: String
  level_4_name: String
  ai_tag_context: String
  uso_path: String
}

Backend Resolver Join (Pseudocode)

signal-resolvers.tsGitHub ↗
/**
 * Example GraphQL resolver for fetching USO metadata
 * based on hierarchical signal codes.
 */

// Example resolver container
// const Signal = {};

/**
 * Resolver: Signal.uso_metadata
 * Retrieves the corresponding metadata record
 * from the USO table using the signal's level codes.
 */
type UsoRecord = {
  level_0_code: string | null;
  level_1_code: string | null;
  level_2_code: string | null;
  level_3_code: string | null;
  level_4_code: string | null;
};

type SignalParent = {
  level_0_code: string | null;
  level_1_code: string | null;
  level_2_code: string | null;
  level_3_code: string | null;
  level_4_code: string | null;
};

type ResolverContext = {
  db: {
    uso_table: {
      findFirst: (args: {
        where: {
          level_0_code: string | null;
          level_1_code: string | null;
          level_2_code: string | null;
          level_3_code: string | null;
          level_4_code: string | null;
        };
      }) => Promise<UsoRecord | null>;
    };
  };
};

const Signal: {
  uso_metadata?: (
    signal: SignalParent,
    args: Record<string, never>,
    ctx: ResolverContext
  ) => Promise<UsoRecord | null>;
} = {};

Signal.uso_metadata = async (
  signal: SignalParent,
  args: Record<string, never>,
  ctx: ResolverContext
): Promise<UsoRecord | null> => {
  return ctx.db.uso_table.findFirst({
    where: {
      level_0_code: signal.level_0_code,
      level_1_code: signal.level_1_code,
      level_2_code: signal.level_2_code,
      level_3_code: signal.level_3_code,
      level_4_code: signal.level_4_code
    }
  });
};

export { Signal };

8. Telemetry-Based Schema Audit

Telemetry-based schema audits are a powerful ways to future-proof the ZAYAZ architecture and catch issues before they impact trust, compliance, or usability.

It’s a system that continuously monitors live data usage (e.g., which signals are invoked by which engines, under which modules) and compares that activity to the declared structure in the Signal Registry and USO table.

When it detects mismatches or anomalies, it can:

Log the issue
Suggest updates
Trigger warnings (e.g., in FOGE or SEM)
Feed insights back into AI tuning or audit dashboards

8.1. Architecture

signal_telemetry_log captures runtime behavior Every signal usage is logged:

signal-telemetry-log.jsonGitHub ↗
{
  "signal_id": "GHG_scope1",
  "engine_invoked": "TRFM",
  "module": "SEM",
  "timestamp": "2025-05-14T18:20:12Z",
  "used_in_path": "ZCH-SEM-CALC-GHG-ENV",
  "trace_id": "abc123"
}

Audit Engine compares usage to registry A scheduled audit compares each telemetry event against:

level_3_dependency
signal_domain
processing_engine
availability_status

If something is undeclared, deprecated, or misaligned, flag it.

schema_audit_log stores results

schema-audit-log.jsonGitHub ↗
{
  "signal_id": "GHG_scope1",
  "issue_type": "undeclared_engine_usage",
  "expected": ["CALC", "VALI"],
  "actual": ["TRFM"],
  "first_seen": "2025-05-14T18:20:12Z",
  "impact_score": 0.92,
  "suggested_fix": {
    "add_engine": "TRFM",
    "review_dependency": true
  },
  "resolved": false
}

Admin/AI Review Panel (UI) Create a dashboard with tabs like:

Unresolved Issues
Suggested Registry Updates
Signals with Conflicting Domains
Unused/Orphaned Signals
Automatic Re-mapping Candidates

8.2. Audit Types to Track

Audit Type	Trigger Condition
undeclared_engine_usage	Runtime engine not in declared deps
domain_conflict	Signal processed in a different signal_domain than declared
deprecated_signal_invoked	Usage of availability_status = deprecated
missing_linked_metric	Signal in use but not mapped to any metric
empty_dependency_signal	Signal with no declared processing_engine or deps, yet being used

Benefits

Autonomous signal registry governance
AI feedback loop for continuous tuning of dependencies
Auditor-ready traceability (“we can prove why this signal was processed this way”)
Fast debugging of missing signals, form errors, or SEM fallback mismatches

8.3. Example Usage: Metrics Table Metadata

From the Metrics Table:

Column Name	Purpose	Example
table_columns	Names and types of required table columns	`{"country_code": "ISO 3166", "risk_level": "String", "action_required": "Text"}`
data_sources	References to data sources (using SSSR IDs or DB references)	`{"country_code": "sssr:countries_table.iso3166", "risk_level": "sssr:risk.child_labour.ilo_index"}`
placeholder_notes	User-facing placeholder notes for empty fields	`{"risk_level": "Select risk from ILO list if not auto-populated"}`
special_logic	Instructions for dynamic logic or data retrieval	"`lookup_risk_level_using_iso3166`"

Detailed Example (for clarity): Metric: S1.SBM-3_10 - Countries/areas at risk of child labour Metadata example:

risk-of-child-labor.jsonGitHub ↗
{
  "table_columns": {
    "country_code": "ISO 3166",
    "location_id": "String",
    "employee_count": "Integer",
    "gender_breakdown": "JSON",
    "child_labour_risk": "Boolean"
  },
  "data_sources": {
    "country_code": "sssr:eco_number_db.country_code",
    "location_id": "sssr:eco_number_db.location_id",
    "employee_count": "sssr:eco_number_db.employee_count",
    "gender_breakdown": "sssr:eco_number_db.gender_breakdown",
    "child_labour_risk": "sssr:countries_table.child_labour_risk_status"
  },
  "placeholder_notes": {
    "child_labour_risk": "Verify auto-populated risk status."
  },
  "special_logic": "populate_child_labour_risk_based_on_iso3166"
}

Integration with Smart Searchable Signal Registry (SSSR):

data_sources fields explicitly reference SSSR IDs for precise, traceable, and auditable linkage to authoritative data.
special_logic instructions precisely guide FOGE to execute more complex logic, such as dynamic lookups or conditional workflows.

GitHub Repo Request for Change (RFC)

1. Introduction​

1.1. Strategic Benefits​

1.2. Position in the ZAYAZ Architecture​

2. AI-Signal Auto-Resolver​

3. Smart Signal Enrichment: Scaling SSSR with Intelligence​

3.1. Populating the Signal System & Semantic Registry (SSSR) with 3,000–4,000 signals may appear overwhelming — but ZAYAZ uses a modular, intelligent enrichment strategy to make this scalable, auditable, and AI-accelerated.​

3.2. 6-Stage Smart Population Strategy​

3.3. Prompt-Ready CSV Format for Batch LLM Suggestions​

3.4. How to Continue Enrichment in Smart Iterations​

3.5. Additional Tools:​

3.6. Signal Enhanced Dashboard:​

4. AI-Signal Auto-Resolver​

4.1. Implementation Spec & Flow Diagram​

4.2. System Components​

4.3. System Flow Diagram​

5. How to Use the Smart Searchable Signal Registry​

6. Developer Tools Specification​

6.1. Signal Registry API Wrapper​

6.2. Visual Signal Mapper UI​

6.3. Signal Impact Explorer​

7. GraphQL API Wrapper for Signal Registry​

7.1. Types​

7.2. Foreign Key Mapping Between the signal_registry and uso_table​

8. Telemetry-Based Schema Audit​

8.1. Architecture​

8.2. Audit Types to Track​

8.3. Example Usage: Metrics Table Metadata​