Skip to main content
Jira progress: loading…

simz-dev

Statistical Inference Engine

1. Identity

<Identity meid="MEID_STAT01" />

Depends on module:

Canonical computation and modeling domain for ESG calculations, simulations, extrapolation, aggregation, normalization, and decision-grade metric synthesis. Provides governed, auditable compute services to other modules (e.g., Reporting, Risk, Net Zero, ZARA).
Domain:
computation-hub
Category:
analytics-modeling
Classification:
module
Lifecycle status:
active
Semver:
1.0.0
Introduced in:
v0.3
Governance
AI risk level:
high
Trust threshold:
0.9
Human review required:
true
Verifier involved:
false
Audit required:
true
Ownership
Primary owner:
Platform
Architecture board:
true
White-label allowed:
true
Entrypoints
Docs:
/computation-hub
UI:
/app/computation-hub
API:
/api/computation-hub
Dependencies
Modules
  • sis
  • input-hub
Unresolved tokens
  • BUME
  • SEM
  • VTE
Engines (declared)
  • DICE
  • DaVE
  • VTE
  • SEM
  • BUME
Micro-engines (from registry)
Micro-engines (declared)
None
Signals
USO
  • DATA.COMPUTATION
  • DATA.VALIDATION
  • DATA.EXTRAPOLATION
  • MODEL.SIMULATION
  • MODEL.UNCERTAINTY
CSI
  • CSI_COMPUTATION_HUB
SSSR tags
  • computation
  • modeling
  • simulation
  • monte-carlo
  • bayesian
  • extrapolation
  • aggregation
  • normalization
  • validation
  • mice
Workflows & Outputs
Workflows
  • MicroEngineRouting
  • RuleBasedComputeDispatch
  • ScenarioModeling
  • UncertaintyQuantification
  • AggregationAndRollups
  • NormalizationAndBenchmarking
  • ExtrapolationAndGapFilling
  • ComputeAuditAndReplay
Outputs
  • computed_metrics
  • scenario_ranges
  • confidence_intervals
  • normalized_values
  • validation_findings
  • trust_scored_compute_outputs
Audit
Ledger:
ALTD
Replay supported:
true
PII policy:
no_pii_in_omr
Tags

Background

Hypothesis testing can become a core trust, validation, and intelligence layer inside ZAYAZ if implemented correctly.

But it needs to be deeply integrated into the architecture (MICE, DICE, VTE, ZARA, SIS) — not treated as a generic statistics feature.


🧠 Where Hypothesis Testing Fits in ZAYAZ (Strategic View)

ZAYAZ is already:

  • Signal-driven (SSSR)
  • Validation-driven (DICE, DaVE, VTE)
  • AI-governed and audit-logged (ALTD, AIGS) 

Hypothesis testing adds a formal statistical decision layer on top of this:

👉 From:

“This looks wrong / anomalous”

👉 To:

“We reject H₀ with 99% confidence: this supplier’s emissions trend is statistically inconsistent with historical baseline”

That shift is massive for CSRD-grade credibility.


🔧 Core Use Cases (High-Impact)

1. 📊 Data Validation & Anomaly Detection (DICE + VTE Enhancement)

Problem today:

  • Outliers flagged heuristically
  • Trust scores based on rules + AI

Add hypothesis testing:

  • Null hypothesis (H₀): Data follows expected distribution
  • Alternative (H₁): Data deviates significantly

Example:

  • Electricity consumption spike
  • Test: Z-score / t-test / Bayesian posterior deviation

Result:

  • Instead of “flagged anomaly”
  • You get:
  • p-value
  • confidence interval
  • statistical justification

👉 This directly strengthens auditability + verifier trust


2. 🔍 Scope 3 Estimation Validation (SEM + Bayesian Engines)

You already:

  • Use SEM extrapolation in ZARA 
  • Run probabilistic modeling (Computation Hub)

Hypothesis testing can:

  • Validate extrapolated values vs known distributions
  • Compare supplier-reported vs model-estimated values

Example:

  • H₀: Supplier emissions = expected industry mean
  • H₁: Supplier deviates significantly

👉 Output:

  • “Supplier emissions 27% higher than sector baseline (p < 0.01)”

This becomes:

  • Benchmarking
  • Risk scoring
  • Materiality signal

3. 🏭 Supplier / Value Chain Benchmarking Engine

Integrate into:

  • NORM + AGGR + RMAP micro engines 

Hypothesis-driven benchmarking:

  • Compare:
  • Company vs sector (NACE)
  • Supplier vs peers
  • Region vs global baseline

Example:

  • H₀: Company is within normal emission range for NACE code
  • Reject → triggers:
  • Risk flag
  • Governance escalation
  • ZARA explanation

👉 This is next-gen ESG intelligence, not just reporting


4. 📈 Impact Measurement & ESG Strategy Validation

ZAYAZ tracks:

  • Goals, KPIs, timelines 

Hypothesis testing enables:

Example:

  • “Did our sustainability initiative reduce emissions?”
  • H₀: No effect
  • H₁: Reduction occurred

Use:

  • A/B testing sustainability actions
  • Policy effectiveness validation
  • CAPEX justification

👉 This is financial + ESG convergence


5. 🧪 AI Governance & Model Validation (CRITICAL)

You already require:

  • Model validation
  • Drift detection
  • audit logs 

Hypothesis testing should be embedded into:

AI Validation SOP:

  • H₀: Model performance unchanged
  • H₁: Model drift detected

Used for:

  • Retraining triggers
  • Bias detection
  • Model degradation alerts

👉 This aligns perfectly with:

  • EU AI Act
  • CSRD AI traceability

6. 🎯 Materiality & Stakeholder Intelligence (SEEL)

Materiality today:

  • Qualitative + scoring

Add:

  • Statistical significance testing on stakeholder inputs

Example:

  • H₀: Issue not materially significant
  • H₁: Stakeholder concern statistically significant

👉 Outcome:

  • Defensible double materiality decisions

🧱 Architectural Implementation (ZAYAZ-Native)

New Micro Engine Category

Add:

MEID_STATXX_v1 → Statistical Inference Engine

Core capabilities:

  • Hypothesis testing (t-test, chi-square, Bayesian)
  • Confidence intervals
  • Distribution fitting
  • Significance scoring

Integrates with:

  • CALC (computation)
  • VALI (validation)
  • CFIL (confidence filter)

Integration Map

LayerRole
SSSRDefine expected distributions per signal
MICE (STAT Engine)Run tests
DICECombine with rule validation
VTEConvert to trust score impact
ZARAExplain results in plain language
ALTDLog hypothesis + result for audit

Example Data Structure

{
"signal_id": "ghg_scope2_energy",
"test_type": "z_test",
"null_hypothesis": "within_expected_range",
"p_value": 0.003,
"confidence": 0.997,
"result": "reject_null",
"impact": "high_risk_flag",
"logged_at": "timestamp"
}

🚀 Strategic Advantage (This is the Big One)

If you implement this properly, ZAYAZ becomes:

❌ Not:

  • ESG reporting tool
  • ESG data platform

✅ But:

A statistically defensible ESG decision engine

That means:

  • Verifiers trust it
  • Regulators respect it
  • CFOs use it
  • Boards rely on it

⚠️ Important Caveats (Design Principles)

  1. “Precision Before Automation”

You already state this clearly 

👉 Hypothesis testing must:

  • Use validated distributions (IPCC, Ecoinvent, etc.)
  • Be explainable
  • Never be “black box stats”

  1. Avoid Misuse

Not all ESG data fits classical stats:

  • Small sample sizes
  • Missing data
  • Non-normal distributions

👉 Solution:

  • Bayesian methods > classical in many cases

  1. UX is Critical (ZARA Layer)

Never show:

“p = 0.03”

Instead:

“This value is statistically unlikely compared to expected patterns (97% confidence).”


🔮 Next-Level Extensions (Where This Gets Powerful)

  • Real-time anomaly detection across supply chains
  • Predictive compliance breach detection
  • Dynamic ESG risk pricing (insurance / finance)
  • Automated auditor assistance (“statistical red flags”)
  • Carbon market validation (credit integrity scoring)

🧭 Final Verdict

👉 Hypothesis testing is not just useful — it is a foundational upgrade to:

  • Trust engine (VTE)
  • Validation layer (DICE)
  • AI governance (AIGS)
  • Decision intelligence (ZARA)  

1. Engine Identity

  • Engine ID: MEID_STAT01_v1
  • Readable Name: Statistical Inference Engine
  • Category Code: STAT
  • Domain: Statistical validation, inference, significance testing, uncertainty quantification
  • Primary Hub: Computation Hub
  • Secondary Consumers: Input Hub, Reports & Insights Hub, Shared Intelligence Stack
  • Lifecycle Status: Proposed
  • Risk Class: Medium by default, High when outputs influence compliance decisions, verifier workflows, or AI-triggered escalations. This fits ZAYAZ’s AI governance model, where higher-impact AI/statistical modules require stronger oversight and logging. 

2. Strategic Purpose

MEID_STAT01_v1 provides a formal statistical decision layer for ZAYAZ.

It converts weak statements like:

  • “this looks abnormal”
  • “this trend may be suspicious”
  • “this estimate seems high”

into structured, defensible outputs like:

  • “baseline consistency rejected at significance threshold 0.01”
  • “reported value falls outside expected peer distribution”
  • “post-intervention change is statistically meaningful”
  • “model drift likely based on distributional shift”

This engine is not meant to replace rule validation, AI reasoning, or verifier judgment. It is meant to strengthen them with reproducible inference.


3. Placement in ZAYAZ Architecture

3.1 Role in MICE

ZAYAZ already defines Micro Engines as modular computation units invoked through structured routing and rule logic, with categories such as CALC, VALI, NORM, AGGR, RCAS, and CFIL. MEID_STAT01_v1 should be added as a first-class engine category in that same family. 

3.2 Architectural Dependencies

Consumes from:

  • SSSR signal metadata
  • NACE / sector / geography context
  • historical observations
  • peer benchmarks
  • input-source trust metadata
  • DICE validation outputs
  • telemetry event streams
  • AI model validation data

Feeds into:

  • DICE
  • VTE / trust logic
  • ZARA explanations
  • ZAAM inline guidance
  • ALTD audit trails
  • Reports Hub visualizations
  • AI governance validation logs

3.3 Positioning

  • DICE answers: “Is the data structurally valid?”
  • STAT answers: “Is the data statistically credible?”
  • VTE answers: “How should that affect trust?”
  • ZARA answers: “What does it mean in human language?”
  • ALTD answers: “Can we prove what happened later?”

That separation matches the modular and governed approach already defined across ZAYAZ.   


4. Core Objectives

MEID_STAT01_v1 shall support six primary objectives:

  1. Outlier and anomaly significance testing For values, trends, ratios, and distributions.

  2. Benchmark comparison testing Company vs peer group, site vs portfolio, supplier vs sector baseline, region vs region.

  3. Pre/post intervention testing To assess whether a policy, capex action, training, or operational change had measurable effect.

  4. Estimation plausibility testing Especially for SEM/extrapolated values, inferred Scope 3 values, and partially imputed datasets.

  5. Distribution shift / drift detection For AI model governance, telemetry quality, or changing operational patterns.

  6. Uncertainty packaging Confidence intervals, credible intervals, posterior probabilities, support strength, and test quality metadata.


5. Supported Use Cases

5.1 ESG Data Quality

  • unusual energy use
  • improbable waste intensity
  • water-use spikes
  • unsafe year-over-year jumps
  • inconsistent supplier reporting

5.2 Scope 1, 2, 3 Emissions

  • compare reported emissions to expected ranges by activity / NACE / geography
  • test whether supplier-reported factors are materially inconsistent with known baselines
  • test whether modeled estimates are statistically plausible

5.3 Materiality and Stakeholder Signals

  • identify whether stakeholder issue concentration is statistically meaningful across segments
  • detect whether issue salience differs by geography or stakeholder class

5.4 AI Governance

ZAYAZ’s governance material already requires structured validation, drift checks, supervised retraining, logging, and human oversight for more critical modules. STAT should become one of the standard inference backbones behind those checks. 

5.5 Verification Support

  • produce machine-readable “statistical red flags”
  • support verifiers with ranked review candidates
  • distinguish “structurally valid but statistically suspicious” from “invalid”

6. Operating Principles

6.1 Precision Before Automation

The ZAYAZ manual explicitly prioritizes trust, explainability, and traceability over blind automation. STAT must inherit that principle directly. 

6.2 No Silent Statistical Decisions

Every material test must log:

  • test type
  • hypothesis definition
  • sample context
  • thresholds used
  • assumptions
  • result
  • confidence / support level
  • downstream effect

6.3 Test Appropriateness First

The engine must not force classical null-hypothesis testing where assumptions are weak. In many ESG contexts:

  • samples are small
  • data is skewed
  • observations are missing
  • sources are mixed quality
  • peer groups are uneven

Therefore the engine must support both:

  • classical tests
  • Bayesian / resampling / robust alternatives

6.4 Explainability Layer Required

No raw p-values should be surfaced to ordinary users without contextual interpretation. ZARA/ZAAM should translate them into decision-grade language. This aligns with ZAYAZ’s agent architecture and explainability goals.  


7. Supported Test Families

7.1 Baseline Statistical Families

Difference tests

  • one-sample t-test
  • two-sample t-test
  • Welch t-test
  • paired t-test
  • Mann–Whitney U
  • Wilcoxon signed-rank

Proportion / categorical tests

  • chi-square goodness-of-fit
  • chi-square independence
  • Fisher exact test
  • z-test for proportions

Distribution / consistency tests

  • Kolmogorov–Smirnov
  • Anderson–Darling
  • Shapiro-style normality checks only for internal suitability checks
  • population stability / drift indices

Variance / dispersion tests

  • Levene / Brown-Forsythe
  • F-test only when justified

Time-series / change tests

  • changepoint detection
  • CUSUM-type detection
  • drift detection on residuals
  • rolling z-score / robust z-score

7.2 Bayesian / Robust Families

Preferred for many ESG applications:

  • posterior probability of exceedance
  • Bayesian mean comparison
  • credible intervals
  • prior-updated peer expectation models
  • bootstrap confidence intervals
  • permutation testing
  • robust median-based deviation testing

7.3 Special ZAYAZ Modes

Plausibility mode For checking if a value is plausible under known distributions.

Comparative mode For comparing entities, suppliers, sites, or years.

Impact mode For testing if a change initiative had measurable effect.

Drift mode For model governance and telemetry monitoring.

Assurance mode For verifier-facing support packets.


8. Input Contract

8.1 Required Input Envelope

{
"engine_id": "MEID_STAT01_v1",
"mode": "plausibility",
"signal_id": "ghg_scope2_market_based",
"entity_id": "eco196123456789",
"reporting_period": "2025",
"comparison_scope": {
"peer_group_id": "nace_c25_eu_midcap",
"geography": "EU",
"sector_code": "C25"
},
"dataset_ref": "zar://dataset/....",
"hypothesis_template_id": "HT_STAT_PLAUS_003",
"significance_profile_id": "SIGPROF_SCOPE2_STANDARD",
"context": {
"source_mix": ["erp", "invoice", "manual"],
"sample_size": 38,
"input_trust_score": 0.84,
"estimation_flag": false
}
}

8.2 Input Sources

  • structured tables
  • registry-linked observations
  • time-series
  • peer benchmark extracts
  • imputation outputs
  • AI model validation logs
  • verifier-reviewed samples

8.3 Required Metadata from SSSR

Each signal eligible for STAT must support additional metadata in SSSR:

  • stat_test_eligible
  • recommended_test_families
  • expected_distribution_type
  • minimum_sample_policy
  • peer_group_strategy
  • significance_profile_default
  • escalation_policy_id
  • explainability_template_id

SSSR already functions as the smart metadata backbone for signals, so this is the correct place to store statistical eligibility and routing metadata. 


9. Hypothesis Template Registry

A dedicated registry should be created, for example:

stat_hypothesis_templates

Suggested fields:

  • template_id
  • template_name
  • signal_type
  • test_family
  • null_hypothesis_text
  • alternative_hypothesis_text
  • assumptions
  • fallback_test_family
  • default_alpha
  • bayesian_supported
  • effect_size_required
  • human_explanation_template
  • verifier_explanation_template
  • status
  • version

Example

{
"template_id": "HT_STAT_PLAUS_003",
"signal_type": "energy_intensity",
"test_family": "robust_zscore_plus_bootstrap",
"null_hypothesis_text": "The observed value is consistent with expected peer-range behavior for the selected comparison scope.",
"alternative_hypothesis_text": "The observed value is not consistent with expected peer-range behavior.",
"default_alpha": 0.01,
"bayesian_supported": true,
"effect_size_required": true
}

10. Significance Profiles

Statistical thresholds should not be hardcoded globally. They should be controlled by policy profiles.

stat_significance_profiles

Fields:

  • profile_id
  • name
  • alpha_default
  • effect_size_floor
  • minimum_sample_size
  • multiple_testing_policy
  • bayesian_probability_threshold
  • high_risk_override
  • verifier_review_required
  • human_approval_required
  • status

Example profiles

  • SIGPROF_LOW_IMPACT_MONITORING
  • SIGPROF_SCOPE3_ESTIMATION
  • SIGPROF_AUDIT_ESCALATION
  • SIGPROF_AI_DRIFT_HIGH_RISK

This is consistent with ZAYAZ’s governance pattern of explicit thresholds, risk registers, and structured review gates. 


11. Output Contract

11.1 Standard Output

{
"engine_id": "MEID_STAT01_v1",
"run_id": "statrun-2026-04-04-000184",
"signal_id": "ghg_scope2_market_based",
"mode": "plausibility",
"test_family_used": "welch_t_plus_bootstrap",
"hypothesis": {
"null": "Observed value is consistent with peer baseline.",
"alternative": "Observed value differs materially from peer baseline."
},
"sample": {
"n_observed": 38,
"n_peer": 412,
"quality_flag": "acceptable"
},
"results": {
"decision": "reject_null",
"p_value": 0.004,
"effect_size": 0.71,
"confidence_interval": [0.19, 0.54],
"posterior_exceedance_probability": 0.973
},
"quality": {
"assumption_fit": "moderate",
"fallback_used": true,
"multiple_testing_adjusted": false
},
"impact": {
"trust_delta": -0.12,
"risk_flag": "high",
"escalation_triggered": true,
"recommended_action": "verifier_review"
},
"explainability": {
"user_message": "This value is statistically unusual compared with similar entities in the selected benchmark group.",
"verifier_message": "Observed value materially exceeds expected peer baseline under selected comparison profile."
},
"audit": {
"logged_to_altd": true,
"model_or_engine_version": "MEID_STAT01_v1",
"timestamp_utc": "2026-04-04T08:12:14Z"
}
}

11.2 Output Classes

  • supports_null
  • rejects_null
  • inconclusive
  • insufficient_sample
  • assumption_failure
  • fallback_applied

The engine must explicitly distinguish “no evidence of difference” from “not enough evidence.”


12. Routing and Invocation Logic

12.1 Trigger Sources

  • DICE validation suspicion
  • VTE trust reassessment
  • ZARA prompted analysis
  • ZAAM inline agent help
  • telemetry anomaly event
  • verifier workflow request
  • scheduled periodic scans
  • AI validation schedule

12.2 ZADIF / Rule Engine Invocation

The ZAYAZ architecture already uses routing logic, agent dispatching, and rule-driven activation. STAT should be routable through the same dispatch pattern.  

Example:

{
"dispatch_condition": {
"signal_type": "ghg_emission",
"trust_score_below": 0.88,
"reporting_context": "csrd",
"sample_size_min": 12
},
"target_engine": "MEID_STAT01_v1",
"mode": "plausibility"
}

13. Interaction with DICE and VTE

13.1 DICE

DICE remains first-pass structural validator. STAT should only run when:

  • data passes minimum structural validation, or
  • DICE explicitly requests a statistical diagnostic path

13.2 VTE

STAT should not directly set trust scores. It should output structured evidence that VTE uses.

Recommended contribution model:

  • abnormality severity
  • effect size
  • assumption quality
  • sample adequacy
  • source-trust interaction
  • repeat anomaly history

That keeps inference and trust logic separate, which is cleaner and more auditable.


14. Interaction with ZARA and ZAAM

14.1 ZARA

ZARA should use STAT outputs for narrative explanations such as:

  • why a signal is suspicious
  • why a benchmark comparison matters
  • why additional evidence is required

ZARA is already positioned as the prompt-driven orchestration and reporting intelligence layer, so it should consume STAT outputs as explainable evidence objects. 

14.2 ZAAM

ZAAM agents can trigger or explain STAT in context:

  • Form Assistant: “This value looks unusually high for your peer group.”
  • Compliance Scout: “This trend could require additional disclosure attention.”
  • Trust Guardian: “The statistical evidence reduced trust because the result diverges from prior validated patterns.”

That aligns with ZAAM’s scoped, trust-aware, role-aware agent model. 


15. Audit and Governance Requirements

All material STAT runs must write to ALTD-compatible audit records. ZAYAZ already emphasizes tamper detection, audit readiness, and governed AI lifecycle records.  

Required audit fields

  • run_id
  • engine_id
  • engine_version
  • signal_id
  • entity_id
  • dataset_ref
  • test_family_used
  • hypothesis_template_id
  • significance_profile_id
  • thresholds_applied
  • result_decision
  • human_review_required
  • downstream_effects
  • initiated_by
  • timestamp_utc

Governance flags

  • used_for_compliance_decision
  • used_for_ai_validation
  • verifier_visible
  • human_approved
  • high_risk_path

16. Failure Modes and Fallbacks

16.1 Failure Classes

  • insufficient sample
  • missing benchmark group
  • incompatible distribution assumptions
  • poor data completeness
  • excessive missingness
  • conflicting source populations
  • no approved significance profile

16.2 Fallback Policy

Fallback order:

  1. preferred test family
  2. robust non-parametric equivalent
  3. bootstrap/permutation
  4. descriptive-only mode
  5. escalate as “statistical inconclusive”

The engine must never fabricate certainty.


17. Security and Compliance

17.1 Privacy

STAT outputs must avoid exposing unnecessary peer-level details if peer groups are confidential.

17.2 Human Oversight

If STAT materially affects:

  • compliance filings
  • verifier routing
  • AI retraining authorization
  • public-facing trust displays

then review gates should apply according to ZAYAZ’s existing governance approach for higher-risk AI logic. 

17.3 EU AI Act Alignment

STAT itself is not a full AI model in all cases, but once combined with automated decision routing, trust scoring, or self-healing actions, it enters the governed AI/decision-support zone and should inherit AIGS/AI governance controls.


  1. Data Model Additions

Recommended tables:

stat_engine_registry

  • engine_id
  • version
  • status
  • supported_modes
  • supported_test_families
  • default_config
  • mode_docs_url

stat_hypothesis_templates

As above.

stat_significance_profiles

As above.

stat_run_log

Execution records.

stat_benchmark_profiles

Defines peer group construction logic.

stat_signal_policy_map

Maps signal IDs to allowed tests, thresholds, peer strategies, and escalation policies.


19. API Draft

POST /api/mice/stat/run

Executes a statistical inference run.

POST /api/mice/stat/explain

Returns user / verifier / board phrasing for a completed run.

GET /api/mice/stat/run/{run_id}

Returns audit-grade run details.

POST /api/mice/stat/batch

Batch mode for portfolio scans, annual pre-checks, or verifier preparation.

POST /api/mice/stat/validate-profile

Validates significance profiles and test mappings before activation.


20. MVP Scope

Phase 1 MVP

Start with:

  • plausibility mode
  • comparative mode
  • drift mode
  • robust z-score
  • Welch t-test
  • Mann–Whitney U
  • chi-square
  • bootstrap CI
  • descriptive fallback
  • ALTD logging
  • VTE evidence output
  • ZARA explanation strings

Phase 2

Add:

  • Bayesian comparison
  • changepoint detection
  • intervention impact mode
  • multiple testing correction profiles
  • verifier evidence packets
  • dashboard components

Phase 3

Add:

  • adaptive priors by NACE/geography
  • federated peer baselines
  • cross-entity anomaly propagation
  • automated materiality significance models

APPENDIX A - ESRS Metrics Most Likely to Benefit from Hypothesis Testing

Highest-value metric families

A. Climate and energy

Best candidates because they are numeric, recurring, comparable, and often benchmarkable.

  • Scope 1 emissions
  • Scope 2 emissions
  • Scope 3 category values
  • electricity consumption
  • fuel consumption
  • energy intensity ratios
  • renewable energy share

Why: high recurrence, strong peer comparability, good anomaly-detection value.

B. Water

  • total withdrawal
  • discharge volumes
  • recycled/reused water share
  • water intensity per unit output

Why: site-level trend testing and peer comparison are often valuable.

C. Waste and circularity

  • hazardous waste
  • non-hazardous waste
  • waste diverted from disposal
  • recycling rates
  • recovery rates
  • material efficiency ratios

Why: distributions are often skewed, so robust methods are useful.

D. Workforce / S metrics

  • injury frequency
  • absenteeism
  • training completion
  • diversity ratios
  • turnover

Why: less suited to raw outlier testing than climate data, but useful for proportions and trend shifts.

E. Governance / process metrics

Often lower statistical value unless repeated over many entities or periods:

  • policy coverage
  • training completion
  • incident counts
  • whistleblower case patterns

Why: many are binary or low-frequency, so use categorical or proportion-based methods only.

Best fit categories for Phase 1

  1. energy and emissions
  2. water
  3. waste
  4. selected workforce ratios

Lower-value categories for initial rollout

  • narrative disclosures
  • one-time governance statements
  • policy existence flags
  • low-frequency event fields

These are better handled by rule logic, traceability, and document validation than by formal hypothesis tests.


APPENDIX B - Draft “Statistical Trust Score” Layer for VTE

Purpose

Convert statistical evidence into a bounded contribution to trust, without letting statistics dominate source quality, verification state, or audit provenance.

Principle

STAT informs trust; it does not own trust.

Proposed subscore

Create a VTE-compatible subscore:

statistical_trust_component = 0.00 to 1.00

Inputs

  • test decision strength
  • effect size magnitude
  • confidence / posterior support
  • sample adequacy
  • assumption quality
  • data completeness
  • peer-group relevance
  • repeat anomaly history
  • whether value is estimated or directly observed

Example weighted model

STC =
0.20 * decision_strength
+ 0.15 * effect_size_quality
+ 0.15 * sample_adequacy
+ 0.10 * assumption_quality
+ 0.10 * completeness_quality
+ 0.10 * peer_group_fit
+ 0.10 * source_integrity_interaction
+ 0.10 * anomaly_history_modifier

Interpretation

  • 0.85–1.00 statistically well-supported
  • 0.65–0.84 acceptable / monitor
  • 0.40–0.64 weak statistical confidence
  • 0.00–0.39 statistically problematic / escalate

Decision strength mapping

Example:

  • supports_null strongly: 0.95
  • inconclusive: 0.55
  • rejects_null moderately: 0.35
  • rejects_null strongly with large effect: 0.15

This sounds inverted, but the point is trust falls when the statistical evidence suggests abnormality relative to expectation.

Guardrails

  • never reduce trust purely from one weak test
  • require stronger effect for high-volatility metrics
  • reduce penalty when source is verified and benchmark fit is weak
  • increase penalty for repeated anomalies across periods
  • cap maximum trust delta from STAT alone, for example ±0.15 per run

Recommended VTE integration

Final VTE score might combine:

  • source provenance
  • structural validation
  • verifier status
  • historical consistency
  • statistical trust component
  • AI-origin penalty/adjustment

That aligns well with ZAYAZ’s trust-centric architecture and explainable validation model.  


APPENDIX C - ZAYAZ Statistical Inference Layer

Technical Implementation Pack v0.1

C.1. Scope of this package

This package defines five implementation layers:

  1. SQL table schemas
  2. JSON schemas for API input/output
  3. SSSR field additions
  4. VTE integration logic
  5. Example statistical test designs for 5 ESRS-relevant metric families

This is an architectural draft, not a locked final schema. The goal is to make the first implementation:

  • auditable
  • modular
  • backward-compatible
  • explainable
  • safe to deploy in stages

C.2. Core architecture overview

C.2.1. Proposed runtime flow

FOGE / API / Import / Telemetry / Verifier Request

DICE

Rule Engine / ZADIF

MEID_STAT01_v1

Statistical Result Object

VTE Trust Logic

ZARA / ZAAM Explanation Layer

ALTD / Audit + Reports Hub

C.2.2. Design principle

STAT should behave like a governed evidence engine, not a black-box scoring engine.

It should:

  • evaluate statistical consistency
  • package uncertainty explicitly
  • return bounded evidence objects
  • avoid direct final decisions where governance requires human review

That matches the broader ZAYAZ governance and trust philosophy already described in the manuals.  


C.3. SQL schema pack

Below is a practical relational design for Postgres-style deployment.


C.3.1. stat_engine_registry

Purpose: register statistical engines and supported modes.

CREATE TABLE stat_engine_registry (
engine_id VARCHAR(50) PRIMARY KEY,
readable_name VARCHAR(255) NOT NULL,
version VARCHAR(30) NOT NULL,
status VARCHAR(30) NOT NULL CHECK (status IN ('active', 'experimental', 'deprecated', 'archived')),
supported_modes JSONB NOT NULL,
supported_test_families JSONB NOT NULL,
default_config JSONB,
mode_docs_url TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Example row

{
"engine_id": "MEID_STAT01_v1",
"readable_name": "Statistical Inference Engine",
"version": "1.0.0",
"status": "experimental",
"supported_modes": ["plausibility", "comparative", "drift", "impact"],
"supported_test_families": ["welch_t", "mann_whitney_u", "chi_square", "bootstrap_ci", "robust_zscore"]
}

C.3.2. stat_hypothesis_templates

Purpose: standardized test logic by metric family / signal type.

CREATE TABLE stat_hypothesis_templates (
template_id VARCHAR(60) PRIMARY KEY,
template_name VARCHAR(255) NOT NULL,
signal_type VARCHAR(100) NOT NULL,
metric_family VARCHAR(100),
default_test_family VARCHAR(80) NOT NULL,
fallback_test_family VARCHAR(80),
null_hypothesis_text TEXT NOT NULL,
alternative_hypothesis_text TEXT NOT NULL,
assumptions JSONB,
default_alpha NUMERIC(6,5) NOT NULL DEFAULT 0.05,
default_effect_size_floor NUMERIC(8,4),
bayesian_supported BOOLEAN NOT NULL DEFAULT FALSE,
bootstrap_supported BOOLEAN NOT NULL DEFAULT TRUE,
effect_size_required BOOLEAN NOT NULL DEFAULT TRUE,
explainability_template_id VARCHAR(60),
verifier_template_id VARCHAR(60),
status VARCHAR(30) NOT NULL CHECK (status IN ('active', 'deprecated', 'draft', 'archived')),
version VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.3. stat_significance_profiles

Purpose: policy-based thresholds rather than hardcoded alpha values.

CREATE TABLE stat_significance_profiles (
profile_id VARCHAR(60) PRIMARY KEY,
profile_name VARCHAR(255) NOT NULL,
alpha_default NUMERIC(6,5) NOT NULL,
minimum_sample_size INTEGER,
effect_size_floor NUMERIC(8,4),
bayesian_probability_threshold NUMERIC(6,5),
multiple_testing_policy VARCHAR(50),
confidence_interval_level NUMERIC(6,5) DEFAULT 0.95,
assumption_failure_policy VARCHAR(50) NOT NULL DEFAULT 'fallback',
inconclusive_policy VARCHAR(50) NOT NULL DEFAULT 'no_penalty',
verifier_review_required BOOLEAN NOT NULL DEFAULT FALSE,
human_approval_required BOOLEAN NOT NULL DEFAULT FALSE,
high_risk_override JSONB,
status VARCHAR(30) NOT NULL CHECK (status IN ('active', 'deprecated', 'draft', 'archived')),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.4. stat_benchmark_profiles

Purpose: define peer groups and benchmark construction logic.

CREATE TABLE stat_benchmark_profiles (
benchmark_profile_id VARCHAR(60) PRIMARY KEY,
benchmark_name VARCHAR(255) NOT NULL,
scope_type VARCHAR(50) NOT NULL CHECK (scope_type IN ('sector', 'geography', 'size_band', 'client_portfolio', 'custom')),
nace_codes JSONB,
geographies JSONB,
size_bands JSONB,
reporting_frameworks JSONB,
signal_filters JSONB,
inclusion_rules JSONB,
exclusion_rules JSONB,
minimum_peer_count INTEGER NOT NULL DEFAULT 20,
freshness_days INTEGER,
confidentiality_policy VARCHAR(50) NOT NULL DEFAULT 'aggregate_only',
status VARCHAR(30) NOT NULL CHECK (status IN ('active', 'draft', 'deprecated', 'archived')),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.5. stat_signal_policy_map

Purpose: map individual signals to statistical policies.

CREATE TABLE stat_signal_policy_map (
signal_id VARCHAR(120) PRIMARY KEY,
stat_test_eligible BOOLEAN NOT NULL DEFAULT FALSE,
preferred_mode VARCHAR(50),
preferred_test_family VARCHAR(80),
fallback_test_family VARCHAR(80),
hypothesis_template_id VARCHAR(60),
significance_profile_id VARCHAR(60),
benchmark_profile_id VARCHAR(60),
expected_distribution_type VARCHAR(50),
minimum_sample_size INTEGER,
requires_effect_size BOOLEAN NOT NULL DEFAULT TRUE,
multiple_testing_group VARCHAR(100),
escalation_policy_id VARCHAR(60),
explainability_template_id VARCHAR(60),
verifier_packet_required BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
CONSTRAINT fk_stat_hypothesis_template
FOREIGN KEY (hypothesis_template_id) REFERENCES stat_hypothesis_templates(template_id),
CONSTRAINT fk_stat_significance_profile
FOREIGN KEY (significance_profile_id) REFERENCES stat_significance_profiles(profile_id),
CONSTRAINT fk_stat_benchmark_profile
FOREIGN KEY (benchmark_profile_id) REFERENCES stat_benchmark_profiles(benchmark_profile_id)
);

C.3.6. stat_run_log

Purpose: immutable log of each statistical execution.

CREATE TABLE stat_run_log (
run_id VARCHAR(80) PRIMARY KEY,
engine_id VARCHAR(50) NOT NULL,
engine_version VARCHAR(30) NOT NULL,
signal_id VARCHAR(120) NOT NULL,
entity_id VARCHAR(80),
reporting_period VARCHAR(40),
mode VARCHAR(50) NOT NULL,
initiated_by VARCHAR(80) NOT NULL,
dataset_ref TEXT,
benchmark_profile_id VARCHAR(60),
hypothesis_template_id VARCHAR(60),
significance_profile_id VARCHAR(60),
test_family_requested VARCHAR(80),
test_family_used VARCHAR(80),
fallback_used BOOLEAN NOT NULL DEFAULT FALSE,
null_hypothesis_text TEXT,
alternative_hypothesis_text TEXT,
sample_metadata JSONB,
assumptions_metadata JSONB,
results_payload JSONB NOT NULL,
decision_class VARCHAR(50) NOT NULL,
trust_delta NUMERIC(8,4),
escalation_triggered BOOLEAN NOT NULL DEFAULT FALSE,
escalation_reason TEXT,
human_review_required BOOLEAN NOT NULL DEFAULT FALSE,
human_review_status VARCHAR(40),
altd_logged BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.7. stat_explainability_templates

Purpose: human-readable output templates for ZARA / ZAAM / verifiers.

CREATE TABLE stat_explainability_templates (
template_id VARCHAR(60) PRIMARY KEY,
audience_type VARCHAR(40) NOT NULL CHECK (audience_type IN ('user', 'verifier', 'board', 'internal_ops', 'agent')),
language_code VARCHAR(10) NOT NULL DEFAULT 'en',
template_text TEXT NOT NULL,
severity_mapping JSONB,
variable_schema JSONB,
status VARCHAR(30) NOT NULL CHECK (status IN ('active', 'draft', 'deprecated', 'archived')),
version VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.8. stat_test_catalog

Purpose: controlled list of allowed methods.

CREATE TABLE stat_test_catalog (
test_family_id VARCHAR(80) PRIMARY KEY,
readable_name VARCHAR(255) NOT NULL,
class_type VARCHAR(50) NOT NULL,
supports_small_samples BOOLEAN NOT NULL DEFAULT FALSE,
supports_non_normal BOOLEAN NOT NULL DEFAULT FALSE,
supports_missingness_robustness BOOLEAN NOT NULL DEFAULT FALSE,
supports_effect_size BOOLEAN NOT NULL DEFAULT TRUE,
supports_bootstrap BOOLEAN NOT NULL DEFAULT FALSE,
supports_bayesian BOOLEAN NOT NULL DEFAULT FALSE,
default_for_modes JSONB,
status VARCHAR(30) NOT NULL CHECK (status IN ('active', 'draft', 'deprecated', 'archived'))
);

C.4. JSON schema pack

Below is a practical API contract design.


C.4.1. Run request schema

{
"type": "object",
"required": ["engine_id", "mode", "signal_id", "initiated_by"],
"properties": {
"engine_id": { "type": "string", "enum": ["MEID_STAT01_v1"] },
"mode": {
"type": "string",
"enum": ["plausibility", "comparative", "drift", "impact", "assurance"]
},
"signal_id": { "type": "string" },
"entity_id": { "type": "string" },
"reporting_period": { "type": "string" },
"initiated_by": { "type": "string" },
"dataset_ref": { "type": "string" },
"benchmark_profile_id": { "type": "string" },
"hypothesis_template_id": { "type": "string" },
"significance_profile_id": { "type": "string" },
"test_family_requested": { "type": "string" },
"context": {
"type": "object",
"properties": {
"source_mix": { "type": "array", "items": { "type": "string" } },
"input_trust_score": { "type": "number", "minimum": 0, "maximum": 1 },
"estimation_flag": { "type": "boolean" },
"peer_group_override": { "type": "object" },
"sample_metadata": { "type": "object" }
}
}
}
}

C.4.2. Run response schema

{
"type": "object",
"required": [
"run_id",
"engine_id",
"signal_id",
"mode",
"decision_class",
"results",
"impact",
"audit"
],
"properties": {
"run_id": { "type": "string" },
"engine_id": { "type": "string" },
"engine_version": { "type": "string" },
"signal_id": { "type": "string" },
"entity_id": { "type": "string" },
"mode": { "type": "string" },
"test_family_used": { "type": "string" },
"fallback_used": { "type": "boolean" },
"decision_class": {
"type": "string",
"enum": [
"supports_null",
"rejects_null",
"inconclusive",
"insufficient_sample",
"assumption_failure",
"fallback_applied"
]
},
"results": {
"type": "object",
"properties": {
"p_value": { "type": ["number", "null"] },
"effect_size": { "type": ["number", "null"] },
"confidence_interval": {
"type": ["array", "null"],
"items": { "type": "number" },
"minItems": 2,
"maxItems": 2
},
"posterior_exceedance_probability": { "type": ["number", "null"] },
"test_statistic": { "type": ["number", "null"] },
"assumption_fit": { "type": "string" }
}
},
"impact": {
"type": "object",
"properties": {
"trust_delta": { "type": ["number", "null"] },
"risk_flag": { "type": "string" },
"escalation_triggered": { "type": "boolean" },
"recommended_action": { "type": "string" }
}
},
"explainability": {
"type": "object",
"properties": {
"user_message": { "type": "string" },
"verifier_message": { "type": "string" },
"board_message": { "type": "string" }
}
},
"audit": {
"type": "object",
"properties": {
"logged_to_altd": { "type": "boolean" },
"timestamp_utc": { "type": "string" },
"human_review_required": { "type": "boolean" }
}
}
}
}

C.4.3. Batch request schema

{
"type": "object",
"required": ["engine_id", "mode", "initiated_by", "items"],
"properties": {
"engine_id": { "type": "string" },
"mode": { "type": "string" },
"initiated_by": { "type": "string" },
"items": {
"type": "array",
"items": {
"type": "object",
"required": ["signal_id"],
"properties": {
"signal_id": { "type": "string" },
"entity_id": { "type": "string" },
"reporting_period": { "type": "string" },
"dataset_ref": { "type": "string" }
}
}
}
}
}

C.5. SSSR field additions

Because SSSR is the correct place for signal-level intelligence, routing metadata, and structured lookup behavior in ZAYAZ, the statistical layer should be attached there rather than scattered across engine configs. 

C.5.1. New SSSR fields for statistical readiness

Add these fields to the signal metadata layer:

{
"stat_test_eligible": true,
"stat_priority_level": "high",
"stat_metric_family": "climate_energy",
"preferred_stat_mode": "plausibility",
"recommended_test_families": ["welch_t", "bootstrap_ci", "robust_zscore"],
"expected_distribution_type": "right_skewed",
"minimum_sample_policy": {
"preferred_min_n": 20,
"absolute_min_n": 8
},
"benchmark_profile_id": "BENCH_NACE_C25_EU",
"hypothesis_template_id": "HT_SCOPE2_PLAUS_001",
"significance_profile_id": "SIGPROF_SCOPE2_STANDARD",
"effect_size_required": true,
"multiple_testing_group": "esrs_e1_energy",
"stat_explainability_template_id": "STAT_USER_GENERIC_001",
"verifier_packet_required": true,
"stat_retest_cooldown_days": 30,
"stat_escalation_policy_id": "ESC_STAT_HIGH_SCOPE2"
}

C.5.2. Strong recommendation

Do not add just one boolean like supports_statistics. That would be too weak. Use a structured object so:

  • routing stays deterministic
  • governance stays inspectable
  • benchmark strategies stay versioned

C.6. VTE integration logic

C.6.1. Principle

STAT should contribute a bounded trust evidence component into VTE, not replace provenance, document validation, or verifier approval.

C.6.2. Proposed VTE composition

Final_Trust_Score =
0.30 * provenance_component
+ 0.20 * structural_validation_component
+ 0.15 * verifier_component
+ 0.15 * historical_consistency_component
+ 0.10 * statistical_trust_component
+ 0.10 * ai_origin_adjustment_component

This is just a starting balance. For ZAYAZ, we should keep STAT at 10% to 15% max in early phases.

C.6.3. Statistical trust component formula

STC =
0.20 * decision_strength
+ 0.15 * effect_size_quality
+ 0.15 * sample_adequacy
+ 0.10 * assumption_quality
+ 0.10 * completeness_quality
+ 0.10 * peer_group_fit
+ 0.10 * source_integrity_interaction
+ 0.10 * anomaly_history_modifier

Normalize to 0.00–1.00.

C.6.4. Decision strength mapping

{
"supports_null_strong": 0.95,
"supports_null_moderate": 0.82,
"inconclusive": 0.58,
"rejects_null_moderate": 0.35,
"rejects_null_strong": 0.15,
"insufficient_sample": 0.50,
"assumption_failure": 0.52
}

C.6.5. Trust delta rule

STAT should output both:

  • statistical_trust_component
  • suggested_trust_delta

Suggested rule:

suggested_trust_delta = (STC - 0.70) * 0.20

Then cap:

  • minimum delta: -0.15
  • maximum delta: +0.08

This prevents statistics from overpowering the total trust score.

C.6.6. Escalation thresholds

Example:

  • trust_delta <= -0.10 and effect_size >= 0.6 → verifier review
  • repeated anomaly 3 periods in a row → high-risk escalation
  • inconclusive + verified source → no penalty
  • assumption failure + missing benchmark → route to descriptive-only mode

C.6.7. Pseudocode

def compute_statistical_trust_component(run):
decision_strength = map_decision_strength(run.decision_class, run.results)
effect_size_quality = map_effect_size(run.results.get("effect_size"))
sample_adequacy = map_sample_quality(run.sample_metadata)
assumption_quality = map_assumption_fit(run.results.get("assumption_fit"))
completeness_quality = map_completeness(run.sample_metadata)
peer_group_fit = map_peer_group_fit(run.sample_metadata)
source_integrity_interaction = map_source_integrity(run.context)
anomaly_history_modifier = map_history(run.entity_id, run.signal_id)

stc = (
0.20 * decision_strength +
0.15 * effect_size_quality +
0.15 * sample_adequacy +
0.10 * assumption_quality +
0.10 * completeness_quality +
0.10 * peer_group_fit +
0.10 * source_integrity_interaction +
0.10 * anomaly_history_modifier
)

return round(min(max(stc, 0.0), 1.0), 4)

C.7. Example implementation logic for 5 ESRS-relevant metric families

These are not legal ESRS interpretations. They are implementation archetypes for statistical support inside ZAYAZ.


C.7.1. Family A: Scope 2 electricity / energy emissions

Typical signals

  • electricity consumption
  • location-based Scope 2
  • market-based Scope 2
  • energy intensity ratio

Best modes

  • plausibility
  • comparative
  • drift

Preferred tests

  • Welch t-test
  • robust z-score
  • bootstrap confidence interval

Hypothesis example

  • H0: entity value is consistent with sector/geography peer baseline
  • H1: entity value differs materially from peer baseline

Signal policy example

{
"signal_id": "ghg_scope2_market_based",
"preferred_stat_mode": "plausibility",
"recommended_test_families": ["welch_t", "bootstrap_ci", "robust_zscore"],
"expected_distribution_type": "right_skewed",
"minimum_sample_policy": {
"preferred_min_n": 20,
"absolute_min_n": 8
},
"effect_size_required": true,
"verifier_packet_required": true
}

Notes

This is one of the strongest early candidates because it is recurring, numeric, and highly comparable.


C.7.2. Family B: Scope 3 business travel / upstream transport

Typical signals

  • flight emissions
  • travel activity values
  • freight emissions
  • transport intensity

Best modes

  • plausibility
  • comparative
  • impact

Preferred tests

  • Mann–Whitney U
  • bootstrap CI
  • changepoint detection for trends

Hypothesis example

  • H0: travel-related emission intensity is unchanged from prior operating profile
  • H1: a meaningful shift occurred

Special caution

These metrics can be structurally volatile. Therefore:

  • stronger effect-size thresholds
  • more tolerant anomaly penalties
  • more emphasis on trend context than one-off outliers

C.7.3. Family C: Water withdrawal / discharge

Typical signals

  • total water withdrawn
  • recycled water share
  • water intensity per production unit
  • discharge volume

Best modes

  • plausibility
  • comparative
  • impact

Preferred tests

  • Welch t-test
  • paired test for pre/post interventions
  • bootstrap CI

Hypothesis example

  • H0: water intensity after intervention is unchanged
  • H1: water intensity decreased meaningfully after intervention

High-value use

Very good for demonstrating measurable change after capex, policy, or operational changes.


C.7.4. Family D: Waste and circularity

Typical signals

  • hazardous waste
  • non-hazardous waste
  • diverted from disposal
  • recycled fraction
  • circular material use ratios

Best modes

  • plausibility
  • comparative
  • drift

Preferred tests

  • Mann–Whitney U
  • chi-square for disposal category proportions
  • bootstrap CI

Hypothesis example

  • H0: waste diversion pattern is consistent with prior validated pattern
  • H1: waste diversion pattern differs materially

Notes

This family is often skewed and operationally messy. Robust and non-parametric methods should dominate.


C.7.5. Family E: Workforce safety / social ratios

Typical signals

  • injury rate
  • lost-time incident rate
  • turnover
  • diversity proportions
  • training completion ratios

Best modes

  • comparative
  • drift
  • impact

Preferred tests

  • z-test for proportions
  • Fisher exact test
  • chi-square
  • change-point or rolling drift methods

Hypothesis example

  • H0: injury rate proportion is consistent with prior baseline
  • H1: injury rate changed materially

Notes

For social metrics, category and rate tests matter more than continuous-value comparisons.


C.8. Example seeded records

C.8.1. Example significance profile

{
"profile_id": "SIGPROF_SCOPE2_STANDARD",
"profile_name": "Scope 2 Standard Statistical Review",
"alpha_default": 0.01,
"minimum_sample_size": 12,
"effect_size_floor": 0.40,
"bayesian_probability_threshold": 0.95,
"multiple_testing_policy": "benjamini_hochberg",
"confidence_interval_level": 0.95,
"assumption_failure_policy": "fallback",
"inconclusive_policy": "no_penalty",
"verifier_review_required": false,
"human_approval_required": false
}

C.8.2. Example hypothesis template

{
"template_id": "HT_SCOPE2_PLAUS_001",
"template_name": "Scope 2 Peer Plausibility Check",
"signal_type": "ghg_emission",
"metric_family": "climate_energy",
"default_test_family": "welch_t",
"fallback_test_family": "bootstrap_ci",
"null_hypothesis_text": "The reported Scope 2 value is consistent with the expected peer baseline for comparable entities.",
"alternative_hypothesis_text": "The reported Scope 2 value differs materially from the expected peer baseline for comparable entities.",
"default_alpha": 0.01,
"default_effect_size_floor": 0.40,
"bayesian_supported": true,
"bootstrap_supported": true,
"effect_size_required": true
}

C.9. API endpoint draft

POST /api/mice/stat/run

Example request

{
"engine_id": "MEID_STAT01_v1",
"mode": "plausibility",
"signal_id": "ghg_scope2_market_based",
"entity_id": "eco196123456789",
"reporting_period": "2025",
"initiated_by": "dice_auto_rule",
"dataset_ref": "zar://dataset/scope2/2025/entity/eco196123456789",
"benchmark_profile_id": "BENCH_NACE_C25_EU",
"hypothesis_template_id": "HT_SCOPE2_PLAUS_001",
"significance_profile_id": "SIGPROF_SCOPE2_STANDARD",
"context": {
"source_mix": ["erp", "invoice"],
"input_trust_score": 0.86,
"estimation_flag": false
}
}

Example response

{
"run_id": "statrun-000001",
"engine_id": "MEID_STAT01_v1",
"engine_version": "1.0.0",
"signal_id": "ghg_scope2_market_based",
"entity_id": "eco196123456789",
"mode": "plausibility",
"test_family_used": "welch_t",
"fallback_used": false,
"decision_class": "rejects_null",
"results": {
"p_value": 0.0041,
"effect_size": 0.68,
"confidence_interval": [0.21, 0.49],
"posterior_exceedance_probability": 0.973,
"test_statistic": 2.91,
"assumption_fit": "moderate"
},
"impact": {
"trust_delta": -0.11,
"risk_flag": "high",
"escalation_triggered": true,
"recommended_action": "verifier_review"
},
"explainability": {
"user_message": "This value appears statistically unusual compared with similar entities in the selected benchmark.",
"verifier_message": "Observed Scope 2 value materially exceeds peer baseline under the active profile.",
"board_message": "A statistically significant deviation has been detected and should be reviewed before final disclosure."
},
"audit": {
"logged_to_altd": true,
"timestamp_utc": "2026-04-04T09:12:14Z",
"human_review_required": false
}
}

C.10. Governance controls

Because ZAYAZ already has a formal AI governance charter, validation SOP, retraining log model, and risk register concept, the STAT engine should be onboarded through that same discipline rather than introduced as an informal utility. 

Required controls for launch

  • register MEID_STAT01_v1 in engine registry
  • assign risk level
  • define validation frequency
  • define fallback and failure policies
  • require ALTD logging for material runs
  • define human-review thresholds
  • define statistical method approval list
  • prohibit silent threshold changes

Recommended initial risk classification

  • Medium for passive advisory/statistical evidence
  • High if directly driving trust score changes for compliance-critical disclosures
  • High if used in automated verifier escalation or AI self-healing actions

Phase 0

Schema-only

  • create tables
  • seed 3 test families
  • seed 2 significance profiles
  • add SSSR metadata fields
  • no user-facing outputs yet

Phase 1

Passive evidence mode

  • run STAT after DICE for selected climate/energy signals
  • write outputs to ALTD
  • do not alter visible trust score yet
  • expose only to internal ops and verifier sandbox

Phase 2

Bounded VTE integration

  • allow limited trust delta
  • enable ZARA explanations
  • enable internal dashboard flags

Phase 3

Verifier-facing support

  • assurance packets
  • review queues
  • batch scans before report export

Phase 4

Advanced modes

  • Bayesian support
  • intervention-effect mode
  • drift support for AI governance and telemetry

I would begin with 12 to 20 signals max.

Best first set:

  • Scope 2 market-based emissions
  • Scope 2 location-based emissions
  • electricity consumption
  • fuel consumption
  • energy intensity
  • water withdrawal
  • water intensity
  • hazardous waste
  • non-hazardous waste
  • waste diversion rate
  • LTIR or equivalent injury rate
  • employee turnover ratio

This is enough to validate the architecture without creating test sprawl.


C.13. Final architecture recommendation

The cleanest long-term pattern is this:

  • SSSR owns eligibility and mapping
  • STAT owns inference
  • VTE owns trust interpretation
  • ZARA/ZAAM own explanation
  • ALTD owns evidence trail
  • AI governance owns approval boundaries

That keeps ZAYAZ modular, future-proof, and defensible under audit and regulatory scrutiny. It also fits the platform’s existing decomposition into registries, agents, trust layers, micro-engines, and governed workflows.    




GitHub RepoRequest for Change (RFC)