Jira progress: loading…

simz-dev

Statistical Inference Engine

1. Identity

<Identity meid="MEID_STAT01" />

Depends on module:

Canonical computation and modeling domain for ESG calculations, simulations, extrapolation, aggregation, normalization, and decision-grade metric synthesis. Provides governed, auditable compute services to other modules (e.g., Reporting, Risk, Net Zero, ZARA).

Domain:

computation-hub

Category:

analytics-modeling

Classification:

module

Lifecycle status:

active

Semver:

1.0.0

Introduced in:

v0.3

Governance

AI risk level:

high

Trust threshold:

0.9

Human review required:

true

Verifier involved:

false

Audit required:

true

Ownership

Primary owner:

Platform

Architecture board:

true

White-label allowed:

true

Entrypoints

Docs:

/computation-hub

UI:

/app/computation-hub

API:

/api/computation-hub

Dependencies

Modules

sis
input-hub

Unresolved tokens

BUME
SEM
VTE

Engines (declared)

DICE
DaVE
VTE
SEM
BUME

Micro-engines (from registry)

Autonomous Assurance Engine · MEID_ASRE_AAE
AgriVision Engine · ...
AP Connector · ...
Air Transport Engine · ...
BOM Interpolator · ...
Carbon Budget Calculator · MEID_CALC_CARBON_BUDGET
CARBIE Bloom Index · ...
Carbon Credit Price Engine · ...
Carbon Finance Engine · ...
Chemical Engine · ...
Delta Calculator · MEID_CALC_DELTA
DSME Sync Engine · ...
EFDB Reader · ...
Energy Aggregation · MEID_CALC_ENERGY_AGGR
Electricity Consolidation · MEID_CALC_ENERGY_ELEC
Fuel Consolidation · MEID_CALC_ENERGY_FUEL
Energy Intensity Calculator · MEID_CALC_ENERGY_INT
Renewable Energy Calculator · MEID_CALC_ENERGY_RE
Energy Share / Composition Calculator · MEID_CALC_ENERGY_SHARE
EPD Exchange Engine · ...
EWC Hazard Classification · MEID_TRANS_WASTE_HAZ_CLASSIFY
Forecaster LSTM · ...
Farm Emissions Engine · ...
FUME Fuel Engine · ...
GHG Aggregation Calculator · MEID_CALC_GHG_AGGR
GHG Scenario Alignment · MEID_SCEN_GHG_ALIGN
GHG Intensity Calculator · MEID_CALC_GHG_INT
GHG Scenario Align · ...
GHG Share / Composition Calculator · MEID_CALC_GHG_SHARE
Governance Scoring · ...
Machinery Analyzer · ...
Invoice Emissions · ...
IQ Air · ...
LCA Optimizer · ...
Legal Compliance Mod · ...
Contract Engine · ...
Micro Engines - General Info
MICE — Execution Model
NLP Document Parser · ...
PDF Vector Engine · ...
PEF Micro Engine · ...
Real-World Align Engine · ...
Scope 1 Consolidation · MEID_CALC_GHG_SCOPE1
Scope 2 Engine · ...
Scope 2 Market / Location Adjustment · MEID_CALC_GHG_SCOPE2_ADJ
Scope 2 Consolidation · MEID_CALC_GHG_SCOPE2
Scope 3 Category Aggregation · MEID_CALC_GHG_SCOPE3_CAT
Scope 3 Consolidation · MEID_CALC_GHG_SCOPE3
Split / Allocation Calculator · MEID_CALC_SPLIT
Share / Ratio Calculator · MEID_CALC_SHARE
Tagged Accounting Crawler · MEID_ACCT_CRAWLER
TrustGate Rubric · ...
Distance-to-Target / Gap Calculator · MEID_CALC_TARGET_GAP
Transition ROI Calculator · MEID_CALC_TRANSITION_ROI
Water Mgmt Engine · ...
Waste Generation Aggregation · MEID_CALC_WASTE_AGGR
Waste Treatment Classification · MEID_TRANS_WASTE_TREATMENT_CLASSIFY
Waste Diversion Rate Calculator · MEID_CALC_WASTE_DIV
Waste Intensity Calculator · MEID_CALC_WASTE_INT
Waste Share / Composition Calculator · MEID_CALC_WASTE_SHARE
Water Withdrawal Aggregation · MEID_CALC_WATER_AGGR
Water Consumption Calculator · MEID_CALC_WATER_CONS
Water Discharge Aggregation · MEID_CALC_WATER_DISC
Water Intensity Calculator · MEID_CALC_WATER_INT
Water Share / Composition Calculator · MEID_CALC_WATER_SHARE
Water Stress Allocation & Exposure · MEID_CALC_WATER_STRESS
ZACC Engine · ...

Micro-engines (declared)

None

Signals

USO

DATA.COMPUTATION
DATA.VALIDATION
DATA.EXTRAPOLATION
MODEL.SIMULATION
MODEL.UNCERTAINTY

CSI

CSI_COMPUTATION_HUB

SSSR tags

computation
modeling
simulation
monte-carlo
bayesian
extrapolation
aggregation
normalization
validation
mice

Workflows & Outputs

Workflows

MicroEngineRouting
RuleBasedComputeDispatch
ScenarioModeling
UncertaintyQuantification
AggregationAndRollups
NormalizationAndBenchmarking
ExtrapolationAndGapFilling
ComputeAuditAndReplay

Outputs

computed_metrics
scenario_ranges
confidence_intervals
normalized_values
validation_findings
trust_scored_compute_outputs

Audit

Ledger:

ALTD

Replay supported:

true

PII policy:

no_pii_in_omr

Background

Hypothesis testing can become a core trust, validation, and intelligence layer inside ZAYAZ if implemented correctly.

But it needs to be deeply integrated into the architecture (MICE, DICE, VTE, ZARA, SIS) — not treated as a generic statistics feature.

🧠 Where Hypothesis Testing Fits in ZAYAZ (Strategic View)

ZAYAZ is already:

Signal-driven (SSSR)
Validation-driven (DICE, DaVE, VTE)
AI-governed and audit-logged (ALTD, AIGS)

Hypothesis testing adds a formal statistical decision layer on top of this:

👉 From:

“This looks wrong / anomalous”

👉 To:

“We reject H₀ with 99% confidence: this supplier’s emissions trend is statistically inconsistent with historical baseline”

That shift is massive for CSRD-grade credibility.

🔧 Core Use Cases (High-Impact)

1. 📊 Data Validation & Anomaly Detection (DICE + VTE Enhancement)

Problem today:

Outliers flagged heuristically
Trust scores based on rules + AI

Add hypothesis testing:

Null hypothesis (H₀): Data follows expected distribution
Alternative (H₁): Data deviates significantly

Example:

Electricity consumption spike
Test: Z-score / t-test / Bayesian posterior deviation

Result:

Instead of “flagged anomaly”
You get:
p-value
confidence interval
statistical justification

👉 This directly strengthens auditability + verifier trust

2. 🔍 Scope 3 Estimation Validation (SEM + Bayesian Engines)

You already:

Use SEM extrapolation in ZARA
Run probabilistic modeling (Computation Hub)

Hypothesis testing can:

Validate extrapolated values vs known distributions
Compare supplier-reported vs model-estimated values

Example:

H₀: Supplier emissions = expected industry mean
H₁: Supplier deviates significantly

👉 Output:

“Supplier emissions 27% higher than sector baseline (p < 0.01)”

This becomes:

Benchmarking
Risk scoring
Materiality signal

3. 🏭 Supplier / Value Chain Benchmarking Engine

Integrate into:

NORM + AGGR + RMAP micro engines

Hypothesis-driven benchmarking:

Compare:
Company vs sector (NACE)
Supplier vs peers
Region vs global baseline

Example:

H₀: Company is within normal emission range for NACE code
Reject → triggers:
Risk flag
Governance escalation
ZARA explanation

👉 This is next-gen ESG intelligence, not just reporting

4. 📈 Impact Measurement & ESG Strategy Validation

ZAYAZ tracks:

Goals, KPIs, timelines

Hypothesis testing enables:

Example:

“Did our sustainability initiative reduce emissions?”
H₀: No effect
H₁: Reduction occurred

Use:

A/B testing sustainability actions
Policy effectiveness validation
CAPEX justification

👉 This is financial + ESG convergence

5. 🧪 AI Governance & Model Validation (CRITICAL)

You already require:

Model validation
Drift detection
audit logs

Hypothesis testing should be embedded into:

AI Validation SOP:

H₀: Model performance unchanged
H₁: Model drift detected

Used for:

Retraining triggers
Bias detection
Model degradation alerts

👉 This aligns perfectly with:

EU AI Act
CSRD AI traceability

6. 🎯 Materiality & Stakeholder Intelligence (SEEL)

Materiality today:

Qualitative + scoring

Add:

Statistical significance testing on stakeholder inputs

Example:

H₀: Issue not materially significant
H₁: Stakeholder concern statistically significant

👉 Outcome:

Defensible double materiality decisions

🧱 Architectural Implementation (ZAYAZ-Native)

New Micro Engine Category

Add:

MEID_STATXX_v1 → Statistical Inference Engine

Core capabilities:

Hypothesis testing (t-test, chi-square, Bayesian)
Confidence intervals
Distribution fitting
Significance scoring

Integrates with:

CALC (computation)
VALI (validation)
CFIL (confidence filter)

Integration Map

Layer	Role
SSSR	Define expected distributions per signal
MICE (STAT Engine)	Run tests
DICE	Combine with rule validation
VTE	Convert to trust score impact
ZARA	Explain results in plain language
ALTD	Log hypothesis + result for audit

Example Data Structure

{
  "signal_id": "ghg_scope2_energy",
  "test_type": "z_test",
  "null_hypothesis": "within_expected_range",
  "p_value": 0.003,
  "confidence": 0.997,
  "result": "reject_null",
  "impact": "high_risk_flag",
  "logged_at": "timestamp"
}

🚀 Strategic Advantage (This is the Big One)

If you implement this properly, ZAYAZ becomes:

❌ Not:

ESG reporting tool
ESG data platform

✅ But:

A statistically defensible ESG decision engine

That means:

Verifiers trust it
Regulators respect it
CFOs use it
Boards rely on it

⚠️ Important Caveats (Design Principles)

“Precision Before Automation”

You already state this clearly

👉 Hypothesis testing must:

Use validated distributions (IPCC, Ecoinvent, etc.)
Be explainable
Never be “black box stats”

Avoid Misuse

Not all ESG data fits classical stats:

Small sample sizes
Missing data
Non-normal distributions

👉 Solution:

Bayesian methods > classical in many cases

UX is Critical (ZARA Layer)

Never show:

“p = 0.03”

Instead:

“This value is statistically unlikely compared to expected patterns (97% confidence).”

🔮 Next-Level Extensions (Where This Gets Powerful)

Real-time anomaly detection across supply chains
Predictive compliance breach detection
Dynamic ESG risk pricing (insurance / finance)
Automated auditor assistance (“statistical red flags”)
Carbon market validation (credit integrity scoring)

🧭 Final Verdict

👉 Hypothesis testing is not just useful — it is a foundational upgrade to:

Trust engine (VTE)
Validation layer (DICE)
AI governance (AIGS)
Decision intelligence (ZARA)

1. Engine Identity

Engine ID: MEID_STAT01_v1
Readable Name: Statistical Inference Engine
Category Code: STAT
Domain: Statistical validation, inference, significance testing, uncertainty quantification
Primary Hub: Computation Hub
Secondary Consumers: Input Hub, Reports & Insights Hub, Shared Intelligence Stack
Lifecycle Status: Proposed
Risk Class: Medium by default, High when outputs influence compliance decisions, verifier workflows, or AI-triggered escalations. This fits ZAYAZ’s AI governance model, where higher-impact AI/statistical modules require stronger oversight and logging.

2. Strategic Purpose

MEID_STAT01_v1 provides a formal statistical decision layer for ZAYAZ.

It converts weak statements like:

“this looks abnormal”
“this trend may be suspicious”
“this estimate seems high”

into structured, defensible outputs like:

“baseline consistency rejected at significance threshold 0.01”
“reported value falls outside expected peer distribution”
“post-intervention change is statistically meaningful”
“model drift likely based on distributional shift”

This engine is not meant to replace rule validation, AI reasoning, or verifier judgment. It is meant to strengthen them with reproducible inference.

3. Placement in ZAYAZ Architecture

3.1 Role in MICE

ZAYAZ already defines Micro Engines as modular computation units invoked through structured routing and rule logic, with categories such as CALC, VALI, NORM, AGGR, RCAS, and CFIL. MEID_STAT01_v1 should be added as a first-class engine category in that same family.

3.2 Architectural Dependencies

Consumes from:

SSSR signal metadata
NACE / sector / geography context
historical observations
peer benchmarks
input-source trust metadata
DICE validation outputs
telemetry event streams
AI model validation data

Feeds into:

DICE
VTE / trust logic
ZARA explanations
ZAAM inline guidance
ALTD audit trails
Reports Hub visualizations
AI governance validation logs

3.3 Positioning

DICE answers: “Is the data structurally valid?”
STAT answers: “Is the data statistically credible?”
VTE answers: “How should that affect trust?”
ZARA answers: “What does it mean in human language?”
ALTD answers: “Can we prove what happened later?”

That separation matches the modular and governed approach already defined across ZAYAZ.

4. Core Objectives

MEID_STAT01_v1 shall support six primary objectives:

Outlier and anomaly significance testing For values, trends, ratios, and distributions.
Benchmark comparison testing Company vs peer group, site vs portfolio, supplier vs sector baseline, region vs region.
Pre/post intervention testing To assess whether a policy, capex action, training, or operational change had measurable effect.
Estimation plausibility testing Especially for SEM/extrapolated values, inferred Scope 3 values, and partially imputed datasets.
Distribution shift / drift detection For AI model governance, telemetry quality, or changing operational patterns.
Uncertainty packaging Confidence intervals, credible intervals, posterior probabilities, support strength, and test quality metadata.

5. Supported Use Cases

5.1 ESG Data Quality

unusual energy use
improbable waste intensity
water-use spikes
unsafe year-over-year jumps
inconsistent supplier reporting

5.2 Scope 1, 2, 3 Emissions

compare reported emissions to expected ranges by activity / NACE / geography
test whether supplier-reported factors are materially inconsistent with known baselines
test whether modeled estimates are statistically plausible

5.3 Materiality and Stakeholder Signals

identify whether stakeholder issue concentration is statistically meaningful across segments
detect whether issue salience differs by geography or stakeholder class

5.4 AI Governance

ZAYAZ’s governance material already requires structured validation, drift checks, supervised retraining, logging, and human oversight for more critical modules. STAT should become one of the standard inference backbones behind those checks.

5.5 Verification Support

produce machine-readable “statistical red flags”
support verifiers with ranked review candidates
distinguish “structurally valid but statistically suspicious” from “invalid”

6. Operating Principles

6.1 Precision Before Automation

The ZAYAZ manual explicitly prioritizes trust, explainability, and traceability over blind automation. STAT must inherit that principle directly.

6.2 No Silent Statistical Decisions

Every material test must log:

test type
hypothesis definition
sample context
thresholds used
assumptions
result
confidence / support level
downstream effect

6.3 Test Appropriateness First

The engine must not force classical null-hypothesis testing where assumptions are weak. In many ESG contexts:

samples are small
data is skewed
observations are missing
sources are mixed quality
peer groups are uneven

Therefore the engine must support both:

classical tests
Bayesian / resampling / robust alternatives

6.4 Explainability Layer Required

No raw p-values should be surfaced to ordinary users without contextual interpretation. ZARA/ZAAM should translate them into decision-grade language. This aligns with ZAYAZ’s agent architecture and explainability goals.

7. Supported Test Families

7.1 Baseline Statistical Families

Difference tests

one-sample t-test
two-sample t-test
Welch t-test
paired t-test
Mann–Whitney U
Wilcoxon signed-rank

Proportion / categorical tests

chi-square goodness-of-fit
chi-square independence
Fisher exact test
z-test for proportions

Distribution / consistency tests

Kolmogorov–Smirnov
Anderson–Darling
Shapiro-style normality checks only for internal suitability checks
population stability / drift indices

Variance / dispersion tests

Levene / Brown-Forsythe
F-test only when justified

Time-series / change tests

changepoint detection
CUSUM-type detection
drift detection on residuals
rolling z-score / robust z-score

7.2 Bayesian / Robust Families

Preferred for many ESG applications:

posterior probability of exceedance
Bayesian mean comparison
credible intervals
prior-updated peer expectation models
bootstrap confidence intervals
permutation testing
robust median-based deviation testing

7.3 Special ZAYAZ Modes

Plausibility mode For checking if a value is plausible under known distributions.

Comparative mode For comparing entities, suppliers, sites, or years.

Impact mode For testing if a change initiative had measurable effect.

Drift mode For model governance and telemetry monitoring.

Assurance mode For verifier-facing support packets.

8. Input Contract

8.1 Required Input Envelope

{
  "engine_id": "MEID_STAT01_v1",
  "mode": "plausibility",
  "signal_id": "ghg_scope2_market_based",
  "entity_id": "eco196123456789",
  "reporting_period": "2025",
  "comparison_scope": {
    "peer_group_id": "nace_c25_eu_midcap",
    "geography": "EU",
    "sector_code": "C25"
  },
  "dataset_ref": "zar://dataset/....",
  "hypothesis_template_id": "HT_STAT_PLAUS_003",
  "significance_profile_id": "SIGPROF_SCOPE2_STANDARD",
  "context": {
    "source_mix": ["erp", "invoice", "manual"],
    "sample_size": 38,
    "input_trust_score": 0.84,
    "estimation_flag": false
  }
}

8.2 Input Sources

structured tables
registry-linked observations
time-series
peer benchmark extracts
imputation outputs
AI model validation logs
verifier-reviewed samples

8.3 Required Metadata from SSSR

Each signal eligible for STAT must support additional metadata in SSSR:

stat_test_eligible
recommended_test_families
expected_distribution_type
minimum_sample_policy
peer_group_strategy
significance_profile_default
escalation_policy_id
explainability_template_id

SSSR already functions as the smart metadata backbone for signals, so this is the correct place to store statistical eligibility and routing metadata.

9. Hypothesis Template Registry

A dedicated registry should be created, for example:

stat_hypothesis_templates

Suggested fields:

template_id
template_name
signal_type
test_family
null_hypothesis_text
alternative_hypothesis_text
assumptions
fallback_test_family
default_alpha
bayesian_supported
effect_size_required
human_explanation_template
verifier_explanation_template
status
version

Example

{
  "template_id": "HT_STAT_PLAUS_003",
  "signal_type": "energy_intensity",
  "test_family": "robust_zscore_plus_bootstrap",
  "null_hypothesis_text": "The observed value is consistent with expected peer-range behavior for the selected comparison scope.",
  "alternative_hypothesis_text": "The observed value is not consistent with expected peer-range behavior.",
  "default_alpha": 0.01,
  "bayesian_supported": true,
  "effect_size_required": true
}

10. Significance Profiles

Statistical thresholds should not be hardcoded globally. They should be controlled by policy profiles.

stat_significance_profiles

Fields:

profile_id
name
alpha_default
effect_size_floor
minimum_sample_size
multiple_testing_policy
bayesian_probability_threshold
high_risk_override
verifier_review_required
human_approval_required
status

Example profiles

SIGPROF_LOW_IMPACT_MONITORING
SIGPROF_SCOPE3_ESTIMATION
SIGPROF_AUDIT_ESCALATION
SIGPROF_AI_DRIFT_HIGH_RISK

This is consistent with ZAYAZ’s governance pattern of explicit thresholds, risk registers, and structured review gates.

11. Output Contract

11.1 Standard Output

{
  "engine_id": "MEID_STAT01_v1",
  "run_id": "statrun-2026-04-04-000184",
  "signal_id": "ghg_scope2_market_based",
  "mode": "plausibility",
  "test_family_used": "welch_t_plus_bootstrap",
  "hypothesis": {
    "null": "Observed value is consistent with peer baseline.",
    "alternative": "Observed value differs materially from peer baseline."
  },
  "sample": {
    "n_observed": 38,
    "n_peer": 412,
    "quality_flag": "acceptable"
  },
  "results": {
    "decision": "reject_null",
    "p_value": 0.004,
    "effect_size": 0.71,
    "confidence_interval": [0.19, 0.54],
    "posterior_exceedance_probability": 0.973
  },
  "quality": {
    "assumption_fit": "moderate",
    "fallback_used": true,
    "multiple_testing_adjusted": false
  },
  "impact": {
    "trust_delta": -0.12,
    "risk_flag": "high",
    "escalation_triggered": true,
    "recommended_action": "verifier_review"
  },
  "explainability": {
    "user_message": "This value is statistically unusual compared with similar entities in the selected benchmark group.",
    "verifier_message": "Observed value materially exceeds expected peer baseline under selected comparison profile."
  },
  "audit": {
    "logged_to_altd": true,
    "model_or_engine_version": "MEID_STAT01_v1",
    "timestamp_utc": "2026-04-04T08:12:14Z"
  }
}

11.2 Output Classes

supports_null
rejects_null
inconclusive
insufficient_sample
assumption_failure
fallback_applied

The engine must explicitly distinguish “no evidence of difference” from “not enough evidence.”

12. Routing and Invocation Logic

12.1 Trigger Sources

DICE validation suspicion
VTE trust reassessment
ZARA prompted analysis
ZAAM inline agent help
telemetry anomaly event
verifier workflow request
scheduled periodic scans
AI validation schedule

12.2 ZADIF / Rule Engine Invocation

The ZAYAZ architecture already uses routing logic, agent dispatching, and rule-driven activation. STAT should be routable through the same dispatch pattern.

Example:

{
  "dispatch_condition": {
    "signal_type": "ghg_emission",
    "trust_score_below": 0.88,
    "reporting_context": "csrd",
    "sample_size_min": 12
  },
  "target_engine": "MEID_STAT01_v1",
  "mode": "plausibility"
}

13. Interaction with DICE and VTE

13.1 DICE

DICE remains first-pass structural validator. STAT should only run when:

data passes minimum structural validation, or
DICE explicitly requests a statistical diagnostic path

13.2 VTE

STAT should not directly set trust scores. It should output structured evidence that VTE uses.

Recommended contribution model:

abnormality severity
effect size
assumption quality
sample adequacy
source-trust interaction
repeat anomaly history

That keeps inference and trust logic separate, which is cleaner and more auditable.

14. Interaction with ZARA and ZAAM

14.1 ZARA

ZARA should use STAT outputs for narrative explanations such as:

why a signal is suspicious
why a benchmark comparison matters
why additional evidence is required

ZARA is already positioned as the prompt-driven orchestration and reporting intelligence layer, so it should consume STAT outputs as explainable evidence objects.

14.2 ZAAM

ZAAM agents can trigger or explain STAT in context:

Form Assistant: “This value looks unusually high for your peer group.”
Compliance Scout: “This trend could require additional disclosure attention.”
Trust Guardian: “The statistical evidence reduced trust because the result diverges from prior validated patterns.”

That aligns with ZAAM’s scoped, trust-aware, role-aware agent model.

15. Audit and Governance Requirements

All material STAT runs must write to ALTD-compatible audit records. ZAYAZ already emphasizes tamper detection, audit readiness, and governed AI lifecycle records.

Required audit fields

run_id
engine_id
engine_version
signal_id
entity_id
dataset_ref
test_family_used
hypothesis_template_id
significance_profile_id
thresholds_applied
result_decision
human_review_required
downstream_effects
initiated_by
timestamp_utc

Governance flags

used_for_compliance_decision
used_for_ai_validation
verifier_visible
human_approved
high_risk_path

16. Failure Modes and Fallbacks

16.1 Failure Classes

insufficient sample
missing benchmark group
incompatible distribution assumptions
poor data completeness
excessive missingness
conflicting source populations
no approved significance profile

16.2 Fallback Policy

Fallback order:

preferred test family
robust non-parametric equivalent
bootstrap/permutation
descriptive-only mode
escalate as “statistical inconclusive”

The engine must never fabricate certainty.

17. Security and Compliance

17.1 Privacy

STAT outputs must avoid exposing unnecessary peer-level details if peer groups are confidential.

17.2 Human Oversight

If STAT materially affects:

compliance filings
verifier routing
AI retraining authorization
public-facing trust displays

then review gates should apply according to ZAYAZ’s existing governance approach for higher-risk AI logic.

17.3 EU AI Act Alignment

STAT itself is not a full AI model in all cases, but once combined with automated decision routing, trust scoring, or self-healing actions, it enters the governed AI/decision-support zone and should inherit AIGS/AI governance controls.

Data Model Additions

Recommended tables:

stat_engine_registry

engine_id
version
status
supported_modes
supported_test_families
default_config
mode_docs_url

stat_hypothesis_templates

As above.

stat_significance_profiles

As above.

stat_run_log

Execution records.

stat_benchmark_profiles

Defines peer group construction logic.

stat_signal_policy_map

Maps signal IDs to allowed tests, thresholds, peer strategies, and escalation policies.

19. API Draft

POST /api/mice/stat/run

Executes a statistical inference run.

POST /api/mice/stat/explain

Returns user / verifier / board phrasing for a completed run.

GET /api/mice/stat/run/{run_id}

Returns audit-grade run details.

POST /api/mice/stat/batch

Batch mode for portfolio scans, annual pre-checks, or verifier preparation.

POST /api/mice/stat/validate-profile

Validates significance profiles and test mappings before activation.

20. MVP Scope

Phase 1 MVP

Start with:

plausibility mode
comparative mode
drift mode
robust z-score
Welch t-test
Mann–Whitney U
chi-square
bootstrap CI
descriptive fallback
ALTD logging
VTE evidence output
ZARA explanation strings

Phase 2

Add:

Bayesian comparison
changepoint detection
intervention impact mode
multiple testing correction profiles
verifier evidence packets
dashboard components

Phase 3

Add:

adaptive priors by NACE/geography
federated peer baselines
cross-entity anomaly propagation
automated materiality significance models

APPENDIX A - ESRS Metrics Most Likely to Benefit from Hypothesis Testing

Highest-value metric families

A. Climate and energy

Best candidates because they are numeric, recurring, comparable, and often benchmarkable.

Scope 1 emissions
Scope 2 emissions
Scope 3 category values
electricity consumption
fuel consumption
energy intensity ratios
renewable energy share

Why: high recurrence, strong peer comparability, good anomaly-detection value.

B. Water

total withdrawal
discharge volumes
recycled/reused water share
water intensity per unit output

Why: site-level trend testing and peer comparison are often valuable.

C. Waste and circularity

hazardous waste
non-hazardous waste
waste diverted from disposal
recycling rates
recovery rates
material efficiency ratios

Why: distributions are often skewed, so robust methods are useful.

D. Workforce / S metrics

injury frequency
absenteeism
training completion
diversity ratios
turnover

Why: less suited to raw outlier testing than climate data, but useful for proportions and trend shifts.

E. Governance / process metrics

Often lower statistical value unless repeated over many entities or periods:

policy coverage
training completion
incident counts
whistleblower case patterns

Why: many are binary or low-frequency, so use categorical or proportion-based methods only.

Best fit categories for Phase 1

energy and emissions
water
waste
selected workforce ratios

Lower-value categories for initial rollout

narrative disclosures
one-time governance statements
policy existence flags
low-frequency event fields

These are better handled by rule logic, traceability, and document validation than by formal hypothesis tests.

APPENDIX B - Draft “Statistical Trust Score” Layer for VTE

Purpose

Convert statistical evidence into a bounded contribution to trust, without letting statistics dominate source quality, verification state, or audit provenance.

Principle

STAT informs trust; it does not own trust.

Proposed subscore

Create a VTE-compatible subscore:

statistical_trust_component = 0.00 to 1.00

Inputs

test decision strength
effect size magnitude
confidence / posterior support
sample adequacy
assumption quality
data completeness
peer-group relevance
repeat anomaly history
whether value is estimated or directly observed

Example weighted model

STC =
0.20 * decision_strength
+ 0.15 * effect_size_quality
+ 0.15 * sample_adequacy
+ 0.10 * assumption_quality
+ 0.10 * completeness_quality
+ 0.10 * peer_group_fit
+ 0.10 * source_integrity_interaction
+ 0.10 * anomaly_history_modifier

Interpretation

0.85–1.00 statistically well-supported
0.65–0.84 acceptable / monitor
0.40–0.64 weak statistical confidence
0.00–0.39 statistically problematic / escalate

Decision strength mapping

Example:

supports_null strongly: 0.95
inconclusive: 0.55
rejects_null moderately: 0.35
rejects_null strongly with large effect: 0.15

This sounds inverted, but the point is trust falls when the statistical evidence suggests abnormality relative to expectation.

Guardrails

never reduce trust purely from one weak test
require stronger effect for high-volatility metrics
reduce penalty when source is verified and benchmark fit is weak
increase penalty for repeated anomalies across periods
cap maximum trust delta from STAT alone, for example ±0.15 per run

Recommended VTE integration

Final VTE score might combine:

source provenance
structural validation
verifier status
historical consistency
statistical trust component
AI-origin penalty/adjustment

That aligns well with ZAYAZ’s trust-centric architecture and explainable validation model.

APPENDIX C - ZAYAZ Statistical Inference Layer

Technical Implementation Pack v0.1

C.1. Scope of this package

This package defines five implementation layers:

SQL table schemas
JSON schemas for API input/output
SSSR field additions
VTE integration logic
Example statistical test designs for 5 ESRS-relevant metric families

This is an architectural draft, not a locked final schema. The goal is to make the first implementation:

auditable
modular
backward-compatible
explainable
safe to deploy in stages

C.2. Core architecture overview

C.2.1. Proposed runtime flow

FOGE / API / Import / Telemetry / Verifier Request
        ↓
      DICE
        ↓
   Rule Engine / ZADIF
        ↓
   MEID_STAT01_v1
        ↓
  Statistical Result Object
        ↓
    VTE Trust Logic
        ↓
 ZARA / ZAAM Explanation Layer
        ↓
 ALTD / Audit + Reports Hub

C.2.2. Design principle

STAT should behave like a governed evidence engine, not a black-box scoring engine.

It should:

evaluate statistical consistency
package uncertainty explicitly
return bounded evidence objects
avoid direct final decisions where governance requires human review

That matches the broader ZAYAZ governance and trust philosophy already described in the manuals.

C.3. SQL schema pack

Below is a practical relational design for Postgres-style deployment.

C.3.1. stat_engine_registry

Purpose: register statistical engines and supported modes.

CREATE TABLE stat_engine_registry (
    engine_id                VARCHAR(50) PRIMARY KEY,
    readable_name            VARCHAR(255) NOT NULL,
    version                  VARCHAR(30) NOT NULL,
    status                   VARCHAR(30) NOT NULL CHECK (status IN ('active', 'experimental', 'deprecated', 'archived')),
    supported_modes          JSONB NOT NULL,
    supported_test_families  JSONB NOT NULL,
    default_config           JSONB,
    mode_docs_url            TEXT,
    created_at               TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at               TIMESTAMP NOT NULL DEFAULT NOW()
);

Example row

{
  "engine_id": "MEID_STAT01_v1",
  "readable_name": "Statistical Inference Engine",
  "version": "1.0.0",
  "status": "experimental",
  "supported_modes": ["plausibility", "comparative", "drift", "impact"],
  "supported_test_families": ["welch_t", "mann_whitney_u", "chi_square", "bootstrap_ci", "robust_zscore"]
}

C.3.2. stat_hypothesis_templates

Purpose: standardized test logic by metric family / signal type.

CREATE TABLE stat_hypothesis_templates (
    template_id                   VARCHAR(60) PRIMARY KEY,
    template_name                 VARCHAR(255) NOT NULL,
    signal_type                   VARCHAR(100) NOT NULL,
    metric_family                 VARCHAR(100),
    default_test_family           VARCHAR(80) NOT NULL,
    fallback_test_family          VARCHAR(80),
    null_hypothesis_text          TEXT NOT NULL,
    alternative_hypothesis_text   TEXT NOT NULL,
    assumptions                   JSONB,
    default_alpha                 NUMERIC(6,5) NOT NULL DEFAULT 0.05,
    default_effect_size_floor     NUMERIC(8,4),
    bayesian_supported            BOOLEAN NOT NULL DEFAULT FALSE,
    bootstrap_supported           BOOLEAN NOT NULL DEFAULT TRUE,
    effect_size_required          BOOLEAN NOT NULL DEFAULT TRUE,
    explainability_template_id    VARCHAR(60),
    verifier_template_id          VARCHAR(60),
    status                        VARCHAR(30) NOT NULL CHECK (status IN ('active', 'deprecated', 'draft', 'archived')),
    version                       VARCHAR(20) NOT NULL,
    created_at                    TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at                    TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.3. stat_significance_profiles

Purpose: policy-based thresholds rather than hardcoded alpha values.

CREATE TABLE stat_significance_profiles (
    profile_id                         VARCHAR(60) PRIMARY KEY,
    profile_name                       VARCHAR(255) NOT NULL,
    alpha_default                      NUMERIC(6,5) NOT NULL,
    minimum_sample_size                INTEGER,
    effect_size_floor                  NUMERIC(8,4),
    bayesian_probability_threshold     NUMERIC(6,5),
    multiple_testing_policy            VARCHAR(50),
    confidence_interval_level          NUMERIC(6,5) DEFAULT 0.95,
    assumption_failure_policy          VARCHAR(50) NOT NULL DEFAULT 'fallback',
    inconclusive_policy                VARCHAR(50) NOT NULL DEFAULT 'no_penalty',
    verifier_review_required           BOOLEAN NOT NULL DEFAULT FALSE,
    human_approval_required            BOOLEAN NOT NULL DEFAULT FALSE,
    high_risk_override                 JSONB,
    status                             VARCHAR(30) NOT NULL CHECK (status IN ('active', 'deprecated', 'draft', 'archived')),
    created_at                         TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at                         TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.4. stat_benchmark_profiles

Purpose: define peer groups and benchmark construction logic.

CREATE TABLE stat_benchmark_profiles (
    benchmark_profile_id           VARCHAR(60) PRIMARY KEY,
    benchmark_name                 VARCHAR(255) NOT NULL,
    scope_type                     VARCHAR(50) NOT NULL CHECK (scope_type IN ('sector', 'geography', 'size_band', 'client_portfolio', 'custom')),
    nace_codes                     JSONB,
    geographies                    JSONB,
    size_bands                     JSONB,
    reporting_frameworks           JSONB,
    signal_filters                 JSONB,
    inclusion_rules                JSONB,
    exclusion_rules                JSONB,
    minimum_peer_count             INTEGER NOT NULL DEFAULT 20,
    freshness_days                 INTEGER,
    confidentiality_policy         VARCHAR(50) NOT NULL DEFAULT 'aggregate_only',
    status                         VARCHAR(30) NOT NULL CHECK (status IN ('active', 'draft', 'deprecated', 'archived')),
    created_at                     TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at                     TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.5. stat_signal_policy_map

Purpose: map individual signals to statistical policies.

CREATE TABLE stat_signal_policy_map (
    signal_id                        VARCHAR(120) PRIMARY KEY,
    stat_test_eligible               BOOLEAN NOT NULL DEFAULT FALSE,
    preferred_mode                   VARCHAR(50),
    preferred_test_family            VARCHAR(80),
    fallback_test_family             VARCHAR(80),
    hypothesis_template_id           VARCHAR(60),
    significance_profile_id          VARCHAR(60),
    benchmark_profile_id             VARCHAR(60),
    expected_distribution_type       VARCHAR(50),
    minimum_sample_size              INTEGER,
    requires_effect_size             BOOLEAN NOT NULL DEFAULT TRUE,
    multiple_testing_group           VARCHAR(100),
    escalation_policy_id             VARCHAR(60),
    explainability_template_id       VARCHAR(60),
    verifier_packet_required         BOOLEAN NOT NULL DEFAULT FALSE,
    created_at                       TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at                       TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT fk_stat_hypothesis_template
        FOREIGN KEY (hypothesis_template_id) REFERENCES stat_hypothesis_templates(template_id),
    CONSTRAINT fk_stat_significance_profile
        FOREIGN KEY (significance_profile_id) REFERENCES stat_significance_profiles(profile_id),
    CONSTRAINT fk_stat_benchmark_profile
        FOREIGN KEY (benchmark_profile_id) REFERENCES stat_benchmark_profiles(benchmark_profile_id)
);

C.3.6. stat_run_log

Purpose: immutable log of each statistical execution.

CREATE TABLE stat_run_log (
    run_id                           VARCHAR(80) PRIMARY KEY,
    engine_id                        VARCHAR(50) NOT NULL,
    engine_version                   VARCHAR(30) NOT NULL,
    signal_id                        VARCHAR(120) NOT NULL,
    entity_id                        VARCHAR(80),
    reporting_period                 VARCHAR(40),
    mode                             VARCHAR(50) NOT NULL,
    initiated_by                     VARCHAR(80) NOT NULL,
    dataset_ref                      TEXT,
    benchmark_profile_id             VARCHAR(60),
    hypothesis_template_id           VARCHAR(60),
    significance_profile_id          VARCHAR(60),
    test_family_requested            VARCHAR(80),
    test_family_used                 VARCHAR(80),
    fallback_used                    BOOLEAN NOT NULL DEFAULT FALSE,
    null_hypothesis_text             TEXT,
    alternative_hypothesis_text      TEXT,
    sample_metadata                  JSONB,
    assumptions_metadata             JSONB,
    results_payload                  JSONB NOT NULL,
    decision_class                   VARCHAR(50) NOT NULL,
    trust_delta                      NUMERIC(8,4),
    escalation_triggered             BOOLEAN NOT NULL DEFAULT FALSE,
    escalation_reason                TEXT,
    human_review_required            BOOLEAN NOT NULL DEFAULT FALSE,
    human_review_status              VARCHAR(40),
    altd_logged                      BOOLEAN NOT NULL DEFAULT FALSE,
    created_at                       TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.7. stat_explainability_templates

Purpose: human-readable output templates for ZARA / ZAAM / verifiers.

CREATE TABLE stat_explainability_templates (
    template_id                      VARCHAR(60) PRIMARY KEY,
    audience_type                    VARCHAR(40) NOT NULL CHECK (audience_type IN ('user', 'verifier', 'board', 'internal_ops', 'agent')),
    language_code                    VARCHAR(10) NOT NULL DEFAULT 'en',
    template_text                    TEXT NOT NULL,
    severity_mapping                 JSONB,
    variable_schema                  JSONB,
    status                           VARCHAR(30) NOT NULL CHECK (status IN ('active', 'draft', 'deprecated', 'archived')),
    version                          VARCHAR(20) NOT NULL,
    created_at                       TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at                       TIMESTAMP NOT NULL DEFAULT NOW()
);

C.3.8. stat_test_catalog

Purpose: controlled list of allowed methods.

CREATE TABLE stat_test_catalog (
    test_family_id                   VARCHAR(80) PRIMARY KEY,
    readable_name                    VARCHAR(255) NOT NULL,
    class_type                       VARCHAR(50) NOT NULL,
    supports_small_samples           BOOLEAN NOT NULL DEFAULT FALSE,
    supports_non_normal              BOOLEAN NOT NULL DEFAULT FALSE,
    supports_missingness_robustness  BOOLEAN NOT NULL DEFAULT FALSE,
    supports_effect_size             BOOLEAN NOT NULL DEFAULT TRUE,
    supports_bootstrap               BOOLEAN NOT NULL DEFAULT FALSE,
    supports_bayesian                BOOLEAN NOT NULL DEFAULT FALSE,
    default_for_modes                JSONB,
    status                           VARCHAR(30) NOT NULL CHECK (status IN ('active', 'draft', 'deprecated', 'archived'))
);

C.4. JSON schema pack

Below is a practical API contract design.

C.4.1. Run request schema

{
  "type": "object",
  "required": ["engine_id", "mode", "signal_id", "initiated_by"],
  "properties": {
    "engine_id": { "type": "string", "enum": ["MEID_STAT01_v1"] },
    "mode": {
      "type": "string",
      "enum": ["plausibility", "comparative", "drift", "impact", "assurance"]
    },
    "signal_id": { "type": "string" },
    "entity_id": { "type": "string" },
    "reporting_period": { "type": "string" },
    "initiated_by": { "type": "string" },
    "dataset_ref": { "type": "string" },
    "benchmark_profile_id": { "type": "string" },
    "hypothesis_template_id": { "type": "string" },
    "significance_profile_id": { "type": "string" },
    "test_family_requested": { "type": "string" },
    "context": {
      "type": "object",
      "properties": {
        "source_mix": { "type": "array", "items": { "type": "string" } },
        "input_trust_score": { "type": "number", "minimum": 0, "maximum": 1 },
        "estimation_flag": { "type": "boolean" },
        "peer_group_override": { "type": "object" },
        "sample_metadata": { "type": "object" }
      }
    }
  }
}

C.4.2. Run response schema

{
  "type": "object",
  "required": [
    "run_id",
    "engine_id",
    "signal_id",
    "mode",
    "decision_class",
    "results",
    "impact",
    "audit"
  ],
  "properties": {
    "run_id": { "type": "string" },
    "engine_id": { "type": "string" },
    "engine_version": { "type": "string" },
    "signal_id": { "type": "string" },
    "entity_id": { "type": "string" },
    "mode": { "type": "string" },
    "test_family_used": { "type": "string" },
    "fallback_used": { "type": "boolean" },
    "decision_class": {
      "type": "string",
      "enum": [
        "supports_null",
        "rejects_null",
        "inconclusive",
        "insufficient_sample",
        "assumption_failure",
        "fallback_applied"
      ]
    },
    "results": {
      "type": "object",
      "properties": {
        "p_value": { "type": ["number", "null"] },
        "effect_size": { "type": ["number", "null"] },
        "confidence_interval": {
          "type": ["array", "null"],
          "items": { "type": "number" },
          "minItems": 2,
          "maxItems": 2
        },
        "posterior_exceedance_probability": { "type": ["number", "null"] },
        "test_statistic": { "type": ["number", "null"] },
        "assumption_fit": { "type": "string" }
      }
    },
    "impact": {
      "type": "object",
      "properties": {
        "trust_delta": { "type": ["number", "null"] },
        "risk_flag": { "type": "string" },
        "escalation_triggered": { "type": "boolean" },
        "recommended_action": { "type": "string" }
      }
    },
    "explainability": {
      "type": "object",
      "properties": {
        "user_message": { "type": "string" },
        "verifier_message": { "type": "string" },
        "board_message": { "type": "string" }
      }
    },
    "audit": {
      "type": "object",
      "properties": {
        "logged_to_altd": { "type": "boolean" },
        "timestamp_utc": { "type": "string" },
        "human_review_required": { "type": "boolean" }
      }
    }
  }
}

C.4.3. Batch request schema

{
  "type": "object",
  "required": ["engine_id", "mode", "initiated_by", "items"],
  "properties": {
    "engine_id": { "type": "string" },
    "mode": { "type": "string" },
    "initiated_by": { "type": "string" },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["signal_id"],
        "properties": {
          "signal_id": { "type": "string" },
          "entity_id": { "type": "string" },
          "reporting_period": { "type": "string" },
          "dataset_ref": { "type": "string" }
        }
      }
    }
  }
}

C.5. SSSR field additions

Because SSSR is the correct place for signal-level intelligence, routing metadata, and structured lookup behavior in ZAYAZ, the statistical layer should be attached there rather than scattered across engine configs.

C.5.1. New SSSR fields for statistical readiness

Add these fields to the signal metadata layer:

{
  "stat_test_eligible": true,
  "stat_priority_level": "high",
  "stat_metric_family": "climate_energy",
  "preferred_stat_mode": "plausibility",
  "recommended_test_families": ["welch_t", "bootstrap_ci", "robust_zscore"],
  "expected_distribution_type": "right_skewed",
  "minimum_sample_policy": {
    "preferred_min_n": 20,
    "absolute_min_n": 8
  },
  "benchmark_profile_id": "BENCH_NACE_C25_EU",
  "hypothesis_template_id": "HT_SCOPE2_PLAUS_001",
  "significance_profile_id": "SIGPROF_SCOPE2_STANDARD",
  "effect_size_required": true,
  "multiple_testing_group": "esrs_e1_energy",
  "stat_explainability_template_id": "STAT_USER_GENERIC_001",
  "verifier_packet_required": true,
  "stat_retest_cooldown_days": 30,
  "stat_escalation_policy_id": "ESC_STAT_HIGH_SCOPE2"
}

C.5.2. Strong recommendation

Do not add just one boolean like supports_statistics. That would be too weak. Use a structured object so:

routing stays deterministic
governance stays inspectable
benchmark strategies stay versioned

C.6. VTE integration logic

C.6.1. Principle

STAT should contribute a bounded trust evidence component into VTE, not replace provenance, document validation, or verifier approval.

C.6.2. Proposed VTE composition

Final_Trust_Score =
0.30 * provenance_component
+ 0.20 * structural_validation_component
+ 0.15 * verifier_component
+ 0.15 * historical_consistency_component
+ 0.10 * statistical_trust_component
+ 0.10 * ai_origin_adjustment_component

This is just a starting balance. For ZAYAZ, we should keep STAT at 10% to 15% max in early phases.

C.6.3. Statistical trust component formula

STC =
0.20 * decision_strength
+ 0.15 * effect_size_quality
+ 0.15 * sample_adequacy
+ 0.10 * assumption_quality
+ 0.10 * completeness_quality
+ 0.10 * peer_group_fit
+ 0.10 * source_integrity_interaction
+ 0.10 * anomaly_history_modifier

Normalize to 0.00–1.00.

C.6.4. Decision strength mapping

{
  "supports_null_strong": 0.95,
  "supports_null_moderate": 0.82,
  "inconclusive": 0.58,
  "rejects_null_moderate": 0.35,
  "rejects_null_strong": 0.15,
  "insufficient_sample": 0.50,
  "assumption_failure": 0.52
}

C.6.5. Trust delta rule

STAT should output both:

statistical_trust_component
suggested_trust_delta

Suggested rule:

suggested_trust_delta = (STC - 0.70) * 0.20

Then cap:

minimum delta: -0.15
maximum delta: +0.08

This prevents statistics from overpowering the total trust score.

C.6.6. Escalation thresholds

Example:

trust_delta <= -0.10 and effect_size >= 0.6 → verifier review
repeated anomaly 3 periods in a row → high-risk escalation
inconclusive + verified source → no penalty
assumption failure + missing benchmark → route to descriptive-only mode

C.6.7. Pseudocode

def compute_statistical_trust_component(run):
    decision_strength = map_decision_strength(run.decision_class, run.results)
    effect_size_quality = map_effect_size(run.results.get("effect_size"))
    sample_adequacy = map_sample_quality(run.sample_metadata)
    assumption_quality = map_assumption_fit(run.results.get("assumption_fit"))
    completeness_quality = map_completeness(run.sample_metadata)
    peer_group_fit = map_peer_group_fit(run.sample_metadata)
    source_integrity_interaction = map_source_integrity(run.context)
    anomaly_history_modifier = map_history(run.entity_id, run.signal_id)

    stc = (
        0.20 * decision_strength +
        0.15 * effect_size_quality +
        0.15 * sample_adequacy +
        0.10 * assumption_quality +
        0.10 * completeness_quality +
        0.10 * peer_group_fit +
        0.10 * source_integrity_interaction +
        0.10 * anomaly_history_modifier
    )

    return round(min(max(stc, 0.0), 1.0), 4)

C.7. Example implementation logic for 5 ESRS-relevant metric families

These are not legal ESRS interpretations. They are implementation archetypes for statistical support inside ZAYAZ.

C.7.1. Family A: Scope 2 electricity / energy emissions

Typical signals

electricity consumption
location-based Scope 2
market-based Scope 2
energy intensity ratio

Best modes

plausibility
comparative
drift

Preferred tests

Welch t-test
robust z-score
bootstrap confidence interval

Hypothesis example

H0: entity value is consistent with sector/geography peer baseline
H1: entity value differs materially from peer baseline

Signal policy example

{
  "signal_id": "ghg_scope2_market_based",
  "preferred_stat_mode": "plausibility",
  "recommended_test_families": ["welch_t", "bootstrap_ci", "robust_zscore"],
  "expected_distribution_type": "right_skewed",
  "minimum_sample_policy": {
    "preferred_min_n": 20,
    "absolute_min_n": 8
  },
  "effect_size_required": true,
  "verifier_packet_required": true
}

Notes

This is one of the strongest early candidates because it is recurring, numeric, and highly comparable.

C.7.2. Family B: Scope 3 business travel / upstream transport

Typical signals

flight emissions
travel activity values
freight emissions
transport intensity

Best modes

plausibility
comparative
impact

Preferred tests

Mann–Whitney U
bootstrap CI
changepoint detection for trends

Hypothesis example

H0: travel-related emission intensity is unchanged from prior operating profile
H1: a meaningful shift occurred

Special caution

These metrics can be structurally volatile. Therefore:

stronger effect-size thresholds
more tolerant anomaly penalties
more emphasis on trend context than one-off outliers

C.7.3. Family C: Water withdrawal / discharge

Typical signals

total water withdrawn
recycled water share
water intensity per production unit
discharge volume

Best modes

plausibility
comparative
impact

Preferred tests

Welch t-test
paired test for pre/post interventions
bootstrap CI

Hypothesis example

H0: water intensity after intervention is unchanged
H1: water intensity decreased meaningfully after intervention

High-value use

Very good for demonstrating measurable change after capex, policy, or operational changes.

C.7.4. Family D: Waste and circularity

Typical signals

hazardous waste
non-hazardous waste
diverted from disposal
recycled fraction
circular material use ratios

Best modes

plausibility
comparative
drift

Preferred tests

Mann–Whitney U
chi-square for disposal category proportions
bootstrap CI

Hypothesis example

H0: waste diversion pattern is consistent with prior validated pattern
H1: waste diversion pattern differs materially

Notes

This family is often skewed and operationally messy. Robust and non-parametric methods should dominate.

Typical signals

injury rate
lost-time incident rate
turnover
diversity proportions
training completion ratios

Best modes

comparative
drift
impact

Preferred tests

z-test for proportions
Fisher exact test
chi-square
change-point or rolling drift methods

Hypothesis example

H0: injury rate proportion is consistent with prior baseline
H1: injury rate changed materially

Notes

For social metrics, category and rate tests matter more than continuous-value comparisons.

C.8. Example seeded records

C.8.1. Example significance profile

{
  "profile_id": "SIGPROF_SCOPE2_STANDARD",
  "profile_name": "Scope 2 Standard Statistical Review",
  "alpha_default": 0.01,
  "minimum_sample_size": 12,
  "effect_size_floor": 0.40,
  "bayesian_probability_threshold": 0.95,
  "multiple_testing_policy": "benjamini_hochberg",
  "confidence_interval_level": 0.95,
  "assumption_failure_policy": "fallback",
  "inconclusive_policy": "no_penalty",
  "verifier_review_required": false,
  "human_approval_required": false
}

C.8.2. Example hypothesis template

{
  "template_id": "HT_SCOPE2_PLAUS_001",
  "template_name": "Scope 2 Peer Plausibility Check",
  "signal_type": "ghg_emission",
  "metric_family": "climate_energy",
  "default_test_family": "welch_t",
  "fallback_test_family": "bootstrap_ci",
  "null_hypothesis_text": "The reported Scope 2 value is consistent with the expected peer baseline for comparable entities.",
  "alternative_hypothesis_text": "The reported Scope 2 value differs materially from the expected peer baseline for comparable entities.",
  "default_alpha": 0.01,
  "default_effect_size_floor": 0.40,
  "bayesian_supported": true,
  "bootstrap_supported": true,
  "effect_size_required": true
}

C.9. API endpoint draft

POST /api/mice/stat/run

Example request

{
  "engine_id": "MEID_STAT01_v1",
  "mode": "plausibility",
  "signal_id": "ghg_scope2_market_based",
  "entity_id": "eco196123456789",
  "reporting_period": "2025",
  "initiated_by": "dice_auto_rule",
  "dataset_ref": "zar://dataset/scope2/2025/entity/eco196123456789",
  "benchmark_profile_id": "BENCH_NACE_C25_EU",
  "hypothesis_template_id": "HT_SCOPE2_PLAUS_001",
  "significance_profile_id": "SIGPROF_SCOPE2_STANDARD",
  "context": {
    "source_mix": ["erp", "invoice"],
    "input_trust_score": 0.86,
    "estimation_flag": false
  }
}

Example response

{
  "run_id": "statrun-000001",
  "engine_id": "MEID_STAT01_v1",
  "engine_version": "1.0.0",
  "signal_id": "ghg_scope2_market_based",
  "entity_id": "eco196123456789",
  "mode": "plausibility",
  "test_family_used": "welch_t",
  "fallback_used": false,
  "decision_class": "rejects_null",
  "results": {
    "p_value": 0.0041,
    "effect_size": 0.68,
    "confidence_interval": [0.21, 0.49],
    "posterior_exceedance_probability": 0.973,
    "test_statistic": 2.91,
    "assumption_fit": "moderate"
  },
  "impact": {
    "trust_delta": -0.11,
    "risk_flag": "high",
    "escalation_triggered": true,
    "recommended_action": "verifier_review"
  },
  "explainability": {
    "user_message": "This value appears statistically unusual compared with similar entities in the selected benchmark.",
    "verifier_message": "Observed Scope 2 value materially exceeds peer baseline under the active profile.",
    "board_message": "A statistically significant deviation has been detected and should be reviewed before final disclosure."
  },
  "audit": {
    "logged_to_altd": true,
    "timestamp_utc": "2026-04-04T09:12:14Z",
    "human_review_required": false
  }
}

C.10. Governance controls

Because ZAYAZ already has a formal AI governance charter, validation SOP, retraining log model, and risk register concept, the STAT engine should be onboarded through that same discipline rather than introduced as an informal utility.

Required controls for launch

register MEID_STAT01_v1 in engine registry
assign risk level
define validation frequency
define fallback and failure policies
require ALTD logging for material runs
define human-review thresholds
define statistical method approval list
prohibit silent threshold changes

Recommended initial risk classification

Medium for passive advisory/statistical evidence
High if directly driving trust score changes for compliance-critical disclosures
High if used in automated verifier escalation or AI self-healing actions

C.11. Recommended rollout sequence

Phase 0

Schema-only

create tables
seed 3 test families
seed 2 significance profiles
add SSSR metadata fields
no user-facing outputs yet

Phase 1

Passive evidence mode

run STAT after DICE for selected climate/energy signals
write outputs to ALTD
do not alter visible trust score yet
expose only to internal ops and verifier sandbox

Phase 2

Bounded VTE integration

allow limited trust delta
enable ZARA explanations
enable internal dashboard flags

Phase 3

Verifier-facing support

assurance packets
review queues
batch scans before report export

Phase 4

Advanced modes

Bayesian support
intervention-effect mode
drift support for AI governance and telemetry

C.12. Recommended first seeded signal set

I would begin with 12 to 20 signals max.

Best first set:

Scope 2 market-based emissions
Scope 2 location-based emissions
electricity consumption
fuel consumption
energy intensity
water withdrawal
water intensity
hazardous waste
non-hazardous waste
waste diversion rate
LTIR or equivalent injury rate
employee turnover ratio

This is enough to validate the architecture without creating test sprawl.

C.13. Final architecture recommendation

The cleanest long-term pattern is this:

SSSR owns eligibility and mapping
STAT owns inference
VTE owns trust interpretation
ZARA/ZAAM own explanation
ALTD owns evidence trail
AI governance owns approval boundaries

That keeps ZAYAZ modular, future-proof, and defensible under audit and regulatory scrutiny. It also fits the platform’s existing decomposition into registries, agents, trust layers, micro-engines, and governed workflows.

GitHub Repo Request for Change (RFC)

1. Identity​

Background​

🧠 Where Hypothesis Testing Fits in ZAYAZ (Strategic View)​

🔧 Core Use Cases (High-Impact)​

1. 📊 Data Validation & Anomaly Detection (DICE + VTE Enhancement)​

2. 🔍 Scope 3 Estimation Validation (SEM + Bayesian Engines)​

3. 🏭 Supplier / Value Chain Benchmarking Engine​

4. 📈 Impact Measurement & ESG Strategy Validation​

5. 🧪 AI Governance & Model Validation (CRITICAL)​

6. 🎯 Materiality & Stakeholder Intelligence (SEEL)​

🧱 Architectural Implementation (ZAYAZ-Native)​

🚀 Strategic Advantage (This is the Big One)​

1. Engine Identity​

2. Strategic Purpose​

3. Placement in ZAYAZ Architecture​

3.1 Role in MICE​

3.2 Architectural Dependencies​

3.3 Positioning​

4. Core Objectives​

5. Supported Use Cases​

5.1 ESG Data Quality​

5.2 Scope 1, 2, 3 Emissions​

5.3 Materiality and Stakeholder Signals​

5.4 AI Governance​

5.5 Verification Support​

6. Operating Principles​

6.1 Precision Before Automation​

6.2 No Silent Statistical Decisions​

6.3 Test Appropriateness First​

6.4 Explainability Layer Required​

7. Supported Test Families​

7.1 Baseline Statistical Families​

7.2 Bayesian / Robust Families​

7.3 Special ZAYAZ Modes​

8. Input Contract​

8.2 Input Sources​

8.3 Required Metadata from SSSR​

9. Hypothesis Template Registry​

10. Significance Profiles​

11. Output Contract​

11.1 Standard Output​

11.2 Output Classes​

12. Routing and Invocation Logic​

12.1 Trigger Sources​

12.2 ZADIF / Rule Engine Invocation​

13. Interaction with DICE and VTE​

13.1 DICE​

13.2 VTE​

14. Interaction with ZARA and ZAAM​

14.1 ZARA​

14.2 ZAAM​

15. Audit and Governance Requirements​

16. Failure Modes and Fallbacks​

16.1 Failure Classes​

16.2 Fallback Policy​

17. Security and Compliance​

17.1 Privacy​

17.2 Human Oversight​

17.3 EU AI Act Alignment​

19. API Draft​

20. MVP Scope​

APPENDIX A - ESRS Metrics Most Likely to Benefit from Hypothesis Testing​

Highest-value metric families​

APPENDIX B - Draft “Statistical Trust Score” Layer for VTE​

Decision strength mapping​

APPENDIX C - ZAYAZ Statistical Inference Layer​

C.1. Scope of this package​

C.2. Core architecture overview​

C.2.1. Proposed runtime flow​

C.2.2. Design principle​

C.3. SQL schema pack​

C.3.1. stat_engine_registry​

C.3.2. stat_hypothesis_templates​

C.3.3. stat_significance_profiles​

C.3.4. stat_benchmark_profiles​

C.3.5. stat_signal_policy_map​

C.3.6. stat_run_log​

C.3.7. stat_explainability_templates​

C.3.8. stat_test_catalog​

C.4. JSON schema pack​

1. Identity

Background

🧠 Where Hypothesis Testing Fits in ZAYAZ (Strategic View)

🔧 Core Use Cases (High-Impact)

1. 📊 Data Validation & Anomaly Detection (DICE + VTE Enhancement)

2. 🔍 Scope 3 Estimation Validation (SEM + Bayesian Engines)

3. 🏭 Supplier / Value Chain Benchmarking Engine

4. 📈 Impact Measurement & ESG Strategy Validation

5. 🧪 AI Governance & Model Validation (CRITICAL)

6. 🎯 Materiality & Stakeholder Intelligence (SEEL)

🧱 Architectural Implementation (ZAYAZ-Native)

🚀 Strategic Advantage (This is the Big One)

1. Engine Identity

2. Strategic Purpose

3. Placement in ZAYAZ Architecture

3.1 Role in MICE

3.2 Architectural Dependencies

3.3 Positioning

4. Core Objectives

5. Supported Use Cases

5.1 ESG Data Quality

5.2 Scope 1, 2, 3 Emissions

5.3 Materiality and Stakeholder Signals

5.4 AI Governance

5.5 Verification Support

6. Operating Principles

6.1 Precision Before Automation

6.2 No Silent Statistical Decisions

6.3 Test Appropriateness First

6.4 Explainability Layer Required

7. Supported Test Families

7.1 Baseline Statistical Families

7.2 Bayesian / Robust Families

7.3 Special ZAYAZ Modes

8. Input Contract

8.2 Input Sources

8.3 Required Metadata from SSSR

9. Hypothesis Template Registry

10. Significance Profiles

11. Output Contract

11.1 Standard Output

11.2 Output Classes

12. Routing and Invocation Logic

12.1 Trigger Sources

12.2 ZADIF / Rule Engine Invocation

13. Interaction with DICE and VTE

13.1 DICE

13.2 VTE

14. Interaction with ZARA and ZAAM

14.1 ZARA

14.2 ZAAM

15. Audit and Governance Requirements

16. Failure Modes and Fallbacks

16.1 Failure Classes

16.2 Fallback Policy

17. Security and Compliance

17.1 Privacy

17.2 Human Oversight

17.3 EU AI Act Alignment

19. API Draft

20. MVP Scope

APPENDIX A - ESRS Metrics Most Likely to Benefit from Hypothesis Testing

Highest-value metric families

APPENDIX B - Draft “Statistical Trust Score” Layer for VTE

Decision strength mapping

APPENDIX C - ZAYAZ Statistical Inference Layer

C.1. Scope of this package

C.2. Core architecture overview

C.2.1. Proposed runtime flow

C.2.2. Design principle

C.3. SQL schema pack

C.3.1. stat_engine_registry

C.3.2. stat_hypothesis_templates

C.3.3. stat_significance_profiles

C.3.4. stat_benchmark_profiles

C.3.5. stat_signal_policy_map

C.3.6. stat_run_log

C.3.7. stat_explainability_templates

C.3.8. stat_test_catalog

C.4. JSON schema pack