AIIL-DCM
Data & Compliance Models
1. Global Disclosure Ontology (GDO)
1.1. Purpose & Role
The Global Disclosure Ontology (GDO) is the unifying data model of ZAYAZ. Its role is to:
- Harmonize disclosure requirements across ESRS, ISSB, SEC, GRI, and voluntary frameworks.
- Provide a single schema that all ingestion, retrieval, computation, and reporting workflows map to.
- Enable crosswalks between frameworks, so one disclosure can be repurposed or reconciled across multiple jurisdictions.
- Ensure version control and provenance for every disclosure node, so regulators, auditors, and stakeholders know exactly which standard, version, and clause is being referenced.
The GDO is the source of truth that allows ZAYAZ AI to deliver regulator-ready outputs without fragmentation.
1.2. Core Components
-
Entities
- Company: organizational unit, subsidiaries, reporting boundaries.
- Activity: NACE/ISIC-coded economic activity (basis for EU taxonomy alignment).
- Value Chain: upstream/downstream partners, supply chain tiers.
- Geography: country, region, jurisdiction-specific disclosure requirements.
-
Disclosure Nodes
- Requirement: Core element (e.g., ESRS E1-6, ISSB S2-14, SEC 1500).
- Sub-requirement: Application Requirement, methodology, or calculation guidance.
- Illustrative Guidance: Optional/non-binding guidance.
- Crosswalk Links: connections across frameworks (e.g., ESRS E1-6 ↔ ISSB S2-14).
-
Metrics & Indicators
- Quantitative: GHG emissions, energy use, workforce counts, pay ratios.
- Qualitative: governance policies, biodiversity management, human rights commitments.
- Computed: lifecycle impacts, carbon intensity, transition finance KPIs.
-
Metadata Layer
- Framework ID (ESRS, ISSB, SEC, GRI).
- Version ID (e.g., ESRS July 2025 amendments).
- Jurisdiction (EU, global, US).
- Source hash (cryptographic checksum).
- Assurance status (audited, self-declared, verifier-attested).
1.3. Crosswalks
Crosswalks are the translation layer that connects standards.
- One-to-One: ESRS DR ↔ ISSB equivalent requirement (direct mapping).
- One-to-Many: One disclosure requirement in ESRS may satisfy multiple GRI disclosures.
- Many-to-One: Several SEC items may correspond to one ESRS disclosure node.
- No Direct Mapping: Ontology flags gaps where disclosures cannot be reconciled, highlighting compliance risks.
Crosswalks are stored in machine-readable tables, so ZAYAZ can generate reconciliation tables and support dual-reporting clients.
1.4. Implementation
- Schema Format: JSON Schema + Graph Database (Neo4j/JanusGraph).
- Storage: Each disclosure node is stored as a graph node with relationships:
isPartOf(requirement hierarchy)mapsTo(crosswalk)requires(data dependency)
- APIs:
/gdo/resolve: Resolve a disclosure requirement to its canonical node./gdo/compare: Compare requirements across frameworks./gdo/version: Retrieve schema for a given framework version.
1.5. Governance
- Versioning: Every ontology update is tied to a framework version (e.g., ESRS July 2025 → GDO v1.3).
- Change Management: Changes logged, reviewed by compliance experts, validated against ingestion pipelines.
- Auditability: All nodes tagged with source_hash for immutable provenance.
- Stakeholder Feedback: Verifiers and customers can propose ontology improvements; changes tracked via GitOps workflow.
1.6. Other Features
Automated Crosswalk Updates: AI-driven mapping suggestions when new standards released. Sector-Specific Overlays: Add industry-specific disclosure layers (e.g., SASB industries, EU taxonomy sectors). Machine-Readable Assurance: Ontology integrates with XBRL/iXBRL tags from EFRAG digital taxonomy. Scenario Adaptation: GDO used not only for compliance but also for forward-looking scenario modeling (climate risk, biodiversity pathways).
The Global Disclosure Ontology ensures that ZAYAZ AI speaks a single, unified compliance language, even as frameworks proliferate and evolve. It is the semantic anchor for all reporting, analysis, and assurance.
2. Domain-Specific Data Models
2.1. Purpose & Role
Domain-specific data models provide the granular structures for climate, environmental, social, and governance disclosures. They define how raw data, metrics, and qualitative information are captured, validated, and transformed into disclosure-ready outputs.
This chapter also introduces the Computation Hub and its AI–Compute Contract, ensuring that all numbers presented by ZAYAZ AI are computed via validated models and datasets, never fabricated by the LLM.
2.2. Climate & Environmental Models
The climate domain is the most computation-heavy area of ESG.
Core Metrics
- GHG Emissions:
- Scope 1: Direct (stationary combustion, process emissions).
- Scope 2: Indirect from purchased energy (location- and market-based).
- Scope 3: Upstream/downstream, supplier and customer activities.
- Energy Metrics: Total consumption, renewable vs non-renewable split, efficiency ratios.
- Water & Waste: Withdrawal, discharge, recycling; hazardous vs non-hazardous waste.
- Biodiversity & Land Use: Land footprint, restoration areas, ecosystem services dependencies.
Data Sources
- IPCC Emission Factor Database (EFDB).
- EU Taxonomy Technical Screening Criteria.
- Company operational data, supplier disclosures, verifier attestations.
2.3. Social Models
Core Metrics
- Workforce: Headcount, diversity (gender, age, ethnicity), contract type.
- Health & Safety: Lost-time injury frequency rate (LTIFR), fatalities, incident reporting.
- Human Rights: Policy adoption, grievance mechanisms, remediation actions.
- Supply Chain: Transparency across tiers, child/forced labor risk screening, country-level exposure.
Data Sources
- ILO standards and conventions.
- Supplier codes of conduct.
- Whistleblower systems and grievance logs.
2.4. Governance Models
Core Metrics
- Board Composition: Independence, diversity, ESG oversight responsibilities.
- Executive Remuneration: ESG-linked pay, pay ratio disclosures.
- Anti-Corruption: Policies, investigations, training completion rates.
- Risk Management: Integration of ESG into enterprise risk frameworks.
Data Sources
- Corporate governance codes (e.g., OECD Principles).
- SEC filings (proxy statements, 10-K).
- EU directives on board oversight.
2.5. Computation Hub & AI–Compute Contract
The Computation Hub is the numerical backbone of ZAYAZ. It is a modular system that executes validated computational models (Bayesian, Monte Carlo, LCA, sector-specific calculators) to generate quantitative disclosures.
AI–Compute Contract
The contract defines the strict boundary between AI reasoning and numerical outputs:
-
No Fabrication
- The LLM is prohibited from generating raw numbers.
- All quantitative answers must originate from a validated computation module.
-
Mandatory Tool Invocation
- When a disclosure requires numbers (e.g., Scope 3 emissions), the LLM must call the Computation Hub via API.
- Example:
{
"tool": "ghg_scope3_calculator",
"inputs": {
"activity_data": "...",
"emission_factors": "IPCC:2023-EFDB-v5"
}
}
-
Provenance & Transparency
- Every computed result includes:
- Model ID and version.
- Input dataset hashes.
- Assumptions applied (e.g., allocation methods).
- Outputs tagged with metadata for auditability.
- Every computed result includes:
-
Audit Logs
- Each computation request logged with trace ID.
- Stored alongside RAG retrieval logs and behavioral calibration IDs.
- Enables regulators/auditors to reproduce the exact number using the same model and dataset version.
2.6. Governance & Validation
- Model Validation: Every Computation Hub module is peer-reviewed, tested against reference datasets, and version-controlled.
- Assumption Disclosure: Each calculation is accompanied by a clear note on methodological assumptions.
- Critical/Blocker Rules: If mandatory data missing (e.g., activity data), computation halts and AI responds with refusal:“Required input missing — manual completion required.”
- Audit Readiness: All computations aligned to ISO 14064 (GHG accounting) and EU CSRD assurance expectations.
Other Features
- Automated Data Feeds: Direct ingestion of energy bills, IoT sensor data, supplier portals.
- Scenario Models: Link to NGFS climate scenarios and biodiversity pathways.
- Finance Linkage: Carbon pricing, transition finance modeling, double materiality integration.
- Verifier Integration: Third-party assurance providers can plug into the Computation Hub for independent recalculation.
With the Computation Hub and AI–Compute Contract, ZAYAZ ensures that narratives are generated by AI, but numbers are always computed by validated models — providing a regulator-ready safeguard against hallucination or fabrication.
3. Reference Data Integration
3.1. Purpose & Role
Reference data integration ensures that all disclosures, metrics, and computations in ZAYAZ are aligned with authoritative external taxonomies and datasets.
Its role is to:
- Provide standardized classifications for sectors, geographies, and activities.
- Ensure emission factors and benchmarks are consistent with international authorities.
- Enable cross-report comparability by mapping company-specific disclosures into globally recognized reference systems.
3.2. NACE Codes & Economic Activity Mapping
Purpose NACE (EU statistical classification of economic activities) is the backbone for EU Taxonomy alignment and sector-specific ESRS disclosures.
Implementation
- Each company activity is tagged with one or more NACE codes.
- Codes linked to:
- ESRS sector guidance (e.g., Energy vs Manufacturing vs Agriculture).
- EU Taxonomy Technical Screening Criteria.
- Stored as part of the Global Disclosure Ontology (GDO) → entity.activity.nace_code.
Use Cases
- Identify which disclosures are mandatory for a given sector.
- Sector-specific intensity metrics (e.g., emissions per ton steel vs emissions per MWh).
- Regulatory overlays (e.g., EU CBAM reporting linked to specific NACE activities).
3.3. Countries & Regional Data
Purpose Country and regional mappings enable jurisdiction-aware disclosures, supply chain transparency, and carbon intensity normalization.
Implementation
- Country ISO Codes stored as canonical references.
- Each country entry enriched with:
- Carbon intensity of electricity grid (IEA/Eurostat data).
- Jurisdictional reporting rules (EU vs US vs APAC).
- Risk flags (human rights risks, biodiversity stress).
- Used by Computation Hub for Scope 2 calculations and supply chain risk scoring.
Use Cases
- Location-based Scope 2 emissions (EU grid vs China grid).
- Regional workforce disclosures (e.g., gender pay gaps in UK vs EU vs US).
- Supply chain exposure heatmaps.
3.4. IPCC Emission Factor Database (EFDB)
Purpose The IPCC EFDB provides authoritative emission factors for GHG accounting, ensuring ZAYAZ numbers are scientifically consistent with global climate methodologies.
Implementation
- EFDB imported and versioned in ZAYAZ data lake.
- Linked to Computation Hub modules (e.g., Scope 1 combustion emissions, Scope 3 purchased goods).
- Referenced by emission_factor_id in Computation Hub audit logs.
Use Cases
- Automatic emission factor lookup during Scope 3 computations.
- Verification of reported emission factors against IPCC defaults.
- Reconciliation between corporate assumptions and authoritative references.
3.5. Verifier Attestations & Assurance Data
Purpose
- Integrate third-party verification and assurance evidence directly into the ZAYAZ data model.
Implementation
- Each disclosure tagged with assurance_status:
- Self-declared.
- Limited assurance.
- Reasonable assurance.
- Verifier reports ingested as structured data, linked to disclosure nodes.
- Attestation metadata stored: verifier ID, scope of assurance, countries, methodology used.
- Note: A verifier must be licensed by ZAYAZ to verify in one or more countries based on the local requirements. Only verifiers qualified for a specific country will show up as an approved verifier for clients reporting in that specific country.
Use Cases
- Distinguish between verified vs unverified disclosures in reports.
- Provide auditors with direct access to attestation documents.
- Automate assurance coverage metrics (% of disclosures covered).
3.6. Governance & Provenance
- Version Control: Each reference dataset (NACE, EFDB, country data) versioned and timestamped.
- Cryptographic Hashing: Every external dataset stored with a checksum for audit reproducibility.
- Update Cycle: Automatic ingestion when authorities release updates (e.g., Eurostat NACE revisions, IPCC EFDB updates).
- Crosswalks: References linked into GDO to maintain interoperability across frameworks.
3.7. Other Features
- Other Taxonomies: Integration of NAICS (North America), ANZSIC (Australia/NZ), and CIIU (LatAm).
- Sectoral Pathways: Link to IEA Net Zero scenarios, NGFS stress testing datasets.
- Nature/Biodiversity Datasets: Integration of GBF (Kunming-Montreal Global Biodiversity Framework) metrics.
- Verifier APIs: Direct API connections to assurance providers for near real-time verification.
4. Data Validation & Assurance Contracts
4.1. Purpose & Role
The Data Validation & Assurance Contracts layer ensures that all disclosures in ZAYAZ are:
- Complete (no missing mandatory fields).
- Consistent (units, boundaries, methodologies correctly applied).
- Traceable (linked back to source documents, computations, and verifiers).
- Assurable (structured for external verification under CSRD, ISSB, SEC, or GRI).
This layer provides the compliance-grade governance that transforms raw ESG data into regulator-ready disclosures.
4.2. Validation Principles
- Mandatory Field Enforcement
- Every disclosure node in the Global Disclosure Ontology (GDO) has critical/blocker fields.
- If a mandatory field is missing (e.g., Scope 1 emissions total), the disclosure fails validation.
- Unit & Methodology Consistency
- All metrics stored with explicit units (kg CO₂e, MWh, headcount).
- Calculation methodologies tagged (location-based vs market-based for Scope 2).
- Automatic conversion where necessary, with provenance logged.
- Cross-Framework Alignment
- Dual-reporting companies validated against crosswalk consistency (e.g., ESRS vs ISSB overlaps).
- Gaps flagged where frameworks diverge.
- Evidence Tagging
- Every metric/disclosure linked to:
- Source document (file ID, page, paragraph).
- Computation Hub log (trace ID, dataset hashes).
- Assurance record (verifier, scope, methodology).
- Every metric/disclosure linked to:
4.3. Assurance Contracts
Definition: An Assurance Contract is a machine-readable agreement that defines how a disclosure can be audited and verified.
-
**Components:
disclosure_node→ GDO reference.assurance_status→ self-declared, limited assurance, reasonable assurance.verifier_id→ assurance provider, credentials.evidence_links→ documents, datasets, computation logs.validation_rules→ applied checks (units, thresholds, consistency).
-
**Workflow:
- Disclosure prepared.
- Validation checks applied (blocker/critical rules).
- Assurance contract generated (JSON schema).
- Verifier reviews contract + linked evidence.
- Contract signed (digitally or manually).
4.4. Integration with Computation Hub & RAG
- Computation Hub: Numbers validated against reference datasets (e.g., IPCC EFDB) before being attached to a disclosure.
- RAG Retrieval: If a disclosure cites a regulatory clause, validation checks that the retrieved clause matches the referenced node in GDO.
- Behavioral Calibration: Ensures outputs clearly state whether a disclosure is audited or unaudited.
4.5. Governance & Auditability
- Cryptographic Logging: Every validation and assurance step logged with immutable trace ID.
- Assurance Dashboards: Customers can view which disclosures are verified vs pending.
- Audit Export: On-demand export of assurance contracts in XBRL/iXBRL for regulators.
- Change Control: If a disclosure changes post-assurance, ZAYAZ flags it as “assurance invalidated” until re-verified.
4.6. Other Features
- Automated Assurance APIs: Direct integration with assurance providers (e.g., Big 4, accredited verifiers).
- Smart Contracts for Assurance: Blockchain-based assurance contracts to guarantee immutability and non-repudiation.
- Regulator-Ready Submission: Generate machine-readable filings aligned with EFRAG digital taxonomy and SEC XBRL climate rules.
- Continuous Assurance: Move from annual assurance to near real-time validation/verification as new data ingested.