TAC
Tagged Accounting Crawler
1. Identity
Execution Mode: Deterministic ingestion & normalization
Depends on module:
2. Purpose
The Tagged Accounting Crawler micro-engine ingests ERP accounting data and converts tagged transition-related financial transactions into structured, reconciled, and audit-traceable transition finance objects.
It enables:
- Automated ESRS E1-1 CapEx/OpEx population
- EU Taxonomy CapEx alignment input
- Transition ROI modeling
- Stranded asset analysis
- Audit-grade financial traceability
3. Non-Goals
This engine does NOT:
- Perform EU Taxonomy eligibility determination
- Perform scenario modeling
- Perform Monte Carlo climate modeling
- Alter IFRS classifications
- Override accounting records
It is strictly an ingestion, normalization, classification, and reconciliation engine.
4. Architectural Position
ERP → Connector Runtime → Normalizer → Classifier → Reconciler → Transition Objects → Computation Hub
It operates upstream of:
- Transition Plan Validator
- CapEx Alignment Engine
- Taxonomy Engine
- Climate Risk Engine
Downstream KPI Responsibility
- CapEx/OpEx per tCO₂e reduced, abatement cost, NPV/IRR, and payback are computed by
MEID_CALC_TRANSITION_ROI. - This engine publishes
transition_project_cost_profile(cost + provenance) to enable deterministic ROI computation when paired with an emissions reduction profile.
5. Data Sources
Supported ERP systems (extensible):
- SAP S/4HANA
- SAP ECC
- Oracle Fusion / EBS
- Microsoft Dynamics 365
- NetSuite
- Xero
- QuickBooks
Minimum required data extracts:
- General Ledger entries
- Project codes
- Cost centers
- Trial balance
- Asset register (recommended)
6. Decarbonisation Tagging Protocol (DPTP)
The engine detects transition finance through structured tagging.
Accepted Tag Types
- Project code pattern:
<DECARB>
- GL attribute:
DECARB_CAPEX
DECARB_OPEX
- Cost center:
ESG-TRANSITION
Rules are version-controlled and tenant-configurable.
7. Canonical Ingestion Schema (ERP-Agnostic)
7.1. Canonical GL Entry
{
"tenant_id": "string",
"entity_id": "string",
"source_system": "string",
"posting_date": "date",
"fiscal_year": 2026,
"gl_account": "string",
"project_code": "string|null",
"cost_center": "string|null",
"amount": "number",
"currency": "string",
"ifrs_classification": "string|null",
"asset_id": "string|null",
"source_reference": "string",
"job_id": "string"
}
7.2. Canonical Trial Balance
{
"tenant_id": "string",
"entity_id": "string",
"fiscal_year": 2026,
"gl_account": "string",
"balance": "number",
"currency": "string"
}
8. Classification Logic
The classifier determines:
- CapEx vs OpEx
- IAS 16 / IAS 38 / expense
- Project aggregation
- Tag validity
Time-Phasing Support (for discounting downstream):
Where possible, the engine SHOULD emit cashflow_timeline at monthly granularity derived from posting dates. This enables present value (PV) discounting in MEID_CALC_TRANSITION_ROI without requiring any additional ERP integration.
Output object:
{
"transition_project_id": "ZYZ-DECARB-2026-001",
"project_type": "capex|opex",
"total_amount": 48250000,
"currency": "EUR",
"linked_gl_accounts": ["1500", "1510"],
"source_entities": ["Entity_A"],
"reconciliation_status": "pending|passed|failed"
}
9. Reconciliation Engine
Reconciliation is mandatory.
Validation:
Sum(Tagged GL Entries)
=
Declared Transition CapEx/OpEx
Cross-checks:
- Trial balance totals
- CapEx account group totals
- Duplicate detection
- Currency normalization accuracy
If mismatch:
financial_integrity_flag = true
10. Integrity & Audit Trail
Each ingestion job produces:
- Job manifest
- Raw extract hash
- Normalization log
- Reconciliation report
- Classification ruleset version reference
All stored in immutable evidence vault.
11. Outputs
The engine publishes:
transition_line_itemstransition_project_aggregatesreconciliation_reportfinancial_integrity_flagstransition_project_cost_profile
These are consumed by:
- Transition Plan Validator
- Report Hub
- Taxonomy Engine
- Monte Carlo Risk Engine
11.1. Transition Project Cost Profile (ROI Handoff Artifact)
The engine MUST publish a normalized project-level cost artifact designed for downstream ROI and abatement-cost engines.
Artifact Type: transition_project_cost_profile (ROI handoff artifact: CapEx/OpEx + timeline + provenance)
Primary Consumer: MEID_CALC_TRANSITION_ROI
Keying: transition_project_id (must match emissions reduction profile keys)
{
"transition_project_id": "ZYZ-DECARB-2026-001",
"tenant_id": "TENANT_X",
"entity_id": "Entity_A",
"reporting_year": 2026,
"currency": "EUR",
"capex_total": 48250000,
"opex_total": 9100000,
"cashflow_timeline": [
{ "period": "2026-01", "capex": 1200000, "opex": 100000 },
{ "period": "2026-02", "capex": 900000, "opex": 80000 }
],
"reconciliation_status": "passed_with_warnings",
"integrity_flags": [],
"ledger_refs": [
{ "source_system": "SAP_S4", "source_reference": "1900001:1", "gl_account": "1500", "posting_date": "2026-03-01", "amount": 1250000, "currency": "EUR" }
],
"source": {
"source_system": "SAP_S4",
"job_id": "JOB-UUID",
"extracted_at": "2026-02-16T00:00:00.000Z"
}
}
- cashflow_timeline is optional but recommended for discounting in ROI calculations.
- The accounting crawler does not compute €/tCO₂e. It publishes cost profiles only.
12. Security & Tenant Isolation
- Read-only ERP access
- Encrypted storage
- Tenant-scoped processing
- No financial record mutation
- Full audit logging
13. Observability
Metrics:
- Extraction completeness %
- Reconciliation pass rate
- Tag detection coverage
- Processing latency
Alerts triggered when:
- Integrity flag = true
- Extract missing mandatory fields
- Currency conversion failure
14. Governance Alignment
- Cannot override IFRS classification
- Cannot override platform logic
- Operates under deterministic rules
- Fully traceable for assurance engagements
15. Strategic Impact
This micro-engine:
- Converts ESG finance from manual entry to audited ingestion
- Bridges CFO systems with sustainability logic
- Creates defensible transition finance disclosures
- Enables quantifiable climate ROI modeling
It is a foundational engine in the ZAYAZ Transition Finance Stack.
APPENDIX A - Connector Interface Contract (SDK specification)
A.1. Connector Philosophy
The connector must:
- Extract ERP-native data
- Not apply business logic
- Not interpret ESG meaning
- Not aggregate
- Not classify
- Be idempotent
- Be stateless
It is a transport layer, nothing more.
A.2. Connector Identity Contract
Every connector must expose:
{
"connector_id": "string",
"connector_name": "string",
"erp_system": "SAP_S4 | SAP_ECC | ORACLE_FUSION | DYNAMICS_365 | NETSUITE | XERO | QUICKBOOKS | CUSTOM",
"version": "string",
"supported_entities": ["legal_entity_code"],
"supports_asset_register": true,
"supports_trial_balance": true,
"supports_projects": true
}
A.3. Mandatory Interface Methods
All connectors must implement:
A.3.1. discover()
Purpose: Identify ERP structure.
{
"entities": ["Entity_A", "Entity_B"],
"ledgers": ["Primary Ledger"],
"currencies": ["EUR", "USD"],
"dimensions": {
"project": "field_name",
"cost_center": "field_name",
"gl_account": "field_name"
}
}
A.3.2. extract_gl_lines()
Core function.
Input
{
"entity_id": "Entity_A",
"date_from": "2026-01-01",
"date_to": "2026-12-31",
"filters": {
"project_code_like": "%DECARB%",
"gl_accounts": []
}
}
Output (ERP-native, NOT normalized)
[
{
"posting_date": "2026-03-01",
"document_number": "1900001",
"line_number": 1,
"gl_account": "1500",
"project_code": "ZYZ-DECARB-2026-001",
"cost_center": "ESG-TRANSITION",
"amount": 1250000,
"currency": "EUR",
"debit_credit_indicator": "D",
"asset_id": "A-102"
}
]
Rules:
- Must include source reference keys.
- Must preserve original sign logic.
- Must not aggregate.
A.3.3. extract_trial_balance()
Used for reconciliation.
Input
{
"entity_id": "Entity_A",
"fiscal_year": 2026
}
Output
[
{
"gl_account": "1500",
"closing_balance": 88000000,
"currency": "EUR"
}
]
A.3.4. extract_projects()
[
{
"project_code": "ZYZ-DECARB-2026-001",
"project_name": "Electrification Line A",
"start_date": "2026-01-01",
"status": "ACTIVE"
}
]
A.3.5. extract_assets()
[
{
"asset_id": "A-102",
"asset_class": "Manufacturing Equipment",
"capitalization_date": "2026-02-01",
"useful_life_years": 15,
"net_book_value": 44000000
}
]
A.3.6. Sample Plan Request Payload
{
"intent": {
"intent_id": "INTENT-ACCT-CRAWLER-DEFAULT-2026-02-20",
"applies_to_meid": "MEID_ACCT_CRAWLER",
"ruleset_family": "acct_crawler",
"target": {
"environment": "dev",
"tenant_id": null,
"entity_id": null
},
"output": {
"folder": "/workspaces/zayaz-docs/code/associated-files/computation-hub-calcs/micro-engines/tagged-accounting-crawler",
"naming": {
"version": "1_0_0",
"file_style": "kebab"
}
},
"governance": {
"created_by": "governance",
"owners": ["cto@viroway.com"],
"status": "draft",
"changelog": "Initial default acct crawler rulesets"
},
"bundle": {
"create_bundle": true,
"bundle_name": "acct_crawler_default",
"strict_mode": false,
"allow_tenant_overrides": true,
"execution_order": [
"acct_crawler_tag_detection",
"acct_crawler_classification",
"acct_crawler_reconciliation_policy"
]
},
"rulesets": [
{
"artifact_name": "acct_crawler_tag_detection",
"ruleset_kind": "tag_detection",
"zrr": {
"crid": "ruleset.tagging.finance.global.transition.decarb.standard.WARNING.1_0_0",
"rule_type": "tagging",
"domain": "finance",
"severity": "WARNING",
"enforcement_mode": "soft",
"fallback_logic": "none",
"linked_frameworks": ["GLOBAL"],
"linked_signal_ids": [],
"ontology_binding": [],
"audit_required": true,
"execution_engine": "MEID_ACCT_CRAWLER"
},
"compatibility": {
"min_schema_ref": "ZAR:schema:canonical_gl_entry@v1",
"max_schema_ref": "ZAR:schema:canonical_gl_entry@v1"
},
"rules": { "precedence": ["gl_attribute", "project_code", "cost_center"] }
}
]
},
"context": {
"env": "dev",
"actor": "docusaurus-ui",
"now": "2026-02-24T10:00:00.000Z"
}
}
A.4. Error Handling Contract
Connector must return structured errors:
{
"error_type": "AUTHENTICATION | PERMISSION | RATE_LIMIT | DATA_SCHEMA | NETWORK",
"error_message": "string",
"retryable": true
}
A.5. Security Contract
Connector must:
- Use read-only ERP roles
- Support OAuth2 or service principal
- Never persist credentials locally
- Encrypt transport (TLS 1.2+)
- Log access attempts
A.6. Idempotency Requirements
Each extract must include:
{
"job_id": "uuid",
"extract_window_hash": "sha256",
"record_count": 12450,
"source_checksum": "hash_if_available"
}
Re-running same window must produce identical results unless ERP changed.
A.7. Performance Requirements
Minimum performance target:
- 1 million GL lines per hour per tenant
- Incremental extraction supported
- Pagination mandatory
A.8. Connector Certification Checklist
Before production approval:
- ✔ Contract tests passed
- ✔ Schema validation passed
- ✔ Reconciliation dry-run successful
- ✔ Security audit complete
- ✔ Load test complete
A.9. Versioning Strategy
- Connector versioned independently.
- Canonical schema version controlled by MEID-ACCT-CRAWLER.
- Mapping rules versioned via ZAR.
- Breaking changes require:
- New connector version
- Migration notice
- Regression suite execution
A.10. Strategic Note
This SDK specification ensures:
- ERP-agnostic extensibility
- Enterprise-grade audit traceability
- Zero ESG logic inside connectors
- Clear separation of concerns
- Massive future integration surface
APPENDIX B - Reconciliation Algorithm
Below is a deterministic, audit-grade reconciliation algorithm for MEID-ACCT-CRAWLER that reconciles (1) extracted GL lines vs (2) trial balance and (3) tagged transition subsets (CapEx/OpEx) vs configured control totals. It’s designed to be ERP-agnostic, idempotent, and assurance-ready.
B.0. What we reconcile (three layers)
Layer A — Extraction completeness (GL ↔ Trial Balance)
Ensures the crawler extracted a complete and accurate slice of the ledger for the period.
Layer B — Tag integrity (Tagged subset ↔ Parent population)
Ensures tagged transition entries are a consistent subset of extracted population (no sign flips, duplicates, or missing dimensions).
Layer C — Disclosure readiness (Tagged aggregates ↔ finance controls)
Ensures reported transition CapEx/OpEx totals reconcile to finance-defined control groups (e.g., CapEx accounts, project ledger).
B.1. Inputs
Canonical GL lines (normalized)
Each record includes:
tenant_id,entity_id,fiscal_year,posting_dategl_account,amount_signed,currencyproject_code,cost_center,gl_attributesource_reference(ERP doc+line unique key)job_id
Canonical Trial Balance (TB)
- by
tenant_id,entity_id,fiscal_year,gl_account closing_balance(and ideallyperiod_activityordebit_total/credit_totalif available)
Engine configuration (per tenant/entity)
- fiscal calendar mapping
- currency conversion rules (reporting currency)
- tag rules (DPTP patterns)
- optional control groups:
capex_gl_accounts[]opex_gl_accounts[]capex_projects_prefixes[]etc.
- tolerance policy
B.2. Deterministic normalization prerequisites
B.2.1. Sign normalization (critical)
All GL line amounts must be normalized to a single signed convention:
amount_signed = debit - credit(or equivalent), consistently across connectors.- Preserve original fields too (
debit_credit_indicator, raw amount) for audit.
B.2.2. Currency normalization
For reconciliation, do both:
- native-currency reconciliation where possible (preferred)
- reporting-currency reconciliation (for disclosure readiness)
Store:
amount_native,currency_nativeamount_reporting,currency_reporting,fx_rate_id
B.2.3. Deduplication key
Define immutable key:
dedupe_key = hash(source_system + entity_id + source_reference)
Duplicate lines across pagination/backfills are removed deterministically:
- keep earliest ingested_at, record the duplicate count.
B.3. Tolerance policy (must be explicit)
Define tolerances by layer:
- TB match:
abs(diff) <= max(absolute_tol, relative_tol * abs(tb_value)) - Tag subset: should be exact except for FX rounding; same tolerance rule but smaller.
- Disclosure controls: can allow slightly larger rounding if built from reporting-currency conversion.
Recommended defaults:
absolute_tol: 1.00(currency units) per account per periodrelative_tol: 0.0001(1 bp)fx_rounding_tol: 5.00(units) per entity per period
All tolerance parameters must be stored in the reconciliation report.
B.4. Algorithm steps
Step 1 — Build reconciliation scope
Inputs:
entity_idperiod = [date_from, date_to]fiscal_yearscope_filters(if extraction is filtered, reconciliation must reflect it)
Compute:
gl_population = all canonical_gl_entry within period & entitytb_population = all canonical_trial_balance for entity & fiscal_year(or period activity if available)
If TB only provides closing balances, we need either:
- period activity TB, or
- prior-period closing balances to derive activity. If neither is available, TB reconciliation can only be partial (flag it).
Step 2 — Extraction integrity checks (pre-reconcile)
2.1 Completeness checks
- Missing mandatory fields rate
- Currency coverage
- Posting dates inside window
- Unmapped GL accounts ratio
- Duplicate rate (dedupe_key collisions)
Fail-fast if:
- mandatory field missing > threshold (e.g., 0.1%)
- duplicates > threshold (e.g., 0.5%) unless expected for incremental loads
Step 3 — Primary reconciliation: GL activity ↔ TB activity (by account)
Preferred method: reconcile period activity.
- Compute
gl_sum_by_account=Σ amount_signed_nativefor eachgl_account - Obtain
tb_activity_by_account(from TB if provided)
Compare per gl_account:
diff = gl_sum_by_account - tb_activity_by_account- mark PASS/FAIL using tolerance policy
If TB only provides closing balances:
- derive
tb_activity = closing_balance_current - closing_balance_prior - else: mark reconciliation as
PARTIALand setintegrity_flag.
Output per account:
account_status: PASS|FAIL|MISSING_TB|MISSING_GLdiff_native,diff_reportingtolerance_used
Step 4 — Secondary reconciliation: totals (entity-level)
Compute:
gl_total = Σ all accountstb_total = Σ tb_activity- validate that entity-level totals match within tolerance
This catches “offsetting errors” that pass at account-level due to missing accounts.
Step 5 — Tagged subset construction (DPTP)
Define is_tagged_transition:
True if any:
project_codematches -DECARB-- OR
gl_attributein{DECARB_CAPEX, DECARB_OPEX} - OR
cost_center== ESG-TRANSITION (tenant ruleset may add more patterns)
Partition:
tagged_population = subset(gl_populationwhereis_tagged_transition)untagged_population= remainder
Also classify each tagged line:
transition_type = CAPEX|OPEX|UNKNOWNusing:- explicit
gl_attributeif present - else
gl_accountmembership in configured groups - else
heuristics → UNKNOWN(do not silently guess)
Step 6 — Tagged subset integrity checks
6.1 Subset consistency
- every tagged line must exist in GL population (by dedupe_key): trivial if derived
- no duplicates within tagged subset
- mandatory dimension presence rate:
- tagged lines must have project_code OR gl_attribute (configurable)
- any UNKNOWN transition_type above threshold triggers classification_flag
6.2 Signed amount sanity checks
- detect sign inversions:
- if account is in capex group and tagged capex is predominantly negative unexpectedly → flag (Do not auto-fix; report.)
Step 7 — Tagged-to-control reconciliation (disclosure readiness)
This is where we reconcile transition totals to finance-approved control views.
Control 1: CapEx accounts control total
- If tenant supplies
capex_gl_accounts[]: capex_total_from_gl = Σ GL where gl_account in capex_gl_accountscapex_tagged_total = Σ tagged where transition_type=CAPEXcompute capex_tag_coverage = capex_tagged_total / capex_total_from_gl- This is not “must equal” (not all CapEx is decarb), but:
- ensure
capex_tagged_total <= capex_total_from_gl + tol - ensure
capex_tagged_totalis not implausibly high (policy threshold e.g., >90% triggers review)
Control 2: Project ledger / WBS control
- If ERP provides project totals, reconcile:
- per
project_code: sum GL lines ↔ project system totals - detect projects with GL activity but missing in project dimension extract
- per
Control 3: OpEx accounts control
- Similar to CapEx:
- ensure
opex_tagged_total ≤ opex_control_total(if control total defined)
All controls are policy-based and produce:
PASS|WARN|FAILnot just pass/fail
Step 8 — Output artifacts
Produce a reconciliation_report with:
scope: tenant/entity/period/job_idmethod: TB activity or derivedtolerance_policydedupe_statsaccount_level_results``[]entity_level_totalstagged_subset_stats:- counts, totals by CAPEX/OPEX/UNKNOWN
- tag source breakdown (
project_codevsgl_attributevscost_center)
control_checks``[](capex coverage, op-ex coverage, outliers)integrity_flags[](hard failures)review_flags[](warnings)
Integrity flag taxonomy (suggested)
TB_MISSINGTB_PARTIAL_ACTIVITYACCOUNT_FAILENTITY_TOTAL_FAILDUPLICATE_RATE_HIGHFX_CONVERSION_GAPTAG_CLASSIFICATION_UNKNOWN_HIGHSIGN_ANOMALYCONTROL_EXCEEDS_PARENT_TOTAL
B.5. Deterministic decision rules
Publish rules (recommended)
- If any
integrity_flagsin{ENTITY_TOTAL_FAIL, ACCOUNT_FAIL above threshold}:- publish outputs but set:
financial_integrity_flag = truereconciliation_status = failed
- publish outputs but set:
- If only warnings:
reconciliation_status = passed_with_warnings
This supports operational continuity while still being honest.
B.6. Pseudocode (high-level)
RECONCILE(job_id, entity_id, date_from, date_to):
gl = load_gl(job_id, entity_id, date_from, date_to)
gl = dedupe(gl)
assert mandatory_fields_ok(gl)
tb = load_tb(job_id, entity_id, fiscal_year(date_from))
tb_mode = determine_tb_mode(tb)
gl_by_acct = sum_by(gl, gl_account, amount_native_signed)
tb_by_acct = get_tb_activity(tb, tb_mode, prior_tb_optional)
acct_results = []
for acct in union(keys(gl_by_acct), keys(tb_by_acct)):
diff = gl_by_acct[acct] - tb_by_acct[acct]
status = within_tolerance(diff, tb_by_acct[acct]) ? PASS : FAIL
acct_results.append({acct, status, diff})
entity_diff = sum(gl_by_acct) - sum(tb_by_acct)
entity_status = within_tolerance(entity_diff, sum(tb_by_acct)) ? PASS : FAIL
tagged = filter(gl, is_tagged_transition)
tagged = classify_capex_opex(tagged, ruleset)
tag_stats = compute_tag_stats(tagged)
tag_integrity = validate_tag_integrity(tagged)
controls = []
controls.append(check_parent_totals(tagged, gl, capex_accounts, opex_accounts))
controls.append(check_project_totals(tagged, project_extract_optional))
flags = derive_flags(acct_results, entity_status, tag_integrity, controls)
return reconciliation_report(job_id, entity_id, scope, acct_results, entity_status, tag_stats, controls, flags)
B.7. What this gives ZAYAZ
- Account-level and entity-level reconciliation
- Immutable trail from TB ↔ GL ↔ tagged subset
- Clear rules for “warnings vs failures”
- Deterministic outputs for ZAR versioning
- Compatibility with auditors (evidence pack is reproducible)
APPENDIX C - Event model (JobStarted, Canonicalized, Reconciled, Published)
C.1 Event principles
- Immutable: append-only, never updated
- Idempotent: events carry event_id + deterministic correlation_id
- Traceable: every event includes tenant_id, job_id, meid, ruleset_ref, schema_ref
- Consumable: SIS / Reports Hub / CH engines can subscribe without ERP knowledge
C.2 Core event envelope (all events)
{
"event_id": "uuid",
"event_type": "string",
"event_time": "2026-02-16T12:34:56.000Z",
"tenant_id": "string",
"entity_id": "string",
"meid": "MEID-ACCT-CRAWLER",
"job_id": "uuid",
"correlation_id": "string",
"actor": "system|user",
"severity": "info|warn|error",
"ruleset_ref": "ZAR:ruleset:acct_crawler_dptp@<hash_or_rev>",
"schema_ref": "ZAR:schema:canonical_gl_entry@<hash_or_rev>",
"payload": {}
}
C.3 Job state events (minimum set)
AcctCrawler.JobStarted
Emitted once per job execution window.
{
"event_type": "AcctCrawler.JobStarted",
"payload": {
"window": { "date_from": "2026-01-01", "date_to": "2026-01-31" },
"source_system": "SAP_S4",
"mode": "incremental|backfill",
"requested_by": "scheduler|user"
}
}
AcctCrawler.Extracted
After connector pulls raw data (counts + hashes).
{
"event_type": "AcctCrawler.Extracted",
"payload": {
"extract_artifacts": [
{ "type": "gl_lines", "record_count": 120034, "sha256": "..." },
{ "type": "trial_balance", "record_count": 520, "sha256": "..." },
{ "type": "projects", "record_count": 3400, "sha256": "..." }
],
"connector_id": "conn.sap_s4.odata",
"connector_version": "1.3.2"
}
}
AcctCrawler.Canonicalized
After normalization into canonical schema.
{
"event_type": "AcctCrawler.Canonicalized",
"payload": {
"canonical_artifacts": [
{ "type": "canonical_gl_entry", "record_count": 119990, "sha256": "..." },
{ "type": "canonical_trial_balance", "record_count": 520, "sha256": "..." }
],
"dedupe": { "duplicates_removed": 44, "duplicate_rate": 0.00037 },
"fx": { "reporting_currency": "EUR", "fx_rate_set_ref": "ZAR:fxset:ECB@..." }
}
}
AcctCrawler.Classified
After tagging + CAPEX/OPEX classification.
{
"event_type": "AcctCrawler.Classified",
"payload": {
"tag_stats": {
"tagged_total": 8450,
"capex_lines": 2100,
"opex_lines": 6100,
"unknown_lines": 250,
"tag_sources": { "project_code": 7600, "gl_attribute": 700, "cost_center": 150 }
},
"transition_aggregates": [
{ "type": "capex_total", "amount_reporting": 48250000 },
{ "type": "opex_total", "amount_reporting": 9100000 }
]
}
}
AcctCrawler.Reconciled
After reconciliation report is computed.
{
"event_type": "AcctCrawler.Reconciled",
"payload": {
"reconciliation_status": "passed|passed_with_warnings|failed|partial",
"integrity_flags": ["TB_PARTIAL_ACTIVITY"],
"account_fail_count": 2,
"entity_total_diff_reporting": 3.21,
"report_ref": "ZAR:artifact:reconciliation_report@..."
}
}
AcctCrawler.Published
When outputs are made available to downstream systems.
{
"event_type": "AcctCrawler.Published",
"payload": {
"published_artifacts": [
{ "type": "transition_line_items", "ref": "ZAR:artifact:transition_line_items@..." },
{ "type": "transition_project_aggregates", "ref": "ZAR:artifact:transition_project_aggregates@..." },
{ "type": "reconciliation_report", "ref": "ZAR:artifact:reconciliation_report@..." }
],
"availability": { "api": true, "report_hub": true, "sis_indexed": true }
}
}
AcctCrawler.JobFailed (terminal)
{
"event_type": "AcctCrawler.JobFailed",
"payload": {
"failed_stage": "extract|canonicalize|classify|reconcile|publish",
"error_type": "AUTH|PERMISSION|RATE_LIMIT|DATA_SCHEMA|NETWORK",
"retryable": true,
"error_ref": "ZAR:artifact:error_log@..."
}
}
That’s the full job lifecycle. Downstream systems can subscribe to Published only, or use intermediate events for observability.
APPENDIX D - Ruleset version governance inside ZAR
This appendix defines the default global rulesets for MEID_ACCT_CRAWLER.
MEID is stable. Behavioral changes occur only via ZAR-managed rulesets.
All rulesets are:
- content-addressed (sha256 hash in ZAR)
- explicitly referenced in JobStarted + all downstream artifacts
- replay-safe
- compatible-version validated before activation
D.1. Ruleset version governance inside ZAR (no MEID versions)
D.1.1. Principle
D.1. ZAR Ruleset Pack (v0.1 Default)
D.1.1. MEID-ACCT-CRAWLER (Tagged Accounting Crawler)
- Tag Detection Ruleset
- project code regex patterns (e.g., .-DECARB-.)
- accepted cost centers, departments, custom segments
- accepted GL attributes/tags
- precedence rules (attribute > project > cost center)
- “tag confidence” scoring (optional)
- Classification Ruleset (CapEx/OpEx + buckets)
- capex/opex mapping logic (explicit tag, then account group, then cost center)
- account group mappings (capex_accounts, opex_accounts, plus “transition-eligible” accounts)
- unknown thresholds and behavior:
- keep as UNKNOWN
- block publish?
- publish with warning?
- optional: cost category mapping (equipment/consulting/etc)
- Reconciliation Policy Ruleset
- tolerance policy (absolute/relative)
- TB reconciliation mode policy:
- activity from TB if available
- derived from prior closing
- partial allowed?
- pass/warn/fail thresholds (per-account fail count, entity-level diff)
- duplicate-rate thresholds, missing-field thresholds
- currency policy: native-only vs also reporting-currency checks
- Extraction Scope Policy Ruleset (recommended as separate) This avoids hardcoding scope behavior into the orchestrator.
- default extraction window policy (monthly/weekly)
- incremental keys (lastModifiedDate, postingDate)
- backfill limits
- “filters allowed” policy (e.g., allow filtering by project_code_like)
- FX & Reporting Currency Policy Ruleset (optional, but very useful)
- reporting currency per tenant/entity
- FX source preference (ECB, ERP, custom)
- rounding policy
- acceptable FX age / missing FX behavior
D.1.2. MEID_CALC_TRANSITION_ROI (Transition ROI Calculator)
This engine needs its own ZAR rulesets because behavior varies a lot by client finance policy.
- ROI Calculation Policy Ruleset
- PV timing assumption when timeline missing:
- treat totals at t=0, or spread across year
- PV method allowed: monthly vs annual
- discounting behavior: per project override allowed?
- rounding rules
- which outputs are computed (simple ratios always, PV ratios when possible)
- Rollup Policy Ruleset
- allowed group_by keys
- weighting policy:
- by tCO2e (recommended)
- by cost
- unweighted
- rollup inclusion rules:
- exclude projects with R=0 from ratio rollups?
- include but flag?
- portfolio distribution metrics set (median/p25/p75, outlier thresholds)
- Validation & Flagging Policy Ruleset
- reject vs warn for:
- costs unreconciled
- confidence low
- currency mismatch
- missing annual reductions
- standardized flags mapping (so UI & reports are consistent)
- Abatement Cost Policy Ruleset (when we compute “net cost / tCO2e” with benefits)
- what counts as “benefit” (energy savings, avoided carbon tax, subsidies)
- whether to allow carbon price monetization in v1
- required horizons for abatement cost vs simple €/tCO2e
- Signal Policy Ruleset (only if you keep supported_modes: signal)
- what constitutes outliers
- ranking criteria
- minimum sample sizes for rollup scoring
D.1.3. Shared / Platform rulesets (cross-engine)
These are not engine-specific but should be ZAR-managed for consistency.
- Metric Type Registry / Alias Map (EngineAliasMap)
- maps engine outputs to platform-wide signal IDs, labels, units
- Unit + Currency Normalization Policy
- rounding
- precision
- currency formatting
- unit canonicalization (tCO2e vs kgCO2e)
- Confidence Model Policy
- how “low/medium/high” maps to numeric weights / warnings
Example identifiers:
ZAR:ruleset:acct_crawler_tag_detection@sha256:<hash>ZAR:ruleset:acct_crawler_classification@sha256:<hash>ZAR:ruleset:acct_crawler_reconciliation_policy@sha256:<hash>
D.1.4. Activation model (how a job picks a ruleset)
At job start, orchestrator resolves:
global_default_ruleset_ref(engine default)tenant_ruleset_override_ref(if exists and approved)entity_ruleset_override_ref(optional, highest priority)
Then writes the resolved ruleset_ref into:
JobStartedevent- all subsequent events
- all output artifacts metadata
D.1.5. Change control
Ruleset changes must be:
- proposed (draft)
- validated (unit tests + replay tests on sample extracts)
- approved (owner + governance role)
- activated (set as tenant default)
Replay safety requirement
- When rules change, we can replay an old job using the old
ruleset_refand reproduce the same outputs.
D.1.6. Compatibility contract
Every ruleset must declare:
applies_to_meid:MEID_ACCT_CRAWLER- min_schema_ref and max_schema_ref supported So ZAR can prevent activating incompatible rulesets.
D.1.7. Canonical Rule Identifier (CRID) Architecture
Every executable rule in ZAYAZ must possess a Canonical Rule Identifier (CRID).
The CRID is the governance identity of a rule. It is immutable. It is independent from content hash. It is versioned semantically.
CRID Format
ruleset.<rule_type>......<X_Y_Z>
Example
ruleset.validation.finance.global.reconciliation.standard.critical.1_0_0
ruleset.compute.finance.global.transition-roi.standard.blocking.1_0_0
ruleset.classification.finance.global.capex-opex.standard.warning.1_0_0
CRID Principles
- A CRID uniquely identifies the logical intent of a rule.
- A CRID version change reflects semantic behavior change.
- A CRID does NOT contain execution hash.
- A CRID may map to multiple historical hashes (lineage).
- CRID version increments follow CMCB governance (PATCH / MINOR / MAJOR).
CRID vs ZAR Hash
| Layer | Purpose |
|---|---|
| CRID | Governance identity |
| ZAR hash | Immutable execution identity |
Execution logs must record both.
D.2. Ruleset Governance Model
D.2.2 Ruleset Storage in ZAR
Each ruleset is stored in ZAR as:
ZAR:ruleset:<ruleset_name>@sha256:<hash>
Every ruleset must declare:
applies_to_meidruleset_familyruleset_kindmin_schema_refmax_schema_refstatus(draft | approved | deprecated)
D.2.2 Rule Artifact Storage Model (ZAR Binding)
All executable rules in ZAYAZ are stored as ZAR ruleset artifacts.
A rule is not considered valid unless:
- It exists as a content-addressed YAML artifact.
- It is registered in ZRR with a valid CRID.
- It declares execution bindings (MEID + domain).
- It declares compatibility boundaries (schema refs).
- It is approved under CMCB governance.
The YAML file is the canonical executable representation of the rule.
ZRR is the governance registry. ZAR is the immutable artifact store. Execution engines must load only ZAR-resolved artifacts.
No rule may execute directly from source code or ad hoc configuration.
D.3. Default Rulesets — Accounting Crawler
D.3.0. Ruleset YAML Header Standard
Every ruleset YAML file must begin with a standardized header block.
This header ensures:
- Deterministic hashing
- MEID compatibility control
- CRID traceability
- Schema governance enforcement
- CI/CD validation automation
Required Top-Level Structure
zar:
artifact_type: ruleset
artifact_name: <string>
applies_to_meid: <MEID_...>
ruleset_family: <string>
ruleset_kind: <string>
zrr:
crid: <Canonical Rule Identifier>
rule_type: <Validation|Computation|...>
domain: <GHG|FINANCE|GOV|...>
severity: <INFO|WARNING|CRITICAL|BLOCKING>
linked_signal_ids: []
linked_frameworks: []
execution_engine: <MEID_...>
enforcement_mode: advisory|soft|hard|blocking
fallback_logic: sem|manual_escalation|none
ontology_binding: [] # array of USO node refs
audit_required: true
lifecycle:
status: draft|approved|deprecated
owners: []
approved_by: []
created_by: system|governance|admin
created_at: <ISO8601>
supersedes: null
deprecated_by: null
changelog: "Initial default tag detection rules"
compatibility:
min_schema_ref: <ZAR:schema:...>
max_schema_ref: <ZAR:schema:...>
rules:
# engine-specific logic
Ruleset File Naming Convention (Deterministic)
All ruleset YAML files must follow the strict naming convention:
<artifact_name>-<version>.yaml
Where:
artifact_namein YAML = e.g.acct_crawler_tag_detection- filename uses hyphens
- version = semantic version (human-facing only)
Example:
acct-crawler-tag-detection-1_0_0.yaml
D.3.1. Tag Detection Ruleset
Identifier pattern:
ZAR:ruleset:acct_crawler_tag_detection@sha256:<hash>
zar:
artifact_type: ruleset
artifact_name: acct_crawler_tag_detection
applies_to_meid: MEID_ACCT_CRAWLER
ruleset_family: acct_crawler
ruleset_kind: tag_detection
zrr:
crid: ruleset.tagging.finance.global.transition.decarb.WARNING.1_0_0
rule_type: tagging
domain: finance
severity: WARNING
linked_signal_ids: []
linked_frameworks: ["GLOBAL"]
execution_engine: MEID_ACCT_CRAWLER
enforcement_mode: soft
fallback_logic: none
ontology_binding: []
audit_required: true
lifecycle:
status: draft
owners: ["cto@viroway.com"]
approved_by: []
created_by: governance
created_at: 2026-02-17T00:00:00.000Z
supersedes: null
deprecated_by: null
changelog: "Initial default tag detection rules"
compatibility:
min_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
max_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
rules:
precedence:
- gl_attribute
- project_code
- cost_center
tag_sources:
gl_attribute:
field: gl_attribute
accepted_values:
- DECARB_CAPEX
- DECARB_OPEX
- DECARB
project_code:
field: project_code
patterns:
- ".*-DECARB-.*"
- "^ZYZ-DECARB-.*"
cost_center:
field: cost_center
accepted_values:
- ESG-TRANSITION
thresholds:
min_tagged_fields_present: 1
D.3.2. Classification Ruleset
Identifier:
ZAR:ruleset:acct_crawler_classification@sha256:<hash>
zar:
artifact_type: ruleset
artifact_name: acct_crawler_classification
applies_to_meid: MEID_ACCT_CRAWLER
ruleset_family: acct_crawler
ruleset_kind: classification
zrr:
crid: ruleset.classification.finance.global.capex-opex.standard.WARNING.1_0_0
rule_type: classification
domain: finance
severity: WARNING
linked_signal_ids: []
linked_frameworks: ["GLOBAL"]
execution_engine: MEID_ACCT_CRAWLER
enforcement_mode: hard
fallback_logic: manual_escalation
ontology_binding: []
audit_required: true
lifecycle:
status: draft
owners: ["cto@viroway.com"]
approved_by: []
created_by: governance
created_at: 2026-02-17T00:00:00.000Z
supersedes: null
deprecated_by: null
changelog: "Default capex/opex classification logic"
compatibility:
min_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
max_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
rules:
transition_type_precedence:
- gl_attribute
- gl_account_group
- cost_center
gl_attribute_map:
DECARB_CAPEX: CAPEX
DECARB_OPEX: OPEX
gl_account_groups:
capex_accounts: []
opex_accounts: []
fallback:
unknown_transition_type: UNKNOWN
thresholds:
max_unknown_ratio: 0.02
max_unknown_count: 250
D.3.3. Reconciliation Policy Ruleset
Identifier:
ZAR:ruleset:acct_crawler_reconciliation_policy@sha256:<hash>
zar:
artifact_type: ruleset
artifact_name: acct_crawler_reconciliation_policy
applies_to_meid: MEID_ACCT_CRAWLER
ruleset_family: acct_crawler
ruleset_kind: reconciliation_policy
zrr:
crid: ruleset.validation.finance.global.reconciliation.standard.WARNING.1_0_0
rule_type: validation
domain: finance
severity: WARNING
linked_signal_ids: []
linked_frameworks: ["GLOBAL"]
execution_engine: MEID_ACCT_CRAWLER
enforcement_mode: blocking
fallback_logic: manual_escalation
ontology_binding: []
audit_required: true
lifecycle:
status: draft
owners: ["cto@viroway.com"]
approved_by: []
created_by: governance
created_at: 2026-02-17T00:00:00.000Z
supersedes: null
deprecated_by: null
changelog: "Default reconciliation tolerances"
compatibility:
min_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
max_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
rules:
tolerances:
absolute_tol: 1.00
relative_tol: 0.0001
fx_rounding_tol: 5.00
tb_mode_policy:
preferred: activity_if_available
allow_derived_activity: true
allow_partial: true
thresholds:
max_account_fail_count: 0
max_entity_total_diff_abs: 5.00
max_duplicate_rate: 0.005
max_missing_mandatory_field_rate: 0.001
decisions:
on_entity_fail: publish_with_integrity_flag
on_account_fail: publish_with_integrity_flag
D.4. Ruleset Activation Model
At job start the orchestrator resolves:
global_default_ruleset_reftenant_ruleset_override_ref(if approved)entity_ruleset_override_ref(highest priority)
Resolved ruleset_ref must be written into:
JobStartedeventCanonicalizedReconciled- all output artifacts (transition_project_cost_profile)
- integrity reports
D.5. Change Control & Replay Safety
Ruleset lifecycle:
- Draft
- Validation (unit + replay test suite)
- Governance approval
- Activation
- Deprecated (optional)
Replay requirement:
- Old jobs must remain reproducible by re-running with the original ruleset_ref.
- No MEID versioning is allowed.
- Behavior is controlled exclusively through rulesets.