Skip to main content
Jira progress: loading…

TAC

Tagged Accounting Crawler

1. Identity

Execution Mode: Deterministic ingestion & normalization

Loading identity…

Depends on module:


2. Purpose

The Tagged Accounting Crawler micro-engine ingests ERP accounting data and converts tagged transition-related financial transactions into structured, reconciled, and audit-traceable transition finance objects.

It enables:

  • Automated ESRS E1-1 CapEx/OpEx population
  • EU Taxonomy CapEx alignment input
  • Transition ROI modeling
  • Stranded asset analysis
  • Audit-grade financial traceability

3. Non-Goals

This engine does NOT:

  • Perform EU Taxonomy eligibility determination
  • Perform scenario modeling
  • Perform Monte Carlo climate modeling
  • Alter IFRS classifications
  • Override accounting records

It is strictly an ingestion, normalization, classification, and reconciliation engine.


4. Architectural Position

JiraStoryUnlinkedZYZ-797TAC#4-architectural-position#1
ERP → Connector Runtime → Normalizer → Classifier → Reconciler → Transition Objects → Computation Hub

It operates upstream of:

  • Transition Plan Validator
  • CapEx Alignment Engine
  • Taxonomy Engine
  • Climate Risk Engine

Downstream KPI Responsibility

  • CapEx/OpEx per tCO₂e reduced, abatement cost, NPV/IRR, and payback are computed by MEID_CALC_TRANSITION_ROI.
  • This engine publishes transition_project_cost_profile (cost + provenance) to enable deterministic ROI computation when paired with an emissions reduction profile.

5. Data Sources

Supported ERP systems (extensible):

  • SAP S/4HANA
  • SAP ECC
  • Oracle Fusion / EBS
  • Microsoft Dynamics 365
  • NetSuite
  • Xero
  • QuickBooks

Minimum required data extracts:

  • General Ledger entries
  • Project codes
  • Cost centers
  • Trial balance
  • Asset register (recommended)

6. Decarbonisation Tagging Protocol (DPTP)

The engine detects transition finance through structured tagging.

Accepted Tag Types

  1. Project code pattern:

<DECARB>

  1. GL attribute:

DECARB_CAPEX DECARB_OPEX

  1. Cost center:

ESG-TRANSITION

Rules are version-controlled and tenant-configurable.


7. Canonical Ingestion Schema (ERP-Agnostic)

7.1. Canonical GL Entry

canonical-gl-entry.jsonGitHub ↗
{
"tenant_id": "string",
"entity_id": "string",
"source_system": "string",
"posting_date": "date",
"fiscal_year": 2026,
"gl_account": "string",
"project_code": "string|null",
"cost_center": "string|null",
"amount": "number",
"currency": "string",
"ifrs_classification": "string|null",
"asset_id": "string|null",
"source_reference": "string",
"job_id": "string"
}

7.2. Canonical Trial Balance

canonical-trial-balance.jsonGitHub ↗
{
"tenant_id": "string",
"entity_id": "string",
"fiscal_year": 2026,
"gl_account": "string",
"balance": "number",
"currency": "string"
}

8. Classification Logic

The classifier determines:

  • CapEx vs OpEx
  • IAS 16 / IAS 38 / expense
  • Project aggregation
  • Tag validity

Time-Phasing Support (for discounting downstream):
Where possible, the engine SHOULD emit cashflow_timeline at monthly granularity derived from posting dates. This enables present value (PV) discounting in MEID_CALC_TRANSITION_ROI without requiring any additional ERP integration.

Output object:

output-object.jsonGitHub ↗
{
"transition_project_id": "ZYZ-DECARB-2026-001",
"project_type": "capex|opex",
"total_amount": 48250000,
"currency": "EUR",
"linked_gl_accounts": ["1500", "1510"],
"source_entities": ["Entity_A"],
"reconciliation_status": "pending|passed|failed"
}

9. Reconciliation Engine

Reconciliation is mandatory.

Validation:

Sum(Tagged GL Entries)
=
Declared Transition CapEx/OpEx

Cross-checks:

  • Trial balance totals
  • CapEx account group totals
  • Duplicate detection
  • Currency normalization accuracy

If mismatch:

financial_integrity_flag = true

10. Integrity & Audit Trail

Each ingestion job produces:

  • Job manifest
  • Raw extract hash
  • Normalization log
  • Reconciliation report
  • Classification ruleset version reference

All stored in immutable evidence vault.


11. Outputs

The engine publishes:

  1. transition_line_items
  2. transition_project_aggregates
  3. reconciliation_report
  4. financial_integrity_flags
  5. transition_project_cost_profile

These are consumed by:

  • Transition Plan Validator
  • Report Hub
  • Taxonomy Engine
  • Monte Carlo Risk Engine

11.1. Transition Project Cost Profile (ROI Handoff Artifact)

The engine MUST publish a normalized project-level cost artifact designed for downstream ROI and abatement-cost engines.

Artifact Type: transition_project_cost_profile (ROI handoff artifact: CapEx/OpEx + timeline + provenance)
Primary Consumer: MEID_CALC_TRANSITION_ROI
Keying: transition_project_id (must match emissions reduction profile keys)

roi-handoff-artifact.jsonGitHub ↗
{
"transition_project_id": "ZYZ-DECARB-2026-001",
"tenant_id": "TENANT_X",
"entity_id": "Entity_A",
"reporting_year": 2026,
"currency": "EUR",
"capex_total": 48250000,
"opex_total": 9100000,
"cashflow_timeline": [
{ "period": "2026-01", "capex": 1200000, "opex": 100000 },
{ "period": "2026-02", "capex": 900000, "opex": 80000 }
],
"reconciliation_status": "passed_with_warnings",
"integrity_flags": [],
"ledger_refs": [
{ "source_system": "SAP_S4", "source_reference": "1900001:1", "gl_account": "1500", "posting_date": "2026-03-01", "amount": 1250000, "currency": "EUR" }
],
"source": {
"source_system": "SAP_S4",
"job_id": "JOB-UUID",
"extracted_at": "2026-02-16T00:00:00.000Z"
}
}
  • cashflow_timeline is optional but recommended for discounting in ROI calculations.
  • The accounting crawler does not compute €/tCO₂e. It publishes cost profiles only.

12. Security & Tenant Isolation

  • Read-only ERP access
  • Encrypted storage
  • Tenant-scoped processing
  • No financial record mutation
  • Full audit logging

13. Observability

Metrics:

  • Extraction completeness %
  • Reconciliation pass rate
  • Tag detection coverage
  • Processing latency

Alerts triggered when:

  • Integrity flag = true
  • Extract missing mandatory fields
  • Currency conversion failure

14. Governance Alignment

  • Cannot override IFRS classification
  • Cannot override platform logic
  • Operates under deterministic rules
  • Fully traceable for assurance engagements

15. Strategic Impact

This micro-engine:

  • Converts ESG finance from manual entry to audited ingestion
  • Bridges CFO systems with sustainability logic
  • Creates defensible transition finance disclosures
  • Enables quantifiable climate ROI modeling

It is a foundational engine in the ZAYAZ Transition Finance Stack.


APPENDIX A - Connector Interface Contract (SDK specification)

A.1. Connector Philosophy

The connector must:

  • Extract ERP-native data
  • Not apply business logic
  • Not interpret ESG meaning
  • Not aggregate
  • Not classify
  • Be idempotent
  • Be stateless

It is a transport layer, nothing more.


A.2. Connector Identity Contract

Every connector must expose:

connector-identity.jsonGitHub ↗
{
"connector_id": "string",
"connector_name": "string",
"erp_system": "SAP_S4 | SAP_ECC | ORACLE_FUSION | DYNAMICS_365 | NETSUITE | XERO | QUICKBOOKS | CUSTOM",
"version": "string",
"supported_entities": ["legal_entity_code"],
"supports_asset_register": true,
"supports_trial_balance": true,
"supports_projects": true
}

A.3. Mandatory Interface Methods

All connectors must implement:


A.3.1. discover()

Purpose: Identify ERP structure.

erp-identitfy.jsonGitHub ↗
{
"entities": ["Entity_A", "Entity_B"],
"ledgers": ["Primary Ledger"],
"currencies": ["EUR", "USD"],
"dimensions": {
"project": "field_name",
"cost_center": "field_name",
"gl_account": "field_name"
}
}

A.3.2. extract_gl_lines()

Core function.

Input

gl-extract-input.jsonGitHub ↗
{
"entity_id": "Entity_A",
"date_from": "2026-01-01",
"date_to": "2026-12-31",
"filters": {
"project_code_like": "%DECARB%",
"gl_accounts": []
}
}

Output (ERP-native, NOT normalized)

gl-extract-output.jsonGitHub ↗
[
{
"posting_date": "2026-03-01",
"document_number": "1900001",
"line_number": 1,
"gl_account": "1500",
"project_code": "ZYZ-DECARB-2026-001",
"cost_center": "ESG-TRANSITION",
"amount": 1250000,
"currency": "EUR",
"debit_credit_indicator": "D",
"asset_id": "A-102"
}
]

Rules:

  • Must include source reference keys.
  • Must preserve original sign logic.
  • Must not aggregate.

A.3.3. extract_trial_balance()

Used for reconciliation.

Input

trial-balance-input.jsonGitHub ↗
{
"entity_id": "Entity_A",
"fiscal_year": 2026
}

Output

trial-balance-output.jsonGitHub ↗
[
{
"gl_account": "1500",
"closing_balance": 88000000,
"currency": "EUR"
}
]

A.3.4. extract_projects()

extract-projects.jsonGitHub ↗
[
{
"project_code": "ZYZ-DECARB-2026-001",
"project_name": "Electrification Line A",
"start_date": "2026-01-01",
"status": "ACTIVE"
}
]

A.3.5. extract_assets()

extract-assets.jsonGitHub ↗
[
{
"asset_id": "A-102",
"asset_class": "Manufacturing Equipment",
"capitalization_date": "2026-02-01",
"useful_life_years": 15,
"net_book_value": 44000000
}
]

A.3.6. Sample Plan Request Payload

sample-plan-request-payload.jsonGitHub ↗
{
"intent": {
"intent_id": "INTENT-ACCT-CRAWLER-DEFAULT-2026-02-20",
"applies_to_meid": "MEID_ACCT_CRAWLER",
"ruleset_family": "acct_crawler",

"target": {
"environment": "dev",
"tenant_id": null,
"entity_id": null
},

"output": {
"folder": "/workspaces/zayaz-docs/code/associated-files/computation-hub-calcs/micro-engines/tagged-accounting-crawler",
"naming": {
"version": "1_0_0",
"file_style": "kebab"
}
},

"governance": {
"created_by": "governance",
"owners": ["cto@viroway.com"],
"status": "draft",
"changelog": "Initial default acct crawler rulesets"
},

"bundle": {
"create_bundle": true,
"bundle_name": "acct_crawler_default",
"strict_mode": false,
"allow_tenant_overrides": true,
"execution_order": [
"acct_crawler_tag_detection",
"acct_crawler_classification",
"acct_crawler_reconciliation_policy"
]
},

"rulesets": [
{
"artifact_name": "acct_crawler_tag_detection",
"ruleset_kind": "tag_detection",
"zrr": {
"crid": "ruleset.tagging.finance.global.transition.decarb.standard.WARNING.1_0_0",
"rule_type": "tagging",
"domain": "finance",
"severity": "WARNING",
"enforcement_mode": "soft",
"fallback_logic": "none",
"linked_frameworks": ["GLOBAL"],
"linked_signal_ids": [],
"ontology_binding": [],
"audit_required": true,
"execution_engine": "MEID_ACCT_CRAWLER"
},
"compatibility": {
"min_schema_ref": "ZAR:schema:canonical_gl_entry@v1",
"max_schema_ref": "ZAR:schema:canonical_gl_entry@v1"
},
"rules": { "precedence": ["gl_attribute", "project_code", "cost_center"] }
}
]
},

"context": {
"env": "dev",
"actor": "docusaurus-ui",
"now": "2026-02-24T10:00:00.000Z"
}
}

A.4. Error Handling Contract

Connector must return structured errors:

error-handling.jsonGitHub ↗
{
"error_type": "AUTHENTICATION | PERMISSION | RATE_LIMIT | DATA_SCHEMA | NETWORK",
"error_message": "string",
"retryable": true
}

A.5. Security Contract

Connector must:

  • Use read-only ERP roles
  • Support OAuth2 or service principal
  • Never persist credentials locally
  • Encrypt transport (TLS 1.2+)
  • Log access attempts

A.6. Idempotency Requirements

Each extract must include:

requirements.jsonGitHub ↗
{
"job_id": "uuid",
"extract_window_hash": "sha256",
"record_count": 12450,
"source_checksum": "hash_if_available"
}

Re-running same window must produce identical results unless ERP changed.


A.7. Performance Requirements

Minimum performance target:

  • 1 million GL lines per hour per tenant
  • Incremental extraction supported
  • Pagination mandatory

A.8. Connector Certification Checklist

Before production approval:

  • ✔ Contract tests passed
  • ✔ Schema validation passed
  • ✔ Reconciliation dry-run successful
  • ✔ Security audit complete
  • ✔ Load test complete

A.9. Versioning Strategy

  • Connector versioned independently.
  • Canonical schema version controlled by MEID-ACCT-CRAWLER.
  • Mapping rules versioned via ZAR.
  • Breaking changes require:
    • New connector version
    • Migration notice
    • Regression suite execution

A.10. Strategic Note

This SDK specification ensures:

  • ERP-agnostic extensibility
  • Enterprise-grade audit traceability
  • Zero ESG logic inside connectors
  • Clear separation of concerns
  • Massive future integration surface

APPENDIX B - Reconciliation Algorithm

Below is a deterministic, audit-grade reconciliation algorithm for MEID-ACCT-CRAWLER that reconciles (1) extracted GL lines vs (2) trial balance and (3) tagged transition subsets (CapEx/OpEx) vs configured control totals. It’s designed to be ERP-agnostic, idempotent, and assurance-ready.


B.0. What we reconcile (three layers)

Layer A — Extraction completeness (GL ↔ Trial Balance)

Ensures the crawler extracted a complete and accurate slice of the ledger for the period.

Layer B — Tag integrity (Tagged subset ↔ Parent population)

Ensures tagged transition entries are a consistent subset of extracted population (no sign flips, duplicates, or missing dimensions).

Layer C — Disclosure readiness (Tagged aggregates ↔ finance controls)

Ensures reported transition CapEx/OpEx totals reconcile to finance-defined control groups (e.g., CapEx accounts, project ledger).


B.1. Inputs

Canonical GL lines (normalized)

Each record includes:

  • tenant_id, entity_id, fiscal_year, posting_date
  • gl_account, amount_signed, currency
  • project_code, cost_center, gl_attribute
  • source_reference (ERP doc+line unique key)
  • job_id

Canonical Trial Balance (TB)

  • by tenant_id, entity_id, fiscal_year, gl_account
  • closing_balance (and ideally period_activity or debit_total/credit_total if available)

Engine configuration (per tenant/entity)

  • fiscal calendar mapping
  • currency conversion rules (reporting currency)
  • tag rules (DPTP patterns)
  • optional control groups:
    • capex_gl_accounts[]
    • opex_gl_accounts[]
    • capex_projects_prefixes[] etc.
  • tolerance policy

B.2. Deterministic normalization prerequisites

B.2.1. Sign normalization (critical)

All GL line amounts must be normalized to a single signed convention:

  • amount_signed = debit - credit (or equivalent), consistently across connectors.
  • Preserve original fields too (debit_credit_indicator, raw amount) for audit.

B.2.2. Currency normalization

For reconciliation, do both:

  • native-currency reconciliation where possible (preferred)
  • reporting-currency reconciliation (for disclosure readiness)

Store:

  • amount_native, currency_native
  • amount_reporting, currency_reporting, fx_rate_id

B.2.3. Deduplication key

Define immutable key:

dedupe_key = hash(source_system + entity_id + source_reference)

Duplicate lines across pagination/backfills are removed deterministically:

  • keep earliest ingested_at, record the duplicate count.

B.3. Tolerance policy (must be explicit)

Define tolerances by layer:

  • TB match: abs(diff) <= max(absolute_tol, relative_tol * abs(tb_value))
  • Tag subset: should be exact except for FX rounding; same tolerance rule but smaller.
  • Disclosure controls: can allow slightly larger rounding if built from reporting-currency conversion.

Recommended defaults:

  • absolute_tol: 1.00 (currency units) per account per period
  • relative_tol: 0.0001 (1 bp)
  • fx_rounding_tol: 5.00 (units) per entity per period

All tolerance parameters must be stored in the reconciliation report.


B.4. Algorithm steps

Step 1 — Build reconciliation scope

Inputs:

  • entity_id
  • period = [date_from, date_to]
  • fiscal_year
  • scope_filters (if extraction is filtered, reconciliation must reflect it)

Compute:

  • gl_population = all canonical_gl_entry within period & entity
  • tb_population = all canonical_trial_balance for entity & fiscal_year (or period activity if available)

If TB only provides closing balances, we need either:

  • period activity TB, or
  • prior-period closing balances to derive activity. If neither is available, TB reconciliation can only be partial (flag it).

Step 2 — Extraction integrity checks (pre-reconcile)

2.1 Completeness checks

  • Missing mandatory fields rate
  • Currency coverage
  • Posting dates inside window
  • Unmapped GL accounts ratio
  • Duplicate rate (dedupe_key collisions)

Fail-fast if:

  • mandatory field missing > threshold (e.g., 0.1%)
  • duplicates > threshold (e.g., 0.5%) unless expected for incremental loads

Step 3 — Primary reconciliation: GL activity ↔ TB activity (by account)

Preferred method: reconcile period activity.

  • Compute gl_sum_by_account = Σ amount_signed_native for each gl_account
  • Obtain tb_activity_by_account (from TB if provided)

Compare per gl_account:

  • diff = gl_sum_by_account - tb_activity_by_account
  • mark PASS/FAIL using tolerance policy

If TB only provides closing balances:

  • derive tb_activity = closing_balance_current - closing_balance_prior
  • else: mark reconciliation as PARTIAL and set integrity_flag.

Output per account:

  • account_status: PASS|FAIL|MISSING_TB|MISSING_GL
  • diff_native, diff_reporting
  • tolerance_used

Step 4 — Secondary reconciliation: totals (entity-level)

Compute:

  • gl_total = Σ all accounts
  • tb_total = Σ tb_activity
  • validate that entity-level totals match within tolerance

This catches “offsetting errors” that pass at account-level due to missing accounts.


Step 5 — Tagged subset construction (DPTP)

Define is_tagged_transition:

True if any:

  • project_code matches -DECARB-
  • OR gl_attribute in {DECARB_CAPEX, DECARB_OPEX}
  • OR cost_center == ESG-TRANSITION (tenant ruleset may add more patterns)

Partition:

  • tagged_population = subset(gl_population where is_tagged_transition)
  • untagged_population = remainder

Also classify each tagged line:

  • transition_type = CAPEX|OPEX|UNKNOWN using:
  • explicit gl_attribute if present
  • else gl_account membership in configured groups
  • else heuristics → UNKNOWN (do not silently guess)

Step 6 — Tagged subset integrity checks

6.1 Subset consistency

  • every tagged line must exist in GL population (by dedupe_key): trivial if derived
  • no duplicates within tagged subset
  • mandatory dimension presence rate:
  • tagged lines must have project_code OR gl_attribute (configurable)
  • any UNKNOWN transition_type above threshold triggers classification_flag

6.2 Signed amount sanity checks

  • detect sign inversions:
  • if account is in capex group and tagged capex is predominantly negative unexpectedly → flag (Do not auto-fix; report.)

Step 7 — Tagged-to-control reconciliation (disclosure readiness)

This is where we reconcile transition totals to finance-approved control views.

Control 1: CapEx accounts control total

  • If tenant supplies capex_gl_accounts[]:
  • capex_total_from_gl = Σ GL where gl_account in capex_gl_accounts
  • capex_tagged_total = Σ tagged where transition_type=CAPEX
  • compute capex_tag_coverage = capex_tagged_total / capex_total_from_gl
  • This is not “must equal” (not all CapEx is decarb), but:
  • ensure capex_tagged_total <= capex_total_from_gl + tol
  • ensure capex_tagged_total is not implausibly high (policy threshold e.g., >90% triggers review)

Control 2: Project ledger / WBS control

  • If ERP provides project totals, reconcile:
    • per project_code: sum GL lines ↔ project system totals
    • detect projects with GL activity but missing in project dimension extract

Control 3: OpEx accounts control

  • Similar to CapEx:
  • ensure opex_tagged_total ≤ opex_control_total (if control total defined)

All controls are policy-based and produce:

  • PASS|WARN|FAIL not just pass/fail

Step 8 — Output artifacts

Produce a reconciliation_report with:

  • scope: tenant/entity/period/job_id
  • method: TB activity or derived
  • tolerance_policy
  • dedupe_stats
  • account_level_results``[]
  • entity_level_totals
  • tagged_subset_stats:
    • counts, totals by CAPEX/OPEX/UNKNOWN
    • tag source breakdown (project_code vs gl_attribute vs cost_center)
  • control_checks``[] (capex coverage, op-ex coverage, outliers)
  • integrity_flags[] (hard failures)
  • review_flags[] (warnings)

Integrity flag taxonomy (suggested)

  • TB_MISSING
  • TB_PARTIAL_ACTIVITY
  • ACCOUNT_FAIL
  • ENTITY_TOTAL_FAIL
  • DUPLICATE_RATE_HIGH
  • FX_CONVERSION_GAP
  • TAG_CLASSIFICATION_UNKNOWN_HIGH
  • SIGN_ANOMALY
  • CONTROL_EXCEEDS_PARENT_TOTAL

B.5. Deterministic decision rules

Publish rules (recommended)

  • If any integrity_flags in {ENTITY_TOTAL_FAIL, ACCOUNT_FAIL above threshold}:
    • publish outputs but set:
      • financial_integrity_flag = true
      • reconciliation_status = failed
  • If only warnings:
    • reconciliation_status = passed_with_warnings

This supports operational continuity while still being honest.


B.6. Pseudocode (high-level)

RECONCILE(job_id, entity_id, date_from, date_to):

gl = load_gl(job_id, entity_id, date_from, date_to)
gl = dedupe(gl)
assert mandatory_fields_ok(gl)

tb = load_tb(job_id, entity_id, fiscal_year(date_from))
tb_mode = determine_tb_mode(tb)

gl_by_acct = sum_by(gl, gl_account, amount_native_signed)
tb_by_acct = get_tb_activity(tb, tb_mode, prior_tb_optional)

acct_results = []
for acct in union(keys(gl_by_acct), keys(tb_by_acct)):
diff = gl_by_acct[acct] - tb_by_acct[acct]
status = within_tolerance(diff, tb_by_acct[acct]) ? PASS : FAIL
acct_results.append({acct, status, diff})

entity_diff = sum(gl_by_acct) - sum(tb_by_acct)
entity_status = within_tolerance(entity_diff, sum(tb_by_acct)) ? PASS : FAIL

tagged = filter(gl, is_tagged_transition)
tagged = classify_capex_opex(tagged, ruleset)

tag_stats = compute_tag_stats(tagged)
tag_integrity = validate_tag_integrity(tagged)

controls = []
controls.append(check_parent_totals(tagged, gl, capex_accounts, opex_accounts))
controls.append(check_project_totals(tagged, project_extract_optional))

flags = derive_flags(acct_results, entity_status, tag_integrity, controls)

return reconciliation_report(job_id, entity_id, scope, acct_results, entity_status, tag_stats, controls, flags)

B.7. What this gives ZAYAZ

  • Account-level and entity-level reconciliation
  • Immutable trail from TB ↔ GL ↔ tagged subset
  • Clear rules for “warnings vs failures”
  • Deterministic outputs for ZAR versioning
  • Compatibility with auditors (evidence pack is reproducible)

APPENDIX C - Event model (JobStarted, Canonicalized, Reconciled, Published)

C.1 Event principles

  • Immutable: append-only, never updated
  • Idempotent: events carry event_id + deterministic correlation_id
  • Traceable: every event includes tenant_id, job_id, meid, ruleset_ref, schema_ref
  • Consumable: SIS / Reports Hub / CH engines can subscribe without ERP knowledge

C.2 Core event envelope (all events)

core-event-envelopes.jsonGitHub ↗
{
"event_id": "uuid",
"event_type": "string",
"event_time": "2026-02-16T12:34:56.000Z",
"tenant_id": "string",
"entity_id": "string",
"meid": "MEID-ACCT-CRAWLER",
"job_id": "uuid",
"correlation_id": "string",
"actor": "system|user",
"severity": "info|warn|error",
"ruleset_ref": "ZAR:ruleset:acct_crawler_dptp@<hash_or_rev>",
"schema_ref": "ZAR:schema:canonical_gl_entry@<hash_or_rev>",
"payload": {}
}

C.3 Job state events (minimum set)

AcctCrawler.JobStarted

Emitted once per job execution window.

acct-crawler-job-started.jsonGitHub ↗
{
"event_type": "AcctCrawler.JobStarted",
"payload": {
"window": { "date_from": "2026-01-01", "date_to": "2026-01-31" },
"source_system": "SAP_S4",
"mode": "incremental|backfill",
"requested_by": "scheduler|user"
}
}

AcctCrawler.Extracted

After connector pulls raw data (counts + hashes).

acct-crawler-extracted.jsonGitHub ↗
{
"event_type": "AcctCrawler.Extracted",
"payload": {
"extract_artifacts": [
{ "type": "gl_lines", "record_count": 120034, "sha256": "..." },
{ "type": "trial_balance", "record_count": 520, "sha256": "..." },
{ "type": "projects", "record_count": 3400, "sha256": "..." }
],
"connector_id": "conn.sap_s4.odata",
"connector_version": "1.3.2"
}
}

AcctCrawler.Canonicalized

After normalization into canonical schema.

acct-crawler-canonicalized.jsonGitHub ↗
{
"event_type": "AcctCrawler.Canonicalized",
"payload": {
"canonical_artifacts": [
{ "type": "canonical_gl_entry", "record_count": 119990, "sha256": "..." },
{ "type": "canonical_trial_balance", "record_count": 520, "sha256": "..." }
],
"dedupe": { "duplicates_removed": 44, "duplicate_rate": 0.00037 },
"fx": { "reporting_currency": "EUR", "fx_rate_set_ref": "ZAR:fxset:ECB@..." }
}
}

AcctCrawler.Classified

After tagging + CAPEX/OPEX classification.

acct-crawler-classified.jsonGitHub ↗
{
"event_type": "AcctCrawler.Classified",
"payload": {
"tag_stats": {
"tagged_total": 8450,
"capex_lines": 2100,
"opex_lines": 6100,
"unknown_lines": 250,
"tag_sources": { "project_code": 7600, "gl_attribute": 700, "cost_center": 150 }
},
"transition_aggregates": [
{ "type": "capex_total", "amount_reporting": 48250000 },
{ "type": "opex_total", "amount_reporting": 9100000 }
]
}
}

AcctCrawler.Reconciled

After reconciliation report is computed.

acct-crawler-reconciled.jsonGitHub ↗
{
"event_type": "AcctCrawler.Reconciled",
"payload": {
"reconciliation_status": "passed|passed_with_warnings|failed|partial",
"integrity_flags": ["TB_PARTIAL_ACTIVITY"],
"account_fail_count": 2,
"entity_total_diff_reporting": 3.21,
"report_ref": "ZAR:artifact:reconciliation_report@..."
}
}

AcctCrawler.Published

When outputs are made available to downstream systems.

acct-crawler-published.jsonGitHub ↗
{
"event_type": "AcctCrawler.Published",
"payload": {
"published_artifacts": [
{ "type": "transition_line_items", "ref": "ZAR:artifact:transition_line_items@..." },
{ "type": "transition_project_aggregates", "ref": "ZAR:artifact:transition_project_aggregates@..." },
{ "type": "reconciliation_report", "ref": "ZAR:artifact:reconciliation_report@..." }
],
"availability": { "api": true, "report_hub": true, "sis_indexed": true }
}
}

AcctCrawler.JobFailed (terminal)

acct-crawler-job-failed.jsonGitHub ↗
{
"event_type": "AcctCrawler.JobFailed",
"payload": {
"failed_stage": "extract|canonicalize|classify|reconcile|publish",
"error_type": "AUTH|PERMISSION|RATE_LIMIT|DATA_SCHEMA|NETWORK",
"retryable": true,
"error_ref": "ZAR:artifact:error_log@..."
}
}

That’s the full job lifecycle. Downstream systems can subscribe to Published only, or use intermediate events for observability.


APPENDIX D - Ruleset version governance inside ZAR

This appendix defines the default global rulesets for MEID_ACCT_CRAWLER.

MEID is stable. Behavioral changes occur only via ZAR-managed rulesets.

All rulesets are:

  • content-addressed (sha256 hash in ZAR)
  • explicitly referenced in JobStarted + all downstream artifacts
  • replay-safe
  • compatible-version validated before activation

D.1. Ruleset version governance inside ZAR (no MEID versions)

D.1.1. Principle


D.1. ZAR Ruleset Pack (v0.1 Default)

D.1.1. MEID-ACCT-CRAWLER (Tagged Accounting Crawler)

  1. Tag Detection Ruleset
  • project code regex patterns (e.g., .-DECARB-.)
  • accepted cost centers, departments, custom segments
  • accepted GL attributes/tags
  • precedence rules (attribute > project > cost center)
  • “tag confidence” scoring (optional)
  1. Classification Ruleset (CapEx/OpEx + buckets)
  • capex/opex mapping logic (explicit tag, then account group, then cost center)
  • account group mappings (capex_accounts, opex_accounts, plus “transition-eligible” accounts)
  • unknown thresholds and behavior:
  • keep as UNKNOWN
  • block publish?
  • publish with warning?
  • optional: cost category mapping (equipment/consulting/etc)
  1. Reconciliation Policy Ruleset
  • tolerance policy (absolute/relative)
  • TB reconciliation mode policy:
  • activity from TB if available
  • derived from prior closing
  • partial allowed?
  • pass/warn/fail thresholds (per-account fail count, entity-level diff)
  • duplicate-rate thresholds, missing-field thresholds
  • currency policy: native-only vs also reporting-currency checks
  1. Extraction Scope Policy Ruleset (recommended as separate) This avoids hardcoding scope behavior into the orchestrator.
  • default extraction window policy (monthly/weekly)
  • incremental keys (lastModifiedDate, postingDate)
  • backfill limits
  • “filters allowed” policy (e.g., allow filtering by project_code_like)
  1. FX & Reporting Currency Policy Ruleset (optional, but very useful)
  • reporting currency per tenant/entity
  • FX source preference (ECB, ERP, custom)
  • rounding policy
  • acceptable FX age / missing FX behavior

D.1.2. MEID_CALC_TRANSITION_ROI (Transition ROI Calculator)

This engine needs its own ZAR rulesets because behavior varies a lot by client finance policy.

  1. ROI Calculation Policy Ruleset
  • PV timing assumption when timeline missing:
  • treat totals at t=0, or spread across year
  • PV method allowed: monthly vs annual
  • discounting behavior: per project override allowed?
  • rounding rules
  • which outputs are computed (simple ratios always, PV ratios when possible)
  1. Rollup Policy Ruleset
  • allowed group_by keys
  • weighting policy:
  • by tCO2e (recommended)
  • by cost
  • unweighted
  • rollup inclusion rules:
  • exclude projects with R=0 from ratio rollups?
  • include but flag?
  • portfolio distribution metrics set (median/p25/p75, outlier thresholds)
  1. Validation & Flagging Policy Ruleset
  • reject vs warn for:
  • costs unreconciled
  • confidence low
  • currency mismatch
  • missing annual reductions
  • standardized flags mapping (so UI & reports are consistent)
  1. Abatement Cost Policy Ruleset (when we compute “net cost / tCO2e” with benefits)
  • what counts as “benefit” (energy savings, avoided carbon tax, subsidies)
  • whether to allow carbon price monetization in v1
  • required horizons for abatement cost vs simple €/tCO2e
  1. Signal Policy Ruleset (only if you keep supported_modes: signal)
  • what constitutes outliers
  • ranking criteria
  • minimum sample sizes for rollup scoring

D.1.3. Shared / Platform rulesets (cross-engine)

These are not engine-specific but should be ZAR-managed for consistency.

  1. Metric Type Registry / Alias Map (EngineAliasMap)
  • maps engine outputs to platform-wide signal IDs, labels, units
  1. Unit + Currency Normalization Policy
  • rounding
  • precision
  • currency formatting
  • unit canonicalization (tCO2e vs kgCO2e)
  1. Confidence Model Policy
  • how “low/medium/high” maps to numeric weights / warnings

Example identifiers:

  • ZAR:ruleset:acct_crawler_tag_detection@sha256:<hash>
  • ZAR:ruleset:acct_crawler_classification@sha256:<hash>
  • ZAR:ruleset:acct_crawler_reconciliation_policy@sha256:<hash>

D.1.4. Activation model (how a job picks a ruleset)

At job start, orchestrator resolves:

  1. global_default_ruleset_ref (engine default)
  2. tenant_ruleset_override_ref (if exists and approved)
  3. entity_ruleset_override_ref (optional, highest priority)

Then writes the resolved ruleset_ref into:

  • JobStarted event
  • all subsequent events
  • all output artifacts metadata

D.1.5. Change control

Ruleset changes must be:

  • proposed (draft)
  • validated (unit tests + replay tests on sample extracts)
  • approved (owner + governance role)
  • activated (set as tenant default)

Replay safety requirement

  • When rules change, we can replay an old job using the old ruleset_ref and reproduce the same outputs.

D.1.6. Compatibility contract

Every ruleset must declare:

  • applies_to_meid: MEID_ACCT_CRAWLER
  • min_schema_ref and max_schema_ref supported So ZAR can prevent activating incompatible rulesets.

D.1.7. Canonical Rule Identifier (CRID) Architecture

Every executable rule in ZAYAZ must possess a Canonical Rule Identifier (CRID).

The CRID is the governance identity of a rule. It is immutable. It is independent from content hash. It is versioned semantically.

CRID Format

ruleset.<rule_type>......<X_Y_Z>

Example

ruleset.validation.finance.global.reconciliation.standard.critical.1_0_0
ruleset.compute.finance.global.transition-roi.standard.blocking.1_0_0
ruleset.classification.finance.global.capex-opex.standard.warning.1_0_0

CRID Principles

  1. A CRID uniquely identifies the logical intent of a rule.
  2. A CRID version change reflects semantic behavior change.
  3. A CRID does NOT contain execution hash.
  4. A CRID may map to multiple historical hashes (lineage).
  5. CRID version increments follow CMCB governance (PATCH / MINOR / MAJOR).

CRID vs ZAR Hash

LayerPurpose
CRIDGovernance identity
ZAR hashImmutable execution identity

Execution logs must record both.


D.2. Ruleset Governance Model

D.2.2 Ruleset Storage in ZAR

Each ruleset is stored in ZAR as:

ZAR:ruleset:<ruleset_name>@sha256:<hash>

Every ruleset must declare:

  • applies_to_meid
  • ruleset_family
  • ruleset_kind
  • min_schema_ref
  • max_schema_ref
  • status (draft | approved | deprecated)

D.2.2 Rule Artifact Storage Model (ZAR Binding)

All executable rules in ZAYAZ are stored as ZAR ruleset artifacts.

A rule is not considered valid unless:

  1. It exists as a content-addressed YAML artifact.
  2. It is registered in ZRR with a valid CRID.
  3. It declares execution bindings (MEID + domain).
  4. It declares compatibility boundaries (schema refs).
  5. It is approved under CMCB governance.

The YAML file is the canonical executable representation of the rule.

ZRR is the governance registry. ZAR is the immutable artifact store. Execution engines must load only ZAR-resolved artifacts.

No rule may execute directly from source code or ad hoc configuration.


D.3. Default Rulesets — Accounting Crawler

D.3.0. Ruleset YAML Header Standard

Every ruleset YAML file must begin with a standardized header block.

This header ensures:

  • Deterministic hashing
  • MEID compatibility control
  • CRID traceability
  • Schema governance enforcement
  • CI/CD validation automation

Required Top-Level Structure

top-level-ruleset-structure.yamlGitHub ↗
zar:
artifact_type: ruleset
artifact_name: <string>
applies_to_meid: <MEID_...>
ruleset_family: <string>
ruleset_kind: <string>

zrr:
crid: <Canonical Rule Identifier>
rule_type: <Validation|Computation|...>
domain: <GHG|FINANCE|GOV|...>
severity: <INFO|WARNING|CRITICAL|BLOCKING>
linked_signal_ids: []
linked_frameworks: []
execution_engine: <MEID_...>
enforcement_mode: advisory|soft|hard|blocking
fallback_logic: sem|manual_escalation|none
ontology_binding: [] # array of USO node refs
audit_required: true

lifecycle:
status: draft|approved|deprecated
owners: []
approved_by: []
created_by: system|governance|admin
created_at: <ISO8601>
supersedes: null
deprecated_by: null
changelog: "Initial default tag detection rules"

compatibility:
min_schema_ref: <ZAR:schema:...>
max_schema_ref: <ZAR:schema:...>

rules:
# engine-specific logic

Ruleset File Naming Convention (Deterministic)

All ruleset YAML files must follow the strict naming convention:

<artifact_name>-<version>.yaml

Where:

  • artifact_name in YAML = e.g. acct_crawler_tag_detection
  • filename uses hyphens
  • version = semantic version (human-facing only)

Example:

acct-crawler-tag-detection-1_0_0.yaml

D.3.1. Tag Detection Ruleset

Identifier pattern:

ZAR:ruleset:acct_crawler_tag_detection@sha256:<hash>
acct-crawler-tag-detection-1_0_0.yamlGitHub ↗
zar:
artifact_type: ruleset
artifact_name: acct_crawler_tag_detection
applies_to_meid: MEID_ACCT_CRAWLER
ruleset_family: acct_crawler
ruleset_kind: tag_detection

zrr:
crid: ruleset.tagging.finance.global.transition.decarb.WARNING.1_0_0
rule_type: tagging
domain: finance
severity: WARNING
linked_signal_ids: []
linked_frameworks: ["GLOBAL"]
execution_engine: MEID_ACCT_CRAWLER
enforcement_mode: soft
fallback_logic: none
ontology_binding: []
audit_required: true

lifecycle:
status: draft
owners: ["cto@viroway.com"]
approved_by: []
created_by: governance
created_at: 2026-02-17T00:00:00.000Z
supersedes: null
deprecated_by: null
changelog: "Initial default tag detection rules"

compatibility:
min_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
max_schema_ref: "ZAR:schema:canonical_gl_entry@v1"

rules:
precedence:
- gl_attribute
- project_code
- cost_center

tag_sources:
gl_attribute:
field: gl_attribute
accepted_values:
- DECARB_CAPEX
- DECARB_OPEX
- DECARB

project_code:
field: project_code
patterns:
- ".*-DECARB-.*"
- "^ZYZ-DECARB-.*"

cost_center:
field: cost_center
accepted_values:
- ESG-TRANSITION

thresholds:
min_tagged_fields_present: 1

D.3.2. Classification Ruleset

Identifier:

ZAR:ruleset:acct_crawler_classification@sha256:<hash>
acct_crawler_classification-1_0_0.yamlGitHub ↗
zar:
artifact_type: ruleset
artifact_name: acct_crawler_classification
applies_to_meid: MEID_ACCT_CRAWLER
ruleset_family: acct_crawler
ruleset_kind: classification

zrr:
crid: ruleset.classification.finance.global.capex-opex.standard.WARNING.1_0_0
rule_type: classification
domain: finance
severity: WARNING
linked_signal_ids: []
linked_frameworks: ["GLOBAL"]
execution_engine: MEID_ACCT_CRAWLER
enforcement_mode: hard
fallback_logic: manual_escalation
ontology_binding: []
audit_required: true

lifecycle:
status: draft
owners: ["cto@viroway.com"]
approved_by: []
created_by: governance
created_at: 2026-02-17T00:00:00.000Z
supersedes: null
deprecated_by: null
changelog: "Default capex/opex classification logic"

compatibility:
min_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
max_schema_ref: "ZAR:schema:canonical_gl_entry@v1"

rules:
transition_type_precedence:
- gl_attribute
- gl_account_group
- cost_center

gl_attribute_map:
DECARB_CAPEX: CAPEX
DECARB_OPEX: OPEX

gl_account_groups:
capex_accounts: []
opex_accounts: []

fallback:
unknown_transition_type: UNKNOWN
thresholds:
max_unknown_ratio: 0.02
max_unknown_count: 250

D.3.3. Reconciliation Policy Ruleset

Identifier:

ZAR:ruleset:acct_crawler_reconciliation_policy@sha256:<hash>
acct_crawler_reconciliation_policy-1_0_0.yamlGitHub ↗
zar:
artifact_type: ruleset
artifact_name: acct_crawler_reconciliation_policy
applies_to_meid: MEID_ACCT_CRAWLER
ruleset_family: acct_crawler
ruleset_kind: reconciliation_policy

zrr:
crid: ruleset.validation.finance.global.reconciliation.standard.WARNING.1_0_0
rule_type: validation
domain: finance
severity: WARNING
linked_signal_ids: []
linked_frameworks: ["GLOBAL"]
execution_engine: MEID_ACCT_CRAWLER
enforcement_mode: blocking
fallback_logic: manual_escalation
ontology_binding: []
audit_required: true

lifecycle:
status: draft
owners: ["cto@viroway.com"]
approved_by: []
created_by: governance
created_at: 2026-02-17T00:00:00.000Z
supersedes: null
deprecated_by: null
changelog: "Default reconciliation tolerances"

compatibility:
min_schema_ref: "ZAR:schema:canonical_gl_entry@v1"
max_schema_ref: "ZAR:schema:canonical_gl_entry@v1"

rules:
tolerances:
absolute_tol: 1.00
relative_tol: 0.0001
fx_rounding_tol: 5.00

tb_mode_policy:
preferred: activity_if_available
allow_derived_activity: true
allow_partial: true

thresholds:
max_account_fail_count: 0
max_entity_total_diff_abs: 5.00
max_duplicate_rate: 0.005
max_missing_mandatory_field_rate: 0.001

decisions:
on_entity_fail: publish_with_integrity_flag
on_account_fail: publish_with_integrity_flag

D.4. Ruleset Activation Model

At job start the orchestrator resolves:

  1. global_default_ruleset_ref
  2. tenant_ruleset_override_ref (if approved)
  3. entity_ruleset_override_ref (highest priority)

Resolved ruleset_ref must be written into:

  • JobStarted event
  • Canonicalized
  • Reconciled
  • all output artifacts (transition_project_cost_profile)
  • integrity reports

D.5. Change Control & Replay Safety

Ruleset lifecycle:

  1. Draft
  2. Validation (unit + replay test suite)
  3. Governance approval
  4. Activation
  5. Deprecated (optional)

Replay requirement:

  • Old jobs must remain reproducible by re-running with the original ruleset_ref.
  • No MEID versioning is allowed.
  • Behavior is controlled exclusively through rulesets.



GitHub RepoRequest for Change (RFC)