Skip to main content
Jira progress: loading…

EWC-CAT

EWC Hazard Classification

1. Identity

Loading identity…

Depends on module:

Purpose
Classifies waste line items as hazardous or non-hazardous (and optionally assigns hazard class metadata) by resolving an EWC-based mapping dataset (e.g. ewc_all_categories) under ZAR version control.

This engine is the missing governance step that makes hazardous waste reporting auditable:

  • the classification rule is explicit,
  • dataset version is pinned,
  • and every item carries provenance.

Typical usage

  • Ingest waste contractor manifests/invoices containing EWC codes
  • Enrich line items with: hazardous, hazard_class, canonical ewc_item id, recommended unit
  • Feed clean items into MEID_CALC_WASTE_AGGR (which then becomes a pure aggregator)

2. Contract References (ZAR)

2.1 Input Schema

ZAR Address: schema.compute.waste.haz_classify.inputs.v1_0_0

Required conceptual fields:

  • items: list of waste line items to classify
  • mapping_ref: ZAR reference to the EWC mapping dataset (defaults allowed)
  • jurisdiction: optional (e.g., EU, NO, UK) for regional variations
  • alignment: BY_YEAR | BY_INDEX (default BY_YEAR)

Each item conceptually includes:

  • period (typically year)
  • value
  • unit (e.g., kg, tonne)
  • one of:
    • ewc_code (preferred, e.g. "01 03 04*" or "010304*")
    • ewc_item (internal key, e.g. "04")
  • optional:
    • site_id
    • supplier_id / contractor_id
    • invoice_id / manifest_id
    • description (free text)
    • regional_code (if different from jurisdiction)

2.2 Options Schema

ZAR Address: schema.compute.waste.haz_classify.options.v1_0_0

Common options:

  • match_mode: EWC_CODE | EWC_ITEM | AUTO (default AUTO)
  • normalize_ewc_code: boolean (default true)
  • asterisk_means_hazardous: boolean (default true)
    (EU convention: * indicates hazardous entry)
  • unknown_code_policy: ERROR | FLAG | ASSIGN_NON_HAZARDOUS | ASSIGN_UNKNOWN (default FLAG)
  • regional_variations_policy: APPLY_IF_PRESENT | IGNORE (default APPLY_IF_PRESENT)
  • unit_normalization: NO_CONVERT | CONVERT_TO_RECOMMENDED (default NO_CONVERT)
  • rounding: optional digits

Unit conversion is delegated (as per your architecture rule). This engine only requests conversion if CONVERT_TO_RECOMMENDED.


2.3 Output Schema

ZAR Address: schema.compute.waste.haz_classify.output.v1_0_0

Outputs include:

  • items_classified: same items enriched with classification fields
  • summary: counts by classification outcome
  • metadata: mapping version/hash, unknown handling stats, jurisdiction used

Enriched fields per item:

  • hazardous: boolean or enum
  • hazard_class: HAZARDOUS | NON_HAZARDOUS | UNKNOWN
  • ewc_item (canonical internal key)
  • ewc_code_normalized
  • recommended_unit
  • reuse_recyclable_flag
  • classification_confidence (v1: 1.0 for exact match, lower for heuristic)
  • classification_provenance (mapping_ref + row id + rule path)

3. Mapping Dataset Contract (ewc_all_categories)

The mapping dataset MUST provide at least these columns (your example fits perfectly):

  • ewc_item (canonical id)
  • ewc_code (string, may include *)
  • hazard_class (HAZARDOUS/NON_HAZARDOUS)
  • optional:
    • hazardous (boolean/Yes)
    • regional_variations (json/text)
    • recommended_unit
    • reuse_recyclable_flag
    • description

The engine MUST treat the dataset as authoritative and versioned:

  • mapping_ref MUST resolve to an immutable ZAR artifact
  • the output MUST carry mapping_ref + build_hash (or dataset hash if stored separately)

4. Classification Semantics (Normative)

Let an input item be ii.

4.1 EWC code normalization

If normalize_ewc_code = true, transform ewc_code into canonical form:

  • remove spaces
  • uppercase
  • keep trailing * if present

Example:

  • "01 03 04*""010304*"
  • "01 03 04 *""010304*"

Call the normalized code code(i)code(i).

4.2 Matching strategy

If match_mode:

  • EWC_CODE: match only by normalized ewc_code
  • EWC_ITEM: match only by ewc_item
  • AUTO:
    1. try ewc_code
    2. fallback to ewc_item

Let the matched mapping row be m(i)m(i).

If no match:

  • apply unknown_code_policy

4.3 Hazard determination (EU star rule)

If asterisk_means_hazardous = true and the matched ewc_code ends with *, then hazard is hazardous even if the dataset row is missing the hazard flag.

Final hazard class:

  • If match exists:
    • hazard_class(i) = hazard_class(m(i))
      unless overridden by star rule
  • If no match:
    • depends on unknown_code_policy

Star override (if enabled):

hazard_class(i)HAZARDOUSif code(i) ends with hazard\_class(i) \leftarrow \mathrm{HAZARDOUS} \quad \text{if } code(i) \text{ ends with } *

4.4 Regional variations

If regional_variations_policy = APPLY_IF_PRESENT and a region key matches the request (jurisdiction or regional_code), apply overrides from regional_variations.

The output MUST record whether a regional override was applied.

If mapping contains recommended_unit, set:

  • recommended_unit(i) = recommended_unit(m(i))

If unit_normalization = CONVERT_TO_RECOMMENDED, the engine MUST:

  • emit a conversion request (or call the conversion dependency in execution)
  • record the conversion provenance

5. Input/Output Examples

Example A — simple EWC code classification

{
"mapping_ref": "DATASET.WASTE.EWC_ALL_CATEGORIES.v1",
"jurisdiction": "EU",
"items": [
{ "period": 2025, "value": 40, "unit": "tonne", "ewc_code": "01 03 04*" },
{ "period": 2025, "value": 500, "unit": "tonne", "ewc_code": "17 01 07" }
]
}

Output (illustrative):

{
"items_classified": [
{
"period": 2025,
"value": 40,
"unit": "tonne",
"ewc_code": "01 03 04*",
"ewc_code_normalized": "010304*",
"ewc_item": "04",
"hazard_class": "HAZARDOUS",
"hazardous": true,
"recommended_unit": "tonnes",
"reuse_recyclable_flag": false,
"classification_confidence": 1.0,
"classification_provenance": {
"mapping_ref": "DATASET.WASTE.EWC_ALL_CATEGORIES.v1",
"matched_key": "ewc_code",
"matched_row_ewc_item": "04",
"rule_path": "dataset_match+asterisk_override"
}
},
{
"period": 2025,
"value": 500,
"unit": "tonne",
"ewc_code": "17 01 07",
"ewc_code_normalized": "170107",
"hazard_class": "NON_HAZARDOUS",
"hazardous": false,
"classification_confidence": 1.0,
"classification_provenance": {
"mapping_ref": "DATASET.WASTE.EWC_ALL_CATEGORIES.v1",
"matched_key": "ewc_code",
"rule_path": "dataset_match"
}
}
],
"summary": {
"total_items": 2,
"hazardous_items": 1,
"non_hazardous_items": 1,
"unknown_items": 0
}
}

6. Validation & Error Model

Invariants

  • items must be non-empty
  • each item must provide either ewc_code or ewc_item
  • mapping dataset must resolve and be readable
  • normalized codes must be syntactically valid (v1: simple regex)

Error codes (suggested)

  • WASTE_HAZ_MAP_NOT_FOUND
  • WASTE_HAZ_ITEM_MISSING_EWC_KEY
  • WASTE_HAZ_INVALID_EWC_FORMAT
  • WASTE_HAZ_UNKNOWN_CODE_ERROR
  • WASTE_HAZ_REGIONAL_OVERRIDE_INVALID

Errors MUST include:

  • engine cmi_short_code
  • item index + any manifest/invoice identifier + offending ewc code

7. Dependencies

MEID_TRANS_WASTE_HAZ_CLASSIFY depends on:

  • schema resolver (ZAR)
  • mapping dataset resolver (mapping_ref)
  • optional unit conversion capability (delegated; only if unit normalization enabled)

Declared via ZAR dependencies.


8. Federation & Audit Requirements

To reproduce hazard classification externally, the export MUST include:

  • engine identity (cmi or zar_code)
  • engine build proof (execution_ref + build_hash)
  • mapping dataset reference/version/hash (mapping_ref)
  • jurisdiction/regional options used
  • unknown handling policy
  • per-item provenance (matched row / rule path)

Provenance chain MUST show:

… → MEID_TRANS_WASTE_HAZ_CLASSIFY → MEID_CALC_WASTE_AGGR → …

with cmi_short_code recorded in USO tail arrays.


9. Performance Notes

  • Complexity: O(n)O(n) over items with hash-map lookup on ewc_code_normalized
  • Memory: O(n)O(n) output size
  • Dataset should be cached by mapping_ref in-memory per worker

10. Methods Served (v1)

  • Waste.item.classified (enrichment output)
  • (optional convenience) hazardous/non-hazardous subtotals if requested



GitHub RepoRequest for Change (RFC)