Skip to main content
Jira progress: loading…

WST-CLASS

Waste Treatment Classification Micro Engine

1. Identity

Loading identity…

Depends on module:

Purpose
Normalizes contractor/manifest treatment codes (often inconsistent across regions and vendors) into a canonical treatment taxonomy:

  • recycled
  • recovered
  • disposed
  • unknown

This engine makes MEID_CALC_WASTE_DIV audit-ready by ensuring:

  • treatment definitions are explicit and versioned,
  • vendor codes are mapped consistently,
  • and each line item carries provenance (which mapping row/rule was used).

Typical usage

  • Ingest waste contractor statements, manifests, tickets
  • Enrich line items with canonical treatment categories
  • Feed treatment-tagged flows into MEID_CALC_WASTE_DIV

2. Contract References (ZAR)

2.1 Input Schema

ZAR Address: schema.compute.waste.treatment_classify.inputs.v1_0_0

Required conceptual fields:

  • items: list of waste treatment line items to classify
  • mapping_ref: ZAR reference to treatment mapping dataset (defaults allowed)
  • jurisdiction: optional (e.g., EU, NO, UK) for regional variations
  • alignment: BY_YEAR | BY_INDEX (default BY_YEAR)

Each item conceptually includes:

  • period
  • value
  • unit (e.g. kg, tonne)
  • one or more treatment identifiers:
    • contractor_treatment_code (preferred)
    • manifest_treatment_code (e.g., R/D codes, EWC treatment, local codes)
    • treatment_text (free-text fallback)
  • optional:
    • contractor_id
    • facility_id (treatment facility)
    • manifest_id / ticket_id
    • site_id
    • ewc_code / ewc_item (if available; helps routing rules)
    • hazard_class (if already classified upstream)

2.2 Options Schema

ZAR Address: schema.compute.waste.treatment_classify.options.v1_0_0

Common options:

  • match_mode: CODE_ONLY | TEXT_ONLY | AUTO (default AUTO)
  • normalize_codes: boolean (default true)
  • use_eu_r_d_rules: boolean (default true)
    (if an item has an EU-style R/D code, classify by deterministic rules)
  • unknown_code_policy: ERROR | FLAG | ASSIGN_UNKNOWN | ASSIGN_DISPOSED (default FLAG)
  • regional_variations_policy: APPLY_IF_PRESENT | IGNORE (default APPLY_IF_PRESENT)
  • unit_normalization: NO_CONVERT | CONVERT_TO_RECOMMENDED (default NO_CONVERT)
  • rounding: optional digits

Unit conversion is delegated; this engine only requests conversion if enabled.


2.3 Output Schema

ZAR Address: schema.compute.waste.treatment_classify.output.v1_0_0

Outputs include:

  • items_classified: same items enriched with canonical treatment fields
  • summary: totals and counts per treatment outcome
  • metadata: mapping version/hash, unknown handling stats, jurisdiction used

Enriched fields per item:

  • treatment_canonical: recycled | recovered | disposed | unknown
  • treatment_family: DIVERTED | NOT_DIVERTED | UNKNOWN (derived)
  • treatment_code_normalized
  • recommended_unit (if provided by mapping)
  • classification_confidence
  • classification_provenance (mapping_ref + row id + rule path)

3. Canonical Treatment Definitions (Normative)

v1 canonical treatments:

  • recycled: material recycling, reprocessing, composting (where treated as recycling by policy)
  • recovered: energy recovery (e.g., incineration with energy recovery), other recovery operations
  • disposed: landfill, incineration without recovery, permanent storage, deep well injection, etc.
  • unknown: treatment not known or not classifiable under policies

Derived treatment family:

  • DIVERTED = recycled or recovered
  • NOT_DIVERTED = disposed

4. Mapping Dataset Contract (treatment_map)

The treatment mapping dataset (ZAR referenced) SHOULD include:

  • provider / contractor_id (optional; if provider-specific codes)
  • treatment_code (string)
  • treatment_code_normalized (optional precomputed)
  • treatment_text_match (optional regex/keywords)
  • treatment_canonical (recycled|recovered|disposed|unknown)
  • confidence_default (optional)
  • regional_variations (optional json/text)
  • recommended_unit (optional)
  • notes (optional)

This dataset MUST be:

  • versioned and immutable once released
  • referenced via mapping_ref and recorded in output metadata

5. Classification Semantics (Normative)

Let an input item be ii.

5.1 Code normalization

If normalize_codes = true, normalize any code input:

  • trim spaces
  • uppercase
  • remove common punctuation
  • canonicalize common forms (implementation-specific but deterministic)

Call the normalized treatment code code(i)code(i).

5.2 Matching strategy

If match_mode:

  • CODE_ONLY: use code-based matching only
  • TEXT_ONLY: use text-based matching only
  • AUTO:
    1. if use_eu_r_d_rules and an R/D code is detected → apply deterministic rule (below)
    2. else attempt mapping dataset code match
    3. else attempt mapping dataset text/keyword match
    4. else unknown policy

5.3 EU R/D deterministic rules (if enabled)

If item includes an EU-style operation code:

  • R1–R13recovered by default
    (with an exception list: some Rs can be considered recycling if policy defines it)
  • D1–D15disposed

v1 conservative defaults:

  • R1recovered (energy recovery)
  • R2–R9recycled (material recovery operations)
  • R10recycled (land treatment beneficial to agriculture/ecology)
  • R11recycled (use of wastes obtained from R1–R10)
  • R12–R13recovered (exchange/storage pending recovery)

This split can be policy-controlled later; v1 should record which R-code mapping table was used in metadata.

5.4 Dataset mapping

If a mapping row m(i)m(i) is found:

  • assign treatment_canonical(i) = treatment_canonical(m(i))
  • copy recommended unit if present

5.5 Unknown handling

If no match exists, apply unknown_code_policy:

  • ERROR: fail
  • FLAG (default): assign unknown but flag the item
  • ASSIGN_UNKNOWN: assign unknown
  • ASSIGN_DISPOSED: assign disposed (conservative)

6. Outputs & Totals (Convenience)

This engine is primarily an enrichment transformer, but v1 may optionally return per-period totals for convenience:

For each period tt:

Wk(t)=iIk(t)v(i)W_{k}(t) = \sum_{i \in I_k(t)} v(i)

Where k{recycled,recovered,disposed,unknown}k \in \{\mathrm{recycled}, \mathrm{recovered}, \mathrm{disposed}, \mathrm{unknown}\}.

These totals are directly consumable by MEID_CALC_WASTE_DIV.


7. Examples

Example A — Contractor code mapping

{
"mapping_ref": "DATASET.WASTE.TREATMENT_MAP.v1",
"jurisdiction": "EU",
"items": [
{ "period": 2025, "value": 600, "unit": "tonne", "contractor_treatment_code": "REC-MAT" },
{ "period": 2025, "value": 150, "unit": "tonne", "contractor_treatment_code": "R1" },
{ "period": 2025, "value": 400, "unit": "tonne", "contractor_treatment_code": "D1" }
]
}

Example B — Free-text fallback

{
"mapping_ref": "DATASET.WASTE.TREATMENT_MAP.v1",
"items": [
{ "period": 2025, "value": 10, "unit": "tonne", "treatment_text": "sent to landfill" },
{ "period": 2025, "value": 5, "unit": "tonne", "treatment_text": "incineration with energy recovery" }
],
"match_mode": "AUTO"
}

8. Validation & Error Model

Invariants

  • items must be non-empty
  • each item must provide at least one treatment identifier (code or text)
  • mapping dataset must resolve if code/text matching is required
  • values must be finite

Error codes (suggested)

  • WASTE_TREAT_MAP_NOT_FOUND
  • WASTE_TREAT_ITEM_MISSING_TREATMENT_KEY
  • WASTE_TREAT_UNKNOWN_CODE_ERROR
  • WASTE_TREAT_INVALID_CODE_FORMAT
  • WASTE_TREAT_NON_FINITE_VALUE

Errors MUST include:

  • engine cmi_short_code
  • item index + manifest/ticket id + offending code/text sample

9. Dependencies

MEID_TRANS_WASTE_TREATMENT_CLASSIFY depends on:

  • schema resolver (ZAR)
  • mapping dataset resolver (mapping_ref)
  • optional unit conversion capability (delegated)

Declared via ZAR dependencies.


10. Federation & Audit Requirements

To reproduce treatment classification externally, the export MUST include:

  • engine identity (cmi or zar_code)
  • engine build proof (execution_ref + build_hash)
  • mapping dataset reference/version/hash (mapping_ref)
  • jurisdiction/options used (R/D rules, unknown handling)
  • per-item provenance (matched row / rule path)

Provenance chain MUST show:

… → MEID_TRANS_WASTE_TREATMENT_CLASSIFY → MEID_CALC_WASTE_DIV → …

with cmi_short_code recorded in USO tail arrays.


11. Performance Notes

  • Complexity: O(n)O(n) over items (hash-map lookup for code matches)
  • Memory: O(n)O(n) output size
  • Mapping datasets should be cached by mapping_ref per worker

12. Methods Served (v1)

  • Waste.item.treatment_classified
  • Waste.treatment.recycled.abs
  • Waste.treatment.recovered.abs
  • Waste.treatment.disposed.abs
  • Waste.treatment.unknown.abs



GitHub RepoRequest for Change (RFC)