Skip to main content
Jira progress: loading…

EXTR

Extraction Engines

1. Overview

Extraction Engines (EXTR) are micro-engines responsible for converting unstructured or semi-structured content into structured data suitable for computation.

EXTR engines act at the boundary between raw content and structured pipelines.

2. Design Principles

  1. Loss-Aware Extraction
    Extraction uncertainty must be made explicit.

  2. Source Traceability
    Extracted values retain references to original content.

  3. Non-Interpretive
    EXTR engines extract data; they do not infer meaning.

3. Scope of Responsibility

3.1. What EXTR Engines Do

  • OCR of documents
  • NLP-based field extraction
  • Table and invoice parsing
  • Document structure recognition

4. What EXTR Engines Do Not Do

  • ❌ Compute ESG metrics
  • ❌ Validate business logic
  • ❌ Aggregate results
  • ❌ Apply policy or risk logic

5. Inputs

  • Unstructured documents or text
  • Extraction schemas
  • Optional confidence thresholds

6. Outputs

  • Structured extracted payloads
  • Confidence scores
  • Source references

7. Canonical Identification

  • Engine Type: EXTR
  • USO Code: EXTR
  • Category: Micro Engine (MICE)

Status: Stable
Owner: Computation Hub / MICE



GitHub RepoRequest for Change (RFC)