Extraction Engines

1. Overview

Extraction Engines (EXTR) are micro-engines responsible for converting unstructured or semi-structured content into structured data suitable for computation.

EXTR engines act at the boundary between raw content and structured pipelines.

2. Design Principles

Loss-Aware Extraction
Extraction uncertainty must be made explicit.
Source Traceability
Extracted values retain references to original content.
Non-Interpretive
EXTR engines extract data; they do not infer meaning.

3. Scope of Responsibility

3.1. What EXTR Engines Do

OCR of documents
NLP-based field extraction
Table and invoice parsing
Document structure recognition

4. What EXTR Engines Do Not Do

❌ Compute ESG metrics
❌ Validate business logic
❌ Aggregate results
❌ Apply policy or risk logic

5. Inputs

Unstructured documents or text
Extraction schemas
Optional confidence thresholds

6. Outputs

Structured extracted payloads
Confidence scores
Source references

7. Canonical Identification

Engine Type: EXTR
USO Code: EXTR
Category: Micro Engine (MICE)

Status: Stable
Owner: Computation Hub / MICE

GitHub Repo Request for Change (RFC)

1. Overview​

2. Design Principles​

3. Scope of Responsibility​

3.1. What EXTR Engines Do​

4. What EXTR Engines Do Not Do​

5. Inputs​

6. Outputs​

7. Canonical Identification​