Jira progress: loading…
EXTR
Extraction Engines
1. Overview
Extraction Engines (EXTR) are micro-engines responsible for converting unstructured or semi-structured content into structured data suitable for computation.
EXTR engines act at the boundary between raw content and structured pipelines.
2. Design Principles
-
Loss-Aware Extraction
Extraction uncertainty must be made explicit. -
Source Traceability
Extracted values retain references to original content. -
Non-Interpretive
EXTR engines extract data; they do not infer meaning.
3. Scope of Responsibility
3.1. What EXTR Engines Do
- OCR of documents
- NLP-based field extraction
- Table and invoice parsing
- Document structure recognition
4. What EXTR Engines Do Not Do
- ❌ Compute ESG metrics
- ❌ Validate business logic
- ❌ Aggregate results
- ❌ Apply policy or risk logic
5. Inputs
- Unstructured documents or text
- Extraction schemas
- Optional confidence thresholds
6. Outputs
- Structured extracted payloads
- Confidence scores
- Source references
7. Canonical Identification
- Engine Type:
EXTR - USO Code:
EXTR - Category: Micro Engine (MICE)
Status: Stable
Owner: Computation Hub / MICE