Skip to main content
Jira progress: loading…

IoT-DI

IoT Device Inference

1. ZAYAZ IoT Device Inference – V1 Sketch

V1 Design Goal (non-negotiable)

Reduce IoT data-source onboarding friction by ≥50% while increasing provenance transparency.

Not “perfect classification”. Not “magic AI”. Faster, safer, auditable onboarding.

1. V1 Scope (what we deliberately include / exclude)

✅ Included in V1

  • Probabilistic device category inference (not model-level)
  • Entropy-based + metadata-based features only
  • Prefill of limited, low-risk fields
  • Explicit user confirmation / override
  • Full provenance & audit trail
  • Stateless inference engine (easy to scale)

❌ Explicitly excluded from V1

  • Deep packet inspection
  • Automatic compliance mapping
  • Autonomous learning loops
  • “Black box” ML
  • Any inference driving reporting without confirmation

This keeps V1 safe, fast, and credible.

2. V1 Target Outputs (what the system actually produces)

Primary Output

IoT Device Profile – Draft

Fields prefilled with confidence:

FieldPrefill?Notes
Device categorye.g. Sensor / Meter / Camera / Gateway
Sub-category⚠️ (Top-3)e.g. Temperature / Energy / Occupancy
Expected data cadenceperiodic / event-driven / bursty
Expected unit family⚠️energy / environmental / binary events
Data risk flaglow / medium / anomalous
AI confidence0–1
Evidence linkmandatory

Everything else remains manual in V1.

3. Core Component: MEID-IOT-V1 (Micro-Engine)

Purpose

Generate a provenance hypothesis for an unknown IoT data stream.

Inputs (minimal & realistic)

  • Flow metadata (NetFlow-like)
  • Timestamped packet sizes
  • Destination domains / IPs
  • Protocol/port hints (no payload parsing)
  • Optional MAC OUI (if available)

Feature Set (V1)

Entropy Features

  • Payload size entropy
  • Inter-arrival time entropy
  • Destination entropy
  • Session duration variance

Structural Signals

  • Periodicity score
  • Burstiness index
  • Endpoint stability score
  • Avg bytes / minute

Light Identity Hints

  • MAC OUI → vendor family (optional)
  • Domain pattern match (vendor clouds)

⚠️ All features are non-PII and privacy-safe.

4. Inference Logic (V1 = transparent, not fancy)

  • Rule-weighted Bayesian classifier
    • Human-readable priors
    • Easy to tune
    • Explainable

Example (simplified):

IF low time entropy
AND low size entropy
AND single stable endpoint
→ P(sensor) ↑↑

IF high size variance
AND burst traffic
AND high destination entropy
→ P(camera/gateway) ↑↑

Output

iot--inference-logicexample.json
{
"predictions": [
{"label": "Environmental Sensor", "p": 0.82},
{"label": "Energy Meter", "p": 0.11},
{"label": "Gateway", "p": 0.07}
],
"confidence": 0.82,
"model_version": "MEID-IOT-V1.0"
}

5. FOGE Integration: Prefill with Guardrails

UI Behaviour (critical)

  • Fields show “AI-suggested” badge
  • Confidence shown inline
  • “Why?” button reveals evidence summary
  • “Change” opens dropdown with Top-3 + manual search

Hard Rule

Inference may prefill forms, but never auto-lock fields.

This aligns with ZAAM trust principles.

6. Override = First-Class Data Asset

Every override creates:

Inference Event
→ User Override
→ Confirmed Device Profile

Stored with:

  • Old prediction
  • New label
  • Confidence delta
  • Timestamp
  • Tenant context

This becomes future training data, but:

  • Not auto-used
  • Only via governed retraining cycle

7. Data Model Additions (SSSR-aligned)

New entity (V1 minimal)

iot_source_inference

  • inference_id
  • source_id
  • predicted_labels[]
  • confidence
  • evidence_refs[]
  • model_version
  • created_at

Extend existing source entity

  • confirmed_device_category
  • confirmation_method = MANUAL | AI_ASSISTED
  • confirmation_timestamp

This is enough for audit and scale.

8. KPIs for V1 (decide success early)

The followingh should be measured from day one:

KPITarget
Avg onboarding time−50%
% AI-assisted registrations≥60%
Override rate20–40% (healthy!)
Post-confirmation correction rate<5%
Auditor objections0

⚠️ High override rate is good early — it means engagement and learning.

9. V1 Risks & How We Neutralize Them

| Risk Mitigation | | --- | --- | | “AI guessing” distrust | Confidence + evidence mandatory | | Wrong assumptions | No auto-use in reporting | | Overfitting early | Simple rules, no auto-learning | | Security concerns | Metadata-only inference | | Scope creep | Explicit V1 exclusions |

10. Why this V1 is strategically sound

  • Delivers immediate operational value
  • Strengthens ZAYAZ’s data provenance story
  • Creates labeled data for future intelligence
  • Does not threaten compliance credibility
  • Fits perfectly into MICE + FOGE + ZAAM

ZAYAZ IoT Device Category Ontology — V1

Design principles (important to state explicitly)

  1. Behavior-first, not vendor-first
  2. Category ≠ Metric ≠ Compliance use
  3. Probabilistic inference allowed; reporting use is not
  4. Every category must imply expectations
  • cadence
  • data shape
  • unit family
  • risk profile

Level 1: Device Class (Top-Level)

This is the only level inferred automatically in V1.

CodeDevice ClassDefinition
DC1SensorMeasures a physical or environmental variable
DC2MeterQuantifies consumption or flow over time
DC3ActuatorPerforms actions / control (often bidirectional)
DC4Imaging / AVProduces image, video, or audio streams
DC5Gateway / HubAggregates, relays, or transforms other devices’ data
DC6Mobility / AssetAssociated with moving assets (vehicles, containers)
DC7Controller / PLCIndustrial or building automation control logic
DC8Unknown / HybridCannot be reliably classified yet

⚠️ Rule: If confidence < threshold → default to DC8.


Level 2: Functional Category (V1-controlled list)

This is suggested (Top-3), never auto-selected.

DC1 — Sensor

CodeFunctional CategoryTypical Signals
S1EnvironmentalTemp, humidity, CO₂, air quality
S2Occupancy / PresenceMotion, people count, desk usage
S3Condition / StateVibration, tilt, open/close
S4SafetySmoke, gas, leak detection

DC2 — Meter

CodeFunctional CategoryTypical Signals
M1ElectricitykWh, voltage, current
M2Thermal / HeatHeat flow, temperature delta
M3WaterVolume, flow
M4GasVolume, pressure

DC3 — Actuator

CodeFunctional CategoryTypical Signals
A1HVAC ControlSetpoints, valve positions
A2Lighting ControlOn/off, dimming
A3Industrial ActuationMotors, relays

DC4 — Imaging / AV

CodeFunctional CategoryTypical Signals
I1Video CameraHigh bandwidth, bursty
I2Audio SensorModerate bandwidth
I3MultimodalAV + metadata

DC5 — Gateway / Hub

CodeFunctional CategoryTypical Signals
G1IoT GatewayMany inbound devices
G2Protocol BridgeBACnet ↔ IP, Modbus ↔ MQTT
G3Edge ComputePre-aggregation, filtering

DC6 — Mobility / Asset

CodeFunctional CategoryTypical Signals
T1Vehicle TelematicsGPS + speed + events
T2Asset TrackerPeriodic location
T3Mobile EquipmentForklifts, machinery

DC7 — Controller / PLC

CodeFunctional CategoryTypical Signals
C1BMS ControllerDeterministic cycles
C2PLC / SCADAIndustrial protocols
C3Safety ControllerHighly deterministic

DC8 — Unknown / Hybrid

CodeFunctional CategoryTypical Signals
U1UnknownInsufficient data
U2HybridMultiple behaviors

Level 3: Attributes (NOT inferred in V1)

These are derived or user-confirmed later, but the ontology anticipates them.

  • Measurement unit family (energy, temperature, events)
  • Control capability (read-only / write)
  • Safety criticality
  • Data sensitivity
  • ESRS relevance mapping
  • Verification requirements

This separation is intentional.


Inference → Ontology Mapping (V1 rules)

The inference engine only assigns:

Device Class (DCx)
+ Top-3 Functional Categories
+ Confidence score

Everything else is human-confirmed or rule-derived later.


Why this ontology works (strategically)

✔ Small

  • ~30 total functional categories
  • Easy to explain
  • Easy to maintain

✔ Expressive

  • Enough to infer:
  • expected cadence
  • expected entropy
  • expected unit family

✔ ESG-safe

  • No compliance claims
  • No metric assumptions
  • No automatic ESRS mapping

✔ Extensible

Future V2/V3 can add:

  • Industry-specific subclasses
  • Vendor/model layers
  • Carbon passport linkages
  • Product-level digital twins

Critical UX Rule (must be enforced)

Ontology terms must be visible to users.

No hidden magic labels. Users must see:

  • “Environmental Sensor”
  • “Energy Meter”
  • “Gateway”

This builds trust and audit defensibility.


ZAYAZ IoT Category → Behavior Matrix (V1)

How to read this matrix (important)

Each Functional Category defines:

  • Expected behavior envelope (ranges, not absolutes)
  • Typical entropy profile
  • Default risk posture
  • Validation heuristics (soft rules)

Violations do not mean “wrong” They mean “needs attention”


Legend

  • Cadence: typical reporting frequency
  • Volume: data size per day (order of magnitude)
  • Entropy (Time / Size / Destination) Low / Medium / High
  • Risk: data misuse or misclassification impact
  • Primary Use: what the data usually represents (not enforced)

DC1 — Sensors

S1 Environmental Sensor

AttributeExpected
Cadence30s – 15 min
VolumeVery low
Time entropyLow
Size entropyLow
Destination entropyLow
RiskLow
Primary useTemperature, air quality, comfort

Validation heuristics

  • Regular periodicity
  • Stable packet size
  • Single/few endpoints ⚠️ Flag if bursty or MB/hour scale

Validation heuristics

  • Regular periodicity
  • Stable packet size
  • Single/few endpoints ⚠️ Flag if bursty or MB/hour scale

S2 Occupancy / PresenceS2 Occupancy / Presence

AttributeExpected
CadenceEvent-driven + keepalive
VolumeLow
Time entropyMedium
Size entropyLow
Destination entropyLow
RiskMedium (privacy)
Primary useSpace utilization

Validation heuristics

  • Irregular events
  • Small payloads ⚠️ Flag if continuous streaming

S3 Condition / State

AttributeExpected
Cadence1–60 min
VolumeLow
Time entropyLow–Medium
Size entropyLow
Destination entropyLow
RiskLow
Primary useMaintenance, wear

S4 Safety Sensor

AttributeExpected
CadencePeriodic + rare events
VolumeLow
Time entropyMedium
Size entropyLow
Destination entropyLow
RiskHigh
Primary useAlerts, compliance

⚠️ Any data loss or silence = flag


DC2 — Meters

M1 Electricity Meter

AttributeExpected
Cadence1–15 min
VolumeLow
Time entropyVery low
Size entropyVery low
Destination entropyVery low
RiskHigh
Primary useScope 2, billing

Strong invariants

  • Highly regular cadence ⚠️ High entropy = misclassification or gateway

M2 Thermal / Heat Meter

AttributeExpected
Cadence1–15 min
VolumeLow
Time entropyVery low
Size entropyVery low
Destination entropyLow
RiskHigh
Primary useEnergy efficiency

M3 Water Meter

AttributeExpected
Cadence15 min – 1 h
VolumeVery low
Time entropyLow
Size entropyLow
Destination entropyLow
RiskMedium

M4 Gas Meter

AttributeExpected
Cadence15 min – 1 h
VolumeVery low
Time entropyLow
Size entropyLow
Destination entropyLow
RiskHigh

DC3 — Actuators

A1 HVAC Control

AttributeExpected
CadenceEvent-driven + polling
VolumeLow
Time entropyMedium
Size entropyLow
Destination entropyLow
RiskHigh
Primary useControl, not measurement

⚠️ Bidirectional traffic expected


A2 Lighting Control

AttributeExpected
CadenceEvent-driven
VolumeLow
Time entropyMedium
Size entropyLow
Destination entropyLow
RiskMedium

A3 Industrial Actuation

AttributeExpected
CadenceDeterministic cycles
VolumeMedium
Time entropyVery low
Size entropyLow
Destination entropyVery low
RiskVery High

DC4 — Imaging / AV

I1 Video Camera

AttributeExpected
CadenceContinuous or burst
VolumeVery high
Time entropyHigh
Size entropyHigh
Destination entropyMedium
RiskVery High
Primary useSecurity, monitoring

⚠️ If classified as “sensor” → hard contradiction


I2 Audio Sensor

AttributeExpected
CadenceEvent or continuous
VolumeMedium
Time entropyHigh
Size entropyMedium
Destination entropyMedium
RiskHigh

I3 Multimodal AV

AttributeExpected
CadenceBurst-heavy
VolumeVery high
Time entropyHigh
Size entropyHigh
Destination entropyMedium
RiskVery High

DC5 — Gateway / Hub

G1 IoT Gateway

AttributeExpected
CadenceContinuous
VolumeMedium–High
Time entropyHigh
Size entropyMedium
Destination entropyHigh
RiskMedium
Primary useAggregation

Key signal: many inbound → one outbound


G2 Protocol Bridge

AttributeExpected
CadenceDeterministic
VolumeMedium
Time entropyLow
Size entropyLow
Destination entropyLow
RiskMedium

G3 Edge Compute

AttributeExpected
CadenceIrregular
VolumeMedium
Time entropyMedium
Size entropyMedium
Destination entropyMedium
RiskMedium

DC6 — Mobility / Asset

T1 Vehicle Telematics

AttributeExpected
Cadence10s – 5 min
VolumeMedium
Time entropyMedium
Size entropyMedium
Destination entropyMedium
RiskHigh

T2 Asset Tracker

AttributeExpected
Cadence5 min – hours
VolumeLow
Time entropyMedium
Size entropyLow
Destination entropyMedium
RiskMedium

T3 Mobile Equipment

AttributeExpected
CadenceEvent-driven
VolumeMedium
Time entropyMedium
Size entropyMedium
Destination entropyMedium
RiskHigh

DC7 — Controller / PLC

C1 BMS Controller

AttributeExpected
CadenceDeterministic cycles
VolumeMedium
Time entropyVery low
Size entropyLow
Destination entropyVery low
RiskVery High

C2 PLC / SCADA

AttributeExpected
CadenceFixed cycle
VolumeMedium–High
Time entropyVery low
Size entropyLow
Destination entropyVery low
RiskVery High

C3 Safety Controller

AttributeExpected
CadenceFixed + watchdog
VolumeMedium
Time entropyVery low
Size entropyVery low
Destination entropyVery low
RiskExtreme

DC8 — Unknown / Hybrid

U1 Unknown

AttributeExpected
CadenceUndefined
VolumeUndefined
EntropyMixed
RiskUnknown

U2 Hybrid

AttributeExpected
CadenceMultiple modes
VolumeVariable
EntropyHigh variance
RiskMedium–High

How this matrix is used in V1 (very important)

  1. Inference
  • Compare observed behavior → matrix envelope
  • Assign probabilities
  1. Prefill
  • Suggest category + expected cadence + risk
  1. Validation
  • Detect contradictions (camera-like behavior labeled “meter”)
  1. Audit
  • Explain why something was flagged or suggested

What this enables next (without changing V1)

  • Automatic sanity checks
  • Early data poisoning detection
  • Per-category default validation rules
  • Future ESRS relevance suggestions (still manual)

Table set (what’s inside)

  1. iot_device_class — DC1..DC8 (top-level classes)
  2. iot_functional_category — S1..U2 (functional categories linked to device class)
  3. iot_behavior_profile — cadence/volume/entropy/risk expectations per functional category
  4. iot_validation_heuristic — a minimal V1 set of contradiction/anomaly rules (template-style)

What’s in the dictionary (V1)

One table with multiple band_types:

  • volume_band (very_low … very_high, plus variable/undefined) with approx bytes/day ranges
  • entropy_band (very_low … high, plus mixed/high-variance) with ordinals
  • risk_level (low … extreme, unknown) with ordinals
  • cadence_type (periodic, event-driven, deterministic, bursty, etc.) with ordinals + has_regular_period

Why ordinals matter

They let you score differences like:

  • expected entropy_time=very_low (10) but observed high (40) → delta 30 (strong contradiction)
  • expected volume_band=low (20) but observed medium (30) → delta 10 (soft mismatch)

MEID-IOT-V1 Scoring Spec (Tiny)

0. Inputs

Observed signals (per source, over an observation window W)

Minimum viable set:

  • bytes_per_day
  • inter_arrival_cv (coefficient of variation for inter-arrival times)
  • periodicity_score (0..1)
  • burstiness_index (0..1)
  • destination_count and/or destination_entropy_shannon
  • packet_size_entropy_shannon

Expected profile (from iot_behavior_profile)

  • cadence_type
  • volume_band
  • entropy_time
  • entropy_size
  • entropy_destination
  • risk_level (used for triage, not classification)

Band dictionary (from behavior_band_dictionary)

  • ordinals for each band type
  • volume band byte ranges

1.0 Normalize observed signals into observed bands

1.1 Volume band mapping

Use bytes_per_day against behavior_band_dictionary ranges:

  • Find the volume_band whose [min,max) contains bytes_per_day.
  • If none matches → volume_band = variable (or undefined if missing).

1.2 Time entropy band mapping (cheap + robust)

Use periodicity + CV as proxy:

  • If periodicity_score >= 0.85 and inter_arrival_cv <= 0.25 → very_low
  • Else if periodicity_score >= 0.70 and inter_arrival_cv <= 0.50 → low
  • Else if periodicity_score >= 0.50 → medium
  • Else if burstiness_index >= 0.70 → high
  • Else → medium-high

(If you also compute Shannon entropy of inter-arrival bins, you can map via quantiles later; V1 can stay proxy-based.)

1.3 Size entropy band mapping

If you have Shannon entropy of packet sizes (H_size, normalized 0..1):

  • H_size < 0.15 → very_low
  • 0.15–0.30 → low
  • 0.30–0.45 → medium
  • 0.45–0.60 → medium-high
  • >0.60 → high If missing, infer from packet_size_std/mean similarly.

1.4 Destination entropy band mapping

Pick one:

  • Count-based: map destination_count into bands (1→very_low, 2–3→low, 4–8→medium, 9–20→high, >20→high-variance)
  • Entropy-based: map normalized Shannon destination entropy (0..1) using the same cutoffs as size entropy.

2 Distance scoring against each candidate category

For each functional category c with expected profile E(c):

2.1 Ordinal deltas

Let ord(type, code) return ordinal from dictionary.

Compute:

  • ΔV = abs(ord(volume_band_obs) - ord(volume_band_exp))
  • ΔT = abs(ord(entropy_time_obs) - ord(entropy_time_exp))
  • ΔS = abs(ord(entropy_size_obs) - ord(entropy_size_exp))
  • ΔD = abs(ord(entropy_destination_obs) - ord(entropy_destination_exp))

2.2 Weighted distance (V1 defaults)

Weights (tuned for “entropy-first” classification):

  • wV=0.30, wT=0.30, wS=0.20, wD=0.20

Raw distance:

dist_raw(c) = wV*ΔV + wT*ΔT + wS*ΔS + wD*ΔD

2.3 Normalize to similarity score

Convert distance to similarity in 0..1:

sim(c) = exp( - dist_raw(c) / τ )

Where τ is a temperature; V1 default τ = 12.

(Why this form: small deltas barely hurt; big contradictions collapse similarity quickly.)

3 Produce probabilities (Top-K)

For all candidates:

p(c) = sim(c) / Σ sim(all)

Return:

  • top_3 candidates by p(c)
  • confidence = p(top_1) (simple and interpretable)

V1 usage:

  • If confidence < 0.55 → classify as DC8/U1 Unknown (prefill minimal fields only)
  • Else → prefill device class + suggest top-3 functional categories

4 Contradiction severity scoring (separate from classification)

This is what powers flags and trust posture.

4.1 Severity score

Define:

sev(c*) = max(
ΔV / 30,
ΔT / 30,
ΔS / 30,
ΔD / 30
)

Where 30 is a rough “big delta” scale (based on your ordinals spacing).

4.2 Severity thresholds

  • sev < 0.25 → OK
  • 0.25–0.45 → WARN
  • 0.45–0.70 → HIGH
  • >0.70 → CRITICAL

4.3 Hard contradiction rules (V1)

Independent of ordinals, add fast “if-then” checks:

  • If volume_band_obs ∈ {high, very_high} and expected is {very_low, low} → at least HIGH
  • If expected is any meter (M1–M4) and entropy_destination_obs ∈ {medium, high, high-variance} → at least HIGH
  • If expected is camera (I1/I3) but volume is very_low/low → at least WARN (maybe metadata-only camera)

These map cleanly to the iot_validation_heuristic table later.

5 Prefill guardrails (V1 policy)

  • Always show: Top-3 + confidence + “Why” (evidence = band comparisons + raw observed stats)
  • Never auto-lock.
  • Never allow unconfirmed labels to drive compliance outputs.

6 Minimal “Why” explanation payload (for audit + UX)

Return per candidate c:

return-per-candidate.json
{
"category": "M1",
"p": 0.82,
"distance": 6.0,
"deltas": {"ΔV": 0, "ΔT": 5, "ΔS": 0, "ΔD": 0},
"observed_bands": {"volume":"low","time":"very_low","size":"very_low","dest":"very_low"},
"expected_bands": {"volume":"low","time":"very_low","size":"very_low","dest":"very_low"},
"flags": [{"severity":"WARN","reason":"..." }]
}

This is enough to be transparent without exposing payload contents.


V1 Default Params (so teams don’t debate forever)

  • Observation window W = 24h (fallback 6h)
  • τ = 12
  • weights: wV=0.30, wT=0.30, wS=0.20, wD=0.20
  • confidence threshold: 0.55 for “usable prefill”

As a single config object

To be consumed by MEID-IOT-V1 directly.

meid_iot_v1_scoring_config.json
{
"config_name": "MEID_IOT_V1_SCORING",
"version": "1.0.0",
"description": "Scoring and contradiction spec for IoT device inference using behavior bands.",
"observation_window": {
"primary_hours": 24,
"fallback_hours": 6
},
  "band_mapping": {
"volume_band": {
"source_metric": "bytes_per_day",
"fallback": "variable"
},
"entropy_time": {
"method": "proxy",
"rules": [
{
"if": "periodicity_score >= 0.85 and inter_arrival_cv <= 0.25",
"band": "very_low"
},
{
"if": "periodicity_score >= 0.70 and inter_arrival_cv <= 0.50",
"band": "low"
},
{
"if": "periodicity_score >= 0.50",
"band": "medium"
},
{
"if": "burstiness_index >= 0.70",
"band": "high"
},
{
"else": "medium-high"
}
]
},
"entropy_size": {
"method": "shannon_normalized",
"thresholds": [
{
"max": 0.15,
"band": "very_low"
},
{
"max": 0.3,
"band": "low"
},
{
"max": 0.45,
"band": "medium"
},
{
"max": 0.6,
"band": "medium-high"
},
{
"else": "high"
}
]
},
"entropy_destination": {
"method": "shannon_or_count",
"thresholds": [
{
"max": 0.15,
"band": "very_low"
},
{
"max": 0.3,
"band": "low"
},
{
"max": 0.5,
"band": "medium"
},
{
"max": 0.7,
"band": "high"
},
{
"else": "high-variance"
}
]
}
},
"distance_scoring": {
"weights": {
"volume": 0.3,
"entropy_time": 0.3,
"entropy_size": 0.2,
"entropy_destination": 0.2
},
"temperature_tau": 12,
"similarity_function": "exp(-distance/tau)"
},
"classification_policy": {
"top_k": 3,
"confidence_threshold": 0.55,
"fallback_category": "U1",
"fallback_device_class": "DC8"
},
"severity_scoring": {
"delta_scale": 30,
"thresholds": {
"ok": 0.25,
"warn": 0.45,
"high": 0.7,
"critical": 1.0
}
},
"hard_contradiction_rules": [
{
"id": "HC1",
"if": "observed.volume_band in ['high','very_high'] and expected.volume_band in ['very_low','low']",
"min_severity": "high",
"message": "Observed volume contradicts expected low-volume behavior."
},
{
"id": "HC2",
"if": "expected.category in ['M1','M2','M3','M4'] and observed.entropy_destination in ['medium','high','high-variance']",
"min_severity": "high",
"message": "Meter expected deterministic single-endpoint behavior."
},
{
"id": "HC3",
"if": "expected.category in ['I1','I3'] and observed.volume_band in ['very_low','low']",
"min_severity": "warn",
"message": "Imaging device shows unusually low data volume."
}
],
"ui_policy": {
"show_top_k": true,
"show_confidence": true,
"show_evidence": true,
"lock_fields": false
},
"governance": {
"auto_learning": false,
"override_logging": true,
"versioned": true,
"requires_confirmation_for_reporting": true
}
}

What this config gives us (why it’s strong)

This file is intentionally operational, not academic. It centralizes all tunables so we can:

  • Adjust behavior per tenant / sector without code changes
  • Keep inference deterministic and explainable
  • Version and audit scoring logic like any other governed artifact

Key sections (how they’re used)

  1. observation_window
  • Standardizes inference windows (24h / 6h fallback)
  • Prevents “short window noise” from polluting classification
  1. band_mapping
  • Canonical logic for mapping raw signals → bands
  • Keeps entropy logic out of code and in policy
  • Easy to evolve (e.g. replace proxies with Shannon later)
  1. distance_scoring
  • Weighted ordinal distance → similarity
  • Temperature (tau) gives you graceful decay instead of hard cutoffs
  • This is the heart of probabilistic inference
  1. classification_policy
  • Enforces conservative behavior:
  • Top-3 only
  • Explicit confidence threshold
  • Safe fallback to DC8 / U1
  1. severity_scoring
  • Independent contradiction severity scale
  • Decouples “best guess” from “trustworthiness”
  1. hard_contradiction_rules
  • Fast, deterministic safety rails
  • These are auditor-friendly because they are explicit and readable
  1. ui_policy
  • Guarantees no dark patterns
  • Aligns with ZAYAZ trust-first UX
  1. governance
  • Explicitly disables auto-learning
  • Forces human confirmation before reporting
  • Makes this safe for CSRD/ESRS environments

Architectural note (important)

This config should be treated as:

Reference data + policy, not application config

Meaning:

  • Version it
  • Sign it (later)
  • Reference the version in every inference result:
"scoring_config_version": "MEID_IOT_V1_SCORING@1.0.0"

That gives us full forensic traceability.


MEID-IOT-V1 Explain Payload Schema

1. Envelope (what every inference event returns)

payload-schema-example.json
{
"inference_id": "uuid",
"source_id": "uuid-or-stable-id",
"observed_window": {
"start_at": "2026-01-08T00:00:00Z",
"end_at": "2026-01-09T00:00:00Z",
"duration_s": 86400
},
"model": {
"engine": "MEID_IOT_V1",
"engine_version": "1.0.0",
"scoring_config": {
"name": "MEID_IOT_V1_SCORING",
"version": "1.0.0"
}
},
"status": {
"result_quality": "OK|LOW_DATA|DEGRADED|ERROR",
"confidence": 0.82,
"fallback_applied": false
},
"prediction": {
"device_class": { "code": "DC2", "label": "Meter", "p": 0.90 },
"top_k_functional_categories": [
{ "code": "M1", "label": "Electricity", "p": 0.82, "score": 0.71, "distance": 6.0 },
{ "code": "M2", "label": "Thermal / Heat", "p": 0.11, "score": 0.22, "distance": 18.0 },
{ "code": "G2", "label": "Protocol Bridge", "p": 0.07, "score": 0.14, "distance": 22.0 }
]
},
"explain": {
"observed_features": { },
"observed_bands": { },
"expected_bands_by_candidate": { },
"deltas_by_candidate": { },
"flags": [ ]
},
"provenance": {
"data_sources": [
{ "type": "NETFLOW", "ref": "flow_log_id_or_bucket_key" }
],
"evidence_refs": [
{ "type": "FEATURE_SNAPSHOT", "ref": "hash-or-object-key", "hash": "sha256:..." }
]
}
}

Why this envelope works

  • Prediction is separated from explainability, so UI can render quickly.
  • You can store the entire object for audit, but also index just the prediction fields.
  • score and distance are included for debugging + model tuning (optional to show in UI).

2. explain.observed_features (raw, privacy-safe stats)

Keep this minimal, numeric, and non-PII:

observed-feature-example.json
{
"bytes_per_day": 420000,
"avg_bytes_per_min": 291,
"packet_size_entropy_norm": 0.12,
"destination_entropy_norm": 0.05,
"periodicity_score": 0.92,
"inter_arrival_cv": 0.18,
"burstiness_index": 0.08,
"destination_count": 1
}

Optional extensions (still safe):

  • tls_seen: true/false
  • ports_top: [443, 8883]
  • sni_domains_top: hashed/normalized (avoid raw domain unless allowed)

3. explain.observed_bands (derived expectations format)

observed-bands-example.json
{
"volume_band": "low",
"entropy_time": "very_low",
"entropy_size": "very_low",
"entropy_destination": "very_low",
"cadence_type_hint": "periodic"
}

These are exactly the values the matrix expects and the scoring uses.


4. Expected bands per candidate (Top-K only)

Only include Top-K candidates to keep payload small:

expected-bands-by-candidate.json
{
"M1": {
"volume_band": "low",
"entropy_time": "very_low",
"entropy_size": "very_low",
"entropy_destination": "very_low",
"cadence_type": "periodic"
},
"M2": {
"volume_band": "low",
"entropy_time": "very_low",
"entropy_size": "very_low",
"entropy_destination": "low",
"cadence_type": "periodic"
}
}

5. Deltas (this is the “why” in one line)

deltas-by-candidate.json
{
"M1": { "ΔV": 0, "ΔT": 0, "ΔS": 0, "ΔD": 0 },
"M2": { "ΔV": 0, "ΔT": 0, "ΔS": 0, "ΔD": 10 }
}

This enables a clean UI:

  • “Matches expected cadence and low entropy”
  • “Slight mismatch in destination behavior”

6. Flags (contradictions & risk signals)

This is the trust layer, not the classifier:

flags.json
[
{
"id": "HC2",
"severity": "HIGH",
"scope": "candidate:M1",
"message": "Meter expected deterministic single-endpoint behavior.",
"evidence": {
"observed": { "entropy_destination": "high" },
"expected": { "entropy_destination": "very_low" }
}
}
]

Also support general flags:

  • scope: "source" (applies regardless of category)
  • scope: "candidate:<code>"

V1 Storage & Indexing recommendation (important)

Store two representations:

A. Full JSON (append-only)

  • For audit, replay, verifier APIs
  • Content-addressable evidence references

B. Indexed columns (for fast queries)

  • source_id, inference_id, created_at
  • predicted_device_class, top1_category, confidence
  • max_severity, flag_count, fallback_applied
  • scoring_config_version, engine_version

Minimal JSON Schema (practical constraints)

If strict validation is wanted, enforce:

  • required: inference_id, source_id, observed_window, model, status, prediction, explain
  • limit: Top-K must match config
  • enums for severities and band values from behavior_band_dictionary

UI rendering rules (tiny but powerful)

For each inferred field:

  • show label + (AI, 82%)
  • “Why?” expands:
    • observed bands
    • expected bands for top1
    • 1–3 deltas
    • flags (if any)

This makes the assistant trust-mediating, not “guessing”.




GitHub RepoRequest for Change (RFC)