IoT-DI
IoT Device Inference
1. ZAYAZ IoT Device Inference – V1 Sketch
V1 Design Goal (non-negotiable)
Reduce IoT data-source onboarding friction by ≥50% while increasing provenance transparency.
Not “perfect classification”. Not “magic AI”. Faster, safer, auditable onboarding.
1. V1 Scope (what we deliberately include / exclude)
✅ Included in V1
- Probabilistic device category inference (not model-level)
- Entropy-based + metadata-based features only
- Prefill of limited, low-risk fields
- Explicit user confirmation / override
- Full provenance & audit trail
- Stateless inference engine (easy to scale)
❌ Explicitly excluded from V1
- Deep packet inspection
- Automatic compliance mapping
- Autonomous learning loops
- “Black box” ML
- Any inference driving reporting without confirmation
This keeps V1 safe, fast, and credible.
2. V1 Target Outputs (what the system actually produces)
Primary Output
IoT Device Profile – Draft
Fields prefilled with confidence:
| Field | Prefill? | Notes |
|---|---|---|
| Device category | ✅ | e.g. Sensor / Meter / Camera / Gateway |
| Sub-category | ⚠️ (Top-3) | e.g. Temperature / Energy / Occupancy |
| Expected data cadence | ✅ | periodic / event-driven / bursty |
| Expected unit family | ⚠️ | energy / environmental / binary events |
| Data risk flag | ✅ | low / medium / anomalous |
| AI confidence | ✅ | 0–1 |
| Evidence link | ✅ | mandatory |
Everything else remains manual in V1.
3. Core Component: MEID-IOT-V1 (Micro-Engine)
Purpose
Generate a provenance hypothesis for an unknown IoT data stream.
Inputs (minimal & realistic)
- Flow metadata (NetFlow-like)
- Timestamped packet sizes
- Destination domains / IPs
- Protocol/port hints (no payload parsing)
- Optional MAC OUI (if available)
Feature Set (V1)
Entropy Features
- Payload size entropy
- Inter-arrival time entropy
- Destination entropy
- Session duration variance
Structural Signals
- Periodicity score
- Burstiness index
- Endpoint stability score
- Avg bytes / minute
Light Identity Hints
- MAC OUI → vendor family (optional)
- Domain pattern match (vendor clouds)
⚠️ All features are non-PII and privacy-safe.
4. Inference Logic (V1 = transparent, not fancy)
Model choice (recommended)
- Rule-weighted Bayesian classifier
- Human-readable priors
- Easy to tune
- Explainable
Example (simplified):
IF low time entropy
AND low size entropy
AND single stable endpoint
→ P(sensor) ↑↑
IF high size variance
AND burst traffic
AND high destination entropy
→ P(camera/gateway) ↑↑
Output
{
"predictions": [
{"label": "Environmental Sensor", "p": 0.82},
{"label": "Energy Meter", "p": 0.11},
{"label": "Gateway", "p": 0.07}
],
"confidence": 0.82,
"model_version": "MEID-IOT-V1.0"
}
5. FOGE Integration: Prefill with Guardrails
UI Behaviour (critical)
- Fields show “AI-suggested” badge
- Confidence shown inline
- “Why?” button reveals evidence summary
- “Change” opens dropdown with Top-3 + manual search
Hard Rule
Inference may prefill forms, but never auto-lock fields.
This aligns with ZAAM trust principles.
6. Override = First-Class Data Asset
Every override creates:
Inference Event
→ User Override
→ Confirmed Device Profile
Stored with:
- Old prediction
- New label
- Confidence delta
- Timestamp
- Tenant context
This becomes future training data, but:
- Not auto-used
- Only via governed retraining cycle
7. Data Model Additions (SSSR-aligned)
New entity (V1 minimal)
iot_source_inference
- inference_id
- source_id
- predicted_labels[]
- confidence
- evidence_refs[]
- model_version
- created_at
Extend existing source entity
- confirmed_device_category
- confirmation_method = MANUAL | AI_ASSISTED
- confirmation_timestamp
This is enough for audit and scale.
8. KPIs for V1 (decide success early)
The followingh should be measured from day one:
| KPI | Target |
|---|---|
| Avg onboarding time | −50% |
| % AI-assisted registrations | ≥60% |
| Override rate | 20–40% (healthy!) |
| Post-confirmation correction rate | <5% |
| Auditor objections | 0 |
⚠️ High override rate is good early — it means engagement and learning.
9. V1 Risks & How We Neutralize Them
| Risk Mitigation | | --- | --- | | “AI guessing” distrust | Confidence + evidence mandatory | | Wrong assumptions | No auto-use in reporting | | Overfitting early | Simple rules, no auto-learning | | Security concerns | Metadata-only inference | | Scope creep | Explicit V1 exclusions |
10. Why this V1 is strategically sound
- Delivers immediate operational value
- Strengthens ZAYAZ’s data provenance story
- Creates labeled data for future intelligence
- Does not threaten compliance credibility
- Fits perfectly into MICE + FOGE + ZAAM
ZAYAZ IoT Device Category Ontology — V1
Design principles (important to state explicitly)
- Behavior-first, not vendor-first
- Category ≠ Metric ≠ Compliance use
- Probabilistic inference allowed; reporting use is not
- Every category must imply expectations
- cadence
- data shape
- unit family
- risk profile
Level 1: Device Class (Top-Level)
This is the only level inferred automatically in V1.
| Code | Device Class | Definition |
|---|---|---|
| DC1 | Sensor | Measures a physical or environmental variable |
| DC2 | Meter | Quantifies consumption or flow over time |
| DC3 | Actuator | Performs actions / control (often bidirectional) |
| DC4 | Imaging / AV | Produces image, video, or audio streams |
| DC5 | Gateway / Hub | Aggregates, relays, or transforms other devices’ data |
| DC6 | Mobility / Asset | Associated with moving assets (vehicles, containers) |
| DC7 | Controller / PLC | Industrial or building automation control logic |
| DC8 | Unknown / Hybrid | Cannot be reliably classified yet |
⚠️ Rule: If confidence < threshold → default to DC8.
Level 2: Functional Category (V1-controlled list)
This is suggested (Top-3), never auto-selected.
DC1 — Sensor
| Code | Functional Category | Typical Signals |
|---|---|---|
| S1 | Environmental | Temp, humidity, CO₂, air quality |
| S2 | Occupancy / Presence | Motion, people count, desk usage |
| S3 | Condition / State | Vibration, tilt, open/close |
| S4 | Safety | Smoke, gas, leak detection |
DC2 — Meter
| Code | Functional Category | Typical Signals |
|---|---|---|
| M1 | Electricity | kWh, voltage, current |
| M2 | Thermal / Heat | Heat flow, temperature delta |
| M3 | Water | Volume, flow |
| M4 | Gas | Volume, pressure |
DC3 — Actuator
| Code | Functional Category | Typical Signals |
|---|---|---|
| A1 | HVAC Control | Setpoints, valve positions |
| A2 | Lighting Control | On/off, dimming |
| A3 | Industrial Actuation | Motors, relays |
DC4 — Imaging / AV
| Code | Functional Category | Typical Signals |
|---|---|---|
| I1 | Video Camera | High bandwidth, bursty |
| I2 | Audio Sensor | Moderate bandwidth |
| I3 | Multimodal | AV + metadata |
DC5 — Gateway / Hub
| Code | Functional Category | Typical Signals |
|---|---|---|
| G1 | IoT Gateway | Many inbound devices |
| G2 | Protocol Bridge | BACnet ↔ IP, Modbus ↔ MQTT |
| G3 | Edge Compute | Pre-aggregation, filtering |
DC6 — Mobility / Asset
| Code | Functional Category | Typical Signals |
|---|---|---|
| T1 | Vehicle Telematics | GPS + speed + events |
| T2 | Asset Tracker | Periodic location |
| T3 | Mobile Equipment | Forklifts, machinery |
DC7 — Controller / PLC
| Code | Functional Category | Typical Signals |
|---|---|---|
| C1 | BMS Controller | Deterministic cycles |
| C2 | PLC / SCADA | Industrial protocols |
| C3 | Safety Controller | Highly deterministic |
DC8 — Unknown / Hybrid
| Code | Functional Category | Typical Signals |
|---|---|---|
| U1 | Unknown | Insufficient data |
| U2 | Hybrid | Multiple behaviors |
Level 3: Attributes (NOT inferred in V1)
These are derived or user-confirmed later, but the ontology anticipates them.
- Measurement unit family (energy, temperature, events)
- Control capability (read-only / write)
- Safety criticality
- Data sensitivity
- ESRS relevance mapping
- Verification requirements
This separation is intentional.
Inference → Ontology Mapping (V1 rules)
The inference engine only assigns:
Device Class (DCx)
+ Top-3 Functional Categories
+ Confidence score
Everything else is human-confirmed or rule-derived later.
Why this ontology works (strategically)
✔ Small
- ~30 total functional categories
- Easy to explain
- Easy to maintain
✔ Expressive
- Enough to infer:
- expected cadence
- expected entropy
- expected unit family
✔ ESG-safe
- No compliance claims
- No metric assumptions
- No automatic ESRS mapping
✔ Extensible
Future V2/V3 can add:
- Industry-specific subclasses
- Vendor/model layers
- Carbon passport linkages
- Product-level digital twins
Critical UX Rule (must be enforced)
Ontology terms must be visible to users.
No hidden magic labels. Users must see:
- “Environmental Sensor”
- “Energy Meter”
- “Gateway”
This builds trust and audit defensibility.
ZAYAZ IoT Category → Behavior Matrix (V1)
How to read this matrix (important)
Each Functional Category defines:
- Expected behavior envelope (ranges, not absolutes)
- Typical entropy profile
- Default risk posture
- Validation heuristics (soft rules)
Violations do not mean “wrong” They mean “needs attention”
Legend
- Cadence: typical reporting frequency
- Volume: data size per day (order of magnitude)
- Entropy (Time / Size / Destination) Low / Medium / High
- Risk: data misuse or misclassification impact
- Primary Use: what the data usually represents (not enforced)
DC1 — Sensors
S1 Environmental Sensor
| Attribute | Expected |
|---|---|
| Cadence | 30s – 15 min |
| Volume | Very low |
| Time entropy | Low |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | Low |
| Primary use | Temperature, air quality, comfort |
Validation heuristics
- Regular periodicity
- Stable packet size
- Single/few endpoints ⚠️ Flag if bursty or MB/hour scale
Validation heuristics
- Regular periodicity
- Stable packet size
- Single/few endpoints ⚠️ Flag if bursty or MB/hour scale
S2 Occupancy / PresenceS2 Occupancy / Presence
| Attribute | Expected |
|---|---|
| Cadence | Event-driven + keepalive |
| Volume | Low |
| Time entropy | Medium |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | Medium (privacy) |
| Primary use | Space utilization |
Validation heuristics
- Irregular events
- Small payloads ⚠️ Flag if continuous streaming
S3 Condition / State
| Attribute | Expected |
|---|---|
| Cadence | 1–60 min |
| Volume | Low |
| Time entropy | Low–Medium |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | Low |
| Primary use | Maintenance, wear |
S4 Safety Sensor
| Attribute | Expected |
|---|---|
| Cadence | Periodic + rare events |
| Volume | Low |
| Time entropy | Medium |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | High |
| Primary use | Alerts, compliance |
⚠️ Any data loss or silence = flag
DC2 — Meters
M1 Electricity Meter
| Attribute | Expected |
|---|---|
| Cadence | 1–15 min |
| Volume | Low |
| Time entropy | Very low |
| Size entropy | Very low |
| Destination entropy | Very low |
| Risk | High |
| Primary use | Scope 2, billing |
Strong invariants
- Highly regular cadence ⚠️ High entropy = misclassification or gateway
M2 Thermal / Heat Meter
| Attribute | Expected |
|---|---|
| Cadence | 1–15 min |
| Volume | Low |
| Time entropy | Very low |
| Size entropy | Very low |
| Destination entropy | Low |
| Risk | High |
| Primary use | Energy efficiency |
M3 Water Meter
| Attribute | Expected |
|---|---|
| Cadence | 15 min – 1 h |
| Volume | Very low |
| Time entropy | Low |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | Medium |
M4 Gas Meter
| Attribute | Expected |
|---|---|
| Cadence | 15 min – 1 h |
| Volume | Very low |
| Time entropy | Low |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | High |
DC3 — Actuators
A1 HVAC Control
| Attribute | Expected |
|---|---|
| Cadence | Event-driven + polling |
| Volume | Low |
| Time entropy | Medium |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | High |
| Primary use | Control, not measurement |
⚠️ Bidirectional traffic expected
A2 Lighting Control
| Attribute | Expected |
|---|---|
| Cadence | Event-driven |
| Volume | Low |
| Time entropy | Medium |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | Medium |
A3 Industrial Actuation
| Attribute | Expected |
|---|---|
| Cadence | Deterministic cycles |
| Volume | Medium |
| Time entropy | Very low |
| Size entropy | Low |
| Destination entropy | Very low |
| Risk | Very High |
DC4 — Imaging / AV
I1 Video Camera
| Attribute | Expected |
|---|---|
| Cadence | Continuous or burst |
| Volume | Very high |
| Time entropy | High |
| Size entropy | High |
| Destination entropy | Medium |
| Risk | Very High |
| Primary use | Security, monitoring |
⚠️ If classified as “sensor” → hard contradiction
I2 Audio Sensor
| Attribute | Expected |
|---|---|
| Cadence | Event or continuous |
| Volume | Medium |
| Time entropy | High |
| Size entropy | Medium |
| Destination entropy | Medium |
| Risk | High |
I3 Multimodal AV
| Attribute | Expected |
|---|---|
| Cadence | Burst-heavy |
| Volume | Very high |
| Time entropy | High |
| Size entropy | High |
| Destination entropy | Medium |
| Risk | Very High |
DC5 — Gateway / Hub
G1 IoT Gateway
| Attribute | Expected |
|---|---|
| Cadence | Continuous |
| Volume | Medium–High |
| Time entropy | High |
| Size entropy | Medium |
| Destination entropy | High |
| Risk | Medium |
| Primary use | Aggregation |
Key signal: many inbound → one outbound
G2 Protocol Bridge
| Attribute | Expected |
|---|---|
| Cadence | Deterministic |
| Volume | Medium |
| Time entropy | Low |
| Size entropy | Low |
| Destination entropy | Low |
| Risk | Medium |
G3 Edge Compute
| Attribute | Expected |
|---|---|
| Cadence | Irregular |
| Volume | Medium |
| Time entropy | Medium |
| Size entropy | Medium |
| Destination entropy | Medium |
| Risk | Medium |
DC6 — Mobility / Asset
T1 Vehicle Telematics
| Attribute | Expected |
|---|---|
| Cadence | 10s – 5 min |
| Volume | Medium |
| Time entropy | Medium |
| Size entropy | Medium |
| Destination entropy | Medium |
| Risk | High |
T2 Asset Tracker
| Attribute | Expected |
|---|---|
| Cadence | 5 min – hours |
| Volume | Low |
| Time entropy | Medium |
| Size entropy | Low |
| Destination entropy | Medium |
| Risk | Medium |
T3 Mobile Equipment
| Attribute | Expected |
|---|---|
| Cadence | Event-driven |
| Volume | Medium |
| Time entropy | Medium |
| Size entropy | Medium |
| Destination entropy | Medium |
| Risk | High |
DC7 — Controller / PLC
C1 BMS Controller
| Attribute | Expected |
|---|---|
| Cadence | Deterministic cycles |
| Volume | Medium |
| Time entropy | Very low |
| Size entropy | Low |
| Destination entropy | Very low |
| Risk | Very High |
C2 PLC / SCADA
| Attribute | Expected |
|---|---|
| Cadence | Fixed cycle |
| Volume | Medium–High |
| Time entropy | Very low |
| Size entropy | Low |
| Destination entropy | Very low |
| Risk | Very High |
C3 Safety Controller
| Attribute | Expected |
|---|---|
| Cadence | Fixed + watchdog |
| Volume | Medium |
| Time entropy | Very low |
| Size entropy | Very low |
| Destination entropy | Very low |
| Risk | Extreme |
DC8 — Unknown / Hybrid
U1 Unknown
| Attribute | Expected |
|---|---|
| Cadence | Undefined |
| Volume | Undefined |
| Entropy | Mixed |
| Risk | Unknown |
U2 Hybrid
| Attribute | Expected |
|---|---|
| Cadence | Multiple modes |
| Volume | Variable |
| Entropy | High variance |
| Risk | Medium–High |
How this matrix is used in V1 (very important)
- Inference
- Compare observed behavior → matrix envelope
- Assign probabilities
- Prefill
- Suggest category + expected cadence + risk
- Validation
- Detect contradictions (camera-like behavior labeled “meter”)
- Audit
- Explain why something was flagged or suggested
What this enables next (without changing V1)
- Automatic sanity checks
- Early data poisoning detection
- Per-category default validation rules
- Future ESRS relevance suggestions (still manual)
Table set (what’s inside)
- iot_device_class — DC1..DC8 (top-level classes)
- iot_functional_category — S1..U2 (functional categories linked to device class)
- iot_behavior_profile — cadence/volume/entropy/risk expectations per functional category
- iot_validation_heuristic — a minimal V1 set of contradiction/anomaly rules (template-style)
What’s in the dictionary (V1)
One table with multiple band_types:
- volume_band (very_low … very_high, plus variable/undefined) with approx bytes/day ranges
- entropy_band (very_low … high, plus mixed/high-variance) with ordinals
- risk_level (low … extreme, unknown) with ordinals
- cadence_type (periodic, event-driven, deterministic, bursty, etc.) with ordinals + has_regular_period
Why ordinals matter
They let you score differences like:
- expected entropy_time=very_low (10) but observed high (40) → delta 30 (strong contradiction)
- expected volume_band=low (20) but observed medium (30) → delta 10 (soft mismatch)
MEID-IOT-V1 Scoring Spec (Tiny)
0. Inputs
Observed signals (per source, over an observation window W)
Minimum viable set:
- bytes_per_day
- inter_arrival_cv (coefficient of variation for inter-arrival times)
- periodicity_score (0..1)
- burstiness_index (0..1)
- destination_count and/or destination_entropy_shannon
- packet_size_entropy_shannon
Expected profile (from iot_behavior_profile)
- cadence_type
- volume_band
- entropy_time
- entropy_size
- entropy_destination
- risk_level (used for triage, not classification)
Band dictionary (from behavior_band_dictionary)
- ordinals for each band type
- volume band byte ranges
1.0 Normalize observed signals into observed bands
1.1 Volume band mapping
Use bytes_per_day against behavior_band_dictionary ranges:
- Find the volume_band whose [min,max) contains bytes_per_day.
- If none matches → volume_band = variable (or undefined if missing).
1.2 Time entropy band mapping (cheap + robust)
Use periodicity + CV as proxy:
If periodicity_score >= 0.85 and inter_arrival_cv <= 0.25 → very_lowElse if periodicity_score >= 0.70 and inter_arrival_cv <= 0.50 → lowElse if periodicity_score >= 0.50 → mediumElse if burstiness_index >= 0.70 → highElse → medium-high
(If you also compute Shannon entropy of inter-arrival bins, you can map via quantiles later; V1 can stay proxy-based.)
1.3 Size entropy band mapping
If you have Shannon entropy of packet sizes (H_size, normalized 0..1):
H_size < 0.15 → very_low0.15–0.30 → low0.30–0.45 → medium0.45–0.60 → medium-high>0.60 → highIf missing, infer from packet_size_std/mean similarly.
1.4 Destination entropy band mapping
Pick one:
- Count-based: map destination_count into bands (1→very_low, 2–3→low, 4–8→medium, 9–20→high, >20→high-variance)
- Entropy-based: map normalized Shannon destination entropy (0..1) using the same cutoffs as size entropy.
2 Distance scoring against each candidate category
For each functional category c with expected profile E(c):
2.1 Ordinal deltas
Let ord(type, code) return ordinal from dictionary.
Compute:
ΔV = abs(ord(volume_band_obs) - ord(volume_band_exp))ΔT = abs(ord(entropy_time_obs) - ord(entropy_time_exp))ΔS = abs(ord(entropy_size_obs) - ord(entropy_size_exp))ΔD = abs(ord(entropy_destination_obs) - ord(entropy_destination_exp))
2.2 Weighted distance (V1 defaults)
Weights (tuned for “entropy-first” classification):
wV=0.30,wT=0.30,wS=0.20,wD=0.20
Raw distance:
dist_raw(c) = wV*ΔV + wT*ΔT + wS*ΔS + wD*ΔD
2.3 Normalize to similarity score
Convert distance to similarity in 0..1:
sim(c) = exp( - dist_raw(c) / τ )
Where τ is a temperature; V1 default τ = 12.
(Why this form: small deltas barely hurt; big contradictions collapse similarity quickly.)
3 Produce probabilities (Top-K)
For all candidates:
p(c) = sim(c) / Σ sim(all)
Return:
top_3candidates byp(c)- confidence =
p(top_1)(simple and interpretable)
V1 usage:
- If confidence
< 0.55→ classify as DC8/U1 Unknown (prefill minimal fields only) - Else → prefill device class + suggest top-3 functional categories
4 Contradiction severity scoring (separate from classification)
This is what powers flags and trust posture.
4.1 Severity score
Define:
sev(c*) = max(
ΔV / 30,
ΔT / 30,
ΔS / 30,
ΔD / 30
)
Where 30 is a rough “big delta” scale (based on your ordinals spacing).
4.2 Severity thresholds
sev < 0.25 → OK0.25–0.45 → WARN0.45–0.70 → HIGH>0.70 → CRITICAL
4.3 Hard contradiction rules (V1)
Independent of ordinals, add fast “if-then” checks:
If volume_band_obs ∈ {high, very_high} and expected is {very_low, low} → at least HIGHIf expected is any meter (M1–M4) and entropy_destination_obs ∈ {medium, high, high-variance} → at least HIGHIf expected is camera (I1/I3) but volume is very_low/low → at least WARN (maybe metadata-only camera)
These map cleanly to the iot_validation_heuristic table later.
5 Prefill guardrails (V1 policy)
- Always show: Top-3 + confidence + “Why” (evidence = band comparisons + raw observed stats)
- Never auto-lock.
- Never allow unconfirmed labels to drive compliance outputs.
6 Minimal “Why” explanation payload (for audit + UX)
Return per candidate c:
{
"category": "M1",
"p": 0.82,
"distance": 6.0,
"deltas": {"ΔV": 0, "ΔT": 5, "ΔS": 0, "ΔD": 0},
"observed_bands": {"volume":"low","time":"very_low","size":"very_low","dest":"very_low"},
"expected_bands": {"volume":"low","time":"very_low","size":"very_low","dest":"very_low"},
"flags": [{"severity":"WARN","reason":"..." }]
}
This is enough to be transparent without exposing payload contents.
V1 Default Params (so teams don’t debate forever)
- Observation window
W = 24h(fallback 6h) τ = 12- weights:
wV=0.30,wT=0.30,wS=0.20,wD=0.20 - confidence threshold: 0.55 for “usable prefill”
As a single config object
To be consumed by MEID-IOT-V1 directly.
meid_iot_v1_scoring_config.json{
"config_name": "MEID_IOT_V1_SCORING",
"version": "1.0.0",
"description": "Scoring and contradiction spec for IoT device inference using behavior bands.",
"observation_window": {
"primary_hours": 24,
"fallback_hours": 6
},
{
"config_name": "MEID_IOT_V1_SCORING",
"version": "1.0.0",
"description": "Scoring and contradiction spec for IoT device inference using behavior bands.",
"observation_window": {
"primary_hours": 24,
"fallback_hours": 6
},
"band_mapping": {
"volume_band": {
"source_metric": "bytes_per_day",
"fallback": "variable"
},
"entropy_time": {
"method": "proxy",
"rules": [
{
"if": "periodicity_score >= 0.85 and inter_arrival_cv <= 0.25",
"band": "very_low"
},
{
"if": "periodicity_score >= 0.70 and inter_arrival_cv <= 0.50",
"band": "low"
},
{
"if": "periodicity_score >= 0.50",
"band": "medium"
},
{
"if": "burstiness_index >= 0.70",
"band": "high"
},
{
"else": "medium-high"
}
]
},
"entropy_size": {
"method": "shannon_normalized",
"thresholds": [
{
"max": 0.15,
"band": "very_low"
},
{
"max": 0.3,
"band": "low"
},
{
"max": 0.45,
"band": "medium"
},
{
"max": 0.6,
"band": "medium-high"
},
{
"else": "high"
}
]
},
"entropy_destination": {
"method": "shannon_or_count",
"thresholds": [
{
"max": 0.15,
"band": "very_low"
},
{
"max": 0.3,
"band": "low"
},
{
"max": 0.5,
"band": "medium"
},
{
"max": 0.7,
"band": "high"
},
{
"else": "high-variance"
}
]
}
},
"distance_scoring": {
"weights": {
"volume": 0.3,
"entropy_time": 0.3,
"entropy_size": 0.2,
"entropy_destination": 0.2
},
"temperature_tau": 12,
"similarity_function": "exp(-distance/tau)"
},
"classification_policy": {
"top_k": 3,
"confidence_threshold": 0.55,
"fallback_category": "U1",
"fallback_device_class": "DC8"
},
"severity_scoring": {
"delta_scale": 30,
"thresholds": {
"ok": 0.25,
"warn": 0.45,
"high": 0.7,
"critical": 1.0
}
},
"hard_contradiction_rules": [
{
"id": "HC1",
"if": "observed.volume_band in ['high','very_high'] and expected.volume_band in ['very_low','low']",
"min_severity": "high",
"message": "Observed volume contradicts expected low-volume behavior."
},
{
"id": "HC2",
"if": "expected.category in ['M1','M2','M3','M4'] and observed.entropy_destination in ['medium','high','high-variance']",
"min_severity": "high",
"message": "Meter expected deterministic single-endpoint behavior."
},
{
"id": "HC3",
"if": "expected.category in ['I1','I3'] and observed.volume_band in ['very_low','low']",
"min_severity": "warn",
"message": "Imaging device shows unusually low data volume."
}
],
"ui_policy": {
"show_top_k": true,
"show_confidence": true,
"show_evidence": true,
"lock_fields": false
},
"governance": {
"auto_learning": false,
"override_logging": true,
"versioned": true,
"requires_confirmation_for_reporting": true
}
}
What this config gives us (why it’s strong)
This file is intentionally operational, not academic. It centralizes all tunables so we can:
- Adjust behavior per tenant / sector without code changes
- Keep inference deterministic and explainable
- Version and audit scoring logic like any other governed artifact
Key sections (how they’re used)
observation_window
- Standardizes inference windows (24h / 6h fallback)
- Prevents “short window noise” from polluting classification
band_mapping
- Canonical logic for mapping raw signals → bands
- Keeps entropy logic out of code and in policy
- Easy to evolve (e.g. replace proxies with Shannon later)
distance_scoring
- Weighted ordinal distance → similarity
- Temperature (tau) gives you graceful decay instead of hard cutoffs
- This is the heart of probabilistic inference
classification_policy
- Enforces conservative behavior:
- Top-3 only
- Explicit confidence threshold
- Safe fallback to DC8 / U1
severity_scoring
- Independent contradiction severity scale
- Decouples “best guess” from “trustworthiness”
hard_contradiction_rules
- Fast, deterministic safety rails
- These are auditor-friendly because they are explicit and readable
ui_policy
- Guarantees no dark patterns
- Aligns with ZAYAZ trust-first UX
governance
- Explicitly disables auto-learning
- Forces human confirmation before reporting
- Makes this safe for CSRD/ESRS environments
Architectural note (important)
This config should be treated as:
Reference data + policy, not application config
Meaning:
- Version it
- Sign it (later)
- Reference the version in every inference result:
"scoring_config_version": "MEID_IOT_V1_SCORING@1.0.0"
That gives us full forensic traceability.
MEID-IOT-V1 Explain Payload Schema
1. Envelope (what every inference event returns)
{
"inference_id": "uuid",
"source_id": "uuid-or-stable-id",
"observed_window": {
"start_at": "2026-01-08T00:00:00Z",
"end_at": "2026-01-09T00:00:00Z",
"duration_s": 86400
},
"model": {
"engine": "MEID_IOT_V1",
"engine_version": "1.0.0",
"scoring_config": {
"name": "MEID_IOT_V1_SCORING",
"version": "1.0.0"
}
},
"status": {
"result_quality": "OK|LOW_DATA|DEGRADED|ERROR",
"confidence": 0.82,
"fallback_applied": false
},
"prediction": {
"device_class": { "code": "DC2", "label": "Meter", "p": 0.90 },
"top_k_functional_categories": [
{ "code": "M1", "label": "Electricity", "p": 0.82, "score": 0.71, "distance": 6.0 },
{ "code": "M2", "label": "Thermal / Heat", "p": 0.11, "score": 0.22, "distance": 18.0 },
{ "code": "G2", "label": "Protocol Bridge", "p": 0.07, "score": 0.14, "distance": 22.0 }
]
},
"explain": {
"observed_features": { },
"observed_bands": { },
"expected_bands_by_candidate": { },
"deltas_by_candidate": { },
"flags": [ ]
},
"provenance": {
"data_sources": [
{ "type": "NETFLOW", "ref": "flow_log_id_or_bucket_key" }
],
"evidence_refs": [
{ "type": "FEATURE_SNAPSHOT", "ref": "hash-or-object-key", "hash": "sha256:..." }
]
}
}
Why this envelope works
- Prediction is separated from explainability, so UI can render quickly.
- You can store the entire object for audit, but also index just the prediction fields.
- score and distance are included for debugging + model tuning (optional to show in UI).
2. explain.observed_features (raw, privacy-safe stats)
Keep this minimal, numeric, and non-PII:
{
"bytes_per_day": 420000,
"avg_bytes_per_min": 291,
"packet_size_entropy_norm": 0.12,
"destination_entropy_norm": 0.05,
"periodicity_score": 0.92,
"inter_arrival_cv": 0.18,
"burstiness_index": 0.08,
"destination_count": 1
}
Optional extensions (still safe):
- tls_seen: true/false
- ports_top: [443, 8883]
- sni_domains_top: hashed/normalized (avoid raw domain unless allowed)
3. explain.observed_bands (derived expectations format)
{
"volume_band": "low",
"entropy_time": "very_low",
"entropy_size": "very_low",
"entropy_destination": "very_low",
"cadence_type_hint": "periodic"
}
These are exactly the values the matrix expects and the scoring uses.
4. Expected bands per candidate (Top-K only)
Only include Top-K candidates to keep payload small:
{
"M1": {
"volume_band": "low",
"entropy_time": "very_low",
"entropy_size": "very_low",
"entropy_destination": "very_low",
"cadence_type": "periodic"
},
"M2": {
"volume_band": "low",
"entropy_time": "very_low",
"entropy_size": "very_low",
"entropy_destination": "low",
"cadence_type": "periodic"
}
}
5. Deltas (this is the “why” in one line)
{
"M1": { "ΔV": 0, "ΔT": 0, "ΔS": 0, "ΔD": 0 },
"M2": { "ΔV": 0, "ΔT": 0, "ΔS": 0, "ΔD": 10 }
}
This enables a clean UI:
- “Matches expected cadence and low entropy”
- “Slight mismatch in destination behavior”
6. Flags (contradictions & risk signals)
This is the trust layer, not the classifier:
[
{
"id": "HC2",
"severity": "HIGH",
"scope": "candidate:M1",
"message": "Meter expected deterministic single-endpoint behavior.",
"evidence": {
"observed": { "entropy_destination": "high" },
"expected": { "entropy_destination": "very_low" }
}
}
]
Also support general flags:
- scope: "
source" (applies regardless of category) - scope: "
candidate:<code>"
V1 Storage & Indexing recommendation (important)
Store two representations:
A. Full JSON (append-only)
- For audit, replay, verifier APIs
- Content-addressable evidence references
B. Indexed columns (for fast queries)
- source_id, inference_id, created_at
- predicted_device_class, top1_category, confidence
- max_severity, flag_count, fallback_applied
- scoring_config_version, engine_version
Minimal JSON Schema (practical constraints)
If strict validation is wanted, enforce:
- required:
inference_id,source_id,observed_window,model,status,prediction,explain - limit: Top-K must match config
- enums for severities and band values from behavior_band_dictionary
UI rendering rules (tiny but powerful)
For each inferred field:
- show
label + (AI, 82%) - “Why?” expands:
- observed bands
- expected bands for top1
- 1–3 deltas
- flags (if any)
This makes the assistant trust-mediating, not “guessing”.