Jira progress: loading…

IoT-DI

IoT Device Inference

1. ZAYAZ IoT Device Inference – V1 Sketch

V1 Design Goal (non-negotiable)

Reduce IoT data-source onboarding friction by ≥50% while increasing provenance transparency.

Not “perfect classification”. Not “magic AI”. Faster, safer, auditable onboarding.

1. V1 Scope (what we deliberately include / exclude)

✅ Included in V1

Probabilistic device category inference (not model-level)
Entropy-based + metadata-based features only
Prefill of limited, low-risk fields
Explicit user confirmation / override
Full provenance & audit trail
Stateless inference engine (easy to scale)

❌ Explicitly excluded from V1

Deep packet inspection
Automatic compliance mapping
Autonomous learning loops
“Black box” ML
Any inference driving reporting without confirmation

This keeps V1 safe, fast, and credible.

2. V1 Target Outputs (what the system actually produces)

Primary Output

IoT Device Profile – Draft

Fields prefilled with confidence:

Field	Prefill?	Notes
Device category	✅	e.g. Sensor / Meter / Camera / Gateway
Sub-category	⚠️ (Top-3)	e.g. Temperature / Energy / Occupancy
Expected data cadence	✅	periodic / event-driven / bursty
Expected unit family	⚠️	energy / environmental / binary events
Data risk flag	✅	low / medium / anomalous
AI confidence	✅	0–1
Evidence link	✅	mandatory

Everything else remains manual in V1.

3. Core Component: MEID-IOT-V1 (Micro-Engine)

Purpose

Generate a provenance hypothesis for an unknown IoT data stream.

Inputs (minimal & realistic)

Flow metadata (NetFlow-like)
Timestamped packet sizes
Destination domains / IPs
Protocol/port hints (no payload parsing)
Optional MAC OUI (if available)

Feature Set (V1)

Entropy Features

Payload size entropy
Inter-arrival time entropy
Destination entropy
Session duration variance

Structural Signals

Periodicity score
Burstiness index
Endpoint stability score
Avg bytes / minute

Light Identity Hints

MAC OUI → vendor family (optional)
Domain pattern match (vendor clouds)

⚠️ All features are non-PII and privacy-safe.

4. Inference Logic (V1 = transparent, not fancy)

Model choice (recommended)

Rule-weighted Bayesian classifier
- Human-readable priors
- Easy to tune
- Explainable

Example (simplified):

IF low time entropy
AND low size entropy
AND single stable endpoint
→ P(sensor) ↑↑

IF high size variance
AND burst traffic
AND high destination entropy
→ P(camera/gateway) ↑↑

Output

iot--inference-logicexample.json
{
  "predictions": [
    {"label": "Environmental Sensor", "p": 0.82},
    {"label": "Energy Meter", "p": 0.11},
    {"label": "Gateway", "p": 0.07}
  ],
  "confidence": 0.82,
  "model_version": "MEID-IOT-V1.0"
}

5. FOGE Integration: Prefill with Guardrails

UI Behaviour (critical)

Fields show “AI-suggested” badge
Confidence shown inline
“Why?” button reveals evidence summary
“Change” opens dropdown with Top-3 + manual search

Hard Rule

Inference may prefill forms, but never auto-lock fields.

This aligns with ZAAM trust principles.

6. Override = First-Class Data Asset

Every override creates:

Inference Event
→ User Override
→ Confirmed Device Profile

Stored with:

Old prediction
New label
Confidence delta
Timestamp
Tenant context

This becomes future training data, but:

Not auto-used
Only via governed retraining cycle

7. Data Model Additions (SSSR-aligned)

New entity (V1 minimal)

iot_source_inference

inference_id
source_id
predicted_labels[]
confidence
evidence_refs[]
model_version
created_at

Extend existing source entity

confirmed_device_category
confirmation_method = MANUAL | AI_ASSISTED
confirmation_timestamp

This is enough for audit and scale.

8. KPIs for V1 (decide success early)

The followingh should be measured from day one:

KPI	Target
Avg onboarding time	−50%
% AI-assisted registrations	≥60%
Override rate	20–40% (healthy!)
Post-confirmation correction rate	`<5%`
Auditor objections	0

⚠️ High override rate is good early — it means engagement and learning.

9. V1 Risks & How We Neutralize Them

| Risk Mitigation | | --- | --- | | “AI guessing” distrust | Confidence + evidence mandatory | | Wrong assumptions | No auto-use in reporting | | Overfitting early | Simple rules, no auto-learning | | Security concerns | Metadata-only inference | | Scope creep | Explicit V1 exclusions |

10. Why this V1 is strategically sound

Delivers immediate operational value
Strengthens ZAYAZ’s data provenance story
Creates labeled data for future intelligence
Does not threaten compliance credibility
Fits perfectly into MICE + FOGE + ZAAM

ZAYAZ IoT Device Category Ontology — V1

Design principles (important to state explicitly)

Behavior-first, not vendor-first
Category ≠ Metric ≠ Compliance use
Probabilistic inference allowed; reporting use is not
Every category must imply expectations

cadence
data shape
unit family
risk profile

Level 1: Device Class (Top-Level)

This is the only level inferred automatically in V1.

Code	Device Class	Definition
DC1	Sensor	Measures a physical or environmental variable
DC2	Meter	Quantifies consumption or flow over time
DC3	Actuator	Performs actions / control (often bidirectional)
DC4	Imaging / AV	Produces image, video, or audio streams
DC5	Gateway / Hub	Aggregates, relays, or transforms other devices’ data
DC6	Mobility / Asset	Associated with moving assets (vehicles, containers)
DC7	Controller / PLC	Industrial or building automation control logic
DC8	Unknown / Hybrid	Cannot be reliably classified yet

⚠️ Rule: If confidence < threshold → default to DC8.

Level 2: Functional Category (V1-controlled list)

This is suggested (Top-3), never auto-selected.

DC1 — Sensor

Code	Functional Category	Typical Signals
S1	Environmental	Temp, humidity, CO₂, air quality
S2	Occupancy / Presence	Motion, people count, desk usage
S3	Condition / State	Vibration, tilt, open/close
S4	Safety	Smoke, gas, leak detection

DC2 — Meter

Code	Functional Category	Typical Signals
M1	Electricity	kWh, voltage, current
M2	Thermal / Heat	Heat flow, temperature delta
M3	Water	Volume, flow
M4	Gas	Volume, pressure

DC3 — Actuator

Code	Functional Category	Typical Signals
A1	HVAC Control	Setpoints, valve positions
A2	Lighting Control	On/off, dimming
A3	Industrial Actuation	Motors, relays

DC4 — Imaging / AV

Code	Functional Category	Typical Signals
I1	Video Camera	High bandwidth, bursty
I2	Audio Sensor	Moderate bandwidth
I3	Multimodal	AV + metadata

DC5 — Gateway / Hub

Code	Functional Category	Typical Signals
G1	IoT Gateway	Many inbound devices
G2	Protocol Bridge	BACnet ↔ IP, Modbus ↔ MQTT
G3	Edge Compute	Pre-aggregation, filtering

DC6 — Mobility / Asset

Code	Functional Category	Typical Signals
T1	Vehicle Telematics	GPS + speed + events
T2	Asset Tracker	Periodic location
T3	Mobile Equipment	Forklifts, machinery

DC7 — Controller / PLC

Code	Functional Category	Typical Signals
C1	BMS Controller	Deterministic cycles
C2	PLC / SCADA	Industrial protocols
C3	Safety Controller	Highly deterministic

DC8 — Unknown / Hybrid

Code	Functional Category	Typical Signals
U1	Unknown	Insufficient data
U2	Hybrid	Multiple behaviors

Level 3: Attributes (NOT inferred in V1)

These are derived or user-confirmed later, but the ontology anticipates them.

Measurement unit family (energy, temperature, events)
Control capability (read-only / write)
Safety criticality
Data sensitivity
ESRS relevance mapping
Verification requirements

This separation is intentional.

Inference → Ontology Mapping (V1 rules)

The inference engine only assigns:

Device Class (DCx)
+ Top-3 Functional Categories
+ Confidence score

Everything else is human-confirmed or rule-derived later.

Why this ontology works (strategically)

✔ Small

~30 total functional categories
Easy to explain
Easy to maintain

✔ Expressive

Enough to infer:
expected cadence
expected entropy
expected unit family

✔ ESG-safe

No compliance claims
No metric assumptions
No automatic ESRS mapping

✔ Extensible

Future V2/V3 can add:

Industry-specific subclasses
Vendor/model layers
Carbon passport linkages
Product-level digital twins

Critical UX Rule (must be enforced)

Ontology terms must be visible to users.

No hidden magic labels. Users must see:

“Environmental Sensor”
“Energy Meter”
“Gateway”

This builds trust and audit defensibility.

ZAYAZ IoT Category → Behavior Matrix (V1)

How to read this matrix (important)

Each Functional Category defines:

Expected behavior envelope (ranges, not absolutes)
Typical entropy profile
Default risk posture
Validation heuristics (soft rules)

Violations do not mean “wrong” They mean “needs attention”

Legend

Cadence: typical reporting frequency
Volume: data size per day (order of magnitude)
Entropy (Time / Size / Destination) Low / Medium / High
Risk: data misuse or misclassification impact
Primary Use: what the data usually represents (not enforced)

DC1 — Sensors

S1 Environmental Sensor

Attribute	Expected
Cadence	30s – 15 min
Volume	Very low
Time entropy	Low
Size entropy	Low
Destination entropy	Low
Risk	Low
Primary use	Temperature, air quality, comfort

Validation heuristics

Regular periodicity
Stable packet size
Single/few endpoints ⚠️ Flag if bursty or MB/hour scale

Validation heuristics

Regular periodicity
Stable packet size
Single/few endpoints ⚠️ Flag if bursty or MB/hour scale

S2 Occupancy / PresenceS2 Occupancy / Presence

Attribute	Expected
Cadence	Event-driven + keepalive
Volume	Low
Time entropy	Medium
Size entropy	Low
Destination entropy	Low
Risk	Medium (privacy)
Primary use	Space utilization

Validation heuristics

Irregular events
Small payloads ⚠️ Flag if continuous streaming

S3 Condition / State

Attribute	Expected
Cadence	1–60 min
Volume	Low
Time entropy	Low–Medium
Size entropy	Low
Destination entropy	Low
Risk	Low
Primary use	Maintenance, wear

S4 Safety Sensor

Attribute	Expected
Cadence	Periodic + rare events
Volume	Low
Time entropy	Medium
Size entropy	Low
Destination entropy	Low
Risk	High
Primary use	Alerts, compliance

⚠️ Any data loss or silence = flag

DC2 — Meters

M1 Electricity Meter

Attribute	Expected
Cadence	1–15 min
Volume	Low
Time entropy	Very low
Size entropy	Very low
Destination entropy	Very low
Risk	High
Primary use	Scope 2, billing

Strong invariants

Highly regular cadence ⚠️ High entropy = misclassification or gateway

M2 Thermal / Heat Meter

Attribute	Expected
Cadence	1–15 min
Volume	Low
Time entropy	Very low
Size entropy	Very low
Destination entropy	Low
Risk	High
Primary use	Energy efficiency

M3 Water Meter

Attribute	Expected
Cadence	15 min – 1 h
Volume	Very low
Time entropy	Low
Size entropy	Low
Destination entropy	Low
Risk	Medium

M4 Gas Meter

Attribute	Expected
Cadence	15 min – 1 h
Volume	Very low
Time entropy	Low
Size entropy	Low
Destination entropy	Low
Risk	High

DC3 — Actuators

A1 HVAC Control

Attribute	Expected
Cadence	Event-driven + polling
Volume	Low
Time entropy	Medium
Size entropy	Low
Destination entropy	Low
Risk	High
Primary use	Control, not measurement

⚠️ Bidirectional traffic expected

A2 Lighting Control

Attribute	Expected
Cadence	Event-driven
Volume	Low
Time entropy	Medium
Size entropy	Low
Destination entropy	Low
Risk	Medium

A3 Industrial Actuation

Attribute	Expected
Cadence	Deterministic cycles
Volume	Medium
Time entropy	Very low
Size entropy	Low
Destination entropy	Very low
Risk	Very High

DC4 — Imaging / AV

I1 Video Camera

Attribute	Expected
Cadence	Continuous or burst
Volume	Very high
Time entropy	High
Size entropy	High
Destination entropy	Medium
Risk	Very High
Primary use	Security, monitoring

⚠️ If classified as “sensor” → hard contradiction

I2 Audio Sensor

Attribute	Expected
Cadence	Event or continuous
Volume	Medium
Time entropy	High
Size entropy	Medium
Destination entropy	Medium
Risk	High

I3 Multimodal AV

Attribute	Expected
Cadence	Burst-heavy
Volume	Very high
Time entropy	High
Size entropy	High
Destination entropy	Medium
Risk	Very High

DC5 — Gateway / Hub

G1 IoT Gateway

Attribute	Expected
Cadence	Continuous
Volume	Medium–High
Time entropy	High
Size entropy	Medium
Destination entropy	High
Risk	Medium
Primary use	Aggregation

Key signal: many inbound → one outbound

G2 Protocol Bridge

Attribute	Expected
Cadence	Deterministic
Volume	Medium
Time entropy	Low
Size entropy	Low
Destination entropy	Low
Risk	Medium

G3 Edge Compute

Attribute	Expected
Cadence	Irregular
Volume	Medium
Time entropy	Medium
Size entropy	Medium
Destination entropy	Medium
Risk	Medium

DC6 — Mobility / Asset

T1 Vehicle Telematics

Attribute	Expected
Cadence	10s – 5 min
Volume	Medium
Time entropy	Medium
Size entropy	Medium
Destination entropy	Medium
Risk	High

T2 Asset Tracker

Attribute	Expected
Cadence	5 min – hours
Volume	Low
Time entropy	Medium
Size entropy	Low
Destination entropy	Medium
Risk	Medium

T3 Mobile Equipment

Attribute	Expected
Cadence	Event-driven
Volume	Medium
Time entropy	Medium
Size entropy	Medium
Destination entropy	Medium
Risk	High

DC7 — Controller / PLC

C1 BMS Controller

Attribute	Expected
Cadence	Deterministic cycles
Volume	Medium
Time entropy	Very low
Size entropy	Low
Destination entropy	Very low
Risk	Very High

C2 PLC / SCADA

Attribute	Expected
Cadence	Fixed cycle
Volume	Medium–High
Time entropy	Very low
Size entropy	Low
Destination entropy	Very low
Risk	Very High

C3 Safety Controller

Attribute	Expected
Cadence	Fixed + watchdog
Volume	Medium
Time entropy	Very low
Size entropy	Very low
Destination entropy	Very low
Risk	Extreme

DC8 — Unknown / Hybrid

U1 Unknown

Attribute	Expected
Cadence	Undefined
Volume	Undefined
Entropy	Mixed
Risk	Unknown

U2 Hybrid

Attribute	Expected
Cadence	Multiple modes
Volume	Variable
Entropy	High variance
Risk	Medium–High

How this matrix is used in V1 (very important)

Inference

Compare observed behavior → matrix envelope
Assign probabilities

Prefill

Suggest category + expected cadence + risk

Validation

Detect contradictions (camera-like behavior labeled “meter”)

Audit

Explain why something was flagged or suggested

What this enables next (without changing V1)

Automatic sanity checks
Early data poisoning detection
Per-category default validation rules
Future ESRS relevance suggestions (still manual)

Table set (what’s inside)

iot_device_class — DC1..DC8 (top-level classes)
iot_functional_category — S1..U2 (functional categories linked to device class)
iot_behavior_profile — cadence/volume/entropy/risk expectations per functional category
iot_validation_heuristic — a minimal V1 set of contradiction/anomaly rules (template-style)

What’s in the dictionary (V1)

One table with multiple band_types:

volume_band (very_low … very_high, plus variable/undefined) with approx bytes/day ranges
entropy_band (very_low … high, plus mixed/high-variance) with ordinals
risk_level (low … extreme, unknown) with ordinals
cadence_type (periodic, event-driven, deterministic, bursty, etc.) with ordinals + has_regular_period

Why ordinals matter

They let you score differences like:

expected entropy_time=very_low (10) but observed high (40) → delta 30 (strong contradiction)
expected volume_band=low (20) but observed medium (30) → delta 10 (soft mismatch)

MEID-IOT-V1 Scoring Spec (Tiny)

0. Inputs

Observed signals (per source, over an observation window W)

Minimum viable set:

bytes_per_day
inter_arrival_cv (coefficient of variation for inter-arrival times)
periodicity_score (0..1)
burstiness_index (0..1)
destination_count and/or destination_entropy_shannon
packet_size_entropy_shannon

Expected profile (from iot_behavior_profile)

cadence_type
volume_band
entropy_time
entropy_size
entropy_destination
risk_level (used for triage, not classification)

Band dictionary (from behavior_band_dictionary)

ordinals for each band type
volume band byte ranges

1.0 Normalize observed signals into observed bands

1.1 Volume band mapping

Use bytes_per_day against behavior_band_dictionary ranges:

Find the volume_band whose [min,max) contains bytes_per_day.
If none matches → volume_band = variable (or undefined if missing).

1.2 Time entropy band mapping (cheap + robust)

Use periodicity + CV as proxy:

If periodicity_score >= 0.85 and inter_arrival_cv <= 0.25 → very_low
Else if periodicity_score >= 0.70 and inter_arrival_cv <= 0.50 → low
Else if periodicity_score >= 0.50 → medium
Else if burstiness_index >= 0.70 → high
Else → medium-high

(If you also compute Shannon entropy of inter-arrival bins, you can map via quantiles later; V1 can stay proxy-based.)

1.3 Size entropy band mapping

If you have Shannon entropy of packet sizes (H_size, normalized 0..1):

H_size < 0.15 → very_low
0.15–0.30 → low
0.30–0.45 → medium
0.45–0.60 → medium-high
>0.60 → high If missing, infer from packet_size_std/mean similarly.

1.4 Destination entropy band mapping

Pick one:

Count-based: map destination_count into bands (1→very_low, 2–3→low, 4–8→medium, 9–20→high, >20→high-variance)
Entropy-based: map normalized Shannon destination entropy (0..1) using the same cutoffs as size entropy.

2 Distance scoring against each candidate category

For each functional category c with expected profile E(c):

2.1 Ordinal deltas

Let ord(type, code) return ordinal from dictionary.

Compute:

ΔV = abs(ord(volume_band_obs) - ord(volume_band_exp))
ΔT = abs(ord(entropy_time_obs) - ord(entropy_time_exp))
ΔS = abs(ord(entropy_size_obs) - ord(entropy_size_exp))
ΔD = abs(ord(entropy_destination_obs) - ord(entropy_destination_exp))

2.2 Weighted distance (V1 defaults)

Weights (tuned for “entropy-first” classification):

wV=0.30, wT=0.30, wS=0.20, wD=0.20

Raw distance:

dist_raw(c) = wV*ΔV + wT*ΔT + wS*ΔS + wD*ΔD

2.3 Normalize to similarity score

Convert distance to similarity in 0..1:

sim(c) = exp( - dist_raw(c) / τ )

Where τ is a temperature; V1 default τ = 12.

(Why this form: small deltas barely hurt; big contradictions collapse similarity quickly.)

3 Produce probabilities (Top-K)

For all candidates:

p(c) = sim(c) / Σ sim(all)

Return:

top_3 candidates by p(c)
confidence = p(top_1) (simple and interpretable)

V1 usage:

If confidence < 0.55 → classify as DC8/U1 Unknown (prefill minimal fields only)
Else → prefill device class + suggest top-3 functional categories

4 Contradiction severity scoring (separate from classification)

This is what powers flags and trust posture.

4.1 Severity score

Define:

sev(c*) = max(
  ΔV / 30,
  ΔT / 30,
  ΔS / 30,
  ΔD / 30
)

Where 30 is a rough “big delta” scale (based on your ordinals spacing).

4.2 Severity thresholds

sev < 0.25 → OK
0.25–0.45 → WARN
0.45–0.70 → HIGH
>0.70 → CRITICAL

4.3 Hard contradiction rules (V1)

Independent of ordinals, add fast “if-then” checks:

If volume_band_obs ∈ {high, very_high} and expected is {very_low, low} → at least HIGH
If expected is any meter (M1–M4) and entropy_destination_obs ∈ {medium, high, high-variance} → at least HIGH
If expected is camera (I1/I3) but volume is very_low/low → at least WARN (maybe metadata-only camera)

These map cleanly to the iot_validation_heuristic table later.

5 Prefill guardrails (V1 policy)

Always show: Top-3 + confidence + “Why” (evidence = band comparisons + raw observed stats)
Never auto-lock.
Never allow unconfirmed labels to drive compliance outputs.

6 Minimal “Why” explanation payload (for audit + UX)

Return per candidate c:

return-per-candidate.json
{
  "category": "M1",
  "p": 0.82,
  "distance": 6.0,
  "deltas": {"ΔV": 0, "ΔT": 5, "ΔS": 0, "ΔD": 0},
  "observed_bands": {"volume":"low","time":"very_low","size":"very_low","dest":"very_low"},
  "expected_bands": {"volume":"low","time":"very_low","size":"very_low","dest":"very_low"},
  "flags": [{"severity":"WARN","reason":"..." }]
}

This is enough to be transparent without exposing payload contents.

V1 Default Params (so teams don’t debate forever)

Observation window W = 24h (fallback 6h)
τ = 12
weights: wV=0.30, wT=0.30, wS=0.20, wD=0.20
confidence threshold: 0.55 for “usable prefill”

As a single config object

To be consumed by MEID-IOT-V1 directly.

meid_iot_v1_scoring_config.json
{
  "config_name": "MEID_IOT_V1_SCORING",
  "version": "1.0.0",
  "description": "Scoring and contradiction spec for IoT device inference using behavior bands.",
  "observation_window": {
    "primary_hours": 24,
    "fallback_hours": 6
  },

  "band_mapping": {
    "volume_band": {
      "source_metric": "bytes_per_day",
      "fallback": "variable"
    },
    "entropy_time": {
      "method": "proxy",
      "rules": [
        {
          "if": "periodicity_score >= 0.85 and inter_arrival_cv <= 0.25",
          "band": "very_low"
        },
        {
          "if": "periodicity_score >= 0.70 and inter_arrival_cv <= 0.50",
          "band": "low"
        },
        {
          "if": "periodicity_score >= 0.50",
          "band": "medium"
        },
        {
          "if": "burstiness_index >= 0.70",
          "band": "high"
        },
        {
          "else": "medium-high"
        }
      ]
    },
    "entropy_size": {
      "method": "shannon_normalized",
      "thresholds": [
        {
          "max": 0.15,
          "band": "very_low"
        },
        {
          "max": 0.3,
          "band": "low"
        },
        {
          "max": 0.45,
          "band": "medium"
        },
        {
          "max": 0.6,
          "band": "medium-high"
        },
        {
          "else": "high"
        }
      ]
    },
    "entropy_destination": {
      "method": "shannon_or_count",
      "thresholds": [
        {
          "max": 0.15,
          "band": "very_low"
        },
        {
          "max": 0.3,
          "band": "low"
        },
        {
          "max": 0.5,
          "band": "medium"
        },
        {
          "max": 0.7,
          "band": "high"
        },
        {
          "else": "high-variance"
        }
      ]
    }
  },
  "distance_scoring": {
    "weights": {
      "volume": 0.3,
      "entropy_time": 0.3,
      "entropy_size": 0.2,
      "entropy_destination": 0.2
    },
    "temperature_tau": 12,
    "similarity_function": "exp(-distance/tau)"
  },
  "classification_policy": {
    "top_k": 3,
    "confidence_threshold": 0.55,
    "fallback_category": "U1",
    "fallback_device_class": "DC8"
  },
  "severity_scoring": {
    "delta_scale": 30,
    "thresholds": {
      "ok": 0.25,
      "warn": 0.45,
      "high": 0.7,
      "critical": 1.0
    }
  },
  "hard_contradiction_rules": [
    {
      "id": "HC1",
      "if": "observed.volume_band in ['high','very_high'] and expected.volume_band in ['very_low','low']",
      "min_severity": "high",
      "message": "Observed volume contradicts expected low-volume behavior."
    },
    {
      "id": "HC2",
      "if": "expected.category in ['M1','M2','M3','M4'] and observed.entropy_destination in ['medium','high','high-variance']",
      "min_severity": "high",
      "message": "Meter expected deterministic single-endpoint behavior."
    },
    {
      "id": "HC3",
      "if": "expected.category in ['I1','I3'] and observed.volume_band in ['very_low','low']",
      "min_severity": "warn",
      "message": "Imaging device shows unusually low data volume."
    }
  ],
  "ui_policy": {
    "show_top_k": true,
    "show_confidence": true,
    "show_evidence": true,
    "lock_fields": false
  },
  "governance": {
    "auto_learning": false,
    "override_logging": true,
    "versioned": true,
    "requires_confirmation_for_reporting": true
  }
}

What this config gives us (why it’s strong)

This file is intentionally operational, not academic. It centralizes all tunables so we can:

Adjust behavior per tenant / sector without code changes
Keep inference deterministic and explainable
Version and audit scoring logic like any other governed artifact

Key sections (how they’re used)

observation_window

Standardizes inference windows (24h / 6h fallback)
Prevents “short window noise” from polluting classification

band_mapping

Canonical logic for mapping raw signals → bands
Keeps entropy logic out of code and in policy
Easy to evolve (e.g. replace proxies with Shannon later)

distance_scoring

Weighted ordinal distance → similarity
Temperature (tau) gives you graceful decay instead of hard cutoffs
This is the heart of probabilistic inference

classification_policy

Enforces conservative behavior:
Top-3 only
Explicit confidence threshold
Safe fallback to DC8 / U1

severity_scoring

Independent contradiction severity scale
Decouples “best guess” from “trustworthiness”

hard_contradiction_rules

Fast, deterministic safety rails
These are auditor-friendly because they are explicit and readable

ui_policy

Guarantees no dark patterns
Aligns with ZAYAZ trust-first UX

governance

Explicitly disables auto-learning
Forces human confirmation before reporting
Makes this safe for CSRD/ESRS environments

Architectural note (important)

This config should be treated as:

Reference data + policy, not application config

Meaning:

Version it
Sign it (later)
Reference the version in every inference result:

"scoring_config_version": "MEID_IOT_V1_SCORING@1.0.0"

That gives us full forensic traceability.

MEID-IOT-V1 Explain Payload Schema

1. Envelope (what every inference event returns)

payload-schema-example.json
{
  "inference_id": "uuid",
  "source_id": "uuid-or-stable-id",
  "observed_window": {
    "start_at": "2026-01-08T00:00:00Z",
    "end_at": "2026-01-09T00:00:00Z",
    "duration_s": 86400
  },
  "model": {
    "engine": "MEID_IOT_V1",
    "engine_version": "1.0.0",
    "scoring_config": {
      "name": "MEID_IOT_V1_SCORING",
      "version": "1.0.0"
    }
  },
  "status": {
    "result_quality": "OK|LOW_DATA|DEGRADED|ERROR",
    "confidence": 0.82,
    "fallback_applied": false
  },
  "prediction": {
    "device_class": { "code": "DC2", "label": "Meter", "p": 0.90 },
    "top_k_functional_categories": [
      { "code": "M1", "label": "Electricity", "p": 0.82, "score": 0.71, "distance": 6.0 },
      { "code": "M2", "label": "Thermal / Heat", "p": 0.11, "score": 0.22, "distance": 18.0 },
      { "code": "G2", "label": "Protocol Bridge", "p": 0.07, "score": 0.14, "distance": 22.0 }
    ]
  },
  "explain": {
    "observed_features": { },
    "observed_bands": { },
    "expected_bands_by_candidate": { },
    "deltas_by_candidate": { },
    "flags": [ ]
  },
  "provenance": {
    "data_sources": [
      { "type": "NETFLOW", "ref": "flow_log_id_or_bucket_key" }
    ],
    "evidence_refs": [
      { "type": "FEATURE_SNAPSHOT", "ref": "hash-or-object-key", "hash": "sha256:..." }
    ]
  }
}

Why this envelope works

Prediction is separated from explainability, so UI can render quickly.
You can store the entire object for audit, but also index just the prediction fields.
score and distance are included for debugging + model tuning (optional to show in UI).

2. explain.observed_features (raw, privacy-safe stats)

Keep this minimal, numeric, and non-PII:

observed-feature-example.json
{
  "bytes_per_day": 420000,
  "avg_bytes_per_min": 291,
  "packet_size_entropy_norm": 0.12,
  "destination_entropy_norm": 0.05,
  "periodicity_score": 0.92,
  "inter_arrival_cv": 0.18,
  "burstiness_index": 0.08,
  "destination_count": 1
}

Optional extensions (still safe):

tls_seen: true/false
ports_top: [443, 8883]
sni_domains_top: hashed/normalized (avoid raw domain unless allowed)

3. explain.observed_bands (derived expectations format)

observed-bands-example.json
{
  "volume_band": "low",
  "entropy_time": "very_low",
  "entropy_size": "very_low",
  "entropy_destination": "very_low",
  "cadence_type_hint": "periodic"
}

These are exactly the values the matrix expects and the scoring uses.

4. Expected bands per candidate (Top-K only)

Only include Top-K candidates to keep payload small:

expected-bands-by-candidate.json
{
  "M1": {
    "volume_band": "low",
    "entropy_time": "very_low",
    "entropy_size": "very_low",
    "entropy_destination": "very_low",
    "cadence_type": "periodic"
  },
  "M2": {
    "volume_band": "low",
    "entropy_time": "very_low",
    "entropy_size": "very_low",
    "entropy_destination": "low",
    "cadence_type": "periodic"
  }
}

5. Deltas (this is the “why” in one line)

deltas-by-candidate.json
{
  "M1": { "ΔV": 0, "ΔT": 0, "ΔS": 0, "ΔD": 0 },
  "M2": { "ΔV": 0, "ΔT": 0, "ΔS": 0, "ΔD": 10 }
}

This enables a clean UI:

“Matches expected cadence and low entropy”
“Slight mismatch in destination behavior”

6. Flags (contradictions & risk signals)

This is the trust layer, not the classifier:

flags.json
[
  {
    "id": "HC2",
    "severity": "HIGH",
    "scope": "candidate:M1",
    "message": "Meter expected deterministic single-endpoint behavior.",
    "evidence": {
      "observed": { "entropy_destination": "high" },
      "expected": { "entropy_destination": "very_low" }
    }
  }
]

Also support general flags:

scope: "source" (applies regardless of category)
scope: "candidate:<code>"

V1 Storage & Indexing recommendation (important)

Store two representations:

A. Full JSON (append-only)

For audit, replay, verifier APIs
Content-addressable evidence references

B. Indexed columns (for fast queries)

source_id, inference_id, created_at
predicted_device_class, top1_category, confidence
max_severity, flag_count, fallback_applied
scoring_config_version, engine_version

Minimal JSON Schema (practical constraints)

If strict validation is wanted, enforce:

required: inference_id, source_id, observed_window, model, status, prediction, explain
limit: Top-K must match config
enums for severities and band values from behavior_band_dictionary

UI rendering rules (tiny but powerful)

For each inferred field:

show label + (AI, 82%)
“Why?” expands:
- observed bands
- expected bands for top1
- 1–3 deltas
- flags (if any)

This makes the assistant trust-mediating, not “guessing”.

GitHub Repo Request for Change (RFC)

1. ZAYAZ IoT Device Inference – V1 Sketch​

1. V1 Scope (what we deliberately include / exclude)​

2. V1 Target Outputs (what the system actually produces)​

3. Core Component: MEID-IOT-V1 (Micro-Engine)​

Purpose​

Feature Set (V1)​

4. Inference Logic (V1 = transparent, not fancy)​

Model choice (recommended)​

5. FOGE Integration: Prefill with Guardrails​

6. Override = First-Class Data Asset​

7. Data Model Additions (SSSR-aligned)​

New entity (V1 minimal)​

Extend existing source entity​

8. KPIs for V1 (decide success early)​

9. V1 Risks & How We Neutralize Them​

10. Why this V1 is strategically sound​

ZAYAZ IoT Device Category Ontology — V1​

Design principles (important to state explicitly)​

Level 1: Device Class (Top-Level)​

Level 2: Functional Category (V1-controlled list)​

Level 3: Attributes (NOT inferred in V1)​

Inference → Ontology Mapping (V1 rules)​

Why this ontology works (strategically)​

Critical UX Rule (must be enforced)​

ZAYAZ IoT Category → Behavior Matrix (V1)​

How to read this matrix (important)​

Legend​

DC1 — Sensors​

DC2 — Meters​

DC3 — Actuators​

DC4 — Imaging / AV​

DC5 — Gateway / Hub​

DC6 — Mobility / Asset​

DC7 — Controller / PLC​

DC8 — Unknown / Hybrid​

MEID-IOT-V1 Scoring Spec (Tiny)​

0. Inputs​

1.0 Normalize observed signals into observed bands​

1.1 Volume band mapping​

1.2 Time entropy band mapping (cheap + robust)​

1.3 Size entropy band mapping​

1.4 Destination entropy band mapping​

2 Distance scoring against each candidate category​

2.1 Ordinal deltas​

2.2 Weighted distance (V1 defaults)​

2.3 Normalize to similarity score​

3 Produce probabilities (Top-K)​

4 Contradiction severity scoring (separate from classification)​

4.1 Severity score​

4.2 Severity thresholds​

4.3 Hard contradiction rules (V1)​

5 Prefill guardrails (V1 policy)​

6 Minimal “Why” explanation payload (for audit + UX)​

What this config gives us (why it’s strong)​

MEID-IOT-V1 Explain Payload Schema​

1. Envelope (what every inference event returns)​

2. explain.observed_features (raw, privacy-safe stats)​

3. explain.observed_bands (derived expectations format)​

4. Expected bands per candidate (Top-K only)​

5. Deltas (this is the “why” in one line)​

6. Flags (contradictions & risk signals)​

V1 Storage & Indexing recommendation (important)​

Minimal JSON Schema (practical constraints)​

UI rendering rules (tiny but powerful)​