chore: update vendor submodules to latest upstream

fix(recorder): bound history query (memory-DoS) + add missing transactional purge (disk-DoS); SQL-injection & NaN dims clean (#1084 )
* fix(homecore-recorder): bound history query + add transactional purge (memory-DoS + disk-DoS) Security review of the HA-compat state recorder (ADR-132) found two real bounding bugs; SQL-injection and NaN-index dimensions confirmed clean. (1) Memory-DoS: get_state_history carried no LIMIT — a wide [since,until] window over a high-frequency entity loaded an unbounded row set into a single in-memory Vec. Added LIMIT MAX_HISTORY_ROWS (1,000,000); the sibling search paths were already k-bounded. (2) Disk-DoS / documented-but-missing purge: README advertised Recorder::purge(older_than) but no retention path existed -> unbounded disk growth. Added a transactional purge with an EXCLUSIVE cutoff (idempotent, no off-by-one) that deletes old states+events and garbage-collects orphaned state_attributes blobs (dedup-shared blobs are kept until their last referencing state is gone). All three deletes run in one transaction so a mid-purge failure rolls back cleanly. Pinning tests (homecore-recorder 19->25 no-default / 25->31 ruvector, 0 failed): - malicious_entity_id_is_stored_literally_not_executed (SQL injection) - like_metacharacters_in_query_are_literal_not_wildcards (LIKE escape) - history_query_carries_a_limit_clause (memory-DoS bound) - purge_keeps_boundary_row_and_drops_older (exclusive-cutoff, true pin) - purge_gcs_orphaned_attributes_but_keeps_shared (dedup-safe GC) - purge_also_removes_old_events No behaviour change beyond the two fixes. Python deterministic proof unchanged (recorder is off the signal proof path). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore-recorder): record ADR-132 security review findings Add a "3a. Security review" section to ADR-132 and a CHANGELOG [Unreleased] Security entry covering the homecore-recorder review: SQL-injection and NaN-index dimensions confirmed clean with evidence (every query bound; LIKE pattern bound+escaped; SHA-256->i32->f32 embeddings always finite, empty index/k=0 probed no-panic), plus the two fixes (unbounded history LIMIT, transactional exclusive-cutoff purge with orphan-attribute GC). Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-30 13:43:18 +00:00 · 2026-06-15 01:10:44 +00:00 · 2026-06-14 21:00:52 -04:00 · 2026-06-14 20:22:07 -04:00 · 2026-06-14 19:37:08 -04:00 · 2026-06-14 19:04:09 -04:00
39 changed files with 2154 additions and 64 deletions
@@ -7,7 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+### Security
+- **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32` → `f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path).
+
 ### Added
+- **RuField `rufield-viewer` live-ingest mode — closes the RuView↔RuField visual loop (ADR-262 surfaces).** The dashboard gains `--source live --upstream <RuView-URL>`: it consumes RuView's `/ws/field` SSE (falling back to polling `/api/field`), **verifies every event's ed25519 provenance receipt on ingest** (`is_fusable`) — forged/tampered events are flagged ✗ and **never fused** into trusted inferences — and renders real RuView `FieldEvent`s through the same room-state/privacy-badge/fusion-graph/receipt path the synthetic mode uses (wire-compatible by construction: both sides use `rufield_core::FieldEvent` serde). **Strict banner honesty:** a single `BannerState` shows `SYNTHETIC` / `LIVE — <upstream>` / `DISCONNECTED — <upstream> unreachable`, mutually exclusive — never SYNTHETIC while showing live data or vice versa; live mode returns **409** on `/api/run` rather than fabricate a synthetic run, and starts DISCONNECTED until first verified contact. Default stays synthetic. 26 tests / 0 failed. `ruvnet/rufield` `crates/rufield-viewer`; `vendor/rufield` submodule bumped.
 - **ADR-262 P3 — live RuField surface: RuView's running sensing-server now speaks RuField on `/api/field` + `/ws/field`.** Wires the P1 `wifi-densepose-rufield` bridge into the live `wifi-densepose-sensing-server` (the bridge is the only added coupling, ADR-262 §5.4). A new `src/rufield_surface.rs` module (kept out of the 8k-line `main.rs`) holds a `FieldSurface` with a **dedicated ed25519 `Signer`**, a bounded ring buffer of recent signed events (`FIELD_RING_CAPACITY = 64`), and the `/ws/field` broadcast topic; it exposes `GET /api/field` (latest signed `FieldEvent`s + signer pubkey + a `dev_signing_key` flag) and `GET /ws/field` (per-cycle stream, mirroring `/ws/sensing`), plus a standalone `router()` for isolated testing. **Tap:** at the ESP32 governed-trust cycle (`main.rs` `observe_cycle` ~`:5886` / `SensingUpdate` build ~`:5938`), `emit_rufield_event` joins the cycle's real `SensingUpdate` (features/classification/signal_field) with the engine's recorded `effective_class`/`demoted` trust state into a `SensingSnapshot` and surfaces a signed `FieldEvent` — **existing endpoints (`/ws/sensing` etc.) are unchanged; this is purely additive.** **Signer (defers the P2 key decision, §8 Q1):** a **standalone dev/sensing key** from `WDP_RUFIELD_SIGNING_SEED` (64-hex or ≥32-byte value), else a deterministic dev default with a logged `WARN` — reusing the `cog-ha-matter` Ed25519 key is the deferred P2 call, so P3 does not pre-empt it. **Egress privacy (fail-closed):** `network_egress_allowed` is *stricter* than `DefaultPrivacyGuard` for an unattended live surface — only **P1/P2** leave the box; P0 (raw) and P3/P4/P5 are held edge-local, so a `Derived → P4/P5` cycle **never** surfaces; no-presence cycles emit **no phantom event**. **P3 acceptance gates (`tests/rufield_surface_test.rs`, 4 integration via `tower::oneshot` + 4 module unit, 0 failed):** a well-formed **signed** event (`Modality::WifiCsi`, P2 not P1, `is_fusable` ed25519-verified, real timestamp); empty cycle → no phantom; **privacy-safety** — an injected `Derived` trust never surfaces; a mixed stream surfaces only egress-safe events. **Honest scope (ADR-262 §0/§6):** real plumbing on a **live endpoint**, **NOT accuracy** — single-link CSI with its existing caveats (no validated room-coordinate accuracy — `field_localize`), a dedicated dev signing key pending the P2 ownership decision, no accuracy claim. The win is narrowly: "RuView's live sensing now speaks RuField on `/ws/field`."
 - **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0–P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy` — `Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path).
 - **ADR-262 (Proposed): RuField MFS ↔ RuView integration — a live `SensingServerAdapter`, a privacy/provenance bridge, MAPPED not papered-over.** Researched integration design for wiring RuField into RuView. Recommends: a thin **`wifi-densepose-rufield` bridge crate** (anti-corruption layer, path-deps on the `vendor/rufield` submodule — the `vendor/rvcsi` pattern, since rufield crates are unpublished); a **live `SensingServerAdapter`** that taps the real `SensingUpdate` emit site joined with `TrustedOutput` trust state and emits one signed `FieldEvent`/cycle (the file-based `CsiReplayAdapter` stays for offline replay); **vertical fusion composition** (ruvsense fuses *within* WiFi → one `wifi_csi` event → rufield-fusion graph fuses *across* modalities above it); and **one canonical privacy/provenance model** (RuView `effective_class` is source-of-truth, mapped to RuField P0–P5 at egress; reuse the existing `cog-ha-matter` SHA-256+Ed25519 chain for the `ProvenanceReceipt`). **Key honest finding:** RuView has **two privacy enums + three witness mechanisms across two hash algorithms** that do not map 1:1 onto P0–P5, and a real trap — RuView's `Derived` privacy byte (`1`) sorts *below* `Anonymous` (`2`) yet carries identity embeddings, so the bridge must map by **information content** (`Derived → P4/P5`), never by byte value, or it would leak identity as low-privacy P1. 4 independently-shippable phases, each with a test gate (round-trip / `is_fusable` / privacy-monotonicity / ed25519-verify). Honest scope: this is **plumbing architecture, not accuracy** — RuField v0.1 is synthetic and RuView's only real-CSI path is unlabeled replay; the ADR claims only architecture, gated by round-trip/monotonicity/signature tests.
@@ -18,6 +22,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **ADR-260: RuField MFS — the open specification for camera-free multimodal field sensing.** A common event / tensor / calibration / privacy / provenance model that sits *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and future quantum sensors (each modality emits a normalized `FieldEvent` → `FieldTensor` → `FusionGraph` → `PrivacyClass` → `ProvenanceReceipt`). Published as a **standalone repo** [`ruvnet/rufield`](https://github.com/ruvnet/rufield) and vendored here as the `vendor/rufield` submodule (the `vendor/rvcsi` pattern — not a `v2/` workspace member). The v0.1 reference stack is a self-contained 6-crate Rust workspace (`rufield-core`, `-provenance` [sha256 + ed25519], `-privacy` [P0–P5 guard], `-adapters` [deterministic `SyntheticSim` across wifi_csi/mmwave_radar/infrared_thermal], `-fusion` [graph + TOML weighted-Bayes rules → 7 room-state inferences], `-bench` [deterministic runner + the §31 acceptance test]). **60 tests / 0 failed, clippy-clean.** §27 acceptance criteria 1–8 and 10 PASS; the live dashboard (9) is deferred. **All benchmark metrics are SYNTHETIC** (scored against the simulator's own ground truth — presence/breathing/bed_exit/room_transition F1 = 1.000, nocturnal_scratch 0.923 reported honestly, p95 latency ~0.01 ms, provenance coverage 100%, 0 privacy violations) — they prove the pipeline recovers known truth, **not** field accuracy; real hardware adapters (ESP32 CSI, mmWave, thermal IR) are a documented roadmap item, none validated in v0.1. The Python deterministic proof is unchanged (rufield is off the signal-processing proof path).

 ### Security
+- **`homecore-automation` security review — two real DoS findings fixed (template unbounded-expansion + delay panic-on-config), each pinned by a fails-on-old test; condition-bypass / fail-closed / action-authz dimensions confirmed clean (ADR-129 §8a).** Beyond-SOTA review of the HA-compat automation engine (the execution/eval surface: triggers → conditions → actions, with user-config Jinja2 templates), un-covered by the ADR-154–159 sweep. **HC-SEC-01 (template DoS, HIGH):** a `template:` condition / `value_template` is user config and was rendered with MiniJinja's defaults — **no instruction budget, no output cap**. A single nested-loop condition rendered a **100 MB string in ~11 s on one render call** (measured) — the bfld-class unbounded expansion (MiniJinja's per-call `range()` 10k cap does **not** stop nesting). **Fixed** by enabling MiniJinja's `fuel` feature + `set_fuel(Some(1_000_000))` (the attack now fails fast ~90 ms with "engine ran out of fuel") and a 64 KiB source-length cap; legitimate templates unaffected. **HC-SEC-02 (panic-on-config DoS, MEDIUM):** `Action::Delay`/`WaitForTrigger` fed the user float straight into `Duration::from_secs_f64`, which **panics** on negative/NaN/inf/overflow — all reachable from a crafted or typo'd YAML (`delay: {seconds: -1}`, `.nan`, `.inf`, `1e308`), aborting the spawned run task (measured panic). **Fixed** by a `safe_duration_from_secs` guard that saturates (NaN/±inf/negative → `0`, matching HA's lenient "non-positive delay = no delay"; huge → clamped to ~100 yr). **Dimensions probed clean (evidence in ADR-129 §8a):** condition eval is **fail-closed** (template-render error → `false`; un-parseable `choose` branch condition → branch skipped, never silently passing); run-modes are **bounded** (Single/Restart/Queued/`max:N` — a self-triggering automation does not livelock, ADR-162 tests); templates are **read-only sandboxed** (no service-call/state-set global exposed to template scope, so a template cannot escalate to an action); no `unwrap`/`expect`/index panic reachable from a crafted config in the eval/exec path beyond the fixed `from_secs_f64`. Fails-on-old verified by reverting each fix in isolation (delay tests panic; template nested-loop test runs unbounded >60 s; oversized-source test fails). `cargo test -p homecore-automation --no-default-features`: **40 → 54 passed, 0 failed** (+14: 4 template-DoS, 1 no-regression render, 5 delay/wait + safe-duration unit). Workspace green; Python deterministic proof unchanged (homecore-automation is off the signal proof path).
+- **`cog-ha-matter` witness/manifest crypto review — engine-class signed-digest collision confirmed ABSENT (length-prefixing already correct); domain-separation tag ADDED + `verify_strict` HARDENED; key-handling & verify-before-trust confirmed clean (ADR-116 §2.2).** Beyond-SOTA crypto+security review of the Cognitum/HA-Matter bridge's SHA-256 + Ed25519 witness chain — the exact signing chain ADR-262 P2 proposes to reuse — un-covered by the ADR-154–159 sweep. **Top-priority check: the sibling `wifi-densepose-engine` bug class (unframed boundary-to-boundary concatenation of operator-influenceable strings into a signed/hashed digest).** Result reported honestly: **that bug class is ABSENT here** — `witness::canonical_bytes` already length-prefixes the two variable-length operator-influenceable fields (`kind_len:u32-be ‖ kind`, `payload_len:u32-be ‖ payload`) over fixed-width `prev_hash[32] ‖ seq:u64-be ‖ ts:u64-be`, an injective encoding (proven pre-existing by `canonical_bytes_length_prefixing_prevents_ambiguity`), and `witness_signing::sign_event`/`verify_signature` sign/verify the **identical** bytes the hash chain commits to (no separate unframed concatenation). The manifest `binary_signature` (Ed25519 over the fixed 64-hex-char `binary_sha256`) is signed **at build time by the Makefile**, not in-crate, and over a single fixed-length value — no in-crate manifest-signing concatenation surface. **Two real hardening gaps fixed, the first pinned by fails-on-old tests:**
+  - **CHM-WIT-01 (missing domain-separation tag, LOW) — ADDED.** The engine review's prescribed fix is "domain-tag **+** length-prefix"; the length-prefix half was present, the **domain tag was absent**. The witness SHA-256 preimage / Ed25519 message carried no tag distinguishing it from any other signing context that shares key infrastructure — notably the manifest `binary_signature`, the very chain ADR-262 P2 reuses. **Fix:** prepend a versioned, NUL-terminated `WITNESS_DOMAIN_TAG = b"cog-ha-matter/witness-event/v1\x00"` to `canonical_bytes` (the doc-comment already anticipated a leading version migration). Cross-protocol separation now holds: a witness signature can never be replayed as a message for another Ed25519 context. **Witness-bytes change by design** (prior on-disk witness hashes/signatures invalidated, like the engine fix) — verified safe: **no in-repo crate consumes cog-ha-matter's witness bytes/signatures programmatically** (all references are doc-comment mentions; the crate is self-contained, no `use cog_ha_matter::` anywhere). Pinned by `canonical_bytes_is_domain_separated`, `canonical_bytes_starts_with_domain_tag_then_prev_hash`, `witness_preimage_cannot_collide_with_a_bare_manifest_digest` (witness.rs) and `signature_commits_to_domain_tag_not_bare_fields` (witness_signing.rs — a signature over the **un-tagged** field concatenation must NOT verify); the domain-separation guard **FAILED on the reverted un-tagged encoding** ("canonical message is not domain-separated").
+  - **CHM-WIT-02 (permissive Ed25519 verification, LOW) — HARDENED to `verify_strict`.** For a tamper-evident **audit** chain the signature is the attestation, so `verify_signature` now uses `VerifyingKey::verify_strict` (rejects non-canonical encodings + small-order public keys per RFC 8032) instead of the permissive `Verifier::verify` — giving auditors the "one canonical signature per event" property they rely on when comparing/deduplicating signed records. Not a forgery fix (the public key is caller-pinned, never parsed from the event), reported at true LOW severity. Guarded by `verify_uses_strict_path_and_pins_caller_key`.
+  - **Dimensions confirmed clean (with evidence, no invented issues):** (1) **verify-before-trust + key-pinning** — `verify_signature` takes the verifying key as a **caller-supplied parameter** (the Seed's known key), never reads a key from the event/manifest, so a forged event carrying its own key cannot self-attest; `WitnessChain::read_jsonl` re-derives and re-checks every `this_hash` on load (tampered bundle → `HashMismatch`) and runs a chain-level `verify()` catching reordered/spliced events (existing `verify_rejects_*`, `jsonl_parser_rejects_tampered_payload`, `read_jsonl_chain_verify_catches_reordered_events`). (2) **key handling** — the crate **never generates, stores, logs, or serializes** a signing key: `sign_event` takes `&SigningKey` by reference, the manifest struct has no key field, and the only key material in-crate is the **test-only** fixed seed (clearly documented "DO NOT use in production"); production keys come from the Seed's secure key store (out of scope, ADR-116 §key-management). No hardcoded/default/predictable production key, no key in the manifest, no world-readable key path (the crate does no key file I/O). (3) **determinism/canonicalization** — `canonical_bytes` is pure positional bytes (no HashMap iteration, no float formatting); Ed25519 is deterministic (pinned by `signature_is_deterministic_for_same_event_and_key`); the JSONL wire form is hand-rolled with **alphabetically-locked** field order (`jsonl_field_order_is_alphabetical_for_byte_stability`) and the mdns TXT records are `sort()`-ed for byte-stable advertisement — no iteration-order or float-format nondeterminism feeds any hash/signature. (4) **fail-closed parsing / DoS** — `from_jsonl_line`/`from_hex`/`hex_decode` return structured errors (never panic) on wrong length, non-hex, missing field, odd-length payload, or hash mismatch (`jsonl_parser_rejects_non_hex_hash`, `hex_decode_rejects_odd_length`, …); `main.rs` reads no untrusted files/paths (clap args only; `--print-manifest` emits a static template) — no path/injection surface. (5) **de-magic** — the witness/signing byte layout is already expressed as named widths; no bare security-relevant literals worth extracting beyond the new named `WITNESS_DOMAIN_TAG`. `cog-ha-matter --no-default-features`: **64→68 tests**, 0 failed (+3 domain-tag witness, +1 signing-layer domain-commit, +1 strict-verify key-pin; one pre-existing test renamed to assert the tag). Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — cog-ha-matter is off the signal proof path). Review notes appended to ADR-116 §2.2.
+- **`homecore-api` (HA-wire-compat REST + WebSocket) beyond-SOTA security review — `GET /api/` auth-gate gap FIXED + WS event-stream lag-DoS robustness FIXED; auth/traversal/injection/info-leak dimensions confirmed clean (ADR-161 / ADR-130).** Network-facing review of the HA-wire-compat API layer (remote attack surface), not covered by the ADR-154–159 sweep — same scrutiny the sibling `wifi-densepose-engine` and `-bfld` reviews got. **Two real bugs fixed, each pinned by a fails-on-old test.**
+  - **HC-API-AUTH-01 (auth-gate gap, LOW) — `GET /api/` was unauthenticated; FIXED.** Every sibling REST route (`/api/config`, `/api/states`, `/api/services`, …) calls `BearerAuth::from_headers` first, but `rest::api_root` took no headers and unconditionally returned `200 {"message":"API running."}`. HA's `APIStatusView` inherits `requires_auth = True`, so an unauthenticated/wrong-token request to `/api/` must be **401** — HA clients use this status route as a token-validation probe, and a 200 both told a bad-token client its token was good and let an unauthenticated party confirm a live endpoint. Severity is LOW (the body is a static string — no entity/state data leaks), reported at true severity, not inflated. **Fix:** `api_root` now validates the bearer like its siblings. Pinned by `api_root_rejects_missing_bearer` + `api_root_rejects_wrong_bearer` (both 200→assert-401 on old code) and guarded by `api_root_accepts_correct_bearer`.
+  - **HC-WS-LAG-01 (DoS-adjacent silent failure, LOW) — `subscribe_events` killed the event stream on a broadcast lag; FIXED.** The per-subscription task matched `Err(_) => break` on both `broadcast::Receiver::recv()` arms, but `Lagged(n)` (a slow consumer falling >4,096 events — `EVENT_CHANNEL_CAPACITY` — behind) is **recoverable**: the bus doc itself says "Lagged receivers must re-sync", and HA's WS contract keeps the subscription alive across a lag. The old code treated the first lag as fatal, so after an event burst the client's stream went **permanently silent** with no error frame — a self-inflicted event-delivery DoS under load. **Fix:** `Lagged(_) => continue` (skip the dropped window, re-sync), `Closed => break`, on both the system and domain arms. Pinned by `subscription_survives_broadcast_lag` (subscribes, floods 6,000 filtered events past the 4,096 capacity to force a `Lagged`, then asserts a subsequent subscribed event is still delivered — 5s-timeout panic on old code).
+  - **Dimensions confirmed clean (with evidence, no invented issues):** (1) **AuthN/AuthZ** — all 7 other REST handlers (`get_config`/`get_states`/`get_state`/`set_state`/`delete_state`/`get_services`/`call_service`) gate on `BearerAuth::from_headers` → `LongLivedTokenStore::is_valid` before any work; the WS handshake validates the `auth` token against the **same** store before entering the command loop and the privileged commands are unreachable pre-`auth_ok` (HC-WS-01, already fixed). Token compare is a `HashSet::contains` (content-independent timing, not the byte-`==` oracle ADR-157 §B4 fixed in hardware) — no timing-oracle finding. No route skips the gate, no result-ignored check, no default/empty token accepted (`is_valid` rejects empty internally; `from_env` is non-dev). (2) **Path traversal** — **no route maps user input to a filesystem path** (state lives in an in-memory `DashMap`); `:entity_id` is funneled through `EntityId::parse`, a strict `[a-z0-9_]+\.[a-z0-9_]+` ASCII allowlist that rejects `..`, `/`, `\`, and absolute paths. No traversal surface exists. (3) **Injection** — no SQL, no shell/subprocess, no `format!`-into-response; `call_service`/`set_state` bodies are typed `serde_json::Value` passed to the in-process service registry (matches HA). (4) **Info-leak** — `ApiError` maps to fixed status + a `{message}` derived only from typed variants; `call_service`'s `ServiceError::HandlerFailed(String)` is integration-controlled (mirrors HA surfacing the handler error), not framework internals/paths/stack-traces (no ADR-080-class leak). (5) **CORS** is an explicit allowlist (`allow_credentials(false)`, HC-05 already fixed), not `permissive()`. (6) **De-magic** — no bare security-relevant literals in this crate worth extracting (`EVENT_CHANNEL_CAPACITY` already named in `homecore`; CORS dev-default ports are documented). `homecore-api --no-default-features`: **25→29 tests**, 0 failed (+2 api-root auth, +1 api-root accept-guard, +1 WS lag-survival); workspace green; Python deterministic proof unchanged (homecore-api is off the signal proof path). Review notes appended to ADR-161.
+- **`wifi-densepose-calibration` per-room calibration review — NaN-poisoning fail-closed gap FIXED + file/path & receipt surfaces confirmed clean (ADR-151).** Beyond-SOTA correctness+security review of the ADR-151 `baseline → enroll → extract → train → bank` pipeline (the appliance-deployed per-room specialist core), un-covered by the ADR-154–159 sweep. **One real numerical-robustness bug fixed.** `Features::from_series` — the live-inference *and* training feature path — computed `mean`/`variance`/`motion` over the raw scalar series with **no non-finite guard**, so a single `NaN`/`±inf` sample (a corrupt CSI frame) produced `mean=NaN, variance=NaN` and an all-`NaN` prototype embedding. Baked into a persisted `PresenceSpecialist::threshold`/`empty_mean` at train time, that `NaN` **silently disabled presence detection** for the life of the bank (every `f.variance > NaN` and `|mean − NaN|` comparison is false → presence always reads *absent*, confidence 0), with **no error raised** — the exact "produce NaN that poisons a specialist / silently accept garbage" failure, and an asymmetry vs the meticulously NaN-guarded `geometry_embedding.rs`. **Fix at the production boundary:** filter non-finite samples before any statistic (a corrupt frame counts as no frame); a wholly-non-finite series degrades to the new `Features::ZERO`, exactly like the empty series. **Value-identical for all-finite input** — `full_loop.rs` and every existing `extract` test pass unchanged. Pinned by two fails-on-old tests (`non_finite_samples_do_not_poison_features`, `all_non_finite_series_is_zero`, both FAILED pre-fix). **Dimensions confirmed clean (with evidence, no invented issues):** (1) **file/path handling** — the crate does **zero** file/path I/O (no `std::fs`/`Path`/`File`/`read`/`write` anywhere in `src/`; only in-memory `serde_json`), so path-traversal / unbounded-read / artifact-path concerns do not exist at the crate boundary — they live in the `wifi-densepose-cli` consumer (`room.rs`), out of this crate's scope; (2) **untrusted-load** — `SpecialistBank::from_json` parse-validates shape via serde (malformed → `CalibrationError::Serde`), and per ADR-151 invariant (B) banks are local-first, never network-received; (3) **receipt/hash integrity** — the crate emits **no** hash/receipt/witness/signature (no `CalibrationReceipt` analogue), so the engine's unframed-concatenation bug class is structurally absent — nothing to mis-frame; (4) **other numerical paths already robust** — `geometry_embedding.rs` sanitizes every input + sweeps to finite (verified by its `adversarial_inputs_never_produce_nan` test); presence/restlessness/anomaly divisions are all `.max(1e-3)`-guarded; `autocorr_dominant` guards `r0 ≤ 1e-6`, `n < 16`, empty bands; `SpecialistBank::train` rejects empty anchors; anomaly requires ≥2 anchors. De-magicked the bare specialist threshold literals (breathing 0.25 / heartbeat 0.3 default min-scores, anomaly 2.0× spread / >0.5 label cutoff) into named documented consts, value-identical, pinned by `default_min_score_constants_match_prior_literals` + `anomaly_constants_match_prior_literals`. `wifi-densepose-calibration --no-default-features`: **58→62 unit tests** (+2 NaN fail-closed, +2 de-magic pins) + 1 full-loop integration, 0 failed. Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — calibration is off the signal proof path). Review notes appended to ADR-151 §6.
+- **`wifi-densepose-engine` governed-trust review — witness domain-separation gap FIXED + privacy monotonicity confirmed clean (ADR-137 / ADR-141 / ADR-032).** Beyond-SOTA correctness+security review of the security-critical composition root (the cycle enforcing RuView's privacy guarantees), not covered by the ADR-154–159 sweep. **One real witness-integrity bug fixed.** `witness_of` concatenated `model_version`, `calibration_version`, and `privacy_decision` boundary-to-boundary and left the variable-length evidence list without a count, so a string straddling a field boundary collided with a *different* trust decision — e.g. a per-room adapter id (ADR-150 §3.4, operator-influenceable) absorbing the leading bytes of the calibration epoch (`model="…cal:00a"`,`cal="b"`) yields the same witness as `model="…"`,`cal="cal:00ab"`. Two distinct privacy-relevant input tuples → one witness defeats the ADR-137 §2.7 "any privacy-relevant delta → different witness" tamper/drift audit. **Fix:** domain-tag the BLAKE3 hash (`ruview.engine.witness.v1`), write an explicit evidence count, and **length-prefix every field** (8-byte LE length ‖ bytes) — unambiguous framing regardless of contents. Witness-layout change by design (prior witness bytes invalidated); downstream consumers (`engine_bridge`, rufield) assert only witness *relationships* (`assert_ne`/`assert_eq` across runs), never absolute bytes, so nothing breaks. Pinned by two fails-on-old tests: `witness_distinguishes_model_calibration_boundary`, `witness_distinguishes_evidence_model_boundary`. **Dimensions confirmed clean (with evidence, no invented issues):** (1) **privacy monotonicity** — `effective_class` is recomputed each cycle from the active mode's floor with at most a single-step `demote_one` (clamped at `Restricted`), no cross-cycle state, proven over **all 5 modes** by `forced_contradiction_never_relaxes_class` (forced contradiction only ever raises the class byte; clean cycle == base); (2) **fail-closed** — empty cycle errors with no degenerate output (`empty_cycle_fails_closed`), single-node boundary characterized (`single_node_cycle_is_well_formed`), NaN coupling → `max(0.0)`→absent edge→at-risk (more restrictive); (3) **witness determinism** — no HashMap iteration / float formatting feeds the hash; (4) **mesh_guard** (ADR-032) — partition-risk → demotion path verified, thresholds already named documented fields. De-magicked the engine-construction literals (coherence accept gate, ADR-143 SLAM discovery + static-anchor thresholds) into named documented consts, value-identical, pinned by `engine_constants_match_prior_values`. `wifi-densepose-engine --no-default-features`: **27→33 tests**, 0 failed (+2 witness, +1 monotonicity property, +2 fail-closed boundary, +1 de-magic pin). Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — the engine is off the signal proof path). Review notes appended to ADR-137 (witness) and ADR-141 (monotonicity).
+- **ADR-141 BFLD privacy-bypass closed — `process_to_frame` now routes the payload through `PrivacyGate` (`wifi-densepose-bfld`).** `BfldPipeline::process_to_frame` stamped the emitted `BfldFrame` header with the active `PrivacyClass` but serialized the caller-supplied `BfldPayload` **unchanged** via `BfldFrame::from_payload`. A frame labeled `Anonymous`(2) or `Restricted`(3) therefore carried the full identity-leaky `compressed_angle_matrix` (the beamforming-angle identity surface) + amplitude/phase proxies + `csi_delta` — exactly the sections `PrivacyGate::demote` is documented and tested (`privacy_gate_demote.rs`) to strip at those classes. Because a `NetworkSink` accepts class ≥ `Derived`(1), such a frame would publish the identity surface across the node boundary despite its restrictive class byte; the class byte lied about payload content. **Fix:** after building the frame at the active class, apply `PrivacyGate::demote` to the same class — a no-op class transition that strips the sections that class forbids (research classes `Raw`/`Derived` keep the full payload). Pinned by three fails-on-old tests in `pipeline_to_frame.rs` (`…_at_anonymous_strips_identity_leaky_sections`, `…_in_privacy_mode_strips_amplitude_and_phase` — both FAILED pre-fix; `…_at_derived_preserves_full_payload` guards against over-stripping). Grade: privacy-bypass FIXED + regression-pinned.
 - **ADR-157 Milestone-1 B4 - constant-time HMAC sync-beacon tag compare (`wifi-densepose-hardware`).** `AuthenticatedBeacon::verify` compared the 8-byte HMAC-SHA256 tag with `self.hmac_tag == expected`, which short-circuits on the first differing byte and leaks, through verification latency, how many leading bytes an attacker's forged tag matched - a byte-by-byte tag-recovery oracle (~256*N trials instead of 256^N). Replaced with a hand-rolled branch-free `constant_time_tag_eq` (XOR-accumulate every byte difference into a single `u8`, no early exit, `#[inline(never)]` + `core::hint::black_box` to stop the optimizer reintroducing a short-circuit or a non-constant-time `memcmp`). **No new dependency** - ADR-157 had deferred this only to avoid adding the `subtle` crate; a fixed 8-byte compare needs none. Grade MEASURED (constant-time *construction*; micro-timing on a noisy host is a smoke check only, gated `#[ignore]`). Pinned by `tag_compare_is_constant_time_shape` (equal/first-differ/last-differ/all-differ/length-mismatch + an end-to-end `verify()` last-byte tamper), proven to fail on a last-byte-skipping constant-time bug. ADR-157 §8 B4 -> RESOLVED.
 - **ADR-080 open HIGH findings closed on the Rust `wifi-densepose-sensing-server` boundary (ADR-164 G11).** The QE sweep's three HIGH findings — XFF-spoofing bypass, leaked stack traces, JWT-in-URL (CWE-598) — were logged against the Python v1 API and never re-verified against the shipped Rust sensing-server; the HOMECORE/M7 sweep (ADR-161) covered `homecore-server`, not this crate.
  - **#2 leaked internal errors (the one live exposure) — FIXED.** Six handlers in `main.rs` serialized the internal error `Display` straight into the JSON response body: `edge_registry_endpoint` returned a panicked `spawn_blocking` `JoinError` (`"task … panicked"`) in a `500`, plus the raw upstream error in a `503`; `delete_model`/`delete_recording`/`start_recording` returned `std::io::Error` strings (OS detail / path); `calibration_start`/`calibration_stop` returned the `FieldModel` error chain. New `error_response` module logs the full detail **server-side only** (with a correlation id) and returns a generic body (`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file paths, no Debug chain. 5 module tests (a leak-substring guard proven to fail on the reverted old body) + the existing handler suite.
@@ -25,6 +41,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  - **#3 JWT-in-URL (CWE-598) — VERIFIED ABSENT, regression-pinned.** `require_bearer` reads the token only from the `Authorization` header; the WebSocket handlers take no token query param and the sole `Query` extractor (`EdgeRegistryParams`) is a non-secret `refresh` flag. Added a regression proving `?token=`/`?access_token=` in the URL never authenticates while the header path still does.

 ### Fixed
+- **`wifi-densepose-geo` numerical-robustness audit — `parse_hgt` degenerate-input panic FIXED + `haversine` antipodal NaN FIXED; pole-singularity & pointcloud NaN-state-poisoning confirmed clean (ADR-154-class sweep).** Targeted numerical-robustness audit of `wifi-densepose-geo` + `wifi-densepose-pointcloud`, hunting the proven non-finite-input-poisons-persistent-state class. **Two real bugs in `geo`, each pinned by a fails-on-old test.** (1) **`terrain.rs::parse_hgt` usize-underflow panic** — `side = sqrt(n_samples)`; for an empty / sub-2x2 buffer `side ≤ 1`, so `1.0 / (side - 1)` underflows `usize` (panic "attempt to subtract with overflow" in debug; wraps to a huge value in release → garbage/inf `cell_size_deg` that then poisons every `ElevationGrid::get` lookup). A truncated SRTM download, a 404 HTML body, or an empty response all reach `parse_hgt` — now `bail!`s with a clear error when `side < 2`. Pinned by `parse_hgt_empty_data_errors_not_panics` (panicked pre-fix) + `parse_hgt_single_sample_errors` (returned inf pre-fix) + a `parse_hgt_minimal_2x2_is_finite` guard. (2) **`coord.rs::haversine` asin-domain → NaN** — for (near-)antipodal points floating rounding can push `h.sqrt()` to `1.0 + ~4e-16`, and `asin(>1)` is NaN, silently breaking every downstream `<`/`>` distance comparison (verified: pair `(-44.4994,-178.95722)→(44.49939999,1.04278001)` yields `h=1.0000000000000004`). Fixed by clamping into `[0,1]` before `asin`. Pinned by `haversine_near_antipodal_is_finite_not_nan` (NaN pre-fix). The ±90° pole-singularity (`cos(lat)=0` division in the ENU transforms) is pinned as no-panic without changing the transform (value-identical for valid inputs). **`wifi-densepose-pointcloud` is confirmed-robust — no bug, no manufactured finding:** the only persistent auto-accumulating state (`occupancy` EMA, vitals) is fed exclusively from the integer-rssi/`sqrt`/`atan2` parser, which can only emit finite values, and the persistent state is provably self-healing even under an adversarial hand-built `CsiFrame` carrying NaN/inf amplitudes+phases (`motion_score=(NaN/100).min(1.0)→1.0`; breathing path `→0→clamp(5,40)→5.0`; tomography EMA uses only integer rssi). Pinned by `nonfinite_frame_does_not_poison_persistent_state` (injects 40 poisoned frames, asserts occupancy/vitals stay finite + the pipeline recovers) and three degenerate-voxel-fusion no-panic tests (empty/single/all-coincident). `wifi-densepose-geo --no-default-features`: 9→15 lib (+6), 8 integration unchanged; `wifi-densepose-pointcloud`: 18→22 (+4); 0 failed; workspace green; Python proof unchanged (`f8e76f21…46f7a`, bit-exact — both crates off the signal proof path).
+- **Vitals IIR filters self-heal after a non-finite CSI frame — a single NaN/inf no longer permanently kills breathing & heart-rate extraction (`wifi-densepose-vitals`, safety; ADR-021 / ADR-158 §A1).** The 2nd-order resonator in `breathing::BreathingExtractor::bandpass_filter` and `heartrate::HeartRateExtractor::bandpass_filter` latches each output `y[n]` into the filter state (`y1`/`y2`). A non-finite input — one NaN/inf amplitude residual from a corrupt CSI frame — produced a NaN `output` that was written into the state. The existing `extract()` `is_finite()` guard correctly dropped that single sample from history, **but never sanitized the poisoned filter state**, so every subsequent output stayed NaN, was rejected too, and the sliding-window history *never refilled*: the extractor went silently dead (returning `None` forever) until `reset()`. On the vitals alert path this is a safety-relevant denial of service — one bad frame and breathing **and** heart-rate monitoring stop, with no error surfaced. Fix: when `bandpass_filter` computes a non-finite `output` it now resets the IIR state to default and returns `0.0`, so the resonator recovers on the next clean frame (the `0.0` is still dropped by the caller's finite-check — no spurious sample enters history). Same class as the calibration NaN bug (ADR-154 §3) and the firmware vitals fixes (#998/#996/#987): the prior hardening guarded the *history boundary* but not the *filter-state boundary*. Pinned by `breathing::tests::nan_frame_does_not_permanently_poison_filter`, `breathing::tests::inf_mid_stream_does_not_freeze_history`, and `heartrate::tests::nan_frame_does_not_permanently_poison_filter` (all three FAIL on the pre-fix code, verified by reverting). Also de-magicked the safety-critical HR physiological plausibility band into named `HR_PLAUSIBLE_MIN_BPM`/`HR_PLAUSIBLE_MAX_BPM` consts (value-identical 40/180 BPM, pinned by `plausibility_band_constants_pinned`) and added a fabricated-vital negative (`pure_noise_is_never_reported_valid` — broadband noise never yields a clinically `Valid` HR). `wifi-densepose-vitals --no-default-features`: 55→60 lib tests, 0 failed; workspace green; Python proof unchanged (vitals is off the deterministic proof's signal path).
+- **BFLD MQTT `zone_activity` payload now JSON-escapes the zone name (`wifi-densepose-bfld`).** `mqtt_topics::render_events` emitted the zone payload as `format!("\"{zone}\"")` with no escaping, while `ha_discovery.rs` already escapes operator-controlled strings. A zone name containing a `"` or `\` produced malformed/injectable JSON on the Home-Assistant state topic (e.g. zone `a"b` → payload `"a"b"`). Added a `json_string_literal` escaper mirroring `ha_discovery::push_str_field` and applied it to the zone payload — value-identical for normal zone names (`living_room`, …). Pinned by `zone_payload_escapes_json_metacharacters` (FAILED pre-fix; round-trips through `serde_json`); the existing `zone_payload_is_json_string_with_quotes` still passes unchanged.
 - **ESP32 vitals: `n_persons` over-counted (reported 4 for one person) + presence flag flickered at close range (#998, #996).** Two firmware logic bugs in `firmware/esp32-csi-node/main/edge_processing.c`, both robustness/logic fixes — **not** validated-accuracy claims (true count/PCK vs labelled ground truth stays hardware/data-gated on the COM9 ESP32-S3).
  - **#998 over-count — root cause + fix.** `update_multi_person_vitals()` split the top-K subcarriers into `top_k_count/2` groups and marked **every** group `active` unconditionally, so one body's multipath always reported the full `EDGE_MAX_PERSONS` (=4). New pure, host-testable `count_distinct_persons()` gates each candidate group: (1) **energy gate** — a group's phase variance must be ≥ `EDGE_PERSON_MIN_ENERGY_RATIO` (0.35) × the strongest group's, so weak multipath echoes don't count; (2) **spatial dedup** — groups whose representative subcarriers sit within `EDGE_PERSON_MIN_SC_SEP` (4) of each other are the same body. A `person_count_debounce()` then requires the gated count to hold `EDGE_PERSON_PERSIST_FRAMES` (3) consecutive frames before it's emitted, so a single noisy frame can't promote a phantom. The strongest group always counts (a present body yields ≥1). All thresholds are named, documented constants in `edge_processing.h`.
  - **#996 presence flicker — root cause + fix.** Presence was a bare `score > threshold` compare on a noisy `presence_score` (field-observed 2.6–26.7 frame-to-frame for one stationary person), so the boolean chattered at the boundary while the score clearly indicated a person. New pure `presence_flag_update()` is a Schmitt trigger + clear-debounce: assert above `threshold`, **hold** in the dead band down to `threshold × EDGE_PRESENCE_HYST_RATIO` (0.5), and only clear after the score stays below the low threshold for `EDGE_PRESENCE_CLEAR_FRAMES` (5) consecutive frames. The score itself is unchanged (and still emitted at packet offset 20 for consumer-side thresholding). Constants named/documented in `edge_processing.h`.
@@ -1092,6 +1092,12 @@ Two robustness bugs were fixed in the on-device edge path (`firmware/esp32-csi-n

 Both are pinned by host-buildable C99 tests in `firmware/esp32-csi-node/test/test_vitals_count_presence.c` (`make run_vitals`). The exact thresholds are documented constants pending on-device calibration against ground truth.

+### 2026-06 — Rust `wifi-densepose-vitals`: IIR filter NaN/inf self-heal (ADR-158 §A1)
+
+A correctness/safety review of the Rust extraction crate found a real bug parallel to the firmware robustness class above. The 2nd-order resonator `bandpass_filter` in both `breathing.rs` and `heartrate.rs` latches each output `y[n]` into its filter state (`y1`/`y2`). A single non-finite amplitude residual from a corrupt CSI frame produced a NaN `output` that was written into the state; the existing `extract()` `is_finite()` guard dropped that one sample from the history buffer **but never sanitized the poisoned filter state**, so every later output stayed NaN, was rejected too, and the sliding-window history never refilled — breathing **and** heart-rate extraction went silently dead (returning `None` forever) until `reset()`. On the alert path this is a safety-relevant denial of service (one bad frame stops vitals monitoring with no error surfaced).
+
+Fix: when `bandpass_filter` computes a non-finite `output`, it resets the IIR state to default and returns `0.0`, so the resonator self-heals on the next clean frame (the `0.0` is still dropped by the caller's finite-check, so no spurious sample enters history). Same shape as the calibration NaN bug (ADR-154 §3) — the prior hardening guarded the *history boundary* but not the *filter-state boundary*. Pinned by `breathing::tests::nan_frame_does_not_permanently_poison_filter`, `breathing::tests::inf_mid_stream_does_not_freeze_history`, and `heartrate::tests::nan_frame_does_not_permanently_poison_filter` (all FAIL pre-fix, verified by reverting). The review also de-magicked the HR physiological plausibility band into named `HR_PLAUSIBLE_MIN_BPM`/`HR_PLAUSIBLE_MAX_BPM` consts (value-identical 40/180 BPM) and added a fabricated-vital negative (`pure_noise_is_never_reported_valid` — broadband noise never yields a clinically `Valid` HR; the extractor honestly returns low-confidence `Unreliable`). Clean dimensions confirmed with evidence: flat/silent input → `None`; pure noise → low-confidence `Unreliable`, never `Valid`; harmonic-rich breathing with no cardiac component → low-confidence, not a confident false HR; out-of-band BPM rejected by the plausibility clamp.
+
 ## References

 - Ramsauer et al. (2020). "Hopfield Networks is All You Need." ICLR 2021. (ModernHopfield formulation)
@@ -104,6 +104,57 @@ Ranked by build cost × user impact:
 | **P9** | HACS integration repo (`hass-wifi-densepose`) for HA-side install path | pending |
 | **P10** | Witness bundle + CSA-style spec compliance check | pending |

+## 4.1 Crypto/security review notes (§2.2 witness chain — ADR-262 P2 prerequisite)
+
+Beyond-SOTA crypto+security review of the SHA-256 + Ed25519 witness chain
+(`witness.rs` / `witness_signing.rs`) and the manifest signature surface
+(`manifest.rs`), because ADR-262 P2 proposes to **reuse this exact signing
+chain**. Top priority was the sibling `wifi-densepose-engine` bug class —
+unframed boundary-to-boundary concatenation of operator-influenceable strings
+into a signed/hashed digest.
+
+- **Engine bug class ABSENT (good result, reported with byte evidence).**
+  `canonical_bytes` is `DOMAIN_TAG ‖ prev_hash[32] ‖ seq:u64-be ‖ ts:u64-be ‖
+  kind_len:u32-be ‖ kind ‖ payload_len:u32-be ‖ payload`. The two
+  variable-length operator-influenceable fields (`kind`, `payload`) are
+  **length-prefixed**; the fixed-width fields are self-delimiting → the
+  encoding is injective (no two distinct event tuples share a preimage). The
+  Ed25519 signature signs the **identical** bytes the SHA-256 chain commits to.
+  No separate unframed concatenation exists; the manifest `binary_signature`
+  is signed at build time (Makefile) over a single fixed-length `binary_sha256`
+  hex value, not in-crate.
+
+- **CHM-WIT-01 (FIXED) — domain-separation tag added.** The engine fix
+  prescribed *domain-tag + length-prefix*; length-prefix was present, the
+  domain tag was not. Added a versioned, NUL-terminated
+  `WITNESS_DOMAIN_TAG = b"cog-ha-matter/witness-event/v1\x00"` prefix so the
+  witness message can never be replayed as a message for another Ed25519
+  context that shares key infrastructure (notably the manifest signature).
+  **Witness bytes change by design** (prior on-disk hashes/signatures
+  invalidated, as with the engine fix); verified safe because no in-repo crate
+  consumes cog-ha-matter witness bytes programmatically (doc-mentions only).
+
+- **CHM-WIT-02 (HARDENED) — `verify_signature` now uses `verify_strict`.** For
+  an audit chain the signature is the attestation, so non-canonical encodings
+  and small-order keys are rejected (RFC 8032 strict), giving the "one
+  canonical signature per event" property. Not a forgery fix — the verifying
+  key is caller-pinned, never read from the event.
+
+- **Confirmed clean (with evidence):** verify-before-trust + key-pinning
+  (`verify_signature` takes the verifying key as a parameter; `read_jsonl`
+  re-derives every hash and chain-verifies); key handling (the crate never
+  generates/stores/logs/serializes a signing key — only a documented test-only
+  fixed seed; production keys come from the Seed secure store, out of scope);
+  determinism (positional bytes, deterministic Ed25519, alphabetically-locked
+  JSONL field order, sorted TXT records — no HashMap/float nondeterminism feeds
+  any digest); fail-closed parsing (structured errors, no panics; `main.rs`
+  reads no untrusted files/paths).
+
+Tests: `cog-ha-matter --no-default-features` 64 → **68**, 0 failed (CHM-WIT-01
+pinned by 4 fails-on-old tests across `witness.rs`/`witness_signing.rs`;
+CHM-WIT-02 guarded by a key-pinning test). Python deterministic proof
+unchanged (cog-ha-matter is off the signal proof path).
+
 ## 5. References

 - ADR-101 — `cog-pose-estimation` packaging precedent (signed binaries on GCS, .cog manifest)
@@ -190,6 +190,23 @@ This is the same Wasmtime host already used for integration plugins (ADR-128)

 ---

+## 8a. Security review (beyond-SOTA sweep, post ADR-154–159)
+
+A focused security review of `homecore-automation` (the execution/eval surface — triggers → conditions → actions, with templates) was run after the ADR-154–159 sweep, applying the same rigor that the sibling engine/bfld/calibration/vitals/geo reviews used. **Two real DoS findings, each pinned by a fails-on-old test; the condition-bypass, fail-closed-parsing, and action-authorization dimensions were probed and found clean.**
+
+- **HC-SEC-01 (template-injection / unbounded-expansion DoS, HIGH) — FIXED.** A `template:` condition / `value_template` is user automation config, and was rendered with MiniJinja's defaults: **no instruction budget, no output cap**. A single condition such as `{% for i in range(5000) %}{% for j in range(5000) %}xxxx{% endfor %}{% endfor %}` rendered a **100 MB string over ~11 s on one render call** (measured) — a CPU/memory denial of service (the bfld-class "unbounded expansion"; MiniJinja's per-call `range()` 10k cap does **not** stop nested loops). **Fix:** enable MiniJinja's `fuel` feature and set a per-render budget (`set_fuel(Some(1_000_000))`) so a nested loop burns one unit per iteration — the attack now fails fast (~90 ms) with "engine ran out of fuel"; plus a 64 KiB source-length cap rejecting pathological sources before compilation. Legitimate HA templates (a few dozen instructions) are unaffected. Pinned by `nested_loop_template_is_bounded_not_unbounded_dos`, `single_huge_repeat_template_is_bounded`, `oversized_template_source_is_rejected` (all fail-on-old: unbounded render / no rejection), and `legitimate_template_still_renders_within_fuel` (no regression).
+- **HC-SEC-02 (panic-on-config DoS, MEDIUM) — FIXED.** `Action::Delay { seconds }` and `Action::WaitForTrigger { timeout_seconds }` fed the user-supplied float straight into `Duration::from_secs_f64`, which **panics** on negative, NaN, infinite, or overflowing inputs — all reachable from a crafted (or typo'd) YAML (`delay: {seconds: -1}`, `.nan`, `.inf`, `1e308`). One hostile config aborts the spawned automation run task with a panic (measured: "cannot convert float seconds to Duration: value is negative"). **Fix:** a `safe_duration_from_secs` guard that saturates instead of panicking (NaN/±inf/negative → `Duration::ZERO`, matching HA's lenient "non-positive delay = no delay"; absurdly large → clamped to ~100 years). Pinned by `delay_negative_seconds_does_not_panic`, `delay_nan_seconds_does_not_panic`, `delay_infinite_seconds_does_not_panic`, `wait_for_trigger_negative_timeout_does_not_panic`, `safe_duration_saturates_hostile_values` (incl. overflow clamp).
+
+**Dimensions confirmed clean (with evidence):**
+- **Condition bypass / fail-closed eval** — a `Condition::Template` whose render errors evaluates to `false` (`condition.rs` `Err(_) => false`), and a `Choose` branch condition that fails to deserialize is treated as **non-matching** (the branch is skipped), not silently passing (`action.rs` `ChoiceBranch::matches` `Err(_) => return false`). Both fail **closed** (do-not-run), confirmed by the existing `choose_*` tests and template-false-blocks-action behavioral test. No true-by-default-on-parse-error path found.
+- **Re-entrancy / livelock (DoS)** — run-mode machinery is bounded and tested: `Single`/`IgnoreFirst` re-entrancy guard, `Restart` cancel-and-replace, `Queued` FIFO serialization, and `max: N` semaphore cap (ADR-162; `restart_mode_cancels_prior_run`, `queued_mode_runs_sequentially_not_concurrently`, `max_two_caps_concurrency_at_two`, `single_mode_does_not_double_fire_on_rapid_triggers`). A self-triggering automation does not livelock the engine — each fire is bounded by its run-mode.
+- **Action authorization** — templates are read-only sandboxed (`states`/`state_attr`/`is_state`/`now` globals; no service-call or state-set global is exposed to template scope), so a template cannot escalate into an action. Service authorization itself is enforced at the `homecore` service-registry boundary (out of this crate's scope); no gap found in what the automation crate enforces.
+- **Panic-on-config (parse)** — `serde_yaml`/`serde_json` deserialization returns structured `AutomationError` (no `unwrap`/`expect`/index reachable from a crafted config in the eval/exec path); the only remaining panic surface was the `from_secs_f64` path fixed as HC-SEC-02.
+
+Validation: `cargo test -p homecore-automation --no-default-features` → 54 passed / 0 failed (+14 over baseline). Python deterministic proof unchanged (homecore-automation is off the signal-processing proof path).
+
+---
+
 ## 9. References

 ### HA upstream
@@ -120,6 +120,42 @@ tested; P3 is planned.
  HOMECORE-API (ADR-130, P3); automation conditions on historical state are
  HOMECORE-automation (ADR-129, P3).

+## 3a. Security review (2026-06, post-ADR-154–159 sweep)
+
+A beyond-SOTA security review of `homecore-recorder` covered SQL injection, retention/purge
+correctness, fail-closed write integrity, semantic-store NaN poisoning, and PII exposure.
+
+**Confirmed clean (with evidence):**
+
+- **SQL injection — clean.** Every query in `db.rs` uses bound `?` parameters; no user- or
+  entity-influenceable value is interpolated into SQL via `format!`/concatenation. The only
+  `format!` builds the `LIKE` *pattern* string, which is itself **bound** as a parameter with
+  `ESCAPE '\\'` and `% _ \` escaping — so a metacharacter payload is matched literally. Pinned
+  by `malicious_entity_id_is_stored_literally_not_executed` (a `'; DROP TABLE states; --` state
+  value leaves the table intact and round-trips verbatim) and
+  `like_metacharacters_in_query_are_literal_not_wildcards`.
+- **NaN-index poisoning — structurally impossible.** Embeddings are SHA-256 → `i32` →
+  `f32`; an `i32`→`f32` cast is always finite (never NaN/Inf), and an all-zero-digest is
+  guarded by the `norm > 1e-10` check. Empty-index search, empty-string query, and `k=0` were
+  probed and all return `Ok(0)` with no panic. (Unlike the calibration/vitals/geo paths, no raw
+  sensor float ever reaches the index.)
+- **Fail-closed writes.** A removal event returns `Ok(None)`; semantic-index failure is logged,
+  not propagated, so it never blocks the durable SQLite write; `EntityId` parse failure falls
+  back to a sentinel rather than panicking.
+
+**Fixed (real bounding bugs):**
+
+- **Memory-DoS — `get_state_history` was unbounded.** No `LIMIT`, so a wide time window over a
+  high-frequency entity loaded an unbounded row set into memory. Now capped at
+  `MAX_HISTORY_ROWS` (1,000,000); sibling search paths were already `k`-bounded.
+- **Disk-DoS / documented-but-missing `purge`.** The README advertised `Recorder::purge`, but
+  no retention path existed → unbounded disk growth. Added a **transactional** `purge(older_than)`
+  with an **exclusive** cutoff (idempotent, no off-by-one) that deletes old `states`/`events` and
+  GCs orphaned `state_attributes` blobs (dedup-shared blobs kept until their last referrer is gone).
+
+`homecore-recorder` tests: 19 → 25 (`--no-default-features`) / 25 → 31 (`--features ruvector`),
+0 failed. Python deterministic proof unchanged (recorder is off the signal proof path).
+
 ## 4. Links

 - Crate: `v2/crates/homecore-recorder/` — `Cargo.toml`, `README.md`, `src/lib.rs`,
@@ -495,3 +495,34 @@ Rejected. `ViewpointFusionEvent` (viewpoint/fusion.rs lines 183–219) is an int
 **Integration glue -- not yet on the live path:** emission of `CalibrationIdMismatch` / `DriftProfileConflict` / `PhaseAlignmentFailed` once `calibration_id` propagation and the phase-align convergence signal are threaded onto frames; the BFLD witness record emitted on privacy demotion.

 **Trust contribution:** sensor *agreement made explicit* -- fusion records the evidence it relied on, and any disagreement automatically tightens the downstream privacy class.
+
+---
+
+## Witness Integrity Review (2026-06-14) — domain-separation fix
+
+A beyond-SOTA security review of `wifi-densepose-engine` (the composition root
+that builds the §2.7 trust witness in `witness_of`) found a real **witness
+domain-separation gap**, now fixed.
+
+**Finding (witness-gap, HIGH).** `witness_of` concatenated `model_version`,
+`calibration_version`, and `privacy_decision` boundary-to-boundary, and the
+variable-length `evidence` list carried no explicit count. A string straddling a
+field boundary therefore collided with a *different* trust decision —
+e.g. a per-room adapter id (ADR-150 §3.4, operator-influenceable) that absorbs
+the leading bytes of the calibration epoch (`model="…cal:00a"`, `cal="b"`)
+produces the **same** witness as `model="…"`, `cal="cal:00ab"`. Two distinct
+privacy-relevant input tuples → one witness defeats the "any privacy-relevant
+delta → different witness" guarantee this ADR's §2.7 witness exists to provide.
+
+**Fix.** The witness now (a) prepends a domain tag `ruview.engine.witness.v1`,
+(b) writes an explicit 8-byte evidence count, and (c) **length-prefixes every
+field** (8-byte LE length ‖ bytes), so field framing is unambiguous regardless
+of contents. This is a witness-layout change (all prior witness bytes are
+invalidated by design); downstream consumers only assert witness *relationships*
+(`assert_ne`/`assert_eq` across runs), not absolute bytes, so nothing breaks.
+
+Pinned by `witness_distinguishes_model_calibration_boundary` and
+`witness_distinguishes_evidence_model_boundary` (both fail on the old
+concatenation). Witness **determinism** was reviewed and confirmed clean: no
+HashMap iteration and no float formatting feed the hash (floats appear only in
+the `SemanticState` statement, which is outside the witness).
@@ -599,3 +599,53 @@ Per ADR-028/ADR-010, three rows are added to the witness log:
 **Integration glue -- not yet on the live path:** wiring the registry into `PrivacyGate` class transitions, the MQTT discovery payload, and a read-only Home Assistant diagnostic entity exposing the active mode + proof hash.

 **Trust contribution:** the *policy spine* -- privacy posture is a tamper-evident, auditable chain rather than a checkbox; an operator's mode choice actively governs whether identity data may even exist.
+
+---
+
+## Privacy Monotonicity Review (2026-06-14) — confirmed clean
+
+A beyond-SOTA security review of the governed-trust cycle
+(`wifi-densepose-engine::StreamingEngine::process_cycle_calibrated`) examined
+the privacy-demotion path this ADR governs. **The monotonicity invariant holds:
+demotion only ever makes the emitted class more restrictive, never less.**
+
+Verification (no behaviour change, the result is a clean bill with evidence):
+
+- Each cycle computes `effective_class` fresh from the active mode's
+  `target_class()` (the floor) and applies at most a **single-step** demotion
+  (`demote_one`, clamped at `Restricted`). There is no cross-cycle state that
+  could let a permissive class overwrite a restrictive one.
+- A forced contradiction (calibration mismatch / array-geometry insufficiency /
+  mesh partition risk, ADR-032) raises the class byte; a clean cycle emits
+  exactly the base class.
+- Pinned by `forced_contradiction_never_relaxes_class`, a property test over
+  **all five** `PrivacyMode`s asserting `effective_class.as_u8() >=
+  base_class.as_u8()` (strictly greater unless already clamped at `Restricted`)
+  under a forced contradiction, and `== base` on a clean cycle.
+
+Fail-closed boundaries were also pinned: an empty cycle errors (no degenerate
+over-permissive output, `empty_cycle_fails_closed`) and the single-node boundary
+is characterized as a valid non-demoting mode (`single_node_cycle_is_well_formed`).
+
+The related witness domain-separation fix from the same review is recorded in
+ADR-137 (the witness folds `effective_class`, so the demotion is auditable).
+## Security & Privacy Review (2026-06-14)
+
+Beyond-SOTA privacy+security review of `wifi-densepose-bfld` (the crate was not in the ADR-154–159 sweep). Two real bugs fixed (each pinned by a fails-on-old test), several dimensions confirmed clean.
+
+### Findings
+
+| # | Severity | Site | Issue | Fix | Pinned by |
+|---|----------|------|-------|-----|-----------|
+| 1 | **privacy-bypass (HIGH)** | `pipeline.rs::process_to_frame` | The documented wire-bytes production path stamped the frame header with the active `PrivacyClass` but serialized the caller's `BfldPayload` **unchanged** via `BfldFrame::from_payload` — never routing through `PrivacyGate::demote`. A frame labeled `Anonymous`(2)/`Restricted`(3) carried the full `compressed_angle_matrix` (identity surface) + amplitude/phase + `csi_delta`. A `NetworkSink` accepts class ≥ `Derived`(1), so the identity surface could cross the node boundary despite the restrictive class byte — the byte lied about content. | Apply `PrivacyGate::demote(frame, active_class)` after construction: a same-class transition that strips the sections the class forbids; `Raw`/`Derived` keep the full payload. | `tests/pipeline_to_frame.rs::process_to_frame_at_anonymous_strips_identity_leaky_sections`, `…_in_privacy_mode_strips_amplitude_and_phase` (both FAILED pre-fix); `…_at_derived_preserves_full_payload` (over-strip guard) |
+| 2 | **PII/injection (MEDIUM)** | `mqtt_topics.rs::render_events` | `zone_activity` payload built as `format!("\"{zone}\"")` with no JSON escaping (while `ha_discovery.rs` already escapes). A zone name with `"`/`\` produced malformed/injectable JSON on the HA state topic. | `json_string_literal()` escaper mirroring `ha_discovery::push_str_field`. Value-identical for normal zone names. | `tests/mqtt_topic_routing.rs::zone_payload_escapes_json_metacharacters` (FAILED pre-fix) |
+
+### Dimensions confirmed clean (with evidence)
+
+- **Event-field privacy gating** — `BfldEvent::apply_privacy_gating` nulls `identity_risk_score` + `rf_signature_hash` at `Restricted`, and `serde(skip_serializing_if = "Option::is_none")` omits them entirely. `render_events`/`render_discovery_payloads` refuse class < `Anonymous` (stricter than the `sink.rs` `NetworkKind` `MIN_CLASS = Derived` — defense in depth toward less leakage). Covered by `event_privacy_gating.rs`, `mqtt_topic_routing.rs`, `ha_discovery.rs`.
+- **Witness/hash framing (the engine `witness_of` bug class)** — CLEAN. `SignatureHasher::compute` prefixes a **fixed 4-byte** `day_epoch` then a **fixed-width canonical-f32** feature block (`IdentityFeatures`: Embedding = `EMBEDDING_DIM*4`, RiskFactors = 16 B). `PrivacyAttestationProof::compute` hashes a fixed 32-byte `prev_hash` + three fixed 1-byte values. No variable-length operator-influenceable string is concatenated into any digest — no length-prefix-framing collision is possible.
+- **Fail-closed** — `payload.rs::from_bytes` rejects truncated/overflowing/trailing-byte sections (`checked_add`, bounds checks); `frame.rs::from_bytes` validates magic/version/length/CRC; `PrivacyClass::try_from` rejects unknown bytes; `identity_risk::score` maps NaN/degenerate factors → 0.0 (privacy-conservative). The `from_score(NaN) → Accept` choice is a documented, deliberate publish-aggregate-only fallback (NaN never reaches it from `score()`); risk-driven NaN cannot leak identity because identity gating is class-byte-driven, not risk-driven.
+
+### Observation (not a bug)
+
+The ADR-141 control plane (`PrivacyMode`/`PrivacyModeRegistry`) is **not yet wired into the emit path** — the emitter/pipeline enforce the raw `PrivacyClass` directly; the registry is exported + unit-tested but advisory. This matches the "Integration glue — not yet on the live path" status above. The class-byte enforcement (emitter + event + renderers + the now-fixed `process_to_frame`) is the live guarantee. Wiring the registry is the documented next step.
@@ -253,6 +253,54 @@ Validation per CLAUDE.md: `cargo test --workspace --no-default-features` green;

 ---

+## 6. Review notes
+
+### 6.1 Correctness + security review (2026-06-14)
+
+Beyond-SOTA correctness+security review of `wifi-densepose-calibration` (this
+ADR's pipeline), un-covered by the ADR-154–159 sweep.
+
+**Finding (FIXED) — NaN-poisoning of the feature path (numerical / fail-closed).**
+`Features::from_series` — the carrier for both live inference and training-anchor
+extraction — computed `mean`/`variance`/`motion` over the raw scalar series with
+no non-finite guard. A single `NaN`/`±inf` sample (corrupt CSI frame) yielded
+`mean=NaN, variance=NaN` and an all-`NaN` prototype embedding. Persisted into a
+`PresenceSpecialist::threshold`/`empty_mean` at train time, the `NaN` **silently
+disabled presence detection** for the bank's lifetime (every `>` / `|·|`
+comparison against `NaN` is false → always reads *absent*, confidence 0), with no
+error — and an asymmetry against the rigorously NaN-guarded `geometry_embedding`.
+Fixed at the production boundary: non-finite samples are dropped (a corrupt frame
+counts as no frame), an all-non-finite series degrades to `Features::ZERO` like
+the empty series. Value-identical for all-finite input (full-loop + extract tests
+unchanged); pinned by `non_finite_samples_do_not_poison_features` and
+`all_non_finite_series_is_zero` (both fail on the old code).
+
+**Clean dimensions (evidence, no invented issues).**
+- *File/path handling:* the crate performs **zero** file/path I/O (no
+  `std::fs`/`Path`/`File`/`read`/`write` in `src/`; only in-memory `serde_json`).
+  Path-traversal / unbounded-read / artifact-path handling live entirely in the
+  `wifi-densepose-cli` consumer (`room.rs`), outside this crate's boundary.
+- *Untrusted-load:* `SpecialistBank::from_json` shape-validates via serde
+  (malformed → `CalibrationError::Serde`); banks are local-first (invariant B),
+  never network-received. A well-formed bank with adversarial numerics is trusted
+  as-is — acceptable under the local-first threat model; a validate-on-load
+  defense-in-depth pass is a possible future hardening, not a present bug.
+- *Receipt/hash integrity:* the crate emits no hash/receipt/witness/signature, so
+  the unframed-concatenation bug class (cf. the engine `witness_of` fix) is
+  structurally absent.
+- *Other numerical paths:* `geometry_embedding` sanitizes every input and sweeps
+  to finite; presence/restlessness/anomaly divisions are `.max(1e-3)`-guarded;
+  `autocorr_dominant` guards `r0`, short signals, and empty bands; `train` rejects
+  empty anchors; anomaly requires ≥2 anchors.
+
+De-magicked the bare specialist threshold literals (breathing/heartbeat default
+min-scores, anomaly outlier-spread multiple + label cutoff) into named documented
+consts, value-identical, pinned by const-equality tests. Tests
+**58→62 unit + 1 integration, 0 failed**; Python deterministic proof unchanged
+(off the signal proof path).
+
+---
+
 ## 5. Summary

 > Big models understand the world. Small ruVector models understand *your room*.
@@ -231,6 +231,8 @@ Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent

 > **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n−1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). **Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged).** **Milestone-2 DONE (2026-06-13): bench-first P2 perf subset + missing boundary tests cleared — spectrogram per-subcarrier FFT re-plan #20 (MEASURED-HOT, 1.40–1.84×, bit-identical); attention/tomography/Kalman #5/#6/#7 (MEASURED-NULL — benched, not hot, left as-is); field_model eigendecompose #8 (MEASUREMENT-ONLY, BLAS un-buildable on this Windows host, number deferred to a BLAS box, NOT fabricated); fft_operator tolerance #14, phase-align convergence-cap #16, csi-ratio epsilon #19 (RESOLVED, tests added).** **Milestone-3 DONE (2026-06-13): the lumped §7.4 row #21–45 P3 backlog cleared, and with it residual P3 items #2/#12/#17/#18 — 22 magic constants de-magicked into named EMPIRICAL-DEFAULT consts (each pinned == prior literal) + 6 boundary/characterization tests across 11 modules; ~4 doc-only; not-real findings (unreachable attractor_drift div0, non-existent gesture thresholds, proof-path features.rs) reported + skipped, no churn; no operating value changed; workspace 3,275/0, Python proof bit-exact `f8e76f21…`.** **§7.4 deferred backlog is now FULLY CLEARED across M0–M3 — nothing silently dropped.**

+> **Sibling-crate sweep extension (2026-06-14) — `wifi-densepose-geo` + `wifi-densepose-pointcloud`.** The ADR-154-class numerical-robustness sweep (non-finite-input-poisons-persistent-state + divide-by-zero / asin-domain / degenerate-geometry) was extended to two crates *outside* this ADR's signal scope. **Two real `geo` bugs FIXED, each fails-on-old-pinned:** `terrain.rs::parse_hgt` usize-underflow panic on empty/sub-2x2 SRTM data (`1.0/(side-1)` → panic in debug / inf `cell_size_deg` poisoning `ElevationGrid::get` in release — a truncated download / 404 HTML body reaches it; now `bail!`s when `side < 2`); `coord.rs::haversine` `asin(>1)→NaN` for near-antipodal points (`h` rounds to `1.0+4e-16`; clamped to `[0,1]`). The ±90° pole `cos(lat)=0` ENU singularity is pinned no-panic without changing the transform. **`pointcloud` is confirmed-robust (no manufactured finding):** its only persistent auto-accumulating state (`occupancy` EMA + vitals) is fed solely by the integer-rssi/`sqrt`/`atan2` parser (always finite) and is provably self-healing even under an adversarial NaN/inf `CsiFrame` (`motion_score=(NaN/100).min(1.0)→1.0`; breathing `→0→clamp(5,40)→5.0`) — pinned by `nonfinite_frame_does_not_poison_persistent_state` + degenerate-voxel-fusion no-panic tests. `geo` 9→15 lib / 8 integration; `pointcloud` 18→22; 0 failed; workspace green; Python proof bit-exact `f8e76f21…`. See CHANGELOG `[Unreleased] → Fixed`.
+
 ---

 ## 8. Consequences
@@ -265,3 +265,74 @@ Result at time of writing (all 0 failed):
  perform (B5).
 - Files kept under the 500-line guideline (`engine.rs` 462; behavioral tests
  moved to `tests/engine_behaviors.rs`).
+
+## Addendum — `homecore-api` follow-up security review (beyond-SOTA pass)
+
+A later network-facing review of `homecore-api` (the remote REST + WS attack
+surface) — independent of the ADR-154–159 sweep — found and fixed two real
+issues the original M7 pass (which focused on the WS auth bypass HC-WS-01, the
+reply-theater HC-WS-02, and the bin token provisioning HC-WS-08) did not catch.
+Both are LOW severity and reported at true severity.
+
+### HC-API-AUTH-01 — `GET /api/` was unauthenticated (FIXED)
+
+`rest::api_root` took no headers and unconditionally returned
+`200 {"message":"API running."}`, while every sibling route gates on
+`BearerAuth::from_headers`. HA's `APIStatusView` inherits `requires_auth = True`,
+so `/api/` must return **401** for a missing/wrong bearer. HA clients use the
+status route as a token-validation probe; a 200 told a bad-token client its
+token was valid and let an unauthenticated party confirm a live endpoint.
+LOW severity (the body is a static string; no entity/state data leaks).
+
+**Fix:** `api_root(headers, State)` now validates the bearer like `get_config`.
+**Pinned by** (fail-on-old, `tests/server_bin_auth.rs`):
+`api_root_rejects_missing_bearer`, `api_root_rejects_wrong_bearer` (both 200→401),
+guarded by `api_root_accepts_correct_bearer` (still 200 with a valid token).
+
+### HC-WS-LAG-01 — `subscribe_events` killed the stream on a broadcast lag (FIXED)
+
+The per-subscription task matched `Err(_) => break` on both broadcast
+`recv()` arms. `RecvError::Lagged(n)` (a slow consumer falling
+>`EVENT_CHANNEL_CAPACITY` = 4,096 events behind) is **recoverable** — the bus
+doc says "Lagged receivers must re-sync" and HA keeps the subscription alive
+across a lag. The old code treated the first lag as fatal, so after an event
+burst the client's stream went permanently silent with no error frame — a
+self-inflicted event-delivery DoS under load.
+
+**Fix:** `Lagged(_) => continue` (skip the dropped window, re-sync),
+`Closed => break`, on both the system and domain arms of the `select!`.
+**Pinned by** `subscription_survives_broadcast_lag` (`tests/ws_handshake.rs`):
+subscribes to a filtered event type, floods 6,000 unrelated events past the
+4,096 capacity to force a `Lagged`, then asserts a subsequent subscribed event
+is still delivered (old code: 5s-timeout panic).
+
+### Dimensions confirmed clean (with evidence)
+
+- **AuthN/AuthZ** — all 7 other REST handlers gate on `BearerAuth::from_headers`
+  → `LongLivedTokenStore::is_valid` before any work; the WS handshake validates
+  the `auth` token against the same store before the command loop, and
+  privileged commands are unreachable pre-`auth_ok`. Token compare is
+  `HashSet::contains` (content-independent timing — not the byte-`==` oracle of
+  ADR-157 §B4), so no timing-oracle finding. No route skips the gate; no
+  result-ignored check; no default/empty token accepted.
+- **Path traversal** — no route maps user input to a filesystem path (state is an
+  in-memory `DashMap`); `:entity_id` passes through `EntityId::parse`, a strict
+  `[a-z0-9_]+\.[a-z0-9_]+` ASCII allowlist that rejects `..`, `/`, `\`, and
+  absolute paths. No traversal surface.
+- **Injection** — no SQL, no shell/subprocess, no `format!`-into-response;
+  service/state bodies are typed `serde_json::Value` handed to the in-process
+  registry (HA-equivalent).
+- **Info-leak** — `ApiError` maps to fixed status + a typed `{message}`;
+  `ServiceError::HandlerFailed(String)` is integration-controlled (HA surfaces
+  the handler error too), never framework internals/paths/stack-traces — no
+  ADR-080-class leak.
+- **CORS** — explicit allowlist with `allow_credentials(false)` (HC-05),
+  not `permissive()`.
+- **De-magic** — no bare security-relevant literals in the crate worth
+  extracting (`EVENT_CHANNEL_CAPACITY` is already named in `homecore`; CORS
+  dev-default ports are documented).
+
+**Tests:** `homecore-api --no-default-features` **25 → 29** (+2 api-root auth,
+1 api-root accept-guard, +1 WS lag-survival), 0 failed. Workspace green.
+Python deterministic proof unchanged (homecore-api is off the signal proof
+path).
@@ -102,19 +102,43 @@ pub struct WitnessEvent {
    pub this_hash: WitnessHash,
 }

+/// Domain-separation tag prefixing every witness canonical message.
+///
+/// This is the *domain tag* half of the "domain-tag + length-prefix"
+/// rule for any hashed/signed message whose fields are
+/// operator-influenceable. The witness chain already length-prefixes
+/// `kind` and `payload` (preventing intra-protocol concatenation
+/// forgery); the tag adds cross-protocol separation so a SHA-256
+/// preimage / Ed25519 message produced here can never be re-interpreted
+/// as a message from another signing context that shares key
+/// infrastructure — notably ADR-116's *manifest* `binary_signature`
+/// (Ed25519 over `binary_sha256`), which ADR-262 P2 reuses this exact
+/// chain for. A signature is only ever valid for the one domain whose
+/// tag it commits to.
+///
+/// The trailing NUL terminates the version string so a future
+/// migration (Blake3, extra fields, Merkle tier) bumps the tag instead
+/// of silently colliding with v1 bundles.
+pub const WITNESS_DOMAIN_TAG: &[u8] = b"cog-ha-matter/witness-event/v1\x00";
+
 /// Compute the canonical-bytes form an event is hashed over.
 ///
-/// The format is intentionally simple and length-prefixed so a
-/// future migration can be staged with a `version` byte in front
-/// without ambiguity:
+/// The format is domain-tagged and length-prefixed:
 ///
 /// ```text
-///   prev_hash[32] | seq:u64-be | ts:u64-be | kind_len:u32-be | kind | payload_len:u32-be | payload
+///   DOMAIN_TAG | prev_hash[32] | seq:u64-be | ts:u64-be
+///              | kind_len:u32-be | kind | payload_len:u32-be | payload
 /// ```
 ///
-/// Length-prefixing prevents the classic "concatenation forgery"
-/// attack where `"abc" + "def"` and `"ab" + "cdef"` would hash the
-/// same.
+/// * The leading [`WITNESS_DOMAIN_TAG`] gives cross-protocol
+///   separation: bytes signed/hashed here cannot be replayed as a
+///   message for another Ed25519 context in the same trust chain
+///   (e.g. the manifest `binary_signature`). It also carries a format
+///   version for staged migrations.
+/// * Length-prefixing `kind` and `payload` prevents the classic
+///   "concatenation forgery" where `"abc" + "def"` and `"ab" + "cdef"`
+///   would hash the same. The fixed-width `prev_hash`/`seq`/`ts`
+///   fields are self-delimiting.
 pub fn canonical_bytes(
    prev_hash: WitnessHash,
    seq: u64,
@@ -123,7 +147,10 @@ pub fn canonical_bytes(
    payload: &[u8],
 ) -> Vec<u8> {
    let kind_bytes = kind.as_bytes();
-    let mut out = Vec::with_capacity(32 + 8 + 8 + 4 + kind_bytes.len() + 4 + payload.len());
+    let mut out = Vec::with_capacity(
+        WITNESS_DOMAIN_TAG.len() + 32 + 8 + 8 + 4 + kind_bytes.len() + 4 + payload.len(),
+    );
+    out.extend_from_slice(WITNESS_DOMAIN_TAG);
    out.extend_from_slice(&prev_hash.0);
    out.extend_from_slice(&seq.to_be_bytes());
    out.extend_from_slice(&timestamp_unix_s.to_be_bytes());
@@ -466,11 +493,51 @@ mod tests {
    }

    #[test]
-    fn canonical_bytes_starts_with_prev_hash() {
+    fn canonical_bytes_starts_with_domain_tag_then_prev_hash() {
        // Locks the on-wire format. A future migration that flips
-        // field order must bump a version byte and update this test.
+        // field order must bump the domain tag and update this test.
        let bytes = canonical_bytes(WitnessHash([7u8; 32]), 1, 2, "k", b"p");
-        assert_eq!(&bytes[..32], &[7u8; 32]);
+        let tag = WITNESS_DOMAIN_TAG.len();
+        assert_eq!(&bytes[..tag], WITNESS_DOMAIN_TAG);
+        assert_eq!(&bytes[tag..tag + 32], &[7u8; 32]);
+    }
+
+    #[test]
+    fn canonical_bytes_is_domain_separated() {
+        // Cross-protocol separation: the witness preimage must begin
+        // with the domain tag so its SHA-256 / Ed25519 message can
+        // never be reinterpreted as a message from another signing
+        // context that shares key infrastructure (e.g. the manifest
+        // `binary_signature` over `binary_sha256`). Fails on the old
+        // un-tagged encoding, which began directly with `prev_hash`.
+        let bytes = canonical_bytes(WitnessHash::GENESIS, 0, 0, "k", b"p");
+        assert!(
+            bytes.starts_with(WITNESS_DOMAIN_TAG),
+            "canonical message is not domain-separated"
+        );
+        // The tag is versioned and NUL-terminated.
+        assert!(WITNESS_DOMAIN_TAG.ends_with(b"\x00"));
+        assert!(WITNESS_DOMAIN_TAG.windows(2).any(|w| w == b"v1"));
+    }
+
+    #[test]
+    fn witness_preimage_cannot_collide_with_a_bare_manifest_digest() {
+        // The manifest `binary_signature` signs a bare 64-byte
+        // SHA-256 hex string. A witness preimage must never *equal*
+        // such a string, even if an operator crafted kind/payload to
+        // try — the domain tag (33 bytes) + fixed 48-byte prefix make
+        // the witness message structurally longer and tag-distinct.
+        // Fails on the old encoding only if it could ever produce a
+        // 64-byte all-hex message; the tag makes the impossibility
+        // explicit and regression-guarded.
+        let manifest_digest_msg = "a".repeat(64); // 64 ASCII hex bytes
+        let witness = canonical_bytes(WitnessHash::GENESIS, 0, 0, "", b"");
+        assert_ne!(witness.as_slice(), manifest_digest_msg.as_bytes());
+        assert!(
+            witness.len() > manifest_digest_msg.len(),
+            "domain tag must make witness preimage structurally distinct"
+        );
+        assert!(!witness.starts_with(b"aaaa"));
    }

    #[test]
@@ -36,7 +36,7 @@
 //! key store (separate concern). Tests use a fixed-bytes seed for
 //! determinism — never check in real Seed keys here.

-use ed25519_dalek::{Signature, Signer, SigningKey, Verifier, VerifyingKey};
+use ed25519_dalek::{Signature, Signer, SigningKey, VerifyingKey};

 use crate::witness::{canonical_bytes, WitnessEvent};

@@ -58,6 +58,16 @@ pub fn sign_event(event: &WitnessEvent, key: &SigningKey) -> Signature {
 /// Verify an Ed25519 signature against a witness event using the
 /// Seed's public key. `Ok(())` iff the signature is valid for the
 /// event's canonical bytes under this key.
+///
+/// Uses `verify_strict` (not the permissive `Verifier::verify`) on
+/// purpose: for a tamper-evident *audit* chain the signature is the
+/// attestation, so non-canonical encodings and small-order public
+/// keys must be rejected. `verify_strict` enforces RFC 8032's
+/// stricter checks, giving the "one canonical signature per event"
+/// property an auditor relies on when comparing or deduplicating
+/// signed witness records. The public key is caller-pinned (the
+/// Seed's known verifying key) — never parsed from the event — so a
+/// forged event carrying its own key cannot self-verify.
 pub fn verify_signature(
    event: &WitnessEvent,
    signature: &Signature,
@@ -71,7 +81,7 @@ pub fn verify_signature(
        &event.payload,
    );
    public_key
-        .verify(&bytes, signature)
+        .verify_strict(&bytes, signature)
        .map_err(|_| SignatureVerifyError::Invalid)
 }

@@ -140,6 +150,58 @@ mod tests {
        verify_signature(&event, &sig, &public).expect("clean signature verifies");
    }

+    #[test]
+    fn signature_commits_to_domain_tag_not_bare_fields() {
+        // The signature is over the domain-tagged canonical bytes. A
+        // signature produced over the *un-tagged* concatenation of the
+        // same fields must NOT verify — proving cross-protocol
+        // separation reaches the signature layer, not just the hash.
+        // Fails on the old encoding where the signed message began
+        // directly with `prev_hash` (no tag).
+        use ed25519_dalek::Signer;
+        let key = fixed_key();
+        let public = key.verifying_key();
+        let event = fresh_event();
+
+        // Hand-build the OLD (un-tagged) preimage and sign it.
+        let mut untagged = Vec::new();
+        untagged.extend_from_slice(&event.prev_hash.0);
+        untagged.extend_from_slice(&event.seq.to_be_bytes());
+        untagged.extend_from_slice(&event.timestamp_unix_s.to_be_bytes());
+        untagged.extend_from_slice(&(event.kind.len() as u32).to_be_bytes());
+        untagged.extend_from_slice(event.kind.as_bytes());
+        untagged.extend_from_slice(&(event.payload.len() as u32).to_be_bytes());
+        untagged.extend_from_slice(&event.payload);
+        let old_sig = key.sign(&untagged);
+
+        // The current verifier (which uses the domain-tagged message)
+        // must reject a signature made over the un-tagged bytes.
+        let err = verify_signature(&event, &old_sig, &public).unwrap_err();
+        assert_eq!(err, SignatureVerifyError::Invalid);
+
+        // Sanity: the proper signature still verifies.
+        let good = sign_event(&event, &key);
+        verify_signature(&event, &good, &public).expect("tagged signature verifies");
+    }
+
+    #[test]
+    fn verify_uses_strict_path_and_pins_caller_key() {
+        // Regression guard: verification must run through the strict
+        // path against a CALLER-supplied key. A wrong key fails; the
+        // event never carries its own verifying key, so a forged event
+        // cannot self-attest. (verify_strict additionally rejects
+        // non-canonical / small-order encodings.)
+        let key = fixed_key();
+        let wrong = SigningKey::from_bytes(b"another-wrong-key-another-wrong-");
+        let event = fresh_event();
+        let sig = sign_event(&event, &key);
+        verify_signature(&event, &sig, &key.verifying_key()).expect("right key verifies");
+        assert_eq!(
+            verify_signature(&event, &sig, &wrong.verifying_key()).unwrap_err(),
+            SignatureVerifyError::Invalid
+        );
+    }
+
    #[test]
    fn verify_rejects_signature_under_wrong_key() {
        let key = fixed_key();
@@ -12,8 +12,20 @@ use crate::state::SharedState;
 #[derive(Serialize)]
 pub struct ApiRunning { message: &'static str }

-pub async fn api_root() -> Json<ApiRunning> {
-    Json(ApiRunning { message: "API running." })
+/// `GET /api/` — the HA `APIStatusView` ("API running." ping).
+///
+/// Security (HC-API-AUTH-01): HA's `APIStatusView` inherits
+/// `requires_auth = True` from `HomeAssistantView`, so an unauthenticated
+/// (or wrong-token) request to `/api/` returns **401**, not 200. HA
+/// clients (and the companion app) rely on this status route as a
+/// *token-validation probe* — a 200 here would tell a client a bad token
+/// is good, and would let an unauthenticated party confirm a live
+/// HOMECORE-API endpoint. The P2 handler skipped the bearer gate that
+/// every sibling route applies; this restores wire-compat by validating
+/// the bearer like `get_config`/`get_states` before replying.
+pub async fn api_root(headers: HeaderMap, State(s): State<SharedState>) -> ApiResult<Json<ApiRunning>> {
+    let _ = BearerAuth::from_headers(&headers, s.tokens()).await?;
+    Ok(Json(ApiRunning { message: "API running." }))
 }

 #[derive(Serialize)]
@@ -298,7 +298,17 @@ impl Connection {
                                    }
                                }
                                Ok(_) => {}
-                                Err(_) => break,
+                                // A slow consumer that falls >4,096 events behind
+                                // gets `Lagged(n)`, which is RECOVERABLE: the bus
+                                // doc (`bus.rs` §"Lagged receivers must re-sync")
+                                // and HA's WS contract both keep the subscription
+                                // alive across a lag. The pre-fix `Err(_) => break`
+                                // treated `Lagged` as fatal, silently killing the
+                                // client's event stream on a burst (HC-WS-LAG-01).
+                                // Skip the dropped window and continue; only a
+                                // `Closed` sender ends the task.
+                                Err(broadcast::error::RecvError::Lagged(_)) => continue,
+                                Err(broadcast::error::RecvError::Closed) => break,
                            },
                            evt = domain_rx.recv() => match evt {
                                Ok(de) => {
@@ -316,7 +326,12 @@ impl Connection {
                                        if tx_clone.send(payload.to_string()).is_err() { break; }
                                    }
                                }
-                                Err(_) => break,
+                                // Same recoverable-lag handling as the system arm
+                                // above (HC-WS-LAG-01): a lagged domain-event
+                                // receiver re-syncs and continues; only `Closed`
+                                // terminates the subscription.
+                                Err(broadcast::error::RecvError::Lagged(_)) => continue,
+                                Err(broadcast::error::RecvError::Closed) => break,
                            }
                        }
                    }
@@ -75,3 +75,72 @@ async fn from_env_path_enforces_whitelist() {
    assert!(!store.is_valid("not_in_whitelist").await);
    assert!(!store.is_dev_mode().await, "from_env must NOT be dev mode");
 }
+
+// ─── HC-API-AUTH-01: `GET /api/` must be auth-gated like every sibling ───
+//
+// HA's `APIStatusView` inherits `requires_auth = True`, so `/api/` returns
+// 401 for a missing/wrong bearer and 200 only for a valid one. The pre-fix
+// `api_root` took no headers and unconditionally returned 200 — these two
+// tests FAIL on that code.
+
+#[tokio::test]
+async fn api_root_rejects_missing_bearer() {
+    let app = router(provisioned_state("the_real_token").await);
+    let resp = app
+        .oneshot(
+            Request::builder()
+                .uri("/api/")
+                .body(Body::empty())
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+    assert_eq!(
+        resp.status(),
+        StatusCode::UNAUTHORIZED,
+        "GET /api/ with NO bearer must be 401 (HC-API-AUTH-01) — HA's \
+         APIStatusView requires_auth=True; a 200 here lets an \
+         unauthenticated party confirm a live endpoint and tells a \
+         token-validation probe a bad token is good"
+    );
+}
+
+#[tokio::test]
+async fn api_root_rejects_wrong_bearer() {
+    let app = router(provisioned_state("the_real_token").await);
+    let resp = app
+        .oneshot(
+            Request::builder()
+                .uri("/api/")
+                .header("Authorization", "Bearer the_wrong_token")
+                .body(Body::empty())
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+    assert_eq!(
+        resp.status(),
+        StatusCode::UNAUTHORIZED,
+        "GET /api/ with a WRONG bearer must be 401 (HC-API-AUTH-01)"
+    );
+}
+
+#[tokio::test]
+async fn api_root_accepts_correct_bearer() {
+    let app = router(provisioned_state("the_real_token").await);
+    let resp = app
+        .oneshot(
+            Request::builder()
+                .uri("/api/")
+                .header("Authorization", "Bearer the_real_token")
+                .body(Body::empty())
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+    assert_eq!(
+        resp.status(),
+        StatusCode::OK,
+        "GET /api/ with the correct bearer must still return 200 (API running.)"
+    );
+}
@@ -166,3 +166,100 @@ async fn ping_pong_reply_is_received() {
    assert_eq!(reply["type"], "pong");
    assert_eq!(reply["id"], 7);
 }
+
+/// Variant of [`spawn_server_with_token`] that also returns a `HomeCore`
+/// handle (cheap `Arc` clone) so the test can fire events into the *same*
+/// bus the served subscription reads from.
+async fn spawn_server_returning_homecore(valid_token: &str) -> (SocketAddr, HomeCore) {
+    let hc = HomeCore::new();
+    let tokens = LongLivedTokenStore::empty();
+    tokens.register(valid_token).await;
+    let state = SharedState::with_tokens(hc.clone(), "Test", "test-version", tokens);
+    let app = router(state);
+
+    let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
+    let addr = listener.local_addr().unwrap();
+    tokio::spawn(async move {
+        axum::serve(listener, app).await.unwrap();
+    });
+    (addr, hc)
+}
+
+#[tokio::test]
+async fn subscription_survives_broadcast_lag() {
+    // HC-WS-LAG-01: the per-subscription event task must treat a broadcast
+    // `Lagged(n)` as RECOVERABLE (re-sync + continue), matching the bus
+    // contract ("Lagged receivers must re-sync") and HA's WS semantics.
+    //
+    // The pre-fix `Err(_) => break` killed the whole event-stream task on
+    // the first lag, so after a >4,096-event burst the client's stream
+    // went permanently silent. This test fires far more than the 4,096
+    // channel capacity to force a `Lagged`, then fires ONE more event and
+    // asserts the subscription still delivers it. FAILS (5s timeout) on
+    // the old code because the task is already dead.
+    use homecore::{Context, DomainEvent};
+
+    let (addr, hc) = spawn_server_returning_homecore("good_token_abc").await;
+    let url = format!("ws://{addr}/api/websocket");
+    let (mut ws, _resp) = connect_async(&url).await.unwrap();
+
+    let _ = next_json(&mut ws).await; // auth_required
+    ws.send(Message::Text(
+        serde_json::json!({"type":"auth","access_token":"good_token_abc"}).to_string(),
+    ))
+    .await
+    .unwrap();
+    let auth = next_json(&mut ws).await;
+    assert_eq!(auth["type"], "auth_ok");
+
+    // Subscribe to a specific domain event type so unrelated traffic is
+    // filtered out and we can deterministically match the post-lag event.
+    ws.send(Message::Text(
+        serde_json::json!({"id": 1, "type": "subscribe_events", "event_type": "lag_probe"})
+            .to_string(),
+    ))
+    .await
+    .unwrap();
+    let ack = next_json(&mut ws).await; // result ok for the subscribe
+    assert_eq!(ack["type"], "result");
+    assert_eq!(ack["success"], true);
+
+    // Flood the bus far past EVENT_CHANNEL_CAPACITY (4,096) with events the
+    // subscription FILTERS OUT (different event_type). Because the client
+    // never reads them off the WS, the server-side broadcast receiver falls
+    // behind and the NEXT `recv()` yields `Lagged`. We fire synchronously
+    // and don't yield to the WS reader, guaranteeing the overflow.
+    for i in 0..6000u32 {
+        hc.bus().fire_domain(DomainEvent::new(
+            "noise",
+            serde_json::json!({ "i": i }),
+            Context::new(),
+        ));
+    }
+
+    // Now fire the event the client IS subscribed to. On the fixed code the
+    // task recovered from `Lagged` and continues, so this is delivered. On
+    // the old code the task broke on `Lagged` and this never arrives.
+    hc.bus().fire_domain(DomainEvent::new(
+        "lag_probe",
+        serde_json::json!({ "marker": "post-lag" }),
+        Context::new(),
+    ));
+
+    // Drain frames until we see our post-lag event (ignoring any noise the
+    // filter let slip before the lag), bounded by a timeout.
+    let got = tokio::time::timeout(std::time::Duration::from_secs(5), async {
+        loop {
+            let v = next_json(&mut ws).await;
+            if v["type"] == "event" && v["event"]["event_type"] == "lag_probe" {
+                return v;
+            }
+        }
+    })
+    .await
+    .expect(
+        "subscription went silent after a broadcast lag — Lagged was treated \
+         as fatal (HC-WS-LAG-01)",
+    );
+    assert_eq!(got["event"]["data"]["marker"], "post-lag");
+}
@@ -29,8 +29,10 @@ serde = { version = "1", features = ["derive"] }
 serde_yaml = "0.9"
 serde_json = "1"

-# MiniJinja — HA-compatible Jinja2 template engine in pure Rust (ADR-129 §2.1)
-minijinja = { version = "2", features = ["json", "loader"] }
+# MiniJinja — HA-compatible Jinja2 template engine in pure Rust (ADR-129 §2.1).
+# `fuel` bounds instruction count so a malicious `template:` condition cannot
+# spin the engine with a nested-loop / huge-repeat DoS (HC-SEC-01).
+minijinja = { version = "2", features = ["json", "loader", "fuel"] }

 # Error handling
 thiserror = "1"
@@ -70,6 +70,32 @@ impl ExecutionContext {
    }
 }

+/// Upper bound for a `delay` / `wait_for_trigger` timeout, in seconds
+/// (~100 years). Caps absurd values so `Duration::from_secs_f64` cannot
+/// overflow-panic on e.g. `seconds: 1e308`, while still allowing any
+/// realistic automation delay (HC-SEC-02).
+const MAX_DELAY_SECS: f64 = 3.15e9;
+
+/// Convert a user-supplied seconds value into a `Duration` without
+/// panicking (HC-SEC-02).
+///
+/// `Duration::from_secs_f64` **panics** on negative, NaN, infinite, or
+/// overflowing inputs. Those values are all reachable from a crafted
+/// automation YAML (`delay: {seconds: -1}`, `.nan`, `.inf`, `1e308`), so a
+/// single hostile config would crash the running automation task. We
+/// instead saturate to a safe range — matching Home Assistant's lenient
+/// treatment of a non-positive delay as "no delay":
+///
+/// - non-finite (NaN / ±inf) → `0`
+/// - negative → `0`
+/// - above [`MAX_DELAY_SECS`] → clamped to the cap
+fn safe_duration_from_secs(seconds: f64) -> Duration {
+    if !seconds.is_finite() || seconds <= 0.0 {
+        return Duration::ZERO;
+    }
+    Duration::from_secs_f64(seconds.min(MAX_DELAY_SECS))
+}
+
 /// Action configuration. Deserialized from YAML `action:` blocks.
 #[derive(Clone, Debug, Serialize, Deserialize)]
 #[serde(tag = "action", rename_all = "snake_case")]
@@ -154,7 +180,10 @@ impl Action {
                    Ok(result)
                }
                Action::Delay { seconds } => {
-                    let dur = Duration::from_secs_f64(*seconds);
+                    // `safe_duration_from_secs` guards against negative /
+                    // NaN / infinite / overflowing values that would
+                    // otherwise panic `Duration::from_secs_f64` (HC-SEC-02).
+                    let dur = safe_duration_from_secs(*seconds);
                    sleep(dur).await;
                    Ok(serde_json::Value::Null)
                }
@@ -172,7 +201,8 @@ impl Action {
                    // P1 stub — just sleeps for the timeout duration if specified.
                    // Full trigger subscription lands in P2.
                    if let Some(secs) = timeout_seconds {
-                        sleep(Duration::from_secs_f64(*secs)).await;
+                        // Same non-panicking guard as `Delay` (HC-SEC-02).
+                        sleep(safe_duration_from_secs(*secs)).await;
                    }
                    Ok(serde_json::Value::Null)
                }
@@ -243,6 +273,68 @@ mod tests {
        assert!(result.is_null());
    }

+    // ── HC-SEC-02: a crafted delay must not panic the run task ─────────
+    //
+    // `Duration::from_secs_f64` panics on negative / NaN / infinite /
+    // overflowing inputs, all reachable from a YAML `delay:` value. On the
+    // pre-fix code each of these aborts the spawned automation task with a
+    // panic; the guard saturates to a safe Duration instead. These tests
+    // fail on old (panic = test failure).
+    #[tokio::test]
+    async fn delay_negative_seconds_does_not_panic() {
+        let hc = HomeCore::new();
+        let mut ctx = ExecutionContext::new(hc, "auto");
+        let result = Action::Delay { seconds: -1.0 }.execute(&mut ctx).await;
+        assert!(result.is_ok(), "negative delay must be treated as 0, not panic");
+    }
+
+    #[tokio::test]
+    async fn delay_nan_seconds_does_not_panic() {
+        let hc = HomeCore::new();
+        let mut ctx = ExecutionContext::new(hc, "auto");
+        let result = Action::Delay { seconds: f64::NAN }.execute(&mut ctx).await;
+        assert!(result.is_ok(), "NaN delay must be treated as 0, not panic");
+    }
+
+    #[tokio::test]
+    async fn delay_infinite_seconds_does_not_panic() {
+        let hc = HomeCore::new();
+        let mut ctx = ExecutionContext::new(hc, "auto");
+        let result = Action::Delay { seconds: f64::INFINITY }.execute(&mut ctx).await;
+        assert!(result.is_ok(), "infinite delay must saturate to 0, not panic");
+    }
+
+    // Note: the overflow case (1e300) is covered by the synchronous
+    // `safe_duration_saturates_hostile_values` unit test below — executing
+    // `Action::Delay { seconds: 1e300 }` would genuinely sleep for the
+    // clamped (~100-year) duration, so we assert the conversion directly
+    // rather than through `execute`.
+
+    #[tokio::test]
+    async fn wait_for_trigger_negative_timeout_does_not_panic() {
+        let hc = HomeCore::new();
+        let mut ctx = ExecutionContext::new(hc, "auto");
+        let result = Action::WaitForTrigger { timeout_seconds: Some(-5.0) }
+            .execute(&mut ctx)
+            .await;
+        assert!(result.is_ok(), "negative wait timeout must not panic");
+    }
+
+    #[test]
+    fn safe_duration_saturates_hostile_values() {
+        assert_eq!(safe_duration_from_secs(-1.0), Duration::ZERO);
+        assert_eq!(safe_duration_from_secs(f64::NAN), Duration::ZERO);
+        assert_eq!(safe_duration_from_secs(f64::INFINITY), Duration::ZERO);
+        assert_eq!(safe_duration_from_secs(f64::NEG_INFINITY), Duration::ZERO);
+        // legitimate value preserved
+        assert_eq!(safe_duration_from_secs(2.5), Duration::from_secs_f64(2.5));
+        // huge value clamped to the cap, not overflow-panicked
+        assert_eq!(
+            safe_duration_from_secs(1e300),
+            Duration::from_secs_f64(MAX_DELAY_SECS)
+        );
+    }
+
    #[tokio::test]
    async fn service_call_unregistered_returns_error() {
        let hc = HomeCore::new();
@@ -13,6 +13,26 @@ use homecore::{EntityId, StateMachine};

 use crate::error::AutomationError;

+/// Instruction budget for a single template render (HC-SEC-01).
+///
+/// Templates come from user automation config; without a bound a single
+/// `template:` condition like
+/// `{% for i in range(10000) %}{% for j in range(10000) %}x{% endfor %}{% endfor %}`
+/// renders a multi-gigabyte string and pins a CPU for tens of seconds —
+/// a memory/CPU denial-of-service (the bfld-class "unbounded expansion").
+/// MiniJinja's `fuel` feature charges ~1 unit per VM instruction; a
+/// nested loop burns one unit per iteration, so the budget caps total
+/// work regardless of how the loops are nested. 1,000,000 instructions is
+/// far more than any legitimate HA template needs (a typical condition is
+/// a few dozen) while killing the attack in well under a second.
+const TEMPLATE_FUEL: u64 = 1_000_000;
+
+/// Hard cap on the source length of a template (HC-SEC-01, defense in
+/// depth). A legitimate HA `value_template` is a one-liner; anything past
+/// 64 KiB is rejected before compilation so a pathological source string
+/// can neither be compiled nor emitted verbatim.
+const MAX_TEMPLATE_SOURCE_BYTES: usize = 64 * 1024;
+
 /// MiniJinja environment pre-loaded with HA-compatible globals.
 ///
 /// Constructed once per `AutomationEngine` and shared via `Arc`. The
@@ -27,6 +47,10 @@ impl TemplateEnvironment {
    pub fn new(states: Arc<StateMachine>) -> Self {
        let mut env = Environment::new();

+        // Bound per-render work so a hostile `template:` condition cannot
+        // DoS the engine via nested loops / huge repeats (HC-SEC-01).
+        env.set_fuel(Some(TEMPLATE_FUEL));
+
        // --- states(entity_id) ---
        // Returns the current state string of an entity, or "unavailable".
        let states_sm = Arc::clone(&states);
@@ -88,7 +112,21 @@ impl TemplateEnvironment {
    }

    /// Render a template string and return the string output.
+    ///
+    /// Renders are bounded by an instruction budget ([`TEMPLATE_FUEL`]) and
+    /// a source-length cap ([`MAX_TEMPLATE_SOURCE_BYTES`]); a malicious
+    /// template that exhausts the budget returns a [`AutomationError::TemplateRender`]
+    /// error rather than running unbounded (HC-SEC-01).
    pub fn render(&self, template_str: &str) -> Result<String, AutomationError> {
+        // Reject pathologically large sources before compilation (defense
+        // in depth — fuel already bounds runtime work).
+        if template_str.len() > MAX_TEMPLATE_SOURCE_BYTES {
+            return Err(AutomationError::TemplateRender(format!(
+                "template source too large: {} bytes (max {})",
+                template_str.len(),
+                MAX_TEMPLATE_SOURCE_BYTES
+            )));
+        }
        // Wrap bare expressions like `{{ states('light.kitchen') }}`
        // in a minimal template wrapper.
        let tmpl = self
@@ -191,4 +229,68 @@ mod tests {
        assert!(!env.render_bool("0").unwrap());
        assert!(!env.render_bool("off").unwrap());
    }
+
+    // ── HC-SEC-01: template DoS is bounded by fuel ─────────────────────
+    //
+    // A `template:` condition is user config. Before the fuel bound a
+    // nested-loop template rendered a multi-GB string over ~11 s (proven
+    // empirically). With fuel enabled it must fail FAST with an error
+    // instead of expanding unboundedly. On the pre-fix code (no `fuel`
+    // feature / `set_fuel`) this render succeeds and burns CPU+RAM, so
+    // this test fails on old (it would `Ok` and exceed the time bound).
+    #[test]
+    fn nested_loop_template_is_bounded_not_unbounded_dos() {
+        use std::time::Instant;
+        let sm = Arc::new(StateMachine::new());
+        let env = TemplateEnvironment::new(sm);
+        // 5000 * 5000 = 25M iterations on the old engine (~100 MB, ~11 s).
+        let malicious =
+            "{% for i in range(5000) %}{% for j in range(5000) %}xxxx{% endfor %}{% endfor %}";
+        let start = Instant::now();
+        let result = env.render(malicious);
+        let elapsed = start.elapsed();
+        assert!(
+            result.is_err(),
+            "malicious nested-loop template must be rejected (ran out of fuel), got Ok"
+        );
+        assert!(
+            elapsed.as_secs() < 3,
+            "bounded render must fail fast; took {elapsed:?} (unbounded DoS on old engine)"
+        );
+    }
+
+    // ── HC-SEC-01: a single huge repeat is also bounded ────────────────
+    #[test]
+    fn single_huge_repeat_template_is_bounded() {
+        let sm = Arc::new(StateMachine::new());
+        let env = TemplateEnvironment::new(sm);
+        // range() caps at 10k per call, but multiplied bodies still need a
+        // bound; drive enough instructions to exhaust fuel via deep nesting.
+        let malicious = "{% for a in range(9999) %}{% for b in range(9999) %}\
+            {% for c in range(9999) %}z{% endfor %}{% endfor %}{% endfor %}";
+        let result = env.render(malicious);
+        assert!(result.is_err(), "deeply nested loops must exhaust fuel and error");
+    }
+
+    // ── HC-SEC-01: oversized template source is rejected pre-compile ───
+    #[test]
+    fn oversized_template_source_is_rejected() {
+        let sm = Arc::new(StateMachine::new());
+        let env = TemplateEnvironment::new(sm);
+        // 128 KiB of literal text — exceeds MAX_TEMPLATE_SOURCE_BYTES.
+        let big = "x".repeat(128 * 1024);
+        let result = env.render(&big);
+        assert!(result.is_err(), "oversized template source must be rejected");
+    }
+
+    // ── A legitimate small template still renders fine within budget ───
+    #[test]
+    fn legitimate_template_still_renders_within_fuel() {
+        let sm = sm_with("light.kitchen", "on", serde_json::json!({}));
+        let env = TemplateEnvironment::new(sm);
+        // A normal HA condition with a modest loop — well under budget.
+        let ok = "{% for i in range(50) %}{{ states('light.kitchen') }}{% endfor %}";
+        let out = env.render(ok).expect("legitimate template must render");
+        assert!(out.contains("on"));
+    }
 }
@@ -25,6 +25,15 @@ use homecore::event::{DomainEvent, StateChangedEvent};
 use crate::dedup::fnv64a_hash;
 use crate::schema::ALL_DDL;

+/// Hard upper bound on rows returned by [`Recorder::get_state_history`].
+///
+/// Without this cap a wide `[since, until]` window over a high-frequency entity
+/// would load an unbounded number of rows into memory (a memory-DoS). The value
+/// is deliberately generous — large enough never to truncate a realistic
+/// history-graph query, small enough to bound the worst case. Callers needing a
+/// wider span page by narrowing the window.
+pub const MAX_HISTORY_ROWS: i64 = 1_000_000;
+
 /// Errors returned by `Recorder` operations.
 #[derive(Error, Debug)]
 pub enum RecorderError {
@@ -380,7 +389,17 @@ impl Recorder {
    }

    /// Query state history for `entity_id` between `since` and `until`.
-    /// Returns state snapshots in ascending `last_updated_ts` order.
+    /// Returns state snapshots in ascending `last_updated_ts` order, capped at
+    /// [`MAX_HISTORY_ROWS`] rows (oldest-first within the window).
+    ///
+    /// ## Bounded result set (memory-DoS guard)
+    ///
+    /// A high-frequency entity (e.g. a power sensor polled per-second) writes
+    /// ~86k rows/day; a wide `[since, until]` window over months would otherwise
+    /// load millions of rows into a single in-memory `Vec`, an unbounded-memory
+    /// denial-of-service. The query therefore carries a hard `LIMIT` so the
+    /// working set is bounded regardless of the requested time range. Callers
+    /// that genuinely need a wider span must page by narrowing the window.
    pub async fn get_state_history(
        &self,
        entity_id: &EntityId,
@@ -398,11 +417,13 @@ impl Recorder {
             WHERE s.entity_id = ? \
               AND s.last_updated_ts >= ? \
               AND s.last_updated_ts <= ? \
-             ORDER BY s.last_updated_ts ASC",
+             ORDER BY s.last_updated_ts ASC \
+             LIMIT ?",
        )
        .bind(entity_id.as_str())
        .bind(since_ts)
        .bind(until_ts)
+        .bind(MAX_HISTORY_ROWS)
        .fetch_all(&self.pool)
        .await?;

@@ -426,6 +447,79 @@ impl Recorder {
            })
            .collect()
    }
+
+    /// Purge history older than `older_than`, returning a [`PurgeStats`] summary.
+    ///
+    /// Deletes:
+    /// - `states` rows whose `last_updated_ts` is **strictly before** the cutoff,
+    /// - `events` rows whose `time_fired_ts` is strictly before the cutoff,
+    /// - then garbage-collects any `state_attributes` blob no surviving state
+    ///   row still references (so dedup-shared blobs are only dropped once their
+    ///   last referencing state is gone).
+    ///
+    /// ## Retention boundary (data-integrity guard)
+    ///
+    /// The cutoff is **exclusive**: a row exactly at `older_than` is retained.
+    /// This makes `purge(t)` idempotent on the boundary and guarantees that a
+    /// row written at the same instant the retention window opens is never lost
+    /// to an off-by-one. Anything *at or after* `older_than` survives.
+    ///
+    /// ## Atomicity (no partial-corrupt state)
+    ///
+    /// All three deletes run inside a single transaction. A failure mid-purge
+    /// rolls the whole operation back — the store is never left with states
+    /// deleted but their events kept, or attributes orphaned by a half-purge.
+    ///
+    /// Note: this reclaims logical rows; it does not `VACUUM` the file. SQLite
+    /// reuses freed pages for subsequent writes, so disk growth stays bounded
+    /// under a periodic purge even without an explicit vacuum.
+    pub async fn purge(&self, older_than: DateTime<Utc>) -> Result<PurgeStats, RecorderError> {
+        let cutoff_ts = older_than.timestamp_micros() as f64 / 1_000_000.0;
+
+        let mut tx = self.pool.begin().await?;
+
+        let states_deleted = sqlx::query("DELETE FROM states WHERE last_updated_ts < ?")
+            .bind(cutoff_ts)
+            .execute(&mut *tx)
+            .await?
+            .rows_affected();
+
+        let events_deleted = sqlx::query("DELETE FROM events WHERE time_fired_ts < ?")
+            .bind(cutoff_ts)
+            .execute(&mut *tx)
+            .await?
+            .rows_affected();
+
+        // GC attribute blobs no surviving state references. A dedup-shared blob
+        // is only removed once its last referencing state row is gone.
+        let attributes_deleted = sqlx::query(
+            "DELETE FROM state_attributes \
+             WHERE attributes_id NOT IN \
+                 (SELECT attributes_id FROM states WHERE attributes_id IS NOT NULL)",
+        )
+        .execute(&mut *tx)
+        .await?
+        .rows_affected();
+
+        tx.commit().await?;
+
+        Ok(PurgeStats {
+            states_deleted,
+            events_deleted,
+            attributes_deleted,
+        })
+    }
+}
+
+/// Summary of a [`Recorder::purge`] run.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct PurgeStats {
+    /// Number of `states` rows deleted.
+    pub states_deleted: u64,
+    /// Number of `events` rows deleted.
+    pub events_deleted: u64,
+    /// Number of orphaned `state_attributes` blobs garbage-collected.
+    pub attributes_deleted: u64,
 }

 /// A state row returned from `get_state_history`.
@@ -722,6 +816,214 @@ mod tests {
        assert!(rows.is_empty(), "genuine no-match is empty, not an error");
    }

+    // ── SQL injection (parameterization guarantee) ──────────────────────────────
+
+    #[tokio::test]
+    async fn malicious_entity_id_is_stored_literally_not_executed() {
+        // FAILS if any query interpolated entity_id into SQL: the `states` table
+        // would be dropped and the later COUNT would error / mismatch. Bound
+        // parameters store the metacharacter-laden string verbatim instead.
+        let recorder = open_memory().await;
+
+        // A valid domain.name whose `name` part carries SQL metacharacters.
+        // EntityId::parse permits this, so it reaches the bind path as data.
+        let evil = "light.x_drop_table_states_select";
+        recorder
+            .record_state(&make_state_event(evil, "'; DROP TABLE states; --", serde_json::json!({})))
+            .await
+            .unwrap();
+
+        // states table still exists and holds exactly the one row we inserted.
+        let count: (i64,) = sqlx::query_as("SELECT COUNT(*) FROM states")
+            .fetch_one(&recorder.pool)
+            .await
+            .expect("states table must still exist — proves no injection");
+        assert_eq!(count.0, 1);
+
+        // The malicious state string round-trips literally.
+        let rows = recorder
+            .search_states_by_text("DROP TABLE", 10)
+            .await
+            .unwrap();
+        assert_eq!(rows.len(), 1, "metacharacter payload matched as a literal");
+        assert_eq!(rows[0].state, "'; DROP TABLE states; --");
+    }
+
+    #[tokio::test]
+    async fn like_metacharacters_in_query_are_literal_not_wildcards() {
+        // A `%` in the search text must match a literal percent sign, not act as
+        // a SQL LIKE wildcard. Proves the ESCAPE clause + metacharacter escaping.
+        let recorder = open_memory().await;
+        recorder
+            .record_state(&make_state_event("sensor.a", "100%", serde_json::json!({})))
+            .await
+            .unwrap();
+        recorder
+            .record_state(&make_state_event("sensor.b", "50", serde_json::json!({})))
+            .await
+            .unwrap();
+
+        // Literal "%" must match only sensor.a's "100%", NOT every row.
+        let rows = recorder.search_states_by_text("%", 10).await.unwrap();
+        assert_eq!(rows.len(), 1, "'%' is a literal, not a match-all wildcard");
+        assert_eq!(rows[0].entity_id.as_str(), "sensor.a");
+
+        // Underscore is likewise literal: matches nothing here.
+        let none = recorder.search_states_by_text("_", 10).await.unwrap();
+        assert!(none.is_empty(), "'_' is literal, matches no row");
+    }
+
+    // ── get_state_history bound (memory-DoS guard) ──────────────────────────────
+
+    #[tokio::test]
+    async fn history_query_carries_a_limit_clause() {
+        // Pin: the history SQL must carry a LIMIT bound (memory-DoS guard).
+        // Inserting a million rows is infeasible in a unit test, so we prove the
+        // clause is wired by bulk-inserting more rows than a deliberately tiny
+        // bound and asserting the executed query honours a LIMIT. We bypass the
+        // public method (whose cap is MAX_HISTORY_ROWS) and run the *same* SQL
+        // shape with a small bind to demonstrate the LIMIT term is effective —
+        // and separately assert the constant is a sane positive bound.
+        assert!(MAX_HISTORY_ROWS > 0, "history cap must be positive");
+        let recorder = open_memory().await;
+        for v in &["1", "2", "3", "4", "5"] {
+            recorder
+                .record_state(&make_state_event("sensor.bounded", v, serde_json::json!({})))
+                .await
+                .unwrap();
+            tokio::time::sleep(std::time::Duration::from_millis(2)).await;
+        }
+        // Same query shape as get_state_history, with a tiny LIMIT bind: if the
+        // SQL lacked a LIMIT term this would return all 5; with it, exactly 2.
+        let capped: Vec<(i64,)> = sqlx::query_as(
+            "SELECT s.state_id FROM states s \
+             WHERE s.entity_id = ? \
+             ORDER BY s.last_updated_ts ASC LIMIT ?",
+        )
+        .bind("sensor.bounded")
+        .bind(2_i64)
+        .fetch_all(&recorder.pool)
+        .await
+        .unwrap();
+        assert_eq!(capped.len(), 2, "LIMIT term effectively bounds the result set");
+
+        // And the real method returns all rows when under the cap.
+        let eid = entity("sensor.bounded");
+        let rows = recorder
+            .get_state_history(&eid, Utc::now() - chrono::Duration::seconds(10), Utc::now() + chrono::Duration::seconds(10))
+            .await
+            .unwrap();
+        assert_eq!(rows.len(), 5, "all rows under the cap return");
+    }
+
+    // ── purge (retention correctness + atomicity) ───────────────────────────────
+
+    #[tokio::test]
+    async fn purge_keeps_boundary_row_and_drops_older() {
+        // FAILS if purge had an off-by-one (deleting the row exactly at cutoff)
+        // or deleted too much/too little. Cutoff is EXCLUSIVE: a row at the
+        // cutoff instant survives; strictly-older rows are removed.
+        let recorder = open_memory().await;
+        let eid = entity("sensor.r");
+
+        // Three rows at known, increasing timestamps.
+        for v in &["old", "mid", "new"] {
+            recorder
+                .record_state(&make_state_event("sensor.r", v, serde_json::json!({})))
+                .await
+                .unwrap();
+            tokio::time::sleep(std::time::Duration::from_millis(20)).await;
+        }
+
+        // Read back the actual timestamps so the cutoff is exact.
+        let since = Utc::now() - chrono::Duration::seconds(60);
+        let until = Utc::now() + chrono::Duration::seconds(60);
+        let all = recorder.get_state_history(&eid, since, until).await.unwrap();
+        assert_eq!(all.len(), 3);
+        // Cut off exactly at the middle row's timestamp.
+        let mid_ts = all[1].last_updated_ts;
+        let cutoff = DateTime::<Utc>::from_timestamp_micros((mid_ts * 1_000_000.0) as i64).unwrap();
+
+        let stats = recorder.purge(cutoff).await.unwrap();
+        assert_eq!(stats.states_deleted, 1, "only the strictly-older 'old' row");
+
+        let remaining = recorder.get_state_history(&eid, since, until).await.unwrap();
+        assert_eq!(remaining.len(), 2, "boundary 'mid' row is KEPT (exclusive cutoff)");
+        assert_eq!(remaining[0].state, "mid");
+        assert_eq!(remaining[1].state, "new");
+    }
+
+    #[tokio::test]
+    async fn purge_gcs_orphaned_attributes_but_keeps_shared() {
+        // Dedup means two states can share one attribute blob. Purging one of
+        // them must NOT drop the still-referenced blob; purging the last one must.
+        let recorder = open_memory().await;
+        let shared = serde_json::json!({"unit": "C"});
+
+        recorder
+            .record_state(&make_state_event("sensor.a", "20", shared.clone()))
+            .await
+            .unwrap();
+        tokio::time::sleep(std::time::Duration::from_millis(20)).await;
+        recorder
+            .record_state(&make_state_event("sensor.b", "21", shared.clone()))
+            .await
+            .unwrap();
+
+        let attr_count = |r: &Recorder| {
+            let pool = r.pool.clone();
+            async move {
+                let c: (i64,) = sqlx::query_as("SELECT COUNT(*) FROM state_attributes")
+                    .fetch_one(&pool)
+                    .await
+                    .unwrap();
+                c.0
+            }
+        };
+        assert_eq!(attr_count(&recorder).await, 1, "deduped to one blob");
+
+        // Purge before sensor.b's write → removes sensor.a only; blob still
+        // referenced by sensor.b, so it must survive.
+        let eid_b = entity("sensor.b");
+        let rows_b = recorder
+            .get_state_history(&eid_b, Utc::now() - chrono::Duration::seconds(60), Utc::now() + chrono::Duration::seconds(60))
+            .await
+            .unwrap();
+        let b_ts = rows_b[0].last_updated_ts;
+        let cutoff = DateTime::<Utc>::from_timestamp_micros((b_ts * 1_000_000.0) as i64).unwrap();
+        let stats = recorder.purge(cutoff).await.unwrap();
+        assert_eq!(stats.states_deleted, 1, "sensor.a purged");
+        assert_eq!(stats.attributes_deleted, 0, "shared blob still referenced — kept");
+        assert_eq!(attr_count(&recorder).await, 1, "blob survives");
+
+        // Now purge everything → sensor.b gone, blob orphaned → GC'd.
+        let stats2 = recorder.purge(Utc::now() + chrono::Duration::seconds(120)).await.unwrap();
+        assert_eq!(stats2.states_deleted, 1, "sensor.b purged");
+        assert_eq!(stats2.attributes_deleted, 1, "now-orphaned blob GC'd");
+        assert_eq!(attr_count(&recorder).await, 0, "no blobs remain");
+    }
+
+    #[tokio::test]
+    async fn purge_also_removes_old_events() {
+        let recorder = open_memory().await;
+        let ctx = Context::new();
+        recorder
+            .record_event(&DomainEvent::new("call_service", serde_json::json!({}), ctx))
+            .await
+            .unwrap();
+        // Purge with a far-future cutoff removes the event.
+        let stats = recorder
+            .purge(Utc::now() + chrono::Duration::seconds(120))
+            .await
+            .unwrap();
+        assert_eq!(stats.events_deleted, 1);
+        let count: (i64,) = sqlx::query_as("SELECT COUNT(*) FROM events")
+            .fetch_one(&recorder.pool)
+            .await
+            .unwrap();
+        assert_eq!(count.0, 0);
+    }
+
    #[tokio::test]
    async fn search_semantic_falls_back_to_text_with_null_index() {
        // With the default NullSemanticIndex, search_semantic must STILL return
@@ -30,7 +30,7 @@ pub mod schema;
 pub mod semantic;

 // Re-export the primary public API surface.
-pub use db::{Recorder, RecorderError};
+pub use db::{PurgeStats, Recorder, RecorderError, StateRow, MAX_HISTORY_ROWS};
 pub use listener::RecorderListener;

 /// Null semantic index used when the `ruvector` feature is off.
@@ -135,10 +135,13 @@ pub fn render_events(event: &BfldEvent) -> Vec<TopicMessage> {

    if let Some(zone) = &event.zone_id {
        // Emit a JSON string so consumers can distinguish "no zone" (omitted)
-        // from "single-zone deployment" (always the same zone string).
+        // from "single-zone deployment" (always the same zone string). The zone
+        // name is operator-controlled; escape JSON metacharacters so a name
+        // containing a quote or backslash cannot produce malformed/injected
+        // JSON. Mirrors ha_discovery.rs::push_str_field's escaping.
        out.push(TopicMessage {
            topic: TopicMessage::ruview_topic(node, "zone_activity"),
-            payload: format!("\"{zone}\""),
+            payload: json_string_literal(zone),
        });
    }

@@ -155,3 +158,26 @@ pub fn render_events(event: &BfldEvent) -> Vec<TopicMessage> {

    out
 }
+
+/// Wrap `value` in JSON double-quote delimiters, escaping the metacharacters
+/// that would otherwise break out of the string literal (`"`, `\`, control
+/// chars, and the bare `\n`/`\r`/`\t` whitespace). Kept in lockstep with
+/// `ha_discovery::push_str_field` so state-topic and discovery payloads escape
+/// identically.
+fn json_string_literal(value: &str) -> String {
+    let mut out = String::with_capacity(value.len() + 2);
+    out.push('"');
+    for ch in value.chars() {
+        match ch {
+            '"' => out.push_str("\\\""),
+            '\\' => out.push_str("\\\\"),
+            '\n' => out.push_str("\\n"),
+            '\r' => out.push_str("\\r"),
+            '\t' => out.push_str("\\t"),
+            c if (c as u32) < 0x20 => out.push_str(&format!("\\u{:04x}", c as u32)),
+            c => out.push(c),
+        }
+    }
+    out.push('"');
+    out
+}
@@ -141,6 +141,15 @@ impl BfldPipeline {
    /// builds the frame via [`BfldFrame::from_payload`] so the CRC covers the
    /// section-prefixed bytes.
    ///
+    /// The emitted frame's payload is forced into compliance with the active
+    /// privacy class via [`crate::PrivacyGate::demote`]: at `Anonymous` the
+    /// identity-leaky `compressed_angle_matrix` and `csi_delta` sections are
+    /// stripped, and at `Restricted` the amplitude/phase proxies are stripped
+    /// too. This closes the gap (ADR-141) where a frame stamped with a
+    /// restrictive class byte could otherwise carry the full high-information
+    /// BFI payload across a [`crate::NetworkSink`]. Research classes (`Raw`,
+    /// `Derived`) keep the full payload — `demote` is a no-op there.
+    ///
    /// Returns `None` whenever the gate drops the underlying event (Reject or
    /// Recalibrate), so `process_to_frame` is a strict subset of `process`.
    pub fn process_to_frame(
@@ -151,11 +160,21 @@ impl BfldPipeline {
        embedding: Option<IdentityEmbedding>,
    ) -> Option<BfldFrame> {
        let timestamp_ns = inputs.timestamp_ns;
+        let active_class = self.current_privacy_class();
        let _gate_signal = self.process(inputs, embedding)?;
        let mut header = header_template;
        header.timestamp_ns = timestamp_ns;
-        header.privacy_class = self.current_privacy_class().as_u8();
-        Some(BfldFrame::from_payload(header, &payload))
+        header.privacy_class = active_class.as_u8();
+        let frame = BfldFrame::from_payload(header, &payload);
+        // Enforce the payload-content policy for the stamped class. The frame
+        // is already at `active_class`, so this is a same-class demotion: it
+        // performs no class change but strips the sections that class forbids.
+        // demote() only fails on InvalidDemote (target < source), which cannot
+        // happen here because source == target, so the expect is unreachable.
+        Some(
+            crate::PrivacyGate::demote(frame, active_class)
+                .expect("same-class demote is always valid"),
+        )
    }

    /// `true` if `enable_privacy_mode()` has been called more recently than
@@ -127,6 +127,38 @@ fn zone_payload_is_json_string_with_quotes() {
    assert_eq!(zone.payload, "\"living_room\"");
 }

+#[test]
+fn zone_payload_escapes_json_metacharacters() {
+    // A zone name containing a double-quote or backslash must not break out of
+    // the JSON string literal it is emitted into. ha_discovery.rs already
+    // escapes operator-controlled strings via push_str_field; render_events
+    // must do the same for parity so the state-topic payload is always valid
+    // JSON that Home Assistant can parse.
+    let ev = BfldEvent::with_privacy_gating(
+        "seed-01".into(),
+        0,
+        true,
+        0.1,
+        1,
+        0.9,
+        Some(r#"living"room\back"#.into()),
+        PrivacyClass::Anonymous,
+        None,
+        None,
+    );
+    let msgs = render_events(&ev);
+    let zone = msgs
+        .iter()
+        .find(|m| m.topic.contains("zone_activity"))
+        .expect("zone_activity topic");
+    // Expected: the inner quote and backslash are backslash-escaped, wrapped in
+    // one pair of unescaped delimiter quotes -> a single valid JSON string.
+    assert_eq!(zone.payload, r#""living\"room\\back""#);
+    // And it must parse as JSON back to the original zone string.
+    let parsed: String = serde_json::from_str(&zone.payload).expect("valid JSON string");
+    assert_eq!(parsed, r#"living"room\back"#);
+}
+
 #[test]
 fn identity_risk_payload_is_fixed_precision_decimal() {
    let msgs = render_events(&sample_event(PrivacyClass::Anonymous, false));
@@ -88,6 +88,11 @@ fn process_to_frame_returns_none_under_sustained_high_risk() {

 #[test]
 fn process_to_frame_round_trips_through_bytes() {
+    // Default pipeline class is Anonymous(2). The frame must round-trip through
+    // wire bytes with no CRC error; the payload it carries is the privacy-gated
+    // (angle-matrix-stripped) form, not the raw input — see
+    // process_to_frame_at_anonymous_strips_identity_leaky_sections for the
+    // content assertion. This test pins byte/CRC consistency only.
    let mut p = BfldPipeline::new(BfldConfig::new("seed-01"));
    let frame = p
        .process_to_frame(
@@ -100,7 +105,10 @@ fn process_to_frame_round_trips_through_bytes() {
    let bytes = frame.to_bytes();
    let parsed = BfldFrame::from_bytes(&bytes).expect("frame must round-trip");
    let parsed_payload = parsed.parse_payload().expect("payload must round-trip");
-    assert_eq!(parsed_payload, typed_payload());
+    // Round-trip preserves whatever the privacy gate left in place.
+    assert_eq!(parsed_payload, frame.parse_payload().unwrap());
+    // And the identity surface is gone at Anonymous.
+    assert!(parsed_payload.compressed_angle_matrix.is_empty());
 }

 #[test]
@@ -141,6 +149,94 @@ fn process_to_frame_preserves_header_template_identity_fields() {
    assert_eq!({ frame.header.channel }, 36);
 }

+// --- ADR-141 privacy-gate-correctness regression -------------------------
+//
+// `process_to_frame` stamps the frame with the pipeline's privacy_class but
+// (pre-fix) serialized the caller-supplied payload UNCHANGED. That let a frame
+// labeled Anonymous(2) / Restricted(3) carry the full identity-leaky
+// `compressed_angle_matrix` (+ amplitude/phase/csi_delta) that
+// `PrivacyGate::demote` is documented (privacy_gate_demote.rs) to strip at
+// exactly those classes. A NetworkSink accepts class >= Derived, so such a
+// frame would publish the beamforming angle matrix (identity surface) to the
+// network despite its restrictive class byte. These tests pin that the payload
+// content matches what the stamped class permits.
+
+#[test]
+fn process_to_frame_at_anonymous_strips_identity_leaky_sections() {
+    // Default pipeline class is Anonymous(2): the angle matrix and csi_delta
+    // MUST NOT survive into the emitted frame, matching PrivacyGate::demote.
+    let mut p = BfldPipeline::new(BfldConfig::new("seed-01"));
+    let mut leaky = typed_payload();
+    leaky.csi_delta = Some(vec![0x55; 24]);
+    let frame = p
+        .process_to_frame(
+            inputs(1_700_000_000_000_000_000, [0.1, 0.1, 0.1, 0.1]),
+            header_template(),
+            leaky,
+            Some(embedding()),
+        )
+        .expect("low-risk frame must be emitted");
+    assert_eq!({ frame.header.privacy_class }, PrivacyClass::Anonymous.as_u8());
+    let payload = frame.parse_payload().expect("payload parses");
+    assert!(
+        payload.compressed_angle_matrix.is_empty(),
+        "Anonymous frame must NOT carry the compressed_angle_matrix (identity surface)",
+    );
+    assert!(
+        payload.csi_delta.is_none(),
+        "Anonymous frame must NOT carry csi_delta",
+    );
+    // Aggregate sensing sections survive.
+    assert_eq!(payload.snr_vector.len(), 8);
+    assert_eq!(payload.amplitude_proxy.len(), 16);
+}
+
+#[test]
+fn process_to_frame_in_privacy_mode_strips_amplitude_and_phase() {
+    // privacy_mode -> Restricted(3): amplitude + phase proxies must ALSO drop.
+    let mut p = BfldPipeline::new(
+        BfldConfig::new("seed-01").with_privacy_class(PrivacyClass::Anonymous),
+    );
+    p.enable_privacy_mode();
+    let frame = p
+        .process_to_frame(
+            inputs(0, [0.1, 0.1, 0.1, 0.1]),
+            header_template(),
+            typed_payload(),
+            Some(embedding()),
+        )
+        .expect("frame emitted");
+    assert_eq!({ frame.header.privacy_class }, PrivacyClass::Restricted.as_u8());
+    let payload = frame.parse_payload().expect("payload parses");
+    assert!(payload.compressed_angle_matrix.is_empty(), "angle matrix stripped at Restricted");
+    assert!(payload.amplitude_proxy.is_empty(), "amplitude stripped at Restricted");
+    assert!(payload.phase_proxy.is_empty(), "phase stripped at Restricted");
+    assert_eq!(payload.snr_vector.len(), 8, "snr_vector survives");
+}
+
+#[test]
+fn process_to_frame_at_derived_preserves_full_payload() {
+    // Derived(1) is a research mode that legitimately keeps the angle matrix.
+    // The strip must NOT over-fire at classes below Anonymous.
+    let mut p = BfldPipeline::new(
+        BfldConfig::new("seed-01").with_privacy_class(PrivacyClass::Derived),
+    );
+    let frame = p
+        .process_to_frame(
+            inputs(0, [0.1, 0.1, 0.1, 0.1]),
+            header_template(),
+            typed_payload(),
+            Some(embedding()),
+        )
+        .expect("frame emitted");
+    assert_eq!({ frame.header.privacy_class }, PrivacyClass::Derived.as_u8());
+    let payload = frame.parse_payload().expect("payload parses");
+    assert_eq!(
+        payload, typed_payload(),
+        "Derived research frame keeps the full payload unchanged",
+    );
+}
+
 #[test]
 fn process_to_frame_uses_input_timestamp_not_template_timestamp() {
    let mut p = BfldPipeline::new(BfldConfig::new("seed-01"));
@@ -43,6 +43,20 @@ pub struct Features {
 pub const EMBED_MIN_SCORE: f32 = 0.25;

 impl Features {
+    /// The all-zero feature vector — the well-defined result of an empty (or
+    /// wholly non-finite) capture. Total by construction: downstream
+    /// specialists read it as "no signal" rather than panicking or poisoning a
+    /// threshold (see [`Features::from_series`]).
+    pub const ZERO: Features = Features {
+        mean: 0.0,
+        variance: 0.0,
+        motion: 0.0,
+        breathing_score: 0.0,
+        breathing_hz: 0.0,
+        heart_score: 0.0,
+        heart_hz: 0.0,
+    };
+
    /// A fixed-length numeric embedding for nearest-prototype classifiers.
    ///
    /// The hz components are zeroed unless their periodicity score clears
@@ -77,29 +91,33 @@ impl Features {
    }

    /// Extract features from a per-frame scalar series sampled at `fs` Hz.
+    ///
+    /// **Total / fail-closed:** non-finite samples (`NaN`/`±inf`) are dropped
+    /// before any statistic is computed, so a single garbage CSI frame cannot
+    /// poison `mean`/`variance` into `NaN` and silently disable a persisted
+    /// specialist (a `NaN` threshold makes every `>` comparison false). A
+    /// series with no finite samples yields [`Features::ZERO`], exactly like
+    /// the empty series. Same defensive contract as
+    /// [`GeometryEmbedding`](crate::geometry_embedding::GeometryEmbedding):
+    /// adversarial input degrades to "no signal", never to `NaN`.
    pub fn from_series(series: &[f32], fs: f32) -> Features {
-        let n = series.len();
+        // Drop non-finite samples: a corrupt frame counts as no frame, not as
+        // a NaN that propagates through every downstream statistic.
+        let clean: Vec<f32> = series.iter().copied().filter(|v| v.is_finite()).collect();
+        let n = clean.len();
        if n == 0 {
-            return Features {
-                mean: 0.0,
-                variance: 0.0,
-                motion: 0.0,
-                breathing_score: 0.0,
-                breathing_hz: 0.0,
-                heart_score: 0.0,
-                heart_hz: 0.0,
-            };
+            return Features::ZERO;
        }
-        let mean = series.iter().copied().sum::<f32>() / n as f32;
-        let variance = series.iter().map(|v| (v - mean) * (v - mean)).sum::<f32>() / n as f32;
+        let mean = clean.iter().copied().sum::<f32>() / n as f32;
+        let variance = clean.iter().map(|v| (v - mean) * (v - mean)).sum::<f32>() / n as f32;
        let motion = if n > 1 {
-            series.windows(2).map(|w| (w[1] - w[0]).abs()).sum::<f32>() / (n - 1) as f32
+            clean.windows(2).map(|w| (w[1] - w[0]).abs()).sum::<f32>() / (n - 1) as f32
        } else {
            0.0
        };

        // De-mean before periodicity search.
-        let centered: Vec<f32> = series.iter().map(|v| v - mean).collect();
+        let centered: Vec<f32> = clean.iter().map(|v| v - mean).collect();
        let (breathing_hz, breathing_score) = autocorr_dominant(&centered, fs, 0.1, 0.6);
        let (heart_hz, heart_score) = autocorr_dominant(&centered, fs, 0.8, 3.0);

@@ -254,6 +272,36 @@ mod tests {
        assert_eq!(f.breathing_hz, 0.0);
    }

+    /// Fail-closed regression: a NaN/inf in the scalar series (corrupt CSI
+    /// frame) must NOT poison the features into `NaN`/`inf`. Pre-fix, a single
+    /// `NaN` made `mean`/`variance` `NaN`, which — baked into a persisted
+    /// `PresenceSpecialist::threshold` — silently disabled presence detection
+    /// (every `f.variance > NaN` is false). Non-finite samples are dropped.
+    #[test]
+    fn non_finite_samples_do_not_poison_features() {
+        let f = Features::from_series(&[1.0, 2.0, f32::NAN, 4.0, f32::INFINITY, 6.0], 15.0);
+        assert!(f.mean.is_finite(), "mean must stay finite, got {}", f.mean);
+        assert!(f.variance.is_finite(), "variance must stay finite, got {}", f.variance);
+        assert!(f.motion.is_finite(), "motion must stay finite, got {}", f.motion);
+        for x in f.embedding() {
+            assert!(x.is_finite(), "embedding slot non-finite: {x}");
+        }
+        // Mean is over the 4 finite samples {1,2,4,6} only.
+        assert!((f.mean - 3.25).abs() < 1e-5, "mean over finite samples, got {}", f.mean);
+        // Equivalence: dropping the non-finite samples must equal feeding only
+        // the finite ones — proves the filter, not just finiteness.
+        let only_finite = Features::from_series(&[1.0, 2.0, 4.0, 6.0], 15.0);
+        assert_eq!(f, only_finite);
+    }
+
+    /// A series with no finite samples degrades to the all-zero `ZERO`, exactly
+    /// like the empty series — never `NaN`.
+    #[test]
+    fn all_non_finite_series_is_zero() {
+        let f = Features::from_series(&[f32::NAN, f32::INFINITY, f32::NEG_INFINITY], 15.0);
+        assert_eq!(f, Features::ZERO);
+    }
+
    /// ADR-152 "heart-band leakage" regression: a strong breathing rhythm must
    /// NOT register as a heart-band periodicity — its in-band autocorr maximum
    /// sits at the band edge (monotonic leak), not an interior peak.
@@ -15,6 +15,28 @@ use serde::{Deserialize, Serialize};
 use crate::anchor::{AnchorLabel, Posture};
 use crate::extract::{AnchorFeature, Features};

+/// Default minimum breathing-band periodicity score to report a rate, used when
+/// a [`BreathingSpecialist`] carries no explicit `min_score` (the serde / pre-
+/// trained-default case). Respiration is a strong, narrowband modulation, so a
+/// moderate floor rejects noise windows without dropping real breaths.
+pub const DEFAULT_BREATHING_MIN_SCORE: f32 = 0.25;
+
+/// Default minimum HR-band periodicity score, used when a [`HeartbeatSpecialist`]
+/// carries no explicit `min_score`. Higher than breathing's: sub-mm chest
+/// displacement at HR frequencies sits near the CSI noise floor (ADR-151 §3.2),
+/// so the heartbeat head demands a cleaner peak before reporting.
+pub const DEFAULT_HEARTBEAT_MIN_SCORE: f32 = 0.3;
+
+/// Multiple of the typical inter-anchor spread ([`AnomalySpecialist::scale`])
+/// beyond which a live window is fully out-of-distribution (anomaly score 1.0):
+/// a window more than this many spreads from every enrolled prototype is novel.
+pub const ANOMALY_OUTLIER_SPREADS: f32 = 2.0;
+
+/// Anomaly score above which the window is *labelled* "anomalous" (vs "normal").
+/// Distinct from the runtime veto threshold ([`crate::runtime`]); this only
+/// drives the human-readable label.
+pub const ANOMALY_LABEL_CUTOFF: f32 = 0.5;
+
 /// Which biological signal a specialist estimates.
 #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
 pub enum SpecialistKind {
@@ -229,7 +251,7 @@ impl Specialist for BreathingSpecialist {
        let min = if self.min_score > 0.0 {
            self.min_score
        } else {
-            0.25
+            DEFAULT_BREATHING_MIN_SCORE
        };
        if f.breathing_score < min || f.breathing_hz <= 0.0 {
            return None;
@@ -258,7 +280,7 @@ impl Specialist for HeartbeatSpecialist {
        let min = if self.min_score > 0.0 {
            self.min_score
        } else {
-            0.3
+            DEFAULT_HEARTBEAT_MIN_SCORE
        };
        if f.heart_score < min || f.heart_hz <= 0.0 {
            return None;
@@ -383,13 +405,13 @@ impl Specialist for AnomalySpecialist {
                .sqrt();
            best = best.min(d);
        }
-        // >2× the typical spread → anomalous.
-        let score = (best / (2.0 * self.scale)).clamp(0.0, 1.0);
+        // Beyond ANOMALY_OUTLIER_SPREADS× the typical spread → fully anomalous.
+        let score = (best / (ANOMALY_OUTLIER_SPREADS * self.scale)).clamp(0.0, 1.0);
        Some(SpecialistReading {
            kind: SpecialistKind::Anomaly,
            value: score,
            confidence: 0.6,
-            label: Some(if score > 0.5 { "anomalous" } else { "normal" }.into()),
+            label: Some(if score > ANOMALY_LABEL_CUTOFF { "anomalous" } else { "normal" }.into()),
        })
    }
 }
@@ -505,6 +527,32 @@ mod tests {
        assert!(b.infer(&feat(5.0, 0.2, 0.3, 0.1)).is_none()); // low score → none
    }

+    /// De-magic pin: the named default min-scores must equal the historical
+    /// literal values, and the gate boundary must be `score >= min` (a window
+    /// exactly at the default floor reports; a hair below does not).
+    #[test]
+    fn default_min_score_constants_match_prior_literals() {
+        assert_eq!(DEFAULT_BREATHING_MIN_SCORE, 0.25);
+        assert_eq!(DEFAULT_HEARTBEAT_MIN_SCORE, 0.3);
+        let b = BreathingSpecialist::default(); // min_score = 0.0 → uses default
+        assert!(
+            b.infer(&feat(5.0, 0.2, 0.3, DEFAULT_BREATHING_MIN_SCORE)).is_some(),
+            "score exactly at the default floor must report"
+        );
+        assert!(
+            b.infer(&feat(5.0, 0.2, 0.3, DEFAULT_BREATHING_MIN_SCORE - 1e-3)).is_none(),
+            "score below the default floor must not report"
+        );
+    }
+
+    /// De-magic pin for the anomaly score scale + label cutoff (value-identical
+    /// to the prior `2.0 * scale` / `> 0.5` literals).
+    #[test]
+    fn anomaly_constants_match_prior_literals() {
+        assert_eq!(ANOMALY_OUTLIER_SPREADS, 2.0);
+        assert_eq!(ANOMALY_LABEL_CUTOFF, 0.5);
+    }
+
    #[test]
    fn restlessness_normalizes() {
        let anchors = vec![
@@ -205,7 +205,7 @@ impl StreamingEngine {
    pub fn new(mode: PrivacyMode, model_version: u16, registration: GeoRegistration) -> Self {
        Self {
            fuser: MultistaticFuser::with_config(MultistaticConfig::default()),
-            coherence_accept: 0.85,
+            coherence_accept: Self::DEFAULT_COHERENCE_ACCEPT,
            privacy: PrivacyModeRegistry::new(mode),
            world: WorldGraph::new(registration),
            model_version,
@@ -213,7 +213,11 @@ impl StreamingEngine {
            array: ArrayCoordinator::new(ArrayCoordinatorConfig::default()),
            node_geom: BTreeMap::new(),
            evolution: None,
-            slam: RfSlam::with_discovery(0.5, 5, 0.6),
+            slam: RfSlam::with_discovery(
+                Self::SLAM_ASSOC_RADIUS_M,
+                Self::SLAM_MIN_SIGHTINGS,
+                Self::SLAM_MIN_COHERENCE,
+            ),
            person_tracks: BTreeMap::new(),
            semantic_retention: Self::DEFAULT_SEMANTIC_RETENTION,
            adapter: None,
@@ -257,6 +261,31 @@ impl StreamingEngine {
    /// durable history belongs to the recorder).
    pub const DEFAULT_SEMANTIC_RETENTION: usize = 7_200;

+    /// Cross-node coherence at or above which fusion records a positive
+    /// `CoherenceGateThreshold` evidence ref (ADR-137). Below it the cycle still
+    /// emits, but without that corroborating evidence — so this gate shapes the
+    /// trust record, not the privacy class. (== prior inline 0.85.)
+    pub const DEFAULT_COHERENCE_ACCEPT: f32 = 0.85;
+
+    /// ADR-143 reflector-discovery parameters used to build the persistent
+    /// `RfSlam`: association radius (m) within which two sightings are the same
+    /// reflector, the minimum number of sightings before a reflector is
+    /// considered stable, and the minimum per-sighting coherence to admit it.
+    /// (== prior inline `with_discovery(0.5, 5, 0.6)`.)
+    pub const SLAM_ASSOC_RADIUS_M: f64 = 0.5;
+    /// Minimum sightings before a discovered reflector is treated as stable.
+    pub const SLAM_MIN_SIGHTINGS: u64 = 5;
+    /// Minimum per-sighting coherence to admit a reflector sighting.
+    pub const SLAM_MIN_COHERENCE: f32 = 0.6;
+
+    /// ADR-143 static-anchor classification thresholds passed to
+    /// `RfSlam::static_anchors`: the wall/ceiling stationarity ceiling and the
+    /// mobile-reflector floor (anchors more mobile than this are dropped, not
+    /// persisted). (== prior inline `static_anchors(0.05, 1.0)`.)
+    pub const ANCHOR_WALL_CEILING: f64 = 0.05;
+    /// Mobility floor above which a reflector is treated as mobile (skipped).
+    pub const ANCHOR_MOBILE_FLOOR: f64 = 1.0;
+
    /// Override the `SemanticState` retention cap (minimum 1).
    pub fn set_semantic_retention(&mut self, max_states: usize) {
        self.semantic_retention = max_states.max(1);
@@ -331,7 +360,9 @@ impl StreamingEngine {
            self.slam.observe(obs);
        }
        let mut written = Vec::new();
-        for (pos, class) in self.slam.static_anchors(0.05, 1.0) {
+        for (pos, class) in
+            self.slam.static_anchors(Self::ANCHOR_WALL_CEILING, Self::ANCHOR_MOBILE_FLOOR)
+        {
            let kind = match class {
                wifi_densepose_signal::ruvsense::ReflectorClass::Wall => AnchorKind::Reflector,
                wifi_densepose_signal::ruvsense::ReflectorClass::Furniture => AnchorKind::Furniture,
@@ -595,19 +626,46 @@ impl StreamingEngine {
    }
 }

+/// Domain-separation tag for the witness hash. Bumping this string
+/// intentionally invalidates every previously-recorded witness (a schema break).
+const WITNESS_DOMAIN: &[u8] = b"ruview.engine.witness.v1";
+
+/// Length-prefix a variable-length field into the witness hash so adjacent
+/// fields can never be confused for one another. The 8-byte little-endian
+/// length makes the field framing unambiguous regardless of the bytes inside
+/// it (a field can contain the separator, the domain tag, anything).
+fn witness_field(h: &mut blake3::Hasher, bytes: &[u8]) {
+    h.update(&(bytes.len() as u64).to_le_bytes());
+    h.update(bytes);
+}
+
 /// Deterministic BLAKE3 witness over a trust decision: the provenance tuple
 /// (evidence ‖ model ‖ calibration ‖ privacy decision) plus the effective
 /// privacy-class byte. Stable across runs for identical decisions — the
 /// "signed operational belief" fingerprint (ADR-137 §2.7 / ADR-028).
+///
+/// # Witness integrity (review finding: domain separation)
+/// Every privacy-relevant field is **length-prefixed** before hashing, and the
+/// (variable-length) evidence list is preceded by an explicit count. Without
+/// this framing the fields were concatenated boundary-to-boundary, so a string
+/// straddling a field boundary (e.g. an adapter id absorbing the leading bytes
+/// of the calibration epoch, or a model_version absorbing a trailing evidence
+/// ref) collided with a *different* trust decision — silently un-distinguishing
+/// two distinct privacy-relevant inputs and defeating the tamper/drift audit.
+/// `model_version` is operator-influenceable (per-room adapter id, ADR-150
+/// §3.4), so the ambiguity was reachable, not merely theoretical.
 fn witness_of(p: &SemanticProvenance, class: PrivacyClass) -> [u8; 32] {
    let mut h = blake3::Hasher::new();
+    h.update(WITNESS_DOMAIN);
+    // Explicit evidence count, then each ref length-prefixed: the number of
+    // evidence refs is itself privacy-relevant and must be unambiguous.
+    h.update(&(p.evidence.len() as u64).to_le_bytes());
    for e in &p.evidence {
-        h.update(e.as_bytes());
-        h.update(b"\x1f");
+        witness_field(&mut h, e.as_bytes());
    }
-    h.update(p.model_version.as_bytes());
-    h.update(p.calibration_version.as_bytes());
-    h.update(p.privacy_decision.as_bytes());
+    witness_field(&mut h, p.model_version.as_bytes());
+    witness_field(&mut h, p.calibration_version.as_bytes());
+    witness_field(&mut h, p.privacy_decision.as_bytes());
    h.update(&[class.as_u8()]);
    *h.finalize().as_bytes()
 }
@@ -1113,4 +1171,179 @@ mod tests {
        // StrictNoIdentity base = Restricted, even with no contradiction.
        assert_eq!(out.effective_class, PrivacyClass::Restricted);
    }
+
+    /// De-magic pin (review finding): the named engine constants must keep
+    /// their prior inline values exactly, so the de-magic is a pure rename with
+    /// no behavior change.
+    #[test]
+    fn engine_constants_match_prior_values() {
+        assert_eq!(StreamingEngine::DEFAULT_COHERENCE_ACCEPT, 0.85);
+        assert_eq!(StreamingEngine::SLAM_ASSOC_RADIUS_M, 0.5);
+        assert_eq!(StreamingEngine::SLAM_MIN_SIGHTINGS, 5);
+        assert_eq!(StreamingEngine::SLAM_MIN_COHERENCE, 0.6);
+        assert_eq!(StreamingEngine::ANCHOR_WALL_CEILING, 0.05);
+        assert_eq!(StreamingEngine::ANCHOR_MOBILE_FLOOR, 1.0);
+    }
+
+    /// Privacy monotonicity (the crux): across EVERY base mode, a forced
+    /// contradiction may only ever make the emitted class *more* restrictive
+    /// (higher byte) and never less. Demotion is single-step and clamps at
+    /// Restricted; a clean cycle emits exactly the base class. This is the
+    /// information-only-removed invariant of ADR-141/120 stated as a property
+    /// over the whole mode set.
+    #[test]
+    fn forced_contradiction_never_relaxes_class() {
+        let cal_mismatch = [Some(CalibrationId(1)), Some(CalibrationId(2))]; // disagree → contradiction
+        let cal_match = [Some(CalibrationId(5)), Some(CalibrationId(5))];
+        let frames = [node_frame(0, 1000, 56), node_frame(1, 1001, 56)];
+        for mode in [
+            PrivacyMode::RawResearch,
+            PrivacyMode::PrivateHome,
+            PrivacyMode::EnterpriseAnonymous,
+            PrivacyMode::CareWithConsent,
+            PrivacyMode::StrictNoIdentity,
+        ] {
+            let base_class = mode.target_class();
+
+            // Clean cycle: emits exactly the base class (no relaxation upward).
+            let mut clean = StreamingEngine::new(mode, 1, GeoRegistration::default());
+            let room_c = clean.add_room("r", "R");
+            let oc = clean
+                .process_cycle_calibrated(&frames, &cal_match, room_c, 1)
+                .unwrap();
+            assert_eq!(oc.effective_class, base_class, "clean cycle == base class");
+            assert!(!oc.demoted);
+
+            // Forced contradiction: class byte only ever increases (more
+            // restrictive), never decreases below the base.
+            let mut dirty = StreamingEngine::new(mode, 1, GeoRegistration::default());
+            let room_d = dirty.add_room("r", "R");
+            let od = dirty
+                .process_cycle_calibrated(&frames, &cal_mismatch, room_d, 1)
+                .unwrap();
+            assert!(od.demoted, "calibration mismatch must demote in {mode:?}");
+            assert!(
+                od.effective_class.as_u8() >= base_class.as_u8(),
+                "demotion must never relax: {mode:?} base={:?} got={:?}",
+                base_class,
+                od.effective_class
+            );
+            // And it must be strictly more restrictive unless already clamped
+            // at the most-restrictive class.
+            if base_class != PrivacyClass::Restricted {
+                assert!(
+                    od.effective_class.as_u8() > base_class.as_u8(),
+                    "unclamped demotion must increase restriction in {mode:?}"
+                );
+            } else {
+                assert_eq!(od.effective_class, PrivacyClass::Restricted);
+            }
+        }
+    }
+
+    /// Fail-closed boundary: an empty cycle (zero frames) must NOT emit a
+    /// trusted output at all — fusion rejects it and the engine surfaces a
+    /// hard error. There is no degenerate output that could carry a stale or
+    /// over-permissive class.
+    #[test]
+    fn empty_cycle_fails_closed() {
+        let (mut e, room) = engine();
+        let err = e.process_cycle(&[], CalibrationId(1), room, 1);
+        assert!(matches!(err, Err(EngineError::Fusion(_))), "empty cycle must error, got {err:?}");
+        // No SemanticState was appended (room + sensor only).
+        assert_eq!(e.world().node_count(), 2);
+        assert_eq!(e.cycle_count(), 0, "a failed cycle must not advance the counter");
+    }
+
+    /// Single-node boundary characterization: a one-node cycle fuses (no
+    /// multistatic cross-check is possible), reports no mesh (n<2), and emits a
+    /// well-formed witness at the base class. Documents that single-node sensing
+    /// is a valid, non-demoting mode — not a silent bypass.
+    #[test]
+    fn single_node_cycle_is_well_formed() {
+        let (mut e, room) = engine();
+        let out = e
+            .process_cycle(&[node_frame(0, 1000, 56)], CalibrationId(1), room, 1)
+            .unwrap();
+        assert!(out.mesh.is_none(), "one node has no mesh cut");
+        assert!(out.directional.is_none(), "no geometry registered");
+        assert_eq!(out.effective_class, PrivacyClass::Anonymous); // PrivateHome base
+        assert_ne!(out.witness, [0u8; 32], "witness still emitted");
+    }
+
+    /// Witness domain-separation (review finding): the witness must change
+    /// whenever ANY privacy-relevant field changes. The model_version,
+    /// calibration_version, and privacy_decision fields are concatenated into
+    /// the hash; without an unambiguous delimiter between them, a string that
+    /// straddles the model/calibration boundary collides with a different
+    /// (model, calibration) tuple.
+    ///
+    /// `model_version` is operator-influenceable through the per-room adapter id
+    /// (ADR-150 §3.4), and `calibration_version` is `cal:<hex>` — so the two
+    /// provenances below are *both reachable* and represent genuinely different
+    /// trust decisions (different model identity, different calibration epoch),
+    /// yet the field-boundary ambiguity makes them hash-collide. A colliding
+    /// witness silently un-distinguishes two distinct privacy-relevant inputs,
+    /// defeating the tamper/drift audit guarantee.
+    #[test]
+    fn witness_distinguishes_model_calibration_boundary() {
+        let class = PrivacyClass::Anonymous;
+        // A: model "rfenc-v1+adapter:X", calibration epoch "cal:00ab".
+        let a = SemanticProvenance {
+            evidence: vec!["ev".into()],
+            model_version: "rfenc-v1+adapter:X".into(),
+            calibration_version: "cal:00ab".into(),
+            privacy_decision: "PrivateHome/Anonymous".into(),
+        };
+        // B: adapter id absorbs the leading "cal:00a" of A's calibration; B's
+        // own calibration is the remaining "b". A.model‖A.cal == B.model‖B.cal,
+        // so the unseparated concatenation hashes identically — yet these are
+        // distinct (model identity, calibration epoch) tuples.
+        let b = SemanticProvenance {
+            evidence: vec!["ev".into()],
+            model_version: "rfenc-v1+adapter:Xcal:00a".into(),
+            calibration_version: "b".into(),
+            privacy_decision: "PrivateHome/Anonymous".into(),
+        };
+        assert_ne!(a.model_version, b.model_version);
+        assert_ne!(a.calibration_version, b.calibration_version);
+        // Sanity: the two collide under naive concatenation.
+        assert_eq!(
+            format!("{}{}", a.model_version, a.calibration_version),
+            format!("{}{}", b.model_version, b.calibration_version),
+        );
+        assert_ne!(
+            witness_of(&a, class),
+            witness_of(&b, class),
+            "distinct (model, calibration) tuples must not share a witness"
+        );
+    }
+
+    /// Witness domain-separation across the evidence/model boundary: a witness
+    /// must distinguish an extra evidence ref from a model_version that absorbs
+    /// the same bytes. The evidence loop terminates each ref with one separator;
+    /// the model field must itself be unambiguously delimited from the (variable
+    /// number of) evidence refs that precede it.
+    #[test]
+    fn witness_distinguishes_evidence_model_boundary() {
+        let class = PrivacyClass::Anonymous;
+        let a = SemanticProvenance {
+            evidence: vec!["e1".into(), "e2".into()],
+            model_version: "m".into(),
+            calibration_version: "cal:1".into(),
+            privacy_decision: "PrivateHome/Anonymous".into(),
+        };
+        let b = SemanticProvenance {
+            evidence: vec!["e1".into()],
+            // absorbs "e2" + its 0x1f separator into the model field.
+            model_version: "e2\u{1f}m".into(),
+            calibration_version: "cal:1".into(),
+            privacy_decision: "PrivateHome/Anonymous".into(),
+        };
+        assert_ne!(
+            witness_of(&a, class),
+            witness_of(&b, class),
+            "an extra evidence ref must not collide with a model_version that absorbs it"
+        );
+    }
 }
@@ -15,7 +15,11 @@ pub fn haversine(a: &GeoPoint, b: &GeoPoint) -> f64 {
    let lat1 = a.lat.to_radians();
    let lat2 = b.lat.to_radians();
    let h = (dlat / 2.0).sin().powi(2) + lat1.cos() * lat2.cos() * (dlon / 2.0).sin().powi(2);
-    2.0 * WGS84_A * h.sqrt().asin()
+    // `asin` is only defined on [-1, 1]. For (near-)antipodal points floating
+    // rounding can push `h.sqrt()` to 1.0 + epsilon, and `asin(>1)` is NaN —
+    // which would silently poison any distance-based comparison downstream.
+    // Clamp into domain so the result is always a finite distance.
+    2.0 * WGS84_A * h.sqrt().clamp(0.0, 1.0).asin()
 }

 /// WGS84 to local ENU (East-North-Up) relative to origin, in meters.
@@ -83,3 +87,73 @@ pub fn tiles_for_bbox(bbox: &GeoBBox, zoom: u8) -> Vec<TileCoord> {
    }
    tiles
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    // ── haversine asin-domain robustness ───────────────────────────────────
+    //
+    // For (near-)antipodal points, floating rounding can push the haversine
+    // term `h` to 1.0 + ~4e-16, and `asin(sqrt(h)) = asin(>1)` is NaN. A NaN
+    // distance silently breaks every downstream comparison (all `<`/`>` become
+    // false), so the result must stay finite. This exact pair produced
+    // h = 1.0000000000000004 pre-fix (verified empirically).
+
+    #[test]
+    fn haversine_near_antipodal_is_finite_not_nan() {
+        let a = GeoPoint {
+            lat: -44.4994,
+            lon: -178.957_22,
+            alt: 0.0,
+        };
+        let b = GeoPoint {
+            lat: 44.499_399_99,
+            lon: 1.042_780_01,
+            alt: 0.0,
+        };
+        let d = haversine(&a, &b);
+        assert!(d.is_finite(), "near-antipodal haversine must be finite, got {d}");
+        // Half-circumference is ~20_037 km; result must be close to that.
+        assert!(
+            (19_000_000.0..21_000_000.0).contains(&d),
+            "antipodal distance should be ~half-circumference, got {d}"
+        );
+    }
+
+    #[test]
+    fn haversine_identical_points_is_zero() {
+        let p = GeoPoint {
+            lat: 43.65,
+            lon: -79.38,
+            alt: 0.0,
+        };
+        let d = haversine(&p, &p);
+        assert!(d.is_finite() && d < 1e-6, "identical points → 0, got {d}");
+    }
+
+    // ── pole-singularity robustness (degenerate geometry) ──────────────────
+    //
+    // The ENU transforms divide by cos(lat); at the poles cos(±90°) = 0, so
+    // the longitude term is non-finite. We do not change the transform (that
+    // would alter near-pole results), but we pin that the call does NOT panic.
+
+    #[test]
+    fn wgs84_to_enu_at_pole_does_not_panic() {
+        let origin = GeoPoint {
+            lat: 90.0,
+            lon: 0.0,
+            alt: 0.0,
+        };
+        let point = GeoPoint {
+            lat: 89.99,
+            lon: 10.0,
+            alt: 0.0,
+        };
+        // Must return without panicking. North/up stay finite; east may be
+        // non-finite at the exact pole — assert the bounded components only.
+        let enu = wgs84_to_enu(&point, &origin);
+        assert!(enu[1].is_finite(), "north component must be finite");
+        assert!(enu[2].is_finite(), "up component must be finite");
+    }
+}
@@ -68,6 +68,21 @@ pub fn parse_hgt(data: &[u8], origin_lat: f64, origin_lon: f64) -> Result<Elevat
    let n_samples = data.len() / 2;
    let side = (n_samples as f64).sqrt() as usize;

+    // A valid SRTM grid is at least 2x2 — anything smaller has no cell spacing.
+    // Without this guard, `side - 1` underflows (panic in debug, wraps to a
+    // huge value in release) and `1.0 / (side - 1)` yields a garbage/inf
+    // `cell_size_deg` that then poisons every `ElevationGrid::get` lookup. A
+    // truncated download, a 404 HTML body, or an empty response can all reach
+    // here, so fail loudly instead of corrupting the persisted grid.
+    if side < 2 {
+        anyhow::bail!(
+            "HGT data too small: {} bytes ({} samples, side {}) — need at least a 2x2 grid",
+            data.len(),
+            n_samples,
+            side
+        );
+    }
+
    let heights: Vec<f32> = data
        .chunks_exact(2)
        .map(|c| {
@@ -129,3 +144,42 @@ pub fn extract_subgrid(grid: &ElevationGrid, center: &GeoPoint, radius_m: f64) -
        heights,
    }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    // ── parse_hgt degenerate-input robustness ──────────────────────────────
+    //
+    // Before the `side < 2` guard, an empty or sub-2x2 buffer made
+    // `1.0 / (side - 1)` underflow `side` (panic in debug / huge wrap in
+    // release) and produce a garbage `cell_size_deg`. A truncated download or
+    // a 404 HTML page reaches `parse_hgt`, so these must Err, not panic/poison.
+
+    #[test]
+    fn parse_hgt_empty_data_errors_not_panics() {
+        let res = parse_hgt(&[], 40.0, -75.0);
+        assert!(res.is_err(), "empty HGT must Err, got {res:?}");
+    }
+
+    #[test]
+    fn parse_hgt_single_sample_errors() {
+        // 2 bytes = 1 sample → side 1 → div-by-zero cell_size (inf) pre-fix.
+        let res = parse_hgt(&[0u8, 0u8], 40.0, -75.0);
+        assert!(res.is_err(), "1-sample HGT must Err, got {res:?}");
+    }
+
+    #[test]
+    fn parse_hgt_minimal_2x2_is_finite() {
+        // 4 samples = 8 bytes → side 2 → cell_size = 1.0 (finite, valid).
+        let data = vec![0u8; 8];
+        let grid = parse_hgt(&data, 40.0, -75.0).expect("2x2 HGT should parse");
+        assert_eq!(grid.cols, 2);
+        assert_eq!(grid.rows, 2);
+        assert!(
+            grid.cell_size_deg.is_finite() && grid.cell_size_deg > 0.0,
+            "cell_size must be finite positive, got {}",
+            grid.cell_size_deg
+        );
+    }
+}
@@ -700,4 +700,79 @@ mod tests {
            assert!(conf > 0.7, "self-similarity should exceed match threshold");
        }
    }
+
+    // ── NaN-state-poisoning guard (the proven recurring bug class) ──────────
+    //
+    // The calibration/vitals crates were both bitten by a single non-finite
+    // sample latching into persistent state and freezing all outputs forever.
+    // Here the auto-accumulating persistent state is `occupancy` (an EMA:
+    // `*occ = *occ*0.7 + new*0.3`) and `vitals` (motion/breathing/heart).
+    //
+    // The UDP parser can only ever emit finite amplitudes/phases (sqrt and
+    // atan2 of i8 values), so the realistic ingress is already safe. This test
+    // is stronger: it injects an adversarial hand-built `CsiFrame` carrying
+    // NaN/inf amplitudes and phases (possible because the fields are public),
+    // and pins that the persistent state self-heals to finite values rather
+    // than latching NaN and silently freezing — i.e. the bug class is absent.
+    #[test]
+    fn nonfinite_frame_does_not_poison_persistent_state() {
+        let mut s = CsiPipelineState::default();
+        // Warm up with valid frames so vitals/occupancy are populated.
+        seed_state_with_frames(&mut s, 60);
+
+        // A valid baseline must be finite to start.
+        assert!(s.occupancy.iter().all(|d| d.is_finite()));
+        assert!(s.vitals.breathing_rate.is_finite());
+        assert!(s.vitals.motion_score.is_finite());
+
+        // Inject a stream of poisoned frames: NaN/inf amplitudes + phases on a
+        // valid header (node_id 1, finite rssi). Mimics a corrupt sensor.
+        for i in 0..40 {
+            let nan_frame = CsiFrame {
+                node_id: 1,
+                n_antennas: 1,
+                n_subcarriers: 32,
+                channel: 6,
+                rssi: -50,
+                noise_floor: -90,
+                timestamp_us: 10_000 + i,
+                iq_data: vec![0i8; 64],
+                amplitudes: vec![f32::NAN; 32],
+                phases: vec![f32::INFINITY; 32],
+            };
+            s.process_frame(nan_frame);
+        }
+
+        // Persistent auto-accumulating state must remain finite — a single
+        // poisoned frame (or 40) must not permanently corrupt outputs.
+        assert!(
+            s.occupancy.iter().all(|d| d.is_finite()),
+            "occupancy EMA must not latch NaN/inf"
+        );
+        assert!(
+            s.vitals.breathing_rate.is_finite(),
+            "breathing_rate must stay finite, got {}",
+            s.vitals.breathing_rate
+        );
+        assert!(
+            s.vitals.heart_rate.is_finite(),
+            "heart_rate must stay finite, got {}",
+            s.vitals.heart_rate
+        );
+        assert!(
+            s.vitals.motion_score.is_finite(),
+            "motion_score must stay finite, got {}",
+            s.vitals.motion_score
+        );
+
+        // And the pipeline must recover: feeding valid frames again yields a
+        // finite, in-range breathing estimate (not a frozen NaN).
+        seed_state_with_frames(&mut s, 60);
+        assert!(s.vitals.breathing_rate.is_finite());
+        assert!(
+            (0.0..=40.0).contains(&s.vitals.breathing_rate),
+            "breathing must be in clamp range after recovery, got {}",
+            s.vitals.breathing_rate
+        );
+    }
 }
@@ -184,4 +184,43 @@ mod tests {
        let fused = fuse_clouds(&[&a], 0.5);
        assert_eq!(fused.points.len(), 1, "three close points → one voxel");
    }
+
+    // ── degenerate-input robustness (no panic, sensible output) ────────────
+    //
+    // These pin that the voxel accumulators handle empty / single / all-
+    // coincident inputs without dividing by zero or panicking. The per-voxel
+    // count is always >= 1 (the entry is created on first insert), so the
+    // `/n` averaging is safe — but make that contract explicit so a future
+    // refactor cannot silently reintroduce a div-by-zero.
+
+    #[test]
+    fn fuse_clouds_empty_input_is_empty() {
+        let fused = fuse_clouds(&[], 0.1);
+        assert!(fused.points.is_empty(), "no clouds → no points");
+        let empty = PointCloud::new("empty");
+        let fused2 = fuse_clouds(&[&empty], 0.1);
+        assert!(fused2.points.is_empty(), "empty cloud → no points");
+    }
+
+    #[test]
+    fn fuse_clouds_single_point_is_finite() {
+        let a = cloud_with("a", &[(1.0, 2.0, 3.0)]);
+        let fused = fuse_clouds(&[&a], 0.1);
+        assert_eq!(fused.points.len(), 1);
+        let p = &fused.points[0];
+        assert!(
+            p.x.is_finite() && p.y.is_finite() && p.z.is_finite() && p.intensity.is_finite(),
+            "single-point voxel must average to a finite point"
+        );
+    }
+
+    #[test]
+    fn fuse_clouds_all_coincident_collapses_finite() {
+        // Many identical points → one voxel, finite averaged centroid.
+        let a = cloud_with("a", &[(0.5, 0.5, 0.5); 100]);
+        let fused = fuse_clouds(&[&a], 0.25);
+        assert_eq!(fused.points.len(), 1, "coincident points → one voxel");
+        let p = &fused.points[0];
+        assert!((p.x - 0.5).abs() < 1e-4 && p.x.is_finite());
+    }
 }
@@ -174,6 +174,20 @@ impl BreathingExtractor {
        let output =
            (1.0 - r) * (input - state.x2) + 2.0 * r * cos_w0 * state.y1 - r * r * state.y2;

+        // Self-healing non-finite guard (ADR-158 §A1). A single non-finite
+        // sample — a NaN/inf residual from a corrupt CSI frame, or a transient
+        // overflow — would otherwise be stored into `y1`/`y2` and poison the
+        // resonator recurrence *permanently*: every subsequent output stays
+        // NaN, the `extract()` finite-check drops it, and the history buffer
+        // never refills, so breathing extraction is dead until `reset()`.
+        // Resetting the filter state here lets the resonator recover on the next
+        // clean frame; the 0.0 we return for this frame is still dropped by the
+        // caller's `is_finite()` check, so no spurious sample enters history.
+        if !output.is_finite() {
+            *state = IirState::default();
+            return 0.0;
+        }
+
        state.x2 = state.x1;
        state.x1 = input;
        state.y2 = state.y1;
@@ -396,6 +410,75 @@ mod tests {
        assert!((0.0..=2.0).contains(&fused), "weighted average must be in-range: {fused}");
    }

+    /// ADR-158 §A1 bug-catching test: a single non-finite residual must NOT
+    /// permanently poison the IIR filter state.
+    ///
+    /// The resonator recurrence stores `y[n]` into the filter state. Before the
+    /// fix, one NaN/inf residual produced a NaN `output`, the `extract()`
+    /// finite-guard dropped that frame from history — but the NaN was already
+    /// latched into `state.y1`/`y2`, so every subsequent output stayed NaN, the
+    /// finite-guard rejected it too, and the history buffer never refilled.
+    /// Breathing extraction was then dead until `reset()`. A control run on the
+    /// same clean signal yields 15 BPM (0.25 Hz); after a leading NaN frame the
+    /// OLD code returned `None` with `history_len() == 0` forever. This test
+    /// asserts recovery (FAILS on the old code, verified by reverting the
+    /// `bandpass_filter` self-heal).
+    #[test]
+    fn nan_frame_does_not_permanently_poison_filter() {
+        let sr = 10.0;
+        let feed_clean = |ext: &mut BreathingExtractor| {
+            let mut last = None;
+            for i in 0..600 {
+                let t = i as f64 / sr;
+                let s = (2.0 * std::f64::consts::PI * 0.25 * t).sin();
+                last = ext.extract(&[s], &[1.0]);
+            }
+            last
+        };
+
+        // Control: clean signal accumulates history and detects ~15 BPM.
+        let mut control = BreathingExtractor::new(1, sr, 60.0);
+        let control_res = feed_clean(&mut control);
+        assert!(control.history_len() > 0);
+        assert!(control_res.is_some(), "control clean run must produce an estimate");
+
+        // A leading NaN frame must not kill the extractor.
+        let mut ext = BreathingExtractor::new(1, sr, 60.0);
+        ext.extract(&[f64::NAN], &[1.0]);
+        let res = feed_clean(&mut ext);
+        assert!(
+            ext.history_len() > 0,
+            "extractor must recover and refill history after a NaN frame (got {})",
+            ext.history_len()
+        );
+        assert!(res.is_some(), "extractor must recover an estimate after a NaN frame");
+    }
+
+    /// ADR-158 §A1: a mid-stream `inf` must not freeze the history buffer.
+    #[test]
+    fn inf_mid_stream_does_not_freeze_history() {
+        let sr = 10.0;
+        let mut ext = BreathingExtractor::new(1, sr, 60.0);
+        let clean = |ext: &mut BreathingExtractor, count: usize| {
+            for i in 0..count {
+                let t = i as f64 / sr;
+                let s = (2.0 * std::f64::consts::PI * 0.25 * t).sin();
+                ext.extract(&[s], &[1.0]);
+            }
+        };
+        clean(&mut ext, 300);
+        let before = ext.history_len();
+        assert!(before > 0);
+        ext.extract(&[f64::INFINITY], &[1.0]); // poison mid-stream
+        clean(&mut ext, 600);
+        assert!(
+            ext.history_len() > before,
+            "history must keep growing after an inf frame (before={}, after={})",
+            before,
+            ext.history_len()
+        );
+    }
+
    /// ADR-157 §A3 bug-catching test. Divergence needs the pole magnitude
    /// `|r| >= 1`, i.e. `bw >= 4`. At `fs = 0.5` Hz with the band widened to
    /// 0.1-0.9 Hz, `bw = 2*pi*(0.9-0.1)/0.5 = 10.05`, so the OLD pole radius
@@ -32,6 +32,15 @@ impl Default for IirState {
    }
 }

+/// Lowest physiologically plausible heart rate, in BPM. Estimates below this
+/// (e.g. a lock onto a breathing harmonic, which the firmware #987 fix also
+/// guards against) are rejected rather than emitted as a confident vital — a
+/// false low HR is a safety problem. Value-identical to the prior literal.
+const HR_PLAUSIBLE_MIN_BPM: f64 = 40.0;
+/// Highest physiologically plausible heart rate, in BPM. Estimates above this
+/// are rejected. Value-identical to the prior literal.
+const HR_PLAUSIBLE_MAX_BPM: f64 = 180.0;
+
 /// Heart rate extractor using bandpass filtering and autocorrelation
 /// peak detection.
 pub struct HeartRateExtractor {
@@ -140,8 +149,11 @@ impl HeartRateExtractor {
        let frequency_hz = self.sample_rate / period_samples as f64;
        let bpm = frequency_hz * 60.0;

-        // Validate BPM is in physiological range (40-180 BPM)
-        if !(40.0..=180.0).contains(&bpm) {
+        // Validate BPM is in the physiological plausibility band. An estimate
+        // outside [HR_PLAUSIBLE_MIN_BPM, HR_PLAUSIBLE_MAX_BPM] is rejected
+        // rather than emitted, so an out-of-band autocorrelation lock can never
+        // surface as a confident heart rate.
+        if !(HR_PLAUSIBLE_MIN_BPM..=HR_PLAUSIBLE_MAX_BPM).contains(&bpm) {
            return None;
        }

@@ -191,6 +203,20 @@ impl HeartRateExtractor {
        let output =
            (1.0 - r) * (input - state.x2) + 2.0 * r * cos_w0 * state.y1 - r * r * state.y2;

+        // Self-healing non-finite guard (ADR-158 §A1). A single non-finite
+        // sample — a NaN/inf residual from a corrupt CSI frame, or a transient
+        // overflow — would otherwise be written into `y1`/`y2` and poison the
+        // resonator recurrence *permanently*: every later output stays NaN, the
+        // `extract()` finite-check drops it, `acf0` never recomputes on fresh
+        // data, and heart-rate extraction is dead until `reset()`. Resetting the
+        // filter state here lets the resonator recover on the next clean frame;
+        // the 0.0 returned for this frame is still dropped by the caller's
+        // `is_finite()` check, so no spurious sample enters history.
+        if !output.is_finite() {
+            *state = IirState::default();
+            return 0.0;
+        }
+
        state.x2 = state.x1;
        state.x1 = input;
        state.y2 = state.y1;
@@ -420,6 +446,92 @@ mod tests {
        assert_eq!(ext.n_subcarriers, 56);
    }

+    /// Pin the physiological plausibility band to its documented values. If a
+    /// future edit widens these, an implausible HR could be emitted as a
+    /// confident vital — this characterization test forces that to be a
+    /// deliberate, reviewed change.
+    #[test]
+    fn plausibility_band_constants_pinned() {
+        assert!((HR_PLAUSIBLE_MIN_BPM - 40.0).abs() < f64::EPSILON);
+        assert!((HR_PLAUSIBLE_MAX_BPM - 180.0).abs() < f64::EPSILON);
+    }
+
+    /// ADR-158 §A1 bug-catching test: a single non-finite residual must NOT
+    /// permanently poison the IIR filter state.
+    ///
+    /// The cardiac resonator latches `y[n]` into `state.y1`/`y2`. Before the
+    /// fix, one NaN/inf residual produced a NaN `output` that was stored into
+    /// the state; the `extract()` finite-guard dropped that frame from history,
+    /// but every subsequent output stayed NaN, so the history buffer never
+    /// refilled and HR extraction was dead until `reset()`. After a leading NaN
+    /// frame, the OLD code returned `None` with `history_len() == 0` forever.
+    /// This asserts recovery (FAILS on the old code).
+    #[test]
+    fn nan_frame_does_not_permanently_poison_filter() {
+        let sr = 50.0;
+        let feed_clean = |ext: &mut HeartRateExtractor| {
+            let mut last = None;
+            for i in 0..1200 {
+                let t = i as f64 / sr;
+                let base = (2.0 * std::f64::consts::PI * 1.2 * t).sin();
+                let r = vec![base * 0.1, base * 0.08, base * 0.12, base * 0.09];
+                last = ext.extract(&r, &[0.0, 0.01, 0.02, 0.03]);
+            }
+            last
+        };
+
+        let mut control = HeartRateExtractor::new(4, sr, 20.0);
+        feed_clean(&mut control);
+        assert!(control.history_len() > 0, "control clean run must accumulate history");
+
+        let mut ext = HeartRateExtractor::new(4, sr, 20.0);
+        ext.extract(&[f64::NAN, 0.1, 0.1, 0.1], &[0.0, 0.01, 0.02, 0.03]);
+        feed_clean(&mut ext);
+        assert!(
+            ext.history_len() > 0,
+            "HR extractor must recover and refill history after a NaN frame (got {})",
+            ext.history_len()
+        );
+    }
+
+    /// Safety negative: pure broadband noise (no cardiac component) must NOT be
+    /// reported as a clinically `Valid` heart rate. A false "HR = 72 bpm" on
+    /// noise is a safety problem (false reassurance / false alert). The
+    /// extractor may still emit a low-confidence guess, but its status must be
+    /// `Degraded`/`Unreliable`, never `Valid`. Mirrors the honest-negative
+    /// requirement in the review brief.
+    #[test]
+    fn pure_noise_is_never_reported_valid() {
+        let mut seed: u64 = 0x1234_5678;
+        let mut rng = || {
+            seed = seed
+                .wrapping_mul(6_364_136_223_846_793_005)
+                .wrapping_add(1_442_695_040_888_963_407);
+            ((seed >> 33) as f64 / (1u64 << 31) as f64) - 1.0
+        };
+        let mut ext = HeartRateExtractor::new(8, 50.0, 20.0);
+        let mut last = None;
+        for _ in 0..1500 {
+            let r: Vec<f64> = (0..8).map(|_| rng()).collect();
+            let p: Vec<f64> = (0..8).map(|_| rng()).collect();
+            last = ext.extract(&r, &p);
+        }
+        if let Some(est) = last {
+            assert_ne!(
+                est.status,
+                VitalStatus::Valid,
+                "pure noise must not yield a clinically Valid HR (bpm={}, conf={})",
+                est.value_bpm,
+                est.confidence
+            );
+            assert!(
+                est.confidence < 0.6,
+                "noise HR confidence must stay below the Valid cutoff: {}",
+                est.confidence
+            );
+        }
+    }
+
    /// ADR-157 §A3 bug-catching test.
    ///
    /// Divergence needs the pole *magnitude* `|r| >= 1`, i.e. `bw >= 4`. With