Compare commits

..

2 Commits

Author SHA1 Message Date
rUv 865f9dee77 perf(beyond-sota): ADR-154 M2 — FFT planner hoist (1.84x, bit-identical) + 3 honest perf nulls + boundary tests (#1055)
* perf(signal): hoist FFT planner across subcarriers (ADR-154 §7.4 #20)

compute_multi_subcarrier_spectrogram called compute_spectrogram once per
subcarrier, and each call built a fresh FftPlanner + re-planned the same
length-window_size FFT. Hoist the plan + window out of the per-subcarrier
loop via a new compute_spectrogram_with_plan core that takes a pre-planned
Arc<dyn Fft> and pre-built window. compute_spectrogram delegates to it
(unchanged behaviour); the multi-subcarrier path plans once and reuses.

MEASURED-HOT (dsp_perf_bench, this box): at 56 subcarriers, window 128,
fresh-planner-per-subcarrier 467.88 µs -> hoisted-plan 254.75 µs = 1.84x;
window 256: 627.27 µs -> 448.39 µs = 1.40x. Plan-forward cost alone is
~1.86 µs (w128), x56 subcarriers ~= the removed delta.

Output is bit-identical: multi_subcarrier_hoisted_plan_bit_identical
compares f64::to_bits of every spectrogram value + freq/time resolution
against the per-call fresh-planner path across all 4 window functions x
{power,magnitude} on a 56-subcarrier matrix. The numeric STFT body is the
old loop verbatim; only plan/window construction is lifted.

Co-Authored-By: claude-flow <ruv@ruv.net>

* test(signal): boundary/tolerance tests for ADR-154 §7.4 #14 #16 #19

Three "+ test" backlog gaps closed — pure additions, no behaviour change
(phase_align refactor is internal: estimate_phase_offsets still returns the
identical offset vector; a counted core is split out only to observe the
iteration count).

#14 cir.rs fft_operator — fft_operator_within_tolerance_of_dense_canonical56:
  the opt-in FFT Φ/Φᴴ path changes the witness hash, so pin it numerically
  CLOSE to the dense path (not silently divergent). Asserts the full Cir
  output (every tap within 1e-2·dominant, dominant idx/ratio, active_tap_count,
  ranging_valid, rms_delay_spread) on the production canonical-56 config
  across τ ∈ {20,50,90} ns. Extends the existing HT20/single-τ test.

#16 phase_align.rs — refinement_terminates_at_iteration_cap_when_not_converging:
  forces non-convergence (tolerance=0.0, unreachable) and asserts the loop
  runs exactly max_iterations then returns — proving the cap, not convergence,
  bounds the loop (no infinite spin). Companion
  refinement_converges_before_cap_on_easy_input proves the cap is an upper
  bound, not the only exit.

#19 csi_ratio.rs — ratio_finite_at_and_below_1e_12_epsilon: the module
  implements the CSI ratio as the conjugate product H_i·conj(H_j) (no
  division), so it is finite even at/below the 1e-12 magnitude boundary a
  naive H_i/H_j division would need an epsilon to guard. Pins finiteness +
  bit-exact conjugate product at the boundary (zero target → zero, never
  inf/NaN), through the amplitude/phase extraction.

cargo test -p wifi-densepose-signal --no-default-features --lib: 447 passed,
0 failed; --features cir --lib: 447 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-154): record Milestone-2 P2-perf verdicts + boundary tests (§7.4)

§7.4: #20 MEASURED-HOT (1.40–1.84× spectrogram FFT-plan hoist, bit-identical);
#5/#6/#7 MEASURED-NULL (benched, not hot, left as-is — sub-µs / stack-only /
alloc-once); #8 MEASUREMENT-ONLY (per-call 56×56 eigh cost; eigenvalue/BLAS
backend un-buildable on this Windows host, number deferred to a BLAS box, NOT
fabricated; also corrects the finding — extract_perturbation reuses cached
modes, the recompute is in estimate_occupancy). #14/#16/#19 RESOLVED (tolerance
/ convergence-cap / epsilon-boundary tests). Updated §7.4 intro + Horizon-ledger
(deferred count 41→36). CHANGELOG [Unreleased] entry added.

Co-Authored-By: claude-flow <ruv@ruv.net>

* bench(signal): committed P2 bench-first benches (ADR-154 §7.4 #5/#6/#7/#8/#20)

New dsp_perf_bench.rs backs every Milestone-2 perf verdict with a committed
criterion bench — no speedup claimed without a before/after number here, and
a benched NULL is the proof a micro-opt was unnecessary (the §5.x "already
amortized" pattern). Registered in Cargo.toml [[bench]].

MEASURED (this box, criterion medians):
  #20 spectrogram_multi_subcarrier (fresh vs hoisted plan):
      MEASURED-HOT — 467.88→254.75 µs (1.84x) @ sc56/w128; 627.27→448.39 µs
      (1.40x) @ sc56/w256. Optimized in the prior commit.
  #5 multistatic_attention/weights: MEASURED-NULL — 181 ns (2 nodes) ..
      848 ns (8 nodes); sub-µs, no hot-path alloc — left as-is.
  #6 tomography_reconstruct/solve: MEASURED-NULL — 47.5 µs (16 links) /
      60.4 µs (32 links) for a full 50-iter ISTA solve; the 2 per-solve voxel
      buffers (~4 KB) are negligible vs O(iters·links·voxels) compute, and
      reconstruct(&self) reuses them across iterations already — left as-is.
  #7 pose_kalman_update/cycles: MEASURED-NULL — 150 ns (17 kpts) / 2.82 µs
      (170); the Kalman "gain matrices" are fixed-size STACK arrays
      ([[f32;3];6]), zero heap — nothing to reuse — left as-is.
  #8 field_model_occupancy (eigenvalue feature): MEASUREMENT-ONLY — quantifies
      the per-call n×n eigendecomposition cost; incremental SVD is a sized
      future project, not attempted (number recorded in ADR-154 §7.4).

Reproduce:
  cargo bench -p wifi-densepose-signal --no-default-features --bench dsp_perf_bench
  cargo bench -p wifi-densepose-signal --bench dsp_perf_bench  # adds #8

Cargo.lock: dev-dep (criterion/clap) graph + crate version bumps from the
build; no runtime-dependency change.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-13 17:34:37 -04:00
rUv cf2a85db66 feat(beyond-sota): ADR-157 M1 — constant-time HMAC compare + MEASURED 5.57x native wlanapi scan (#1054)
* fix(hardware): constant-time HMAC sync-beacon tag compare (ADR-157 §B4)

AuthenticatedBeacon::verify compared the 8-byte HMAC-SHA256 tag with
`self.hmac_tag == expected`, which short-circuits on the first differing
byte and leaks, via verification latency, how many leading bytes a forged
tag matched — a byte-by-byte tag-recovery oracle (~256·N trials vs 256^N).

Replace with a hand-rolled branch-free `constant_time_tag_eq`: XOR-accumulate
every byte difference into a single u8 with no early exit, compare to zero
once. `#[inline(never)]` + `core::hint::black_box(diff)` resist the optimizer
reintroducing a short-circuit or a non-constant-time memcmp; length mismatch
returns false without inspecting contents. No new dependency — ADR-157 had
deferred this only to avoid the `subtle` crate; a fixed 8-byte compare needs
none.

Test (hard gate): tag_compare_is_constant_time_shape — equal / first-differ /
last-differ / all-differ / length-mismatch + end-to-end verify() last-byte
tamper. Proven to fail on a last-byte-skipping constant-time bug. A coarse
timing smoke check (tag_compare_timing_invariance_smoke) is #[ignore]d to
avoid CI flakiness. Grade MEASURED (constant-time construction).

ADR-157 §8 §B4 → RESOLVED. wifi-densepose-hardware: 164 passed / 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(wifiscan): MEASURE native wlanapi.dll vs netsh throughput (ADR-157 §5 #4)

ADR-157 §5 #4 recorded the native wlanapi.dll multi-BSSID fast path as
"asserted but NOT implemented; live scanner is the ~2 Hz netsh shim". Audit
finding: that status is stale — wlanapi_native::scan_native already implements
the real WlanOpenHandle → WlanEnumInterfaces → WlanGetNetworkBssList →
WlanFreeMemory/WlanCloseHandle FFI (handle cleanup on all exits, length-bounded
buffer walks, #[cfg(windows)] with typed Unsupported off-Windows), and
WlanApiScanner::scan_instrumented already wires it native-first with a netsh
fallback. The missing piece was an honest MEASUREMENT.

Add benchmark_backend(backend, window): drives one specific backend over a
fixed wall-clock window so netsh is timed independently (the existing
benchmark() picks native-first and so never measures netsh on a box where
native works). Returns None for an unavailable native path (honest negative,
not a fabricated number).

MEASURED on this box (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13), 10 s window:
  native 21.42 Hz vs netsh 3.84 Hz = 5.57× (mean 5.0 BSSIDs/scan each).
  native-only run: 18.0 Hz. 50/50 back-to-back native scans, no handle leak.
A real positive result — NOT a fabricated 10×. Achieved 21.4 Hz is in the
asserted >2 Hz regime, below the asserted 10–20 Hz upper bound.

Tests (live-WLAN, #[ignore] for CI, RUN here):
  measure_native_vs_netsh_throughput, native_scans_dont_leak_handles,
  measure_native_scan_rate. Non-ignored pin native_scan_runs_real_ffi_on_windows
  (pre-existing) stays green. wifi-densepose-wifiscan: 94 passed / 0 failed.

ADR-157 §5 #4 + §8 → MEASURED (was ACCEPTED-FUTURE / CLAIMED-unmeasured).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-13 16:32:34 -04:00
12 changed files with 1021 additions and 34 deletions
+7
View File
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Security
- **ADR-157 Milestone-1 B4 - constant-time HMAC sync-beacon tag compare (`wifi-densepose-hardware`).** `AuthenticatedBeacon::verify` compared the 8-byte HMAC-SHA256 tag with `self.hmac_tag == expected`, which short-circuits on the first differing byte and leaks, through verification latency, how many leading bytes an attacker's forged tag matched - a byte-by-byte tag-recovery oracle (~256*N trials instead of 256^N). Replaced with a hand-rolled branch-free `constant_time_tag_eq` (XOR-accumulate every byte difference into a single `u8`, no early exit, `#[inline(never)]` + `core::hint::black_box` to stop the optimizer reintroducing a short-circuit or a non-constant-time `memcmp`). **No new dependency** - ADR-157 had deferred this only to avoid adding the `subtle` crate; a fixed 8-byte compare needs none. Grade MEASURED (constant-time *construction*; micro-timing on a noisy host is a smoke check only, gated `#[ignore]`). Pinned by `tag_compare_is_constant_time_shape` (equal/first-differ/last-differ/all-differ/length-mismatch + an end-to-end `verify()` last-byte tamper), proven to fail on a last-byte-skipping constant-time bug. ADR-157 §8 B4 -> RESOLVED.
- **ADR-080 open HIGH findings closed on the Rust `wifi-densepose-sensing-server` boundary (ADR-164 G11).** The QE sweep's three HIGH findings — XFF-spoofing bypass, leaked stack traces, JWT-in-URL (CWE-598) — were logged against the Python v1 API and never re-verified against the shipped Rust sensing-server; the HOMECORE/M7 sweep (ADR-161) covered `homecore-server`, not this crate.
- **#2 leaked internal errors (the one live exposure) — FIXED.** Six handlers in `main.rs` serialized the internal error `Display` straight into the JSON response body: `edge_registry_endpoint` returned a panicked `spawn_blocking` `JoinError` (`"task … panicked"`) in a `500`, plus the raw upstream error in a `503`; `delete_model`/`delete_recording`/`start_recording` returned `std::io::Error` strings (OS detail / path); `calibration_start`/`calibration_stop` returned the `FieldModel` error chain. New `error_response` module logs the full detail **server-side only** (with a correlation id) and returns a generic body (`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file paths, no Debug chain. 5 module tests (a leak-substring guard proven to fail on the reverted old body) + the existing handler suite.
- **#1 XFF-spoofing bypass — VERIFIED ABSENT, regression-pinned.** The sensing-server has no XFF-trusting control to bypass: there is no IP-based rate-limiter or IP-allowlist, and neither `bearer_auth` (token-only) nor `host_validation` (Host-header only) reads `X-Forwarded-For`/`X-Forwarded-Host` (no `forwarded`/`peer_addr`/`client_ip` anywhere in the crate). Added regression tests proving a spoofed `X-Forwarded-For` never flips an auth decision and a spoofed `X-Forwarded-Host` never bypasses the Host allowlist.
@@ -26,9 +27,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Published HuggingFace model was unloadable — RVF format mismatch (#894).** The `ProgressiveLoader` rejected the published `ruvnet/wifi-densepose-pretrained` model with the opaque `invalid magic at offset 0: expected 0x52564653 (RVFS), got 0x77455735`, then silently fell back to signal heuristics (the "10 persons for 1" garbage reporters saw). The HF repo ships `model.safetensors`, `model-q{2,4,8}.bin` (magic `0x77455735` = "5WEw"), and `model.rvf.jsonl` — none carry the binary-RVF magic. New `model_format` module **auto-detects** RVFS / safetensors / HF-quant-bin / JSONL by magic+name, returns a **typed actionable** `ModelLoadError` (lists accepted formats + the one-command convert path — never the opaque magic), and **converts** `model.safetensors` / `model.rvf.jsonl` → RVF in-memory so the published full-precision model now loads via `--model`. A `--convert-model <in> --convert-out <out>` CLI subcommand gives a one-command offline path; the silent heuristics fallback is now a loud, actionable error. **Honest scope:** the converter wires the format/load path (safetensors F32 tensors → RVF weight segment, manifest written, Layer A/B/C all succeed, weights round-trip) — it does **not** claim end-to-end pose accuracy, since the HF pose-decoder architecture differs from this crate's inference head (still data-gated in #894). Quantized `.bin` blobs are rejected with a typed error pointing at the safetensors path. Pinned by `safetensors_converts_and_loads` + `hf_quant_classifies_to_actionable_error` (both fail on the old opaque-magic path).
### Changed
- **ADR-157 Milestone-1 §5 #4 - native `wlanapi.dll` multi-BSSID throughput MEASURED on real hardware (`wifi-densepose-wifiscan`).** The ADR's prior status ("asserted but NOT implemented; live scanner is the ~2 Hz netsh shim") is now stale: `wlanapi_native.rs` already implements the real `WlanOpenHandle` -> `WlanEnumInterfaces` -> `WlanGetNetworkBssList` -> `WlanFreeMemory`/`WlanCloseHandle` FFI and `WlanApiScanner` already wires it native-first with a netsh fallback. This milestone **measured it on this box** (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13): a new `benchmark_backend(backend, window)` drives each backend over the same fixed 10 s wall-clock window so netsh is timed independently (the prior `benchmark()` picked native-first and never measured netsh on a Windows box where native works). **MEASURED: native 21.42 Hz vs netsh 3.84 Hz = 5.57x** (mean 5.0 BSSIDs/scan, both paths); a separate native-only run measured 18.0 Hz. Native genuinely beats netsh - this is a real positive result, not a fabricated "10x". 50 back-to-back native scans completed 50/50 with no handle leak/degradation. Live-WLAN tests (`measure_native_vs_netsh_throughput`, `native_scans_dont_leak_handles`, `measure_native_scan_rate`) are `#[ignore]` for CI but were RUN here; `native_scan_runs_real_ffi_on_windows` is a non-ignored schema-valid pin. ADR-157 §5 #4 + §8 -> MEASURED (was ACCEPTED-FUTURE / CLAIMED-unmeasured).
- **Mesh partition risk now demotes the privacy class and is witnessed (ADR-032).** The dynamic min-cut guard's `at_risk` signal was advisory-only (it fed the recalibration advisor). It now also contributes to the ADR-141 privacy demotion alongside fusion- and array-level contradictions: a mesh close to partitioning makes the fused belief less trustworthy, so the cycle emits at a more restricted class (monotonic — information only removed). Because `effective_class` feeds the BLAKE3 witness, a fragmenting array now shifts the witness — partition risk is auditable, not just logged. The mesh computation moved ahead of the demotion step in `process_cycle`; new `mesh_guard_mut()` exposes risk-threshold tuning. Test proves a forced-risk 3-node cycle demotes PrivateHome Anonymous→Restricted and shifts the witness vs a clean *same-topology* baseline (the only delta between the two cycles is the forced risk).
### Added
- **ADR-154 Milestone-2 — bench-first P2 perf subset + missing boundary tests (`wifi-densepose-signal`, §7.4 #5/#6/#7/#8/#14/#16/#19/#20).** PROOF discipline (ADR-154 §0): every perf item was **benched before being touched** (new committed `benches/dsp_perf_bench.rs`, criterion, this Windows box); only the one item the bench proved hot was optimized, the rest are committed MEASURED-NULLs — a benched null is the proof the micro-opt was unnecessary, the §5.1 "already amortized" pattern. Every behaviour-changing edit is pinned bit-identical (or documented-tolerance). Signal crate lib `--no-default-features`: **447 passed, 0 failed, 1 ignored**; `--features cir`: **447 passed, 0 failed**.
- **#20 MEASURED-HOT, optimized (bit-identical).** `compute_multi_subcarrier_spectrogram` re-planned a fresh `FftPlanner` for *every* subcarrier (via `compute_spectrogram`). Hoisted the plan + window out of the per-subcarrier loop (new `compute_spectrogram_with_plan` core; `compute_spectrogram` delegates, unchanged). **56-subcarrier: 467.88 µs → 254.75 µs = 1.84×** (window 128); **627.27 µs → 448.39 µs = 1.40×** (window 256). Bit-identical via `multi_subcarrier_hoisted_plan_bit_identical` (`f64::to_bits` of every value across all 4 window functions × {power,magnitude}). The §7.4 intro's predicted "most likely real win" — confirmed.
- **#5 / #6 / #7 MEASURED-NULL, left as-is.** `node_attention_weights` 181 ns (2 nodes)…848 ns (8) — sub-µs, no hot-path alloc. `tomography reconstruct` (full 50-iter ISTA, 256 voxels) 47.5 µs (16 links) / 60.4 µs (32) — the 2 voxel buffers are already alloc-once + `.fill`-reused, negligible vs O(iters·links·voxels). `pose_tracker` Kalman cycle 150 ns (17 keypoints) / 2.82 µs (170) — the "gain matrices" are fixed-size **stack** arrays, zero heap to reuse. No rewrite shipped; the committed benches prove each is not hot.
- **#8 MEASUREMENT-ONLY, BLAS-gated (number deferred, not fabricated).** Correction to the finding: `extract_perturbation` does **not** recompute the SVD (it projects against cached `finalize_calibration` modes); the real per-call eigendecomposition is the `eigenvalue`-feature `estimate_occupancy` (`cov.eigh()` on a 56×56 covariance). The `eig` bench is committed but `openblas-src` won't build on this Windows host ("Non-vcpkg builds are not supported on Windows" — the exact reason the project gate runs `--no-default-features`), so its µs cost must come from a Linux/BLAS box. Recorded, not estimated. Incremental SVD stays a sized future item.
- **#14 / #16 / #19 RESOLVED — tests added (no behaviour change).** `fft_operator_within_tolerance_of_dense_canonical56` pins the full `Cir` output of the opt-in FFT path within a documented relative tolerance of the dense path on the production canonical-56 config (τ ∈ {20,50,90} ns) — it changes the witness hash, so it must be provably *close*, not silently divergent. `refinement_terminates_at_iteration_cap_when_not_converging` (+ convergent companion) proves the LO-offset refinement terminates at exactly `max_iterations` on a non-converging input (cap, not convergence, bounds the loop; internal `…_counted` refactor returns the identical offsets). `ratio_finite_at_and_below_1e_12_epsilon` pins that the conjugate-product CSI-ratio (no division → no `1e-12` divide-guard needed) is finite + bit-exact at/below the epsilon boundary and at exact zero (where a naive `H_i/H_j` ratio is ±inf/NaN).
- **ADR-156 §8 Milestone-1: RaBitQ Pass-2 randomized rotation + multi-bit experiment — IMPLEMENTED & MEASURED (RESOLVED-PARTIAL).** Closes the §8 "Multi-bit / Extended RaBitQ" backlog item. New `wifi-densepose-ruvector/src/rotation.rs`: a deterministic randomized orthogonal rotation `R = H·D`**Fast Hadamard Transform** (`O(d log d)`, in-place, `1/√m`-normalized so norm-preserving) + seeded ±1 sign flips (SplitMix64 from a stored `u64` seed; identical at index + query time). Chosen over a dense `d×d` matrix (`O(d²)`, infeasible at the 65,535-d the wire format provisions for); pads to `next_pow2(d)`. Additive, backward-compatible API (`Sketch::from_embedding_rotated`, `SketchBank::with_rotation` + `insert_embedding`/`topk_embedding`/`novelty_embedding`); Pass-1 and the wire format are byte-for-byte unchanged. New `coverage.rs` single-source-of-truth top-K coverage harness (anisotropic planted-cluster fixture, cosine ground truth) backs both a `#[test]` report and the `sketch_bench` coverage table. **MEASURED (dim=128 N=2048 K=8, 64 clusters, noise=0.35, 128 queries, seeded):** at the strict `candidate_k=K` bar, rotation lifts coverage **36.13% → 46.39%**; Pass-2 reaches the **ADR-084 ≥90% bar at candidate_k=24 (~3× over-fetch)**; multi-bit Pass-3 reaches 54%/67%/74% at 2/3/4-bit (strict bar). **Honest verdict: neither rotation nor ≤4-bit multi-bit clears the strict-K 90% bar on this distribution — the bar is met only via the over-fetch "candidate set" pattern ADR-084 specifies.** No benchmark was tuned to manufacture a pass; the strict-bar gap is documented (ADR-156 §10, ADR-084 "Pass 2" section). +19 tests in the crate (100→119), workspace **3,225 / 0 failed**, Python proof VERDICT: PASS (`f8e76f21…`, unchanged — sketch is not on the proof's signal path).
- **Beyond-SOTA `v2/crates/` sweep (ADR-154158) + full stub-implementation push — every claim MEASURED or graded.** A 5-milestone review/optimize/secure/benchmark/validate sweep, then a verified-audit-driven push to replace every production stub with real, tested logic (no labels, no placeholders). Each fix is pinned by a test that fails on the old code; every number ships with a reproduce command. Workspace: **3,122 tests / 0 failed** (`cargo test --workspace --no-default-features`), Python proof **VERDICT: PASS** (bit-exact).
- **ADR-154 Signal/DSP** — revived a dead ADR-134 CIR coherence gate (canonical-56 vs ht20 mismatch meant it never ran in production: 8/8 Err → 8/8 Ok); NaN-bypass + window div0 guards; PSD FFT-planner cache (**2.03.1×**) + honored DTW band (**2.44.1×**).
+12 -10
View File
@@ -199,7 +199,9 @@ The §2–§5 fixes are **ACCEPTED and committed**: dead CIR gate fixed, NaN byp
Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent, **P2** perf, **P3** clarity/style.
**Milestone-1 update (2026-06-13):** the **four P1 backlog items** (#1, #9, #10, #13) are now cleared — #1 and #10 **RESOLVED (MEASURED)**, #9 and #13 **RESOLVED-PARTIAL (DATA-GATED:** de-magicked + boundary-tested, operating values unchanged**)**. ~41 P2/P3 items remain deferred. Each fix is pinned by a regression test that fails on the old behaviour (commits `fd32f094a`, `4a9f2bcf4`, `d672fa602`, `5193f6369`); workspace `--no-default-features` green, Python proof unchanged (bit-exact).
**Milestone-1 update (2026-06-13):** the **four P1 backlog items** (#1, #9, #10, #13) are now cleared — #1 and #10 **RESOLVED (MEASURED)**, #9 and #13 **RESOLVED-PARTIAL (DATA-GATED:** de-magicked + boundary-tested, operating values unchanged**)**. Each fix is pinned by a regression test that fails on the old behaviour (commits `fd32f094a`, `4a9f2bcf4`, `d672fa602`, `5193f6369`); workspace `--no-default-features` green, Python proof unchanged (bit-exact).
**Milestone-2 update (2026-06-13):** the **bench-first P2 perf subset** (#5, #6, #7, #8, #20) and the **three missing boundary tests** (#14, #16, #19) are now cleared — ~36 P2/P3 items remain deferred. PROOF discipline (§0): every perf item was **benched before being touched** — committed in `benches/dsp_perf_bench.rs` (criterion, this Windows box). Only **#20** proved hot and was optimized; **#5/#6/#7** are committed **MEASURED-NULLs** (benched, not hot, left as-is for clarity — exactly the §5.1 "already amortized" pattern); **#8** is **MEASUREMENT-ONLY** but its `eigenvalue`/BLAS backend won't build on this Windows host, so its µs cost must come from a Linux/BLAS box (recorded, not fabricated). Commits `e839fa8f1` (#20 fix), `02e5dd13a` (#14/#16/#19 tests), `aad9464f0` (benches). Workspace `--no-default-features` green; Python proof unchanged (#20 is bit-identical, off the proof path).
| # | Module | Finding | Pri | Why deferred |
|---|--------|---------|-----|--------------|
@@ -207,25 +209,25 @@ Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent
| 2 | calibration.rs ~311 | `subtract_in_place` had a vacuous `if active_input {ki} else {ki}` branch implying a full-FFT→bin remap that didn't exist | P3 | **Resolved here** (branch removed, sequential-convention documented to match the sibling `extract_first_stream`). Listed for visibility — behavior unchanged. |
| 3 | spectrogram.rs / bvp.rs | FFT planner built once-per-call (already amortized across frames) | P2 | Marginal vs the per-frame PSD site; cache if these become hot. |
| 4 | features.rs ~347 | Doppler FFT planner planned once per call, reused across subcarriers | P2 | Already amortized within the call. |
| 5 | multistatic.rs | `node_attention_weights` recomputes consensus/softmax each call; no SIMD | P2 | Needs a bench before touching; not obviously hot. |
| 6 | tomography.rs | ISTA L1 solver re-allocates voxel buffers per solve | P2 | Bench first. |
| 7 | pose_tracker.rs | Kalman gain matrices reallocated per update | P2 | Bench first. |
| 8 | field_model.rs | SVD recomputed on every perturbation extract | P2 | Incremental SVD is a real project, not a micro-fix. |
| 5 | multistatic.rs | `node_attention_weights` recomputes consensus/softmax each call; no SIMD | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** `multistatic_attention/weights`: **181 ns** (2 nodes) … **848 ns** (8 nodes) @ 56 subcarriers — sub-µs, no hot-path allocation. A precompute/SIMD rewrite buys nothing measurable at the realistic 28 node fan-in; the cosine/softmax cost is dwarfed by the surrounding fusion + per-frame FFT. Bench `multistatic_attention` in `dsp_perf_bench.rs`. |
| 6 | tomography.rs | ISTA L1 solver re-allocates voxel buffers per solve | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** A full 50-iteration `reconstruct` (256 voxels): **47.5 µs** (16 links) / **60.4 µs** (32 links). The two voxel buffers (`x`, `gradient`; ~4 KB) are already allocated *once* per `reconstruct()` and `.fill`-reused across iterations — the per-solve alloc is a negligible fraction of the O(iters·links·voxels) inner product. Reusing scratch across *calls* would force `reconstruct(&self)``&mut self` (API break) for no measurable gain. Bench `tomography_reconstruct`. |
| 7 | pose_tracker.rs | Kalman gain matrices reallocated per update | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** A Kalman predict+update cycle: **150 ns** (17 keypoints) / **2.82 µs** (170). The "gain matrices" (`s:[f32;3]`, `k:[[f32;3];6]`) are fixed-size **stack** arrays, *not* heap — there is no per-update allocation to reuse; the compiler keeps them in registers/stack. Bench `pose_kalman_update`. |
| 8 | field_model.rs | SVD recomputed on every perturbation extract | P2 | **MEASUREMENT-ONLY (`aad9464f0`) — BLAS-gated, not measurable on this host.** Correction: `extract_perturbation` does **not** recompute the SVD — it projects against the cached `modes` from `finalize_calibration`. The real per-call eigendecomposition is in the `eigenvalue`-feature `estimate_occupancy` (`cov.eigh()` on a 56×56 covariance, an O(n³)≈175k-flop symmetric eigensolve + O(n²·frames) covariance build, run per call). The bench (`dsp_perf_bench`'s `eig` module) is committed, but `openblas-src` **fails to build on this Windows box** ("Non-vcpkg builds are not supported on Windows" — the very reason the project gate runs `--no-default-features`), so a measured µs number must come from a Linux/BLAS host; **not estimated/fabricated here.** Incremental SVD remains a sized future project, not a micro-fix. |
| 9 | coherence.rs / coherence_gate.rs | Z-score thresholds are magic constants, untested at boundaries | P1 | **RESOLVED-PARTIAL (`5193f6369`) — DATA-GATED.** De-magicked `classify_drift` (`DRIFT_STABLE_SCORE=0.85`, `DRIFT_STEP_CHANGE_MAX_STALE=10`) and the `coherence_gate.rs` defaults (`DEFAULT_ACCEPT_THRESHOLD`/`…REJECT…`/`…MAX_STALE_FRAMES`/`…PREDICT_ONLY_NOISE`) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (`classify_drift_*_boundary`) + `*_consts_unchanged_from_literals`. **Operating values explicitly NOT changed** — defensible values still need labelled stable/drifting traces. The gate already exposed these via `GatePolicyConfig` (config seam). |
| 10 | longitudinal.rs | Welford update not numerically guarded for n=0 | P1 | **RESOLVED (`4a9f2bcf4`) — MEASURED.** The shared `WelfordStats` (`field_model.rs`, consumed by longitudinal.rs) `count < 2` guards already prevent the n=0 NaN / n=1 div0 / `(count1)` underflow, but the boundary was untested. Added `welford_finite_at_n0_and_n1` (finite + documented 0.0 sentinel at n=0/n=1). Fails-on-old proof: removing the `sample_variance` guard makes the test panic with "attempt to subtract with overflow" at the `(count 1)` underflow. |
| 11 | cross_room.rs | Fingerprint hash collisions unhandled | P2 | Low collision prob; needs design. |
| 12 | gesture.rs | `euclidean_distance` no length-mismatch guard | P3 | Caller-enforced; add `debug_assert`. |
| 13 | adversarial.rs | Gini/consistency thresholds are magic constants | P1 | **RESOLVED-PARTIAL (`d672fa602`) — DATA-GATED.** Lifted the bare literals in `check`/`check_consistency` (`FIELD_MODEL_GINI_VIOLATION=0.8`, `ENERGY_RATIO_HIGH_VIOLATION=2.0`, `ENERGY_RATIO_LOW_VIOLATION=0.1`, `CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1`, `SCORE_W_*`) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (`energy_ratio_high_boundary`, `energy_ratio_low_boundary`, `field_model_gini_boundary`, `consistency_active_fraction_boundary`) + `tuning_consts_unchanged_from_literals`. **Operating values explicitly NOT changed** — defensible values still need labelled spoofed/clean CSI (Wi-Spoof, §6.2/§7.3). Bumping a const fails a boundary test (verified). |
| 14 | cir.rs | `fft_operator` path changes the witness hash (documented) — no test that it's *numerically close* to dense | P2 | Add a tolerance test. |
| 14 | cir.rs | `fft_operator` path changes the witness hash (documented) — no test that it's *numerically close* to dense | P2 | **RESOLVED (`02e5dd13a`) — tolerance test added.** `fft_operator_within_tolerance_of_dense_canonical56` pins the **full `Cir` output** of the FFT path within a *documented* relative tolerance of the dense path on the production **canonical-56** config across τ ∈ {20,50,90} ns: every tap within `1e-2·|dominant|`, identical `dominant_tap_idx`, `active_tap_count`, `ranging_valid`, `dominant_tap_ratio` within `1e-2`, `rms_delay_spread` within `1e-2` rel. A regression that lets the FFT path drift (scaling/Φ-column bug) now fails here instead of silently corrupting a downstream witness. Extends the existing HT20/single-τ `fft_estimate_matches_dense_dominant_tap`. |
| 15 | multistatic.rs | `cir_gate_coherence` only estimates the **first** node/channel; multi-node CIR consensus unused | P2 | Design item (which node's CIR is authoritative?). |
| 16 | phase_align.rs | Iterative LO offset estimation has no convergence cap test | P2 | Add iteration-cap test. |
| 16 | phase_align.rs | Iterative LO offset estimation has no convergence cap test | P2 | **RESOLVED (`02e5dd13a`) — cap test added.** `refinement_terminates_at_iteration_cap_when_not_converging` forces non-convergence (`tolerance = 0.0`, unreachable since `max_update ≥ 0`) and asserts the loop runs **exactly `max_iterations`** then returns — proving the cap (not convergence) bounds the loop, so a non-converging input can never spin forever. Companion `refinement_converges_before_cap_on_easy_input` proves the cap is an upper bound, not the only exit. Internal-only refactor: `estimate_phase_offsets` still returns the identical offset vector; a `…_counted` core surfaces the iteration count for the test. |
| 17 | hampel.rs | Window edge handling at series boundaries | P3 | Cosmetic. |
| 18 | motion.rs | Threshold constants undocumented | P3 | Doc-only. |
| 19 | csi_ratio.rs | Division guard relies on `1e-12` epsilon; no test | P2 | Add boundary test. |
| 20 | spectrogram.rs | `compute_multi_subcarrier_spectrogram` re-plans per subcarrier via `compute_spectrogram` | P2 | Hoist the planner (relates to #3). |
| 19 | csi_ratio.rs | Division guard relies on `1e-12` epsilon; no test | P2 | **RESOLVED (`02e5dd13a`) — boundary test added.** Finding clarification: `csi_ratio.rs` implements the CSI *ratio model* as the **conjugate product** `H_i·conj(H_j)` (SpotFi/IndoTrack) — there is **no division**, hence no literal `1e-12` epsilon; the classic `H_i/H_j` ratio (which a `1e-12` guard protects) is deliberately avoided. `ratio_finite_at_and_below_1e_12_epsilon` pins the property the finding cares about: at and below the `1e-12` target magnitude (and at exact zero — where a division ratio is ±inf/NaN) the conjugate-product output is **finite**, exactly the conjugate product (bit-exact), collapses toward zero (the physically correct "no path" answer), and stays finite through `ratio_to_amplitude_phase`. |
| 20 | spectrogram.rs | `compute_multi_subcarrier_spectrogram` re-plans per subcarrier via `compute_spectrogram` | P2 | **MEASURED-HOT (`e839fa8f1`) — optimized, bit-identical.** Hoisted the FFT plan + window out of the per-subcarrier loop (new `compute_spectrogram_with_plan` core). **56-subcarrier** multi-spectrogram: **467.88 µs → 254.75 µs = 1.84×** (window 128); **627.27 µs → 448.39 µs = 1.40×** (window 256). The removed cost is the per-subcarrier `FftPlanner` re-plan (~1.86 µs/plan @ w128 × 56). Bit-identical (`multi_subcarrier_hoisted_plan_bit_identical`, `f64::to_bits` across all 4 windows × {power,magnitude}). The most likely real win predicted by the §7.4 intro — confirmed. (Relates to #3, which stays deferred: `spectrogram.rs`/`bvp.rs` single-signal callers already plan once-per-call.) |
| 2145 | (assorted) | Remaining clarity/doc/magic-constant/missing-boundary-test findings across `ruvsense/*`, `features.rs`, `motion.rs` | P3 | Bulk-addressable in a dedicated "test-the-boundaries + de-magic-constant" follow-up; not high-leverage individually. |
> **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). **Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged).** DEFERRED to follow-up: the ~41 remaining P2/P3 findings in §7.4 — none silently dropped.
> **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). **Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged).** **Milestone-2 DONE (2026-06-13): bench-first P2 perf subset + missing boundary tests cleared — spectrogram per-subcarrier FFT re-plan #20 (MEASURED-HOT, 1.401.84×, bit-identical); attention/tomography/Kalman #5/#6/#7 (MEASURED-NULL — benched, not hot, left as-is); field_model eigendecompose #8 (MEASUREMENT-ONLY, BLAS un-buildable on this Windows host, number deferred to a BLAS box, NOT fabricated); fft_operator tolerance #14, phase-align convergence-cap #16, csi-ratio epsilon #19 (RESOLVED, tests added).** DEFERRED to follow-up: the ~36 remaining P2/P3 findings in §7.4 — none silently dropped.
---
@@ -85,9 +85,11 @@ A new criterion bench (`harness = false`, registered in `Cargo.toml`) drives eac
`OpportunisticCsiBridge::ingest` built `CsiReportPayload { n_subcarriers: self.amp_accum.len() as u16, … }`. The `as u16` would silently wrap a count above 65 535. **This is unreachable in practice**: `ingest` gates `frame.subcarrier_count() > MAX_REPORT_SUBCARRIERS` (484) at entry and returns `None`, and `report.validate()` independently rejects oversized counts downstream. We replaced the cast with `u16::try_from(self.amp_accum.len()).ok()?` (drop-instead-of-truncate) so the construction is **correct-by-construction** rather than relying on the upstream gate. We disclose this as **defense-in-depth on an unreachable path, not a live bug** — no behavior change, no new test (the gate already prevents the input that would exercise it).
### 2.6 §B4 — constant-time HMAC tag compare: **DEFERRED, not landed** (disclosed)
### 2.6 §B4 — constant-time HMAC tag compare: **RESOLVED — no-dependency hand-rolled constant-time compare (Milestone-1)**
`secure_tdm.rs:284` compares the 8-byte HMAC tag with `self.hmac_tag == expected` (data-dependent, non-constant-time). The research authorized adding `subtle::ConstantTimeEq` **only if `subtle` were already a direct dependency** — it is not (only transitive, via a crypto crate). Per that guidance, and because this is an **8-byte tag on a LAN multistatic sync beacon** (not a remote attacker-controlled timing-oracle surface), we **do not add a direct dependency** for it. Tracked in §8 as a deferred item, not silently dropped.
`secure_tdm.rs` compared the 8-byte HMAC tag with `self.hmac_tag == expected` (data-dependent, non-constant-time: short-circuits on the first differing byte, leaking through verification latency how many leading bytes a forged tag matched — a byte-by-byte tag-recovery oracle). Milestone-3 deferred this **only** to avoid adding the `subtle` crate as a direct dependency. Milestone-1 resolves it **without any dependency**: a hand-rolled `constant_time_tag_eq(a, b)` that XOR-accumulates every byte difference into a single `u8` with **no early exit**, then compares the accumulator to zero exactly once. `#[inline(never)]` + `core::hint::black_box(diff)` stop the optimizer from reintroducing a short-circuit or lowering the loop into a non-constant-time `memcmp`; a length mismatch returns `false` without inspecting contents. The former `==` verify site now calls this helper.
**Test (fails on old code, the hard gate):** `tag_compare_is_constant_time_shape` — asserts correct accept/reject for equal, first-byte-differ, last-byte-differ, all-byte-differ, and length-mismatch tags, plus an end-to-end `verify()` last-byte-only tamper. Verified to **bite**: introducing a classic constant-time bug (loop `take(LEN-1)`, skipping the last byte) makes it fail on `last-byte-differ must reject`. A coarse timing-invariance smoke check `tag_compare_timing_invariance_smoke` exists but is `#[ignore]`d (noisy host — not a CI gate). **Grade MEASURED** (constant-time *construction*; micro-timing on a noisy host is only a smoke check, disclosed honestly). Tracked RESOLVED in §8.
---
@@ -143,7 +145,7 @@ Grades: **MEASURED** (source measured it, ideally public method/code), **CLAIMED
| 1 | **CSI vital signs (HR/BR)** | Deep-CSI vital-sign models report **MAE ~23 BPM** vs our classical IIR-bandpass + autocorrelation/zero-crossing. | **DATA-GATED + CLAIMED** | **NO ACTION on method.** A deep model needs **paired PPG/ECG ground truth** we do not have, and no public ESP32 artifact reproduces the cited MAE on commodity CSI. Our classical method is the honest commodity baseline; the real wins this milestone are the A1/A3 robustness fixes, not a new model. |
| 2 | **802.11bf-2025 conformance** | Adopt a conformance test-vector suite for the `ieee80211bf/` forward-compat model. | **CLAIMED (not public)** | **NO ACTION.** No commodity silicon ships a conformant 802.11bf interface as of 2026, and the conformance suites are **WBA / Wi-Fi Alliance pre-certification** material, **not public**. Our model's "no OTA encoding until silicon exists" posture (ADR-153) is the correct one. Tracked in §8: *add SBP conformance vectors when the WFA publishes a test plan* — we will **not invent vectors**. |
| 3 | **Per-room calibration (ADR-151)** | Bank-of-specialists + drift-veto vs a 2026 calibration SOTA. | **CLAIMED on numbers, DATA-GATED on a head-to-head** | **NO ACTION on architecture.** The bank-of-specialists + drift-veto design is SOTA-shaped, but we have **no head-to-head PCK** against a published method (no paired multi-room data). The geometry-conditioned LoRA head is **built-but-unconsumed** and data-gated → **ACCEPTED-FUTURE** (§8), not built now. |
| 4 | **Multi-BSSID throughput (wifiscan)** | The module docs assert a native `wlanapi.dll` FFI 1020 Hz path; the current `WlanApiScanner` wraps `netsh` (~2 Hz). | **CLAIMED-unmeasured** | **NO ACTION + corrected expectation.** The native FFI fast path is **asserted but NOT implemented** — the live scanner is the ~2 Hz netsh shim. The "10×" is unmeasured. → **ACCEPTED-FUTURE** (§8). **We explicitly do NOT claim a speedup that does not exist.** |
| 4 | **Multi-BSSID throughput (wifiscan)** | The module docs assert a native `wlanapi.dll` FFI 1020 Hz path; the current `WlanApiScanner` wraps `netsh` (~2 Hz). | **MEASURED (Milestone-1)** | **IMPLEMENTED + MEASURED — real positive win.** Status corrected: the native FFI is **fully implemented and wired live** (`wlanapi_native::scan_native` calls `WlanOpenHandle`/`WlanEnumInterfaces`/`WlanGetNetworkBssList`/`WlanFreeMemory`/`WlanCloseHandle`; `WlanApiScanner::scan_instrumented` runs it native-first with a netsh fallback). Milestone-1 **measured both paths on this box** (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13) over an identical 10 s wall-clock window via a new `benchmark_backend`: **native 21.42 Hz vs netsh 3.84 Hz = 5.57× MEASURED** (mean 5.0 BSSIDs/scan each; native-only run 18.0 Hz). Native genuinely beats netsh — a real measured multiple, **not** a fabricated 10×; the achieved 21.4 Hz lands in the asserted >2 Hz regime though below the asserted 1020 Hz upper bound. 50 back-to-back native scans = 50/50 OK, no handle leak. → §8 MEASURED. |
---
@@ -176,10 +178,10 @@ Grades: **MEASURED** (source measured it, ideally public method/code), **CLAIMED
## 8. Deferred backlog (NOT silently dropped)
- **§B4 constant-time HMAC compare** — `secure_tdm.rs:284` uses `==` on the 8-byte tag. Add `subtle::ConstantTimeEq` **if** `subtle` becomes a direct dependency for another reason; not worth a new dependency for an 8-byte LAN sync-beacon tag (out of the current threat model). Deferred, not dropped.
- **§B4 constant-time HMAC compare** — **RESOLVED (Milestone-1).** Replaced the short-circuiting `==` on the 8-byte tag with a hand-rolled branch-free `constant_time_tag_eq` (XOR-accumulate, no early exit, `#[inline(never)]` + `black_box`). **No new dependency** — the `subtle` crate was the only reason this was deferred, and a fixed 8-byte compare needs none. Pinned by `tag_compare_is_constant_time_shape` (proven to fail on a last-byte-skipping bug). Grade MEASURED (constant-time construction). See §2.6.
- **802.11bf SBP conformance vectors** (§5 #2) — add real conformance test vectors to the `ieee80211bf/` model **when the Wi-Fi Alliance / WBA publishes a public test plan**. Do not invent vectors before then.
- **Geometry-conditioned LoRA calibration head** (§5 #3) — built-but-unconsumed and **data-gated** on paired multi-room PCK data (ADR-152 measurement (b): data, not architecture, is the bottleneck). ACCEPTED-FUTURE.
- **Native `wlanapi.dll` FFI multi-BSSID fast path** (§5 #4) — the asserted 1020 Hz path is **not implemented**; the live scanner is the ~2 Hz netsh shim. Implement and **measure** the real throughput before claiming any multiple. ACCEPTED-FUTURE, CLAIMED-unmeasured until then.
- **Native `wlanapi.dll` FFI multi-BSSID fast path** (§5 #4) — **RESOLVED + MEASURED (Milestone-1).** The native FFI is implemented and wired live (native-first, netsh fallback). Measured on this box (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13): **native 21.42 Hz vs netsh 3.84 Hz = 5.57×**, mean 5.0 BSSIDs/scan, 50/50 native scans with no handle leak. Real positive result — no fabricated 10×. See §5 #4. (Note: a prior sweep recorded 9.74 Hz on a different/older adapter; the per-adapter number varies, the ratio over netsh is the claim.)
- **Deep-CSI vital-sign model** (§5 #1) — DATA-GATED on paired PPG/ECG ground truth. No public ESP32 artifact reproduces the cited ~23 BPM MAE. Not on the near-term path.
---
Generated
+3 -3
View File
@@ -10835,7 +10835,7 @@ dependencies = [
[[package]]
name = "wifi-densepose-cli"
version = "0.3.0"
version = "0.3.1"
dependencies = [
"anyhow",
"assert_cmd",
@@ -11067,7 +11067,7 @@ dependencies = [
[[package]]
name = "wifi-densepose-sensing-server"
version = "0.3.2"
version = "0.3.3"
dependencies = [
"axum",
"chrono",
@@ -11101,7 +11101,7 @@ dependencies = [
[[package]]
name = "wifi-densepose-signal"
version = "0.3.3"
version = "0.3.4"
dependencies = [
"chrono",
"criterion",
@@ -47,6 +47,42 @@ type HmacSha256 = Hmac<Sha256>;
/// Size of the HMAC-SHA256 truncated tag (manual crypto mode).
const HMAC_TAG_SIZE: usize = 8;
/// Constant-time comparison of two fixed-size HMAC/auth tags.
///
/// ADR-157 §B4: the previous `self.hmac_tag == expected` short-circuits on the
/// first differing byte, leaking how many leading bytes matched through its
/// execution time. For an authentication tag that is a timing oracle: an
/// attacker who can submit forged beacons and measure verification latency can
/// recover the correct tag byte-by-byte (~256·N trials instead of 256^N).
///
/// This hand-rolled compare avoids adding the `subtle` crate (ADR-157 deferred
/// B4 only to dodge that dependency — a fixed 8-byte compare needs none). We
/// XOR-accumulate every byte difference into a single `u8` with **no early
/// exit**, so the work done is identical regardless of where (or whether) the
/// tags differ. The accumulator is non-zero iff any byte differed; we compare
/// it to zero exactly once at the end.
///
/// `#[inline(never)]` plus `black_box` on the accumulator stop the optimizer
/// from reintroducing a short-circuit or hoisting the loop into a `memcmp`
/// (which is itself non-constant-time). The two slices are required to be the
/// same length by construction (both `[u8; HMAC_TAG_SIZE]`); a length mismatch
/// returns `false` without inspecting contents.
#[inline(never)]
fn constant_time_tag_eq(a: &[u8], b: &[u8]) -> bool {
if a.len() != b.len() {
return false;
}
let mut diff: u8 = 0;
for (x, y) in a.iter().zip(b.iter()) {
// Branch-free: accumulate the bitwise difference of every byte.
diff |= x ^ y;
}
// black_box prevents the compiler from proving `diff == 0` early and
// short-circuiting the loop above. The single equality check is the only
// data-dependent branch, and it is on the fully-accumulated value.
core::hint::black_box(diff) == 0
}
/// Size of the nonce field (manual crypto mode).
const NONCE_SIZE: usize = 4;
@@ -281,7 +317,10 @@ impl AuthenticatedBeacon {
msg[..16].copy_from_slice(&self.beacon.to_bytes());
msg[16..20].copy_from_slice(&self.nonce.to_le_bytes());
let expected = Self::compute_tag(&msg, key);
if self.hmac_tag == expected {
// ADR-157 §B4: constant-time compare — `==` on the tag would leak,
// via short-circuit timing, how many leading bytes an attacker's
// forged tag matched, enabling byte-by-byte tag recovery.
if constant_time_tag_eq(&self.hmac_tag, &expected) {
Ok(())
} else {
Err(SecureTdmError::BeaconAuthFailed)
@@ -752,6 +791,124 @@ mod tests {
));
}
// ---- ADR-157 §B4: constant-time tag compare ----
/// Functional pin proving the new constant-time helper is wired and correct
/// for the four tag-shape cases. This is the *hard gate* for §B4 — it fails
/// on the old `==` path only if the helper is removed/unwired, and it
/// guarantees accept/reject semantics are byte-exact. Grade: MEASURED
/// (constant-time *construction*); micro-timing on a noisy host is only a
/// smoke check (see `tag_compare_timing_invariance_smoke`, #[ignore]).
#[test]
fn tag_compare_is_constant_time_shape() {
let base = [0xA5u8; HMAC_TAG_SIZE];
// Equal tags accept.
assert!(constant_time_tag_eq(&base, &base), "equal tags must accept");
// First byte differs → reject.
let mut first = base;
first[0] ^= 0xFF;
assert!(
!constant_time_tag_eq(&base, &first),
"first-byte-differ must reject"
);
// Last byte differs → reject.
let mut last = base;
last[HMAC_TAG_SIZE - 1] ^= 0x01;
assert!(
!constant_time_tag_eq(&base, &last),
"last-byte-differ must reject"
);
// Every byte differs → reject.
let all = [0x5Au8; HMAC_TAG_SIZE]; // bitwise-inverse of 0xA5
assert!(
!constant_time_tag_eq(&base, &all),
"all-bytes-differ must reject"
);
// Length mismatch → reject without inspecting contents.
assert!(
!constant_time_tag_eq(&base, &base[..HMAC_TAG_SIZE - 1]),
"length mismatch must reject"
);
// End-to-end through verify(): a tag whose only difference is the
// *last* byte must still be rejected exactly like a first-byte diff.
let beacon = SyncBeacon {
cycle_id: 7,
cycle_period: Duration::from_millis(50),
drift_correction_us: 0,
generated_at: std::time::Instant::now(),
};
let key = DEFAULT_TEST_KEY;
let nonce = 1u32;
let mut msg = [0u8; 20];
msg[..16].copy_from_slice(&beacon.to_bytes());
msg[16..20].copy_from_slice(&nonce.to_le_bytes());
let mut tag = AuthenticatedBeacon::compute_tag(&msg, &key);
tag[HMAC_TAG_SIZE - 1] ^= 0x01; // tamper the LAST byte only
let auth = AuthenticatedBeacon {
beacon,
nonce,
hmac_tag: tag,
};
assert!(
matches!(auth.verify(&key), Err(SecureTdmError::BeaconAuthFailed)),
"last-byte tamper must fail verify()"
);
}
/// Coarse timing-invariance smoke check. #[ignore]d so it never flakes CI —
/// the host is noisy and a hard timing bound is unreliable. Run manually
/// with `cargo test -p wifi-densepose-hardware -- --ignored
/// tag_compare_timing_invariance_smoke --nocapture`. The assertion is a
/// deliberately *generous* ratio bound (4×): a short-circuit `==` would show
/// last-byte-differ ≫ first-byte-differ; the constant-time helper should not.
#[test]
#[ignore = "timing smoke check — noisy host, run manually with --ignored"]
fn tag_compare_timing_invariance_smoke() {
use std::time::Instant;
const ITERS: u32 = 2_000_000;
let base = [0xA5u8; HMAC_TAG_SIZE];
let mut first = base;
first[0] ^= 0xFF;
let mut last = base;
last[HMAC_TAG_SIZE - 1] ^= 0x01;
// Warm up.
for _ in 0..ITERS / 10 {
core::hint::black_box(constant_time_tag_eq(&base, &first));
}
let t0 = Instant::now();
let mut acc = false;
for _ in 0..ITERS {
acc ^= constant_time_tag_eq(&base, &first);
}
core::hint::black_box(acc);
let dt_first = t0.elapsed().as_nanos() as f64;
let t1 = Instant::now();
let mut acc2 = false;
for _ in 0..ITERS {
acc2 ^= constant_time_tag_eq(&base, &last);
}
core::hint::black_box(acc2);
let dt_last = t1.elapsed().as_nanos() as f64;
let ratio = dt_last.max(dt_first) / dt_last.min(dt_first).max(1.0);
println!(
"first-differ {dt_first:.0}ns, last-differ {dt_last:.0}ns, ratio {ratio:.3}"
);
assert!(
ratio < 4.0,
"timing ratio {ratio:.3} too large — possible short-circuit leak"
);
}
#[test]
fn test_auth_beacon_too_short() {
let result = AuthenticatedBeacon::from_bytes(&[0u8; 10]);
@@ -71,6 +71,12 @@ harness = false
name = "features_bench"
harness = false
## ADR-154 Milestone-2: P2 "bench-first" perf items (§7.4 #5/#6/#7/#8/#20).
## #8 (field_model eigendecompose) is measured only under the eigenvalue feature.
[[bench]]
name = "dsp_perf_bench"
harness = false
## ADR-134: CIR estimator throughput benchmarks
[[bench]]
name = "cir_bench"
@@ -0,0 +1,353 @@
//! ADR-154 Milestone-2 perf benchmarks (§7.4 P2 "bench-first" items).
//!
//! PROOF discipline (ADR-154 §0): every P2 item is **benched before touched**.
//! A micro-opt is landed only if the bench proves the path hot; otherwise the
//! committed bench *is* the result — a MEASURED-NULL that proves the rewrite was
//! unnecessary (exactly the §5.x "already amortized" pattern). No speedup is
//! claimed without a before/after number from here.
//!
//! Reproduce (compile-only):
//! cargo bench -p wifi-densepose-signal --no-default-features \
//! --bench dsp_perf_bench --no-run
//!
//! Reproduce (full run, writes target/criterion/ HTML):
//! cargo bench -p wifi-densepose-signal --no-default-features --bench dsp_perf_bench
//!
//! Groups:
//! * `multistatic_attention` (#5) — `node_attention_weights` at 2..8 nodes ×
//! 56 subcarriers. Re-derives consensus/softmax each call; no scratch to
//! reuse → expected MEASURED-NULL.
//! * `tomography_reconstruct` (#6) — full ISTA solve. The two voxel buffers are
//! allocated once per `reconstruct()` (then `.fill`-reused across
//! iterations), so the per-solve alloc is 2×n_voxels vs an
//! O(iters·links·voxels) compute → expected MEASURED-NULL.
//! * `pose_kalman_update` (#7) — Kalman predict+update loop. The "gain
//! matrices" are fixed-size **stack** arrays (`[[f32;3];6]`), not heap —
//! nothing to reuse → expected MEASURED-NULL.
//! * `spectrogram_multi_subcarrier` (#20) — `compute_multi_subcarrier_spectrogram`:
//! fresh-planner-per-subcarrier (BEFORE) vs hoisted-plan (AFTER, shipped).
//! The per-subcarrier FFT re-plan is the likely real win.
//! * `field_model_occupancy` (#8, `eigenvalue` only) — per-call n×n
//! eigendecomposition in `estimate_occupancy`. MEASUREMENT-ONLY: quantifies
//! the recompute cost; incremental SVD is a sized future project, not a
//! micro-fix.
use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use ndarray::Array2;
use rustfft::FftPlanner;
use std::f64::consts::PI;
use std::time::Duration;
use wifi_densepose_signal::ruvsense::multistatic::node_attention_weights;
use wifi_densepose_signal::ruvsense::pose_tracker::KeypointState;
use wifi_densepose_signal::ruvsense::tomography::{
LinkGeometry, Position3D, RfTomographer, TomographyConfig,
};
use wifi_densepose_signal::spectrogram::{
compute_multi_subcarrier_spectrogram, compute_spectrogram, Spectrogram, SpectrogramConfig,
WindowFunction,
};
// ---------------------------------------------------------------------------
// #5 multistatic node_attention_weights
// ---------------------------------------------------------------------------
fn make_node_amplitudes(n_nodes: usize, n_sub: usize) -> Vec<Vec<f32>> {
(0..n_nodes)
.map(|n| {
(0..n_sub)
.map(|s| {
let phase = (n as f32 * 0.31 + s as f32 * 0.07) % std::f32::consts::TAU;
0.5 + 0.4 * phase.sin()
})
.collect()
})
.collect()
}
fn bench_multistatic_attention(c: &mut Criterion) {
let mut group = c.benchmark_group("multistatic_attention");
group.measurement_time(Duration::from_secs(3));
let n_sub = 56; // canonical-56 grid
for &n_nodes in &[2usize, 4, 8] {
let owned = make_node_amplitudes(n_nodes, n_sub);
let refs: Vec<&[f32]> = owned.iter().map(|v| v.as_slice()).collect();
group.throughput(Throughput::Elements(1));
group.bench_with_input(
BenchmarkId::new("weights", n_nodes),
&refs,
|b, amplitudes| {
b.iter(|| black_box(node_attention_weights(black_box(amplitudes), 1.0)));
},
);
}
group.finish();
}
// ---------------------------------------------------------------------------
// #6 tomography reconstruct (ISTA L1)
// ---------------------------------------------------------------------------
fn make_tomographer(n_links: usize) -> (RfTomographer, Vec<f64>) {
// A modest 8x8x4 grid (256 voxels), n_links TX/RX pairs around the box.
let config = TomographyConfig {
nx: 8,
ny: 8,
nz: 4,
bounds: [0.0, 0.0, 0.0, 4.0, 4.0, 2.0],
lambda: 0.01,
max_iterations: 50,
tolerance: 1e-6,
min_links: 8,
};
let mut links = Vec::with_capacity(n_links);
for i in 0..n_links {
let t = i as f64 / n_links as f64;
links.push(LinkGeometry {
tx: Position3D {
x: 4.0 * (t * PI).cos().abs(),
y: 0.0,
z: 1.0,
},
rx: Position3D {
x: 4.0 * (t * PI).sin().abs(),
y: 4.0,
z: 1.0,
},
link_id: i,
});
}
let tomo = RfTomographer::new(config, &links).unwrap();
// Deterministic attenuations (one occupied region in the middle).
let attenuations: Vec<f64> = (0..n_links)
.map(|i| 0.1 + 0.05 * ((i as f64 * 0.3).sin()))
.collect();
(tomo, attenuations)
}
fn bench_tomography_reconstruct(c: &mut Criterion) {
let mut group = c.benchmark_group("tomography_reconstruct");
group.measurement_time(Duration::from_secs(4));
for &n_links in &[16usize, 32] {
let (tomo, atten) = make_tomographer(n_links);
group.throughput(Throughput::Elements(1));
group.bench_with_input(
BenchmarkId::new("solve", n_links),
&(tomo, atten),
|b, (tomo, atten)| {
b.iter(|| black_box(tomo.reconstruct(black_box(atten)).unwrap().occupied_count));
},
);
}
group.finish();
}
// ---------------------------------------------------------------------------
// #7 pose tracker Kalman update loop
// ---------------------------------------------------------------------------
fn bench_pose_kalman_update(c: &mut Criterion) {
let mut group = c.benchmark_group("pose_kalman_update");
group.measurement_time(Duration::from_secs(3));
// 17 keypoints (COCO-17), N predict+update cycles — a realistic frame batch.
for &n_updates in &[17usize, 170] {
group.throughput(Throughput::Elements(n_updates as u64));
group.bench_with_input(BenchmarkId::new("cycles", n_updates), &n_updates, |b, &n| {
b.iter(|| {
let mut acc = 0.0_f32;
for k in 0..n {
let mut state = KeypointState::new(
(k as f32 * 0.1).sin(),
(k as f32 * 0.2).cos(),
1.0 + (k as f32 * 0.05),
);
state.predict(0.05, 0.5);
let meas = [
(k as f32 * 0.1).sin() + 0.01,
(k as f32 * 0.2).cos() - 0.01,
1.0 + (k as f32 * 0.05),
];
state.update(&meas, 0.1, 1.0);
acc += state.state[0];
}
black_box(acc)
});
});
}
group.finish();
}
// ---------------------------------------------------------------------------
// #20 multi-subcarrier spectrogram: fresh-planner vs hoisted plan
// ---------------------------------------------------------------------------
fn make_csi_temporal(n_samples: usize, n_sc: usize) -> Array2<f64> {
Array2::from_shape_fn((n_samples, n_sc), |(t, sc)| {
let freq = 0.7 + sc as f64 * 0.13;
(2.0 * PI * freq * t as f64 / 100.0).sin()
+ 0.3 * (2.0 * PI * (freq * 2.1) * t as f64 / 100.0).cos()
})
}
/// BEFORE: re-plan the FFT inside `compute_spectrogram` for every subcarrier.
/// Faithful transcription of the pre-ADR-154-M2 `compute_multi_subcarrier_spectrogram`.
fn multi_fresh_planner(
csi: &Array2<f64>,
sample_rate: f64,
config: &SpectrogramConfig,
) -> Vec<Spectrogram> {
let (_, n_sc) = csi.dim();
(0..n_sc)
.map(|sc| {
let col: Vec<f64> = csi.column(sc).to_vec();
// compute_spectrogram builds a fresh FftPlanner on every call.
compute_spectrogram(&col, sample_rate, config).unwrap()
})
.collect()
}
fn bench_spectrogram_multi_subcarrier(c: &mut Criterion) {
let mut group = c.benchmark_group("spectrogram_multi_subcarrier");
group.measurement_time(Duration::from_secs(5));
let sample_rate = 100.0;
// Realistic: 600 temporal samples (~6 s @ 100 Hz) across 56 subcarriers,
// window 128. n_sc re-plans removed by the hoist.
for &(n_samples, n_sc, window) in &[(600usize, 56usize, 128usize), (600, 56, 256)] {
let csi = make_csi_temporal(n_samples, n_sc);
let config = SpectrogramConfig {
window_size: window,
hop_size: 64,
window_fn: WindowFunction::Hann,
power: true,
};
group.throughput(Throughput::Elements(n_sc as u64));
// BEFORE: fresh planner per subcarrier.
group.bench_with_input(
BenchmarkId::new("fresh_planner", format!("sc{n_sc}_w{window}")),
&config,
|b, cfg| {
b.iter(|| black_box(multi_fresh_planner(black_box(&csi), sample_rate, cfg).len()));
},
);
// AFTER: hoisted plan (the shipped `compute_multi_subcarrier_spectrogram`).
group.bench_with_input(
BenchmarkId::new("hoisted_plan", format!("sc{n_sc}_w{window}")),
&config,
|b, cfg| {
b.iter(|| {
black_box(
compute_multi_subcarrier_spectrogram(black_box(&csi), sample_rate, cfg)
.unwrap()
.len(),
)
});
},
);
}
group.finish();
}
// A standalone FftPlanner sanity micro-bench documenting the cost the hoist
// removes: building+planning a length-N forward FFT once.
fn bench_fft_plan_cost(c: &mut Criterion) {
let mut group = c.benchmark_group("fft_plan_cost");
group.measurement_time(Duration::from_secs(2));
for &n in &[128usize, 256] {
group.bench_with_input(BenchmarkId::new("plan_forward", n), &n, |b, &n| {
b.iter(|| {
let mut planner = FftPlanner::<f64>::new();
black_box(planner.plan_fft_forward(black_box(n)))
});
});
}
group.finish();
}
// ---------------------------------------------------------------------------
// #8 field_model SVD/eigendecomposition recompute (MEASUREMENT-ONLY)
// ---------------------------------------------------------------------------
// `estimate_occupancy` builds an n×n covariance and eigendecomposes it on every
// call (BLAS, `eigenvalue` feature). This bench quantifies that per-call cost so
// ADR-154 §7.4 #8 can record a number; incremental SVD is a sized future item,
// NOT attempted here.
#[cfg(feature = "eigenvalue")]
mod eig {
use super::*;
use wifi_densepose_signal::ruvsense::field_model::{FieldModel, FieldModelConfig};
fn calibrated_model(n_sub: usize, n_links: usize) -> FieldModel {
let config = FieldModelConfig {
n_subcarriers: n_sub,
n_links,
n_modes: 3,
min_calibration_frames: 20,
baseline_expiry_s: 86_400.0,
};
let mut model = FieldModel::new(config).unwrap();
// Feed deterministic calibration frames: [n_links][n_sub] per observation.
for f in 0..30 {
let obs: Vec<Vec<f64>> = (0..n_links)
.map(|l| {
(0..n_sub)
.map(|s| {
0.5 + 0.3
* ((f as f64 * 0.1 + l as f64 * 0.2 + s as f64 * 0.05).sin())
})
.collect()
})
.collect();
model.feed_calibration(&obs).unwrap();
}
model.finalize_calibration(0, 0).unwrap();
model
}
pub fn bench_field_model_occupancy(c: &mut Criterion) {
let mut group = c.benchmark_group("field_model_occupancy");
group.measurement_time(Duration::from_secs(4));
let n_sub = 56;
let model = calibrated_model(n_sub, 4);
// Sliding window of recent frames (50 ~ 2.5 s @ 20 Hz).
let frames: Vec<Vec<f64>> = (0..50)
.map(|t| {
(0..n_sub)
.map(|s| 0.5 + 0.3 * ((t as f64 * 0.15 + s as f64 * 0.07).sin()))
.collect()
})
.collect();
group.throughput(Throughput::Elements(1));
group.bench_function(BenchmarkId::new("eigh", n_sub), |b| {
b.iter(|| black_box(model.estimate_occupancy(black_box(&frames))));
});
group.finish();
}
}
#[cfg(feature = "eigenvalue")]
criterion_group!(
benches,
bench_multistatic_attention,
bench_tomography_reconstruct,
bench_pose_kalman_update,
bench_spectrogram_multi_subcarrier,
bench_fft_plan_cost,
eig::bench_field_model_occupancy,
);
#[cfg(not(feature = "eigenvalue"))]
criterion_group!(
benches,
bench_multistatic_attention,
bench_tomography_reconstruct,
bench_pose_kalman_update,
bench_spectrogram_multi_subcarrier,
bench_fft_plan_cost,
);
criterion_main!(benches);
@@ -197,4 +197,61 @@ mod tests {
Err(CsiRatioError::LengthMismatch { .. })
));
}
// ADR-154 §7.4 #19: the CSI *ratio model*. The classic ratio is
// `H_i[k] / H_j[k]`, which blows up (±inf / NaN) when `H_j[k]` approaches
// zero — the case a `1e-12` division-guard epsilon is meant to protect. This
// module deliberately implements the ratio as the **conjugate product**
// `H_i * conj(H_j)` (SpotFi/IndoTrack), which has *no division* and is
// therefore finite even at and below the `1e-12` magnitude boundary. This
// test pins that property: at the epsilon boundary the output is finite and
// exactly the conjugate product (no silent NaN/inf from a hidden divide).
#[test]
fn ratio_finite_at_and_below_1e_12_epsilon() {
let eps = 1e-12_f64;
// Reference at unit magnitude; target swept across / under the epsilon
// boundary a naive H_i/H_j division would need to guard.
let h_ref = vec![
Complex64::from_polar(1.0, 0.3),
Complex64::from_polar(1.0, 0.3),
Complex64::from_polar(1.0, 0.3),
Complex64::from_polar(1.0, 0.3),
];
let h_target = vec![
Complex64::new(eps, 0.0), // exactly at the epsilon
Complex64::new(eps * 0.5, 0.0), // below the epsilon
Complex64::new(0.0, eps), // imaginary axis, at epsilon
Complex64::new(0.0, 0.0), // exact zero — div would be inf/NaN
];
let ratio = conjugate_multiply(&h_ref, &h_target).unwrap();
assert_eq!(ratio.len(), 4);
for (k, r) in ratio.iter().enumerate() {
assert!(
r.re.is_finite() && r.im.is_finite(),
"conjugate-multiply ratio must be finite at boundary k={k}: {r:?}"
);
}
// The near-zero / zero target collapses the product toward zero (the
// physically correct "no measurable path" answer), never to inf/NaN.
assert!(
ratio[3].norm() == 0.0,
"exact-zero target → zero product, got {}",
ratio[3].norm()
);
// The at-epsilon entries equal the exact conjugate product (bit-exact).
let expected0 = h_ref[0] * h_target[0].conj();
assert_eq!(ratio[0].re.to_bits(), expected0.re.to_bits());
assert_eq!(ratio[0].im.to_bits(), expected0.im.to_bits());
// The full pipeline (amplitude/phase extraction) is also finite here.
let mut m = Array2::<Complex64>::zeros((1, 4));
for (k, &v) in ratio.iter().enumerate() {
m[[0, k]] = v;
}
let (amp, phase) = ratio_to_amplitude_phase(&m);
assert!(amp.iter().all(|a| a.is_finite()));
assert!(phase.iter().all(|p| p.is_finite()));
}
}
@@ -1458,6 +1458,79 @@ mod tests {
}
}
/// ADR-154 §7.4 #14: the `fft_operator` path *changes the witness hash*
/// (documented in `CirConfig::fft_operator`), so it must be pinned as
/// numerically **close** to the dense path — not silently divergent. The
/// existing `fft_estimate_matches_dense_dominant_tap` covers HT20 / one tau;
/// this test asserts the **full `Cir` output** (every tap + every scalar
/// field) stays within a documented relative tolerance on the production
/// **canonical-56** config across several realistic delays. A regression
/// that lets the FFT path drift (wrong scaling, off-by-one Φ column, etc.)
/// fails here instead of corrupting a downstream witness unnoticed.
#[test]
fn fft_operator_within_tolerance_of_dense_canonical56() {
// Relative tolerances — documented, not silent. The FFT operator sums the
// same Φ entries in a different order, so taps agree to ~float epsilon
// scaled by the dominant-tap magnitude; ISTA can differ by a few last
// bits over its trajectory, hence 1e-2 (same order as the existing test).
const TAP_REL_TOL: f32 = 1e-2;
const RATIO_ABS_TOL: f32 = 1e-2;
const SPREAD_REL_TOL: f64 = 1e-2;
for &tau in &[20e-9_f64, 50e-9, 90e-9] {
let dense_cfg = CirConfig::canonical56();
let mut fft_cfg = CirConfig::canonical56();
fft_cfg.fft_operator = true;
let frame = make_single_tap_frame(dense_cfg.num_subcarriers, tau);
let dense = CirEstimator::new(dense_cfg).estimate(&frame).unwrap();
let fast = CirEstimator::new(fft_cfg).estimate(&frame).unwrap();
assert_eq!(dense.taps.len(), fast.taps.len());
// Full tap vector close (relative to the dominant tap magnitude).
let dom = dense.taps[dense.dominant_tap_idx].norm().max(1e-6);
let mut max_tap_err = 0.0_f32;
for (a, b) in dense.taps.iter().zip(&fast.taps) {
max_tap_err = max_tap_err.max((a - b).norm());
}
assert!(
max_tap_err <= TAP_REL_TOL * dom,
"tau={tau:e}: FFT taps diverged from dense — max err {max_tap_err} > {TAP_REL_TOL} * {dom} (NOT numerically close)"
);
// The dominant tap and the scalar summary fields must agree too —
// these feed the witness, so a silent divergence here is the bug #14
// guards against.
assert_eq!(
dense.dominant_tap_idx, fast.dominant_tap_idx,
"tau={tau:e}: dominant tap index moved"
);
assert!(
(dense.dominant_tap_ratio - fast.dominant_tap_ratio).abs() <= RATIO_ABS_TOL,
"tau={tau:e}: dominant_tap_ratio drift {} vs {}",
dense.dominant_tap_ratio,
fast.dominant_tap_ratio
);
assert_eq!(
dense.active_tap_count, fast.active_tap_count,
"tau={tau:e}: active_tap_count changed"
);
assert_eq!(
dense.ranging_valid, fast.ranging_valid,
"tau={tau:e}: ranging_valid flipped"
);
let spread_ref = dense.rms_delay_spread_s.abs().max(1e-12);
assert!(
(dense.rms_delay_spread_s - fast.rms_delay_spread_s).abs()
<= SPREAD_REL_TOL * spread_ref,
"tau={tau:e}: rms_delay_spread drift {} vs {}",
dense.rms_delay_spread_s,
fast.rms_delay_spread_s
);
}
}
/// The default configs keep the FFT operator off — the dense, bit-exact
/// witness path is the default (enabling FFT shifts float results).
#[test]
@@ -201,12 +201,29 @@ fn find_static_subcarriers(
/// Estimate per-channel phase offsets using iterative Neumann-style refinement.
///
/// Channel 0 is the reference (offset = 0).
/// Channel 0 is the reference (offset = 0). Thin wrapper that drops the
/// iteration count; `estimate_phase_offsets_counted` is the instrumented core.
fn estimate_phase_offsets(
frames: &[CanonicalCsiFrame],
static_indices: &[usize],
config: &PhaseAlignConfig,
) -> std::result::Result<Vec<f32>, PhaseAlignError> {
estimate_phase_offsets_counted(frames, static_indices, config).map(|(offsets, _iters)| offsets)
}
/// Core of [`estimate_phase_offsets`], also returning the number of refinement
/// iterations actually executed.
///
/// The returned count is bounded by `config.max_iterations` — that bound is the
/// convergence cap that guarantees termination on inputs the damped Neumann
/// update never drives below `config.tolerance` (ADR-154 §7.4 #16). The offset
/// vector is identical to the public `estimate_phase_offsets` path; only the
/// iteration count is surfaced (for the cap test).
fn estimate_phase_offsets_counted(
frames: &[CanonicalCsiFrame],
static_indices: &[usize],
config: &PhaseAlignConfig,
) -> std::result::Result<(Vec<f32>, usize), PhaseAlignError> {
let n_ch = frames.len();
let mut offsets = vec![0.0_f32; n_ch];
@@ -220,7 +237,7 @@ fn estimate_phase_offsets(
}
// Iterative refinement (Neumann-style)
for _iter in 0..config.max_iterations {
for iter in 0..config.max_iterations {
let mut max_update = 0.0_f32;
for c in 1..n_ch {
@@ -241,12 +258,13 @@ fn estimate_phase_offsets(
}
if max_update < config.tolerance {
return Ok(offsets);
return Ok((offsets, iter + 1));
}
}
// Even if we do not converge tightly, return best estimate
Ok(offsets)
// Even if we do not converge tightly, return best estimate. The loop ran the
// full cap — termination is guaranteed by `config.max_iterations`.
Ok((offsets, config.max_iterations))
}
/// Apply phase correction: subtract offset from each subcarrier phase.
@@ -446,6 +464,73 @@ mod tests {
assert_eq!(cfg.min_static_subcarriers, 5);
}
// ADR-154 §7.4 #16: the iterative LO-offset refinement must TERMINATE at the
// `max_iterations` cap on a non-converging input — no unbounded loop.
//
// We force non-convergence by setting `tolerance` to an unreachable value
// (the damped Neumann update on bounded phase residuals can never drive
// `max_update` below 0.0), so the `max_update < tolerance` early-exit is
// never taken. The instrumented core must then run *exactly*
// `max_iterations` and return — proving the cap, not convergence, is what
// bounds the loop.
#[test]
fn refinement_terminates_at_iteration_cap_when_not_converging() {
let n_sub = 56;
let max_iterations = 7;
let config = PhaseAlignConfig {
max_iterations,
// Unreachable tolerance: `max_update` is always ≥ 0, never < 0.0,
// so the convergence branch can never fire.
tolerance: 0.0,
static_fraction: 0.3,
min_static_subcarriers: 5,
};
// Two channels with a real, persistent offset so each iteration keeps
// producing a non-zero update.
let f0 = make_frame_with_phase(n_sub, 0.0, 0.0);
let f1 = make_frame_with_phase(n_sub, 0.0, 1.3);
let frames = vec![f0, f1];
let static_indices = find_static_subcarriers(&frames, &config).unwrap();
let (offsets, iters) =
estimate_phase_offsets_counted(&frames, &static_indices, &config).unwrap();
// The cap, not convergence, terminated the loop.
assert_eq!(
iters, max_iterations,
"expected the loop to run the full cap ({max_iterations}), got {iters}"
);
// It still returns a finite best-estimate offset vector.
assert_eq!(offsets.len(), 2);
assert!(offsets.iter().all(|o| o.is_finite()));
// Reference channel offset stays 0.
assert_eq!(offsets[0], 0.0);
}
// Convergent companion: a near-identical input converges *before* the cap,
// so the cap is an upper bound, not the only exit.
#[test]
fn refinement_converges_before_cap_on_easy_input() {
let n_sub = 56;
let config = PhaseAlignConfig {
max_iterations: 50,
tolerance: 1e-2, // loose: a tiny offset converges in a few iters
static_fraction: 0.3,
min_static_subcarriers: 5,
};
let f0 = make_frame_with_phase(n_sub, 0.0, 0.0);
let f1 = make_frame_with_phase(n_sub, 0.0, 0.02);
let frames = vec![f0, f1];
let static_indices = find_static_subcarriers(&frames, &config).unwrap();
let (_offsets, iters) =
estimate_phase_offsets_counted(&frames, &static_indices, &config).unwrap();
assert!(
iters < config.max_iterations,
"easy input should converge before the cap, ran {iters}/{}",
config.max_iterations
);
}
#[test]
fn phase_correction_preserves_amplitude() {
let mut aligner = PhaseAligner::new(2);
@@ -9,9 +9,10 @@
use ndarray::Array2;
use num_complex::Complex64;
use rustfft::FftPlanner;
use rustfft::{Fft, FftPlanner};
use ruvector_attn_mincut::attn_mincut;
use std::f64::consts::PI;
use std::sync::Arc;
/// Configuration for spectrogram generation.
#[derive(Debug, Clone)]
@@ -87,12 +88,40 @@ pub fn compute_spectrogram(
return Err(SpectrogramError::InvalidWindowSize);
}
let n_frames = (signal.len() - config.window_size) / config.hop_size + 1;
let n_freq = config.window_size / 2 + 1;
let window = make_window(config.window_fn, config.window_size);
let mut planner = FftPlanner::new();
let fft = planner.plan_fft_forward(config.window_size);
let window = make_window(config.window_fn, config.window_size);
Ok(compute_spectrogram_with_plan(
signal,
sample_rate,
config,
&fft,
&window,
))
}
/// STFT core that runs against a **pre-planned** FFT and pre-built window.
///
/// ADR-154 §7.4 #20: `compute_spectrogram` re-plans the FFT on every call, so
/// `compute_multi_subcarrier_spectrogram` (which calls it once per subcarrier)
/// re-planned the same length-`window_size` FFT for *every* subcarrier. This
/// helper hoists the plan + window out of the per-subcarrier loop. The numeric
/// body is byte-for-byte the old loop — only the plan/window construction is
/// lifted — so the output is **bit-identical** to the per-call path (asserted by
/// `multi_subcarrier_hoisted_plan_bit_identical`). Callers must pass a plan
/// built for exactly `config.window_size` and a window of that length.
fn compute_spectrogram_with_plan(
signal: &[f64],
sample_rate: f64,
config: &SpectrogramConfig,
fft: &Arc<dyn Fft<f64>>,
window: &[f64],
) -> Spectrogram {
debug_assert_eq!(window.len(), config.window_size, "window/plan size mismatch");
debug_assert_eq!(fft.len(), config.window_size, "FFT/window size mismatch");
let n_frames = (signal.len() - config.window_size) / config.hop_size + 1;
let n_freq = config.window_size / 2 + 1;
let mut data = Array2::zeros((n_freq, n_frames));
@@ -116,13 +145,13 @@ pub fn compute_spectrogram(
}
}
Ok(Spectrogram {
Spectrogram {
data,
n_freq,
n_time: n_frames,
freq_resolution: sample_rate / config.window_size as f64,
time_resolution: config.hop_size as f64 / sample_rate,
})
}
}
/// Compute spectrogram for each subcarrier from a temporal CSI matrix.
@@ -134,12 +163,40 @@ pub fn compute_multi_subcarrier_spectrogram(
sample_rate: f64,
config: &SpectrogramConfig,
) -> Result<Vec<Spectrogram>, SpectrogramError> {
let (_, n_sc) = csi_temporal.dim();
let mut spectrograms = Vec::with_capacity(n_sc);
let (n_samples, n_sc) = csi_temporal.dim();
// ADR-154 §7.4 #20: validate *once* (same checks `compute_spectrogram`
// makes), then plan the FFT + build the window *once* and reuse them across
// every subcarrier instead of re-planning per column. The window length is
// identical for all subcarriers, so this is pure hoisting — output stays
// bit-identical to the per-call path.
if n_samples < config.window_size {
return Err(SpectrogramError::SignalTooShort {
signal_len: n_samples,
window_size: config.window_size,
});
}
if config.hop_size == 0 {
return Err(SpectrogramError::InvalidHopSize);
}
if config.window_size == 0 {
return Err(SpectrogramError::InvalidWindowSize);
}
let mut planner = FftPlanner::new();
let fft = planner.plan_fft_forward(config.window_size);
let window = make_window(config.window_fn, config.window_size);
let mut spectrograms = Vec::with_capacity(n_sc);
for sc in 0..n_sc {
let col: Vec<f64> = csi_temporal.column(sc).to_vec();
spectrograms.push(compute_spectrogram(&col, sample_rate, config)?);
spectrograms.push(compute_spectrogram_with_plan(
&col,
sample_rate,
config,
&fft,
&window,
));
}
Ok(spectrograms)
@@ -372,6 +429,67 @@ mod tests {
assert_eq!(spec.n_freq, 65);
}
}
// ADR-154 §7.4 #20: the FFT-planner hoist in
// `compute_multi_subcarrier_spectrogram` must produce **bit-identical**
// output to calling `compute_spectrogram` (fresh planner) per subcarrier.
// We compare `f64::to_bits` of every spectrogram value across several
// window functions and a realistic 56-subcarrier CSI matrix — the planner
// change only reorders *when* the (identical) plan is built, never the math.
#[test]
fn multi_subcarrier_hoisted_plan_bit_identical() {
let n_samples = 600;
let n_sc = 56; // canonical-56 grid — the production subcarrier count
let sample_rate = 100.0;
let csi = Array2::from_shape_fn((n_samples, n_sc), |(t, sc)| {
// Deterministic, non-trivial per-subcarrier content.
let freq = 0.7 + sc as f64 * 0.13;
(2.0 * PI * freq * t as f64 / sample_rate).sin()
+ 0.3 * (2.0 * PI * (freq * 2.1) * t as f64 / sample_rate).cos()
});
for window_fn in [
WindowFunction::Hann,
WindowFunction::Hamming,
WindowFunction::Blackman,
WindowFunction::Rectangular,
] {
for &power in &[true, false] {
let config = SpectrogramConfig {
window_size: 128,
hop_size: 37, // non-divisor hop to exercise frame edges
window_fn,
power,
};
// AFTER: hoisted-plan path.
let hoisted =
compute_multi_subcarrier_spectrogram(&csi, sample_rate, &config).unwrap();
// BEFORE: independent per-subcarrier fresh-planner path.
let reference: Vec<Spectrogram> = (0..n_sc)
.map(|sc| {
let col: Vec<f64> = csi.column(sc).to_vec();
compute_spectrogram(&col, sample_rate, &config).unwrap()
})
.collect();
assert_eq!(hoisted.len(), reference.len());
for (sc, (h, r)) in hoisted.iter().zip(reference.iter()).enumerate() {
assert_eq!(h.data.dim(), r.data.dim(), "dim sc={sc} {window_fn:?}");
for (a, b) in h.data.iter().zip(r.data.iter()) {
assert_eq!(
a.to_bits(),
b.to_bits(),
"bit mismatch sc={sc} {window_fn:?} power={power}: {a} vs {b}"
);
}
assert_eq!(h.freq_resolution.to_bits(), r.freq_resolution.to_bits());
assert_eq!(h.time_resolution.to_bits(), r.time_resolution.to_bits());
}
}
}
}
}
#[cfg(test)]
@@ -309,6 +309,61 @@ impl WlanApiScanner {
})
}
/// Measure the **real** achieved rate of a *specific* backend over a
/// fixed wall-clock `window`, for an honest native-vs-netsh comparison.
///
/// Unlike [`benchmark`](Self::benchmark) (which picks native-first and so
/// never exercises netsh on a box where native works), this runs back-to-
/// back scans on **exactly** the requested backend until `window` elapses,
/// then reports the measured scans/second and mean BSSIDs/scan. This is the
/// ADR-157 §5 #4 measurement primitive: drive it once per backend over the
/// same window and compare the two `rate_hz` values — no rate is assumed.
///
/// Returns `None` for [`ScanBackend::Native`] when the native path is
/// unavailable (non-Windows or WLAN service error), so a caller can report
/// the honest negative rather than a fabricated number.
///
/// # Errors
///
/// Propagates the first scan error from the chosen backend.
pub fn benchmark_backend(
&self,
backend: ScanBackend,
window: Duration,
) -> Result<Option<BenchmarkResult>, WifiScanError> {
// Probe native availability first so an unavailable native path is an
// honest `None`, not an error charged against the comparison.
if backend == ScanBackend::Native && wlanapi_native::scan_native().is_err() {
return Ok(None);
}
let start = Instant::now();
let mut iterations: u32 = 0;
let mut total_bssids: u64 = 0;
while start.elapsed() < window {
let list = match backend {
ScanBackend::Native => wlanapi_native::scan_native()?,
ScanBackend::Netsh => self.inner.scan_sync()?,
};
total_bssids += list.len() as u64;
iterations += 1;
}
let total = start.elapsed();
let secs = total.as_secs_f64().max(f64::MIN_POSITIVE);
Ok(Some(BenchmarkResult {
iterations,
total,
rate_hz: f64::from(iterations) / secs,
mean_bssids: if iterations == 0 {
0.0
} else {
total_bssids as f64 / f64::from(iterations)
},
backend,
}))
}
/// Perform an async scan by offloading the blocking call to a
/// background thread (native-first, netsh fallback inside the task).
///
@@ -560,4 +615,76 @@ mod tests {
);
assert!(bench.rate_hz > 0.0);
}
/// ADR-157 §5 #4 honest native-vs-netsh throughput comparison. `#[ignore]`
/// (live WLAN, ~20 s). Run with:
/// `cargo test -p wifi-densepose-wifiscan -- --ignored --nocapture
/// measure_native_vs_netsh_throughput`. Drives BOTH backends over the same
/// fixed wall-clock window and prints the measured Hz + BSSIDs/scan for
/// each, plus the ratio — the real number, whatever it is (a null/negative
/// result is a valid outcome and must be reported, not hidden).
#[cfg(windows)]
#[test]
#[ignore = "live WLAN native-vs-netsh comparison; run with --ignored --nocapture"]
fn measure_native_vs_netsh_throughput() {
let scanner = WlanApiScanner::new();
let window = Duration::from_secs(10);
let native = scanner
.benchmark_backend(ScanBackend::Native, window)
.expect("native benchmark must not error");
let netsh = scanner
.benchmark_backend(ScanBackend::Netsh, window)
.expect("netsh benchmark must not error")
.expect("netsh is always available on Windows");
match native {
Some(n) => {
println!(
"NATIVE: {:.2} Hz ({} scans / {:?}), mean {:.1} BSSIDs/scan",
n.rate_hz, n.iterations, n.total, n.mean_bssids
);
println!(
"NETSH: {:.2} Hz ({} scans / {:?}), mean {:.1} BSSIDs/scan",
netsh.rate_hz, netsh.iterations, netsh.total, netsh.mean_bssids
);
let ratio = n.rate_hz / netsh.rate_hz.max(f64::MIN_POSITIVE);
println!("RATIO native/netsh: {ratio:.2}x");
assert!(n.rate_hz > 0.0 && netsh.rate_hz > 0.0);
}
None => {
println!(
"NATIVE: unavailable on this box (WLAN service error). \
NETSH: {:.2} Hz, mean {:.1} BSSIDs/scan",
netsh.rate_hz, netsh.mean_bssids
);
}
}
}
/// Determinism + handle-cleanup pin: N back-to-back native scans must all
/// succeed (or all be the same typed error) with no resource exhaustion —
/// a `WlanOpenHandle`/`WlanCloseHandle` leak would, after enough calls,
/// surface as a `ScanFailed`. Running 50 iterations here exercises the
/// open→enum→getlist→free→close cycle repeatedly. `#[ignore]` for CI (live
/// WLAN service) but RUN on this box to verify no leak.
#[cfg(windows)]
#[test]
#[ignore = "live WLAN handle-cleanup check; run with --ignored --nocapture"]
fn native_scans_dont_leak_handles() {
let scanner = WlanApiScanner::new();
let mut ok = 0u32;
let mut failed = 0u32;
for _ in 0..50 {
match scanner.scan_native() {
Ok(_) => ok += 1,
Err(WifiScanError::ScanFailed { .. }) => failed += 1,
Err(e) => panic!("unexpected error during leak check: {e:?}"),
}
}
println!("native leak check: {ok} ok, {failed} scan-failed of 50");
// No leak ⇒ behavior is consistent across all 50 calls (all ok, or all
// the same WLAN-service-off failure) — not a degrade partway through.
assert!(ok == 50 || failed == 50, "inconsistent results suggest a leak: {ok} ok / {failed} failed");
}
}