Merge pull request #1030 from ruvnet/feat/v2-beyond-sota-sweep-m9

Beyond-SOTA sweep M9 (ADR-163): edge-latency measurement debt → MEASURED-on-host benches
docs(ADR-163): edge-latency RESULTS + PROOF/prove.sh wiring (T3)
2026-06-13 10:53:20 +00:00 · 2026-06-12 08:14:57 -04:00 · 2026-06-12 08:02:07 -04:00 · 2026-06-12 08:01:50 -04:00 · 2026-06-12 08:01:29 -04:00
13 changed files with 1328 additions and 5 deletions
@@ -55,6 +55,8 @@ trained checkpoint) so you can reproduce them yourself.
 | zero-copy ORT input ~1.48× (ADR-155) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-nn --features onnx --bench onnx_bench` |
 | pointcloud splats 9→2 passes ~1.24× (ADR-160 research) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-pointcloud --bench splats_bench` |
 | native wlanapi multi-BSSID scan 9.74 Hz (vs netsh ~2 Hz) | **MEASURED (Windows)** | `cd v2 && cargo test -p wifi-densepose-wifiscan -- --ignored measure_native_scan_rate` |
+| wasm-edge `process_frame` hot-path latency (host proxy, ADR-163) | **MEASURED-on-host** (NOT the ESP32/WASM3 budget — needs hardware) | `cd v2/crates/wifi-densepose-wasm-edge && cargo bench --features std` |
+| cog steady-state CPU infer latency ~305 µs (ADR-163; NOT the manifest cold-start) | **MEASURED-on-host** | `cd v2 && cargo bench -p cog-person-count -p cog-pose-estimation --no-default-features --bench infer_bench` |

 ## What we do NOT claim (the honest negatives — the strongest anti-slop signal)

@@ -68,8 +70,9 @@ trained checkpoint) so you can reproduce them yourself.

 ## Provenance

-Every claim above traces to a committed ADR (`docs/adr/ADR-154`…`ADR-160`), a
-test, a criterion bench, or `benchmarks/wiflow-std/RESULTS.md`. The history
+Every claim above traces to a committed ADR (`docs/adr/ADR-154`…`ADR-163`), a
+test, a criterion bench, `benchmarks/wiflow-std/RESULTS.md`, or
+`benchmarks/edge-latency/RESULTS.md`. The history
 includes published **retractions** (the 92.9% PCK retraction; the WiFlow-STD
 shipped-checkpoint refutation; the NV-diamond BOM reality check) — a faker hides
 failures; we commit them.
@@ -0,0 +1,137 @@
+# Edge-Latency Benchmark Results — ADR-163
+
+Converting **CLAIMED** edge latency budgets into **MEASURED-on-host** numbers,
+closing the measurement debt flagged by Milestones 5/6 (ADR-159 / ADR-160).
+Benches + docs only — **no production-code behavior changed**.
+
+## The honest caveat, up front (read before citing any number)
+
+Two distinct gaps separate every number below from the figure it is converting:
+
+1. **Host ≠ ESP32.** The wasm-edge skill modules document budgets *"on ESP32-S3
+   WASM3"* (e.g. `exo_time_crystal`: "H (<10 ms)"). These benches run **native
+   x86_64 on a development laptop**, not the Xtensa/WASM3 target. A native host
+   median is an **upper bound on the algorithm's work**, not the ESP32 number.
+   WASM3 interpretation on a ~240 MHz Xtensa core is typically 1–2 orders of
+   magnitude slower than native `-O` host code, so a host median far under the
+   budget **does NOT prove the ESP32 meets it.** *The ESP32 figure is NOT
+   reproduced here — it needs hardware.*
+
+2. **Bench ≠ the doc-claimed measurement.** For the cogs, the manifest cites a
+   **cold-start** number (`cold_start_ms_avg`, weight-load included); these
+   benches measure **steady-state** per-frame `infer` (warm, weights resident).
+   Different measurements; we report both, labelled.
+
+Grades (per `benchmarks/wiflow-std/RESULTS.md` / ADR-152 vocabulary):
+- **MEASURED-on-host** — reproduced in this repo on the machine below, exact
+  command recorded. NOT the ESP32 / NOT the cold-start figure.
+- **CLAIMED (ESP32)** — the doc budget; UNMEASURED on hardware here.
+
+## Machine
+
+| | |
+|---|---|
+| Host | `ruvzen` (Windows 11, this dev box) |
+| CPU | Intel Core Ultra 9 285H |
+| Toolchain | `cargo 1.91.1`, `--release` (opt-level per crate profile) |
+| Bench harness | criterion 0.5 (`time: [low **median** high]` reported below) |
+| Date | 2026-06-12 |
+
+Run-to-run spread on this box is non-trivial (criterion's low/high bracket the
+median by a few %); the medians below are single-session captures with the smoke
+settings `--warm-up-time 1 --measurement-time 2` (wasm-edge) / `3` (cogs). Re-run
+for your own machine — the absolute numbers are host-specific.
+
+---
+
+## T1 — wasm-edge `process_frame` hot paths (ADR-160 deferred item → DONE host)
+
+The crate is **excluded from the v2 workspace**; bench from the crate dir.
+
+```bash
+cd v2/crates/wifi-densepose-wasm-edge
+cargo bench --features std -- --warm-up-time 1 --measurement-time 2
+# med_seizure_detect is medical-experimental-gated:
+cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure
+```
+
+| Hot path (M6-audit-named) | Bench id | Host median | Grade | Doc budget (CLAIMED, ESP32) |
+|---|---|---|---|---|
+| `exo_time_crystal` 256-pt × 128-lag autocorrelation (full buffer) | `exo_time_crystal::process_frame[autocorr_256x128]` | **17.3 µs** | MEASURED-on-host | "H (<10 ms) on ESP32-S3 WASM3" — **NOT reproduced here (needs hardware)** |
+| `exo_ghost_hunter` empty-room periodicity + hidden-breathing | `exo_ghost_hunter::process_frame[empty_room_periodicity]` | **1.44 µs** | MEASURED-on-host | research/exotic; no firm ESP32 figure — host proxy only |
+| `sec_weapon_detect` per-subcarrier Welford (MAX_SC=32) | `sec_weapon_detect::process_frame[per_sc_welford]` | **0.42 µs** (420 ns) | MEASURED-on-host | research-grade; calibration-gated — host proxy only |
+| `med_seizure_detect` clonic-phase rhythm path (steady-state frame) | `med_seizure_detect::process_frame[clonic_rhythm]` | **0.10 µs** (105 ns) | MEASURED-on-host (feature-gated) | doc budget "S (<5 ms) on ESP32"; **NOT reproduced here** |
+
+Reading these honestly:
+
+- `exo_time_crystal` at **17.3 µs host** is the only one whose host cost is even
+  in the same *thousandths* of its 10 ms ESP32 budget — it does the most work
+  (~32K MACs/frame). 17.3 µs native says the algorithm is cheap; it says
+  **nothing** about whether WASM3-on-Xtensa lands under 10 ms. A naïve
+  host→ESP32 extrapolation (assume 100× interpreter+clock penalty) would put it
+  near ~1.7 ms, comfortably under — **but that is an extrapolation, not a
+  measurement**, and is recorded here only to show the host number is not
+  obviously in tension with the budget. ESP32 figure: **UNMEASURED**.
+- `med_seizure_detect`'s 105 ns is the **steady-state** per-frame cost; the
+  expensive clonic autocorrelation only fires when the state machine is in the
+  clonic phase, so this is a lower-bound on the heavy path, not the worst case.
+  It is still a real, committed host datapoint.
+- The pre-existing `tests/budget_compliance.rs` already asserts the L/S/H
+  wall-clock tiers (25 passing tests); these criterion benches add the
+  regression-grade, reproducible median that ADR-160 deferred.
+
+---
+
+## T2 — cog steady-state inference latency (ADR-159/160 deferred item → DONE)
+
+Cog crates are normal workspace members; bench from `v2/`. Real weights
+(`count_v1.safetensors` / `pose_v1.safetensors`) ship in-repo under each cog's
+`cog/artifacts/`, so the bench measures the **real Candle CPU forward**, not the
+stub (the bench `assert!`s `backend().starts_with("candle-")`).
+
+```bash
+cd v2
+cargo bench -p cog-person-count  --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3
+cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3
+```
+
+| Cog | Bench id | Host median (steady-state infer, CPU) | Grade | Manifest cold-start (CLAIMED, different measurement + machine) |
+|---|---|---|---|---|
+| cog-person-count | `cog_person_count::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | — (person-count manifest carries comparable provenance) |
+| cog-pose-estimation | `cog_pose_estimation::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | `cold_start_ms_avg: 5.4` (30 invocations, **ruvultra/RTX 5080 host**, candle 0.9 cpu) — **cold-start, NOT steady-state; NOT this machine** |
+
+> Spread caveat (observed, honest): both medians above were captured with the box
+> otherwise idle. A re-run of the validate-form command *while a second cargo job
+> was loading the same cores* gave 385 µs (person-count) / 973 µs (pose) —
+> the criterion low/high bracket widens to ~0.34–1.18 ms under contention. The
+> 305 µs figures are the idle-box datapoints; the absolute number is host- and
+> load-dependent (the ~10× pose swing is core contention, not a code change).
+
+Reading these honestly:
+
+- **Steady-state ≠ cold-start.** The pose manifest's `5.4 ms` folds in one-time
+  weight load / mmap / first-forward allocation. This bench warms the engine
+  first and times only the recurring per-frame forward, on a *different
+  machine*. The two numbers are not comparable and we do not claim this bench
+  reproduces the 5.4 ms manifest figure.
+- Both cogs share the same conv encoder; person-count adds a count head +
+  confidence head, pose adds a 256-wide MLP head. The host steady-state cost is
+  dominated by the three dilated Conv1d layers (56→64→128→128) shared by both —
+  which is why both land at ~305 µs.
+- **Empirical confirmation of the steady-state/cold-start gap:** pose
+  steady-state (305 µs host) is ~18× *under* the manifest's 5.4 ms cold-start.
+  Even accounting for the different machine, this is the expected shape — the
+  bulk of cold-start is one-time setup, not the forward pass — and it is exactly
+  why conflating the two would be dishonest.
+
+---
+
+## Status vs the deferred items
+
+| Deferred item | Was | Now |
+|---|---|---|
+| ADR-160 "Criterion benches for `process_frame` budget claims" | ACCEPTED-FUTURE | **DONE (host)**; ESP32-on-hardware still **PENDING** (needs the wasm32 target + a flashed ESP32-S3) |
+| ADR-159/160 cog inference latency (`cold_start_ms_avg` uncommitted-benched) | CLAIMED | **MEASURED-on-host (steady-state)**; cold-start-on-ruvultra remains the manifest's separate claim |
+
+Nothing here changes runtime behavior — these are benches + this results file
+only. No crate needs republishing.
@@ -182,9 +182,15 @@ label or behavior change, consistent with leaving their claim surface intact.)
  sign-language claim requires labelled clinical/affective/ASL data and reference
  standards that do not exist in this repo. The disclaimers + feature gate are the
  honest stand-in. Nothing is claimed that is not measured.
- **Criterion benches for `process_frame` budget claims** — **ACCEPTED-FUTURE**.
-  `tests/budget_compliance.rs` asserts L/S/H tier wall-clock budgets (25 tests,
-  passing), but a regression-grade criterion bench is not yet wired.
+- **Criterion benches for `process_frame` budget claims** — **DONE (host)**
+  (ADR-163, 2026-06-12). `benches/process_frame_bench.rs` benches the heaviest
+  hot paths (`exo_time_crystal` 256×128 autocorrelation, `exo_ghost_hunter`
+  periodicity, `sec_weapon_detect` per-subcarrier Welford, `med_seizure_detect`
+  clonic rhythm) and reports committed **host** medians
+  (`benchmarks/edge-latency/RESULTS.md`). `tests/budget_compliance.rs` continues
+  to assert the L/S/H tier wall-clock budgets (25 tests, passing). **ESP32-on-
+  hardware (Xtensa/WASM3) latency remains PENDING** — the host bench is an
+  upper-bound algorithm-cost proxy, NOT the ESP32 figure (needs hardware).
 - **`wasm32-unknown-unknown` `static_mut_refs` confirmation** — **ACCEPTED-FUTURE**
  (toolchain): the source pattern is eliminated; a CI job on the wasm target should
  assert zero `static_mut_refs` once the target is added to the build image.
@@ -0,0 +1,123 @@
+# ADR-163: Edge-Latency Measurement — CLAIMED budgets → MEASURED-on-host
+
+- **Status**: accepted
+- **Date**: 2026-06-12
+- **Deciders**: ruv
+- **Tags**: edge-latency, wasm-edge, esp32, cog-inference, criterion, prove-everything, measurement-debt
+- **Amends**: ADR-160 (deferred "criterion benches for process_frame budget claims" line now DONE-on-host); ADR-159 (cog inference latency)
+
+## Context — Milestone 9 of the beyond-SOTA sweep
+
+Prior milestones (M5/M6, ADR-159/ADR-160) flagged **measurement debt**: edge
+latency budgets asserted in doc-comments and manifests but **never reproduced by
+a committed benchmark**. Specifically:
+
+- Many `wifi-densepose-wasm-edge` skill modules document a timing budget *"on
+  ESP32-S3 WASM3"* (e.g. `exo_time_crystal`: "H (heavy, <10 ms)"). These were
+  **CLAIMED**, not benchmarked. ADR-160's deferred backlog named exactly this:
+  *"Criterion benches for `process_frame` budget claims — ACCEPTED-FUTURE."*
+- `cog-pose-estimation`'s manifest cites `cold_start_ms_avg: 5.4`, but neither
+  cog had a `benches/` directory or any committed inference-latency number.
+
+Under the project's **prove-everything / anti-"AI-slop"** directive, a CLAIMED
+latency budget that a skeptic cannot reproduce is debt. M9 pays it down — benches
+and docs only, **no production-code behavior change** (so nothing republishes).
+
+## Headline
+
+**Converted the CLAIMED edge-latency budgets into MEASURED-on-host numbers, with
+the honest host-vs-ESP32 caveat stated everywhere.** Added committed criterion
+benches over the heaviest hot paths and a results file a skeptic can re-run. The
+ESP32-on-hardware figure remains explicitly **UNMEASURED** — this milestone does
+not pretend a laptop reproduces an Xtensa/WASM3 budget.
+
+## Decision — benches landed
+
+### T1 — wasm-edge `process_frame` budget benches
+
+`v2/crates/wifi-densepose-wasm-edge/benches/process_frame_bench.rs` (criterion,
+`harness = false`, `required-features = ["std"]`). The crate is **excluded from
+the v2 workspace**, so it runs from the crate dir. Benches the M6-audit-named
+heaviest hot paths over a **fixed synthetic CSI frame**, each driven through the
+public `process_frame` after warming the relevant ring/phase buffers so the
+expensive path actually executes:
+
+- `exo_time_crystal::process_frame` — full 256-pt × 128-lag autocorrelation.
+- `exo_ghost_hunter::process_frame` — empty-room periodicity / hidden-breathing.
+- `sec_weapon_detect::process_frame` — per-subcarrier (MAX_SC=32) Welford.
+- `med_seizure_detect::process_frame` — clonic-rhythm path (`#[cfg(feature =
+  "medical-experimental")]`, only built/run with that gate).
+
+The lib's `bench = false` was set so the libtest harness does not intercept
+criterion CLI flags; the `ghost_hunter` bin is already `standalone-bin`-gated and
+not built under `--features std`.
+
+**Measured host medians** (Intel Core Ultra 9 285H, native `--release`):
+`exo_time_crystal` **17.3 µs** · `exo_ghost_hunter` **1.44 µs** ·
+`sec_weapon_detect` **0.42 µs** · `med_seizure_detect` **0.10 µs**.
+
+### T2 — cog inference latency benches
+
+`v2/crates/cog-person-count/benches/infer_bench.rs` and
+`v2/crates/cog-pose-estimation/benches/infer_bench.rs` (criterion,
+`harness = false`). Each loads the **real** shipped weights from the in-repo
+`cog/artifacts/`, asserts the Candle CPU backend (so the stub can never be
+silently benched), warms one forward, then times steady-state
+`InferenceEngine::infer` over a fixed CSI window on `Device::Cpu`.
+
+**Measured host medians:** cog-person-count **305 µs** · cog-pose-estimation
+**305 µs** (steady-state, CPU, real weights).
+
+### T3 — results file
+
+`benchmarks/edge-latency/RESULTS.md`, in the `benchmarks/wiflow-std/RESULTS.md`
+style: each number with its exact reproduce command, the machine, the
+MEASURED-on-host grade, and the honest caveat.
+
+## The honest caveat (recorded, non-negotiable)
+
+1. **Host ≠ ESP32.** The wasm-edge benches run native x86_64, not Xtensa/WASM3.
+   A host median is an **upper bound on algorithm work**, not the ESP32 number;
+   WASM3 interpretation on a ~240 MHz core is 1–2 orders of magnitude slower than
+   native `-O`. A host median under budget does **not** prove the ESP32 meets it.
+   **The ESP32 figure is NOT reproduced here — it needs hardware.**
+2. **Bench ≠ the doc-claimed measurement.** The cogs' manifest cites a
+   **cold-start** number (weight-load included); these benches measure
+   **steady-state** per-frame `infer`. We report both, labelled, and do not
+   conflate them. Empirically, pose steady-state (305 µs host) is ~18× under the
+   5.4 ms cold-start — the expected shape, and exactly why conflating would lie.
+
+## Deferred / still-pending (nothing dropped)
+
+- **ESP32-on-hardware `process_frame` latency** — **PENDING (hardware)**. Needs
+  the `wasm32-unknown-unknown` target built + flashed to an ESP32-S3 and timed
+  under WASM3. The host bench is the algorithm-cost proxy until then.
+- **Per-skill *accuracy*** remains **DATA-GATED** (unchanged from ADR-160) —
+  this ADR measures latency only, never claims detection accuracy.
+
+## Reproduction (MEASURED)
+
+```bash
+# T1 — wasm-edge (workspace-excluded → run from the crate dir)
+cd v2/crates/wifi-densepose-wasm-edge
+cargo bench --features std -- --warm-up-time 1 --measurement-time 2
+cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure
+
+# T2 — cogs (workspace members)
+cd v2
+cargo bench -p cog-person-count   --no-default-features --bench infer_bench
+cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench
+
+# existing tests still green (behavior unchanged)
+cargo test -p cog-person-count -p cog-pose-estimation --no-default-features
+```
+
+## Consequences
+
+- ADR-160's deferred *"Criterion benches for `process_frame` budget claims"* line
+  is now **DONE (host)**; the ESP32-on-hardware confirmation is explicitly the
+  one remaining pending item.
+- The cogs now ship committed, reproducible steady-state inference-latency
+  numbers, cleanly distinguished from the manifest's cold-start claim.
+- No runtime behavior changed; no crate republishes. `PROOF.md`'s performance
+  table and `scripts/prove.sh`'s gated section reference the new benches.
@@ -131,6 +131,7 @@ else
  SKIP "named person-identity — DATA-GATED: needs a real enrollment feeding the AETHER/body-resonance channel (see docs/research/soul/)"
  SKIP "OccWorld trained accuracy — needs a trained checkpoint (predict() carries weights_trained=false until then)"
  SKIP "native wlanapi 9.74 Hz scan — Windows-only; run: cargo test -p wifi-densepose-wifiscan -- --ignored measure_native_scan_rate"
+  SKIP "edge-latency benches (ADR-163) — host medians, not asserted here: (cd v2/crates/wifi-densepose-wasm-edge && cargo bench --features std) and (cd v2 && cargo bench -p cog-person-count -p cog-pose-estimation --no-default-features --bench infer_bench). HOST proxy only — the ESP32/WASM3 budget is NOT reproduced on a laptop; see benchmarks/edge-latency/RESULTS.md"
  echo "  (re-run with --full to attempt the feature-gated subset where prereqs exist)"
 fi
 hr
@@ -1015,6 +1015,7 @@ dependencies = [
 "candle-core 0.9.2",
 "candle-nn 0.9.2",
 "clap",
+ "criterion",
 "safetensors 0.4.5",
 "serde",
 "serde_json",
@@ -1034,6 +1035,7 @@ dependencies = [
 "candle-core 0.9.2",
 "candle-nn 0.9.2",
 "clap",
+ "criterion",
 "hex",
 "safetensors 0.4.5",
 "serde",
@@ -34,6 +34,12 @@ safetensors = "0.4"
 [dev-dependencies]
 tempfile = "3"
 approx = "0.5"
+# ADR-163: steady-state infer latency bench (real count_v1 weights, Device::Cpu).
+criterion = { version = "0.5", features = ["html_reports"] }
+
+[[bench]]
+name = "infer_bench"
+harness = false

 [features]
 default = []
@@ -0,0 +1,95 @@
+//! Criterion bench for `cog-person-count` steady-state inference latency
+//! (ADR-163, closing the ADR-159/160 deferred "cog inference latency bench" item).
+//!
+//! ## What this measures — and what the manifest's `cold_start_ms` does NOT
+//!
+//! This benches **steady-state** `InferenceEngine::infer` over a FIXED CSI
+//! window on `Device::Cpu` with the **real** shipped `count_v1.safetensors`
+//! weights — i.e. the per-frame cost once the model is loaded and warm.
+//!
+//! The cog manifest's `build_metadata.cold_start_ms_avg` (in the pose cog;
+//! person-count's manifest carries comparable provenance) is a **DIFFERENT
+//! measurement**: it includes one-time weight load / mmap / first-forward
+//! allocation. Cold-start is a startup cost paid once; steady-state infer is the
+//! recurring per-frame cost. They are not comparable and we do not conflate them.
+//! `cold_start` was measured on ruvultra (RTX 5080 host, candle 0.9 cpu); this
+//! bench runs on whatever machine you run it on — see `benchmarks/edge-latency/RESULTS.md`
+//! for the host the committed numbers were taken on.
+//!
+//! If the weights file is absent the engine falls back to the zero-confidence
+//! stub; we skip the bench in that case rather than benchmark the stub (which
+//! would be a meaningless number) — the bench prints a notice and measures a
+//! no-op so criterion still produces a (clearly-labelled) datapoint.
+//!
+//! Run (cog crates are normal workspace members):
+//!   cd v2 && cargo bench -p cog-person-count --no-default-features
+//!   cd v2 && cargo bench -p cog-person-count --no-default-features -- --warm-up-time 1 --measurement-time 2
+
+use std::hint::black_box;
+use std::path::Path;
+
+use criterion::{criterion_group, criterion_main, Criterion};
+
+use cog_person_count::inference::{CsiWindow, InferenceEngine, INPUT_SUBCARRIERS, INPUT_TIMESTEPS};
+
+/// Deterministic fixed CSI window (seed-stable LCG), normalised-ish amplitudes.
+fn fixed_window() -> CsiWindow {
+    let mut s = 0x00C0_FFEEu32;
+    let data: Vec<f32> = (0..INPUT_SUBCARRIERS * INPUT_TIMESTEPS)
+        .map(|_| {
+            s = s.wrapping_mul(1103515245).wrapping_add(12345);
+            (s >> 16) as f32 / 32768.0 // [0, 1)
+        })
+        .collect();
+    CsiWindow { data }
+}
+
+/// Locate the real weights from the crate dir or the repo root.
+fn real_weights() -> Option<std::path::PathBuf> {
+    let candidates = [
+        "cog/artifacts/count_v1.safetensors",
+        "v2/crates/cog-person-count/cog/artifacts/count_v1.safetensors",
+        "crates/cog-person-count/cog/artifacts/count_v1.safetensors",
+    ];
+    candidates
+        .iter()
+        .map(Path::new)
+        .find(|p| p.exists())
+        .map(|p| p.to_path_buf())
+}
+
+fn bench_infer(c: &mut Criterion) {
+    let window = fixed_window();
+
+    match real_weights() {
+        Some(path) => {
+            let engine =
+                InferenceEngine::with_weights(Some(&path)).expect("load real count_v1 weights");
+            assert!(
+                engine.backend().starts_with("candle-"),
+                "expected real Candle backend, got {} — bench would measure the stub",
+                engine.backend()
+            );
+            // Sanity: one real inference before timing.
+            let _ = engine.infer(&window).expect("warmup infer");
+
+            c.bench_function("cog_person_count::infer[cpu_real_weights_steady_state]", |b| {
+                b.iter(|| {
+                    black_box(engine.infer(black_box(&window)).expect("infer"));
+                });
+            });
+        }
+        None => {
+            eprintln!(
+                "NOTE: count_v1.safetensors not found — skipping the real-weights infer bench. \
+                 (The committed RESULTS.md numbers require the in-repo weights.)"
+            );
+            c.bench_function("cog_person_count::infer[SKIPPED_no_weights]", |b| {
+                b.iter(|| black_box(1 + 1));
+            });
+        }
+    }
+}
+
+criterion_group!(benches, bench_infer);
+criterion_main!(benches);
@@ -39,6 +39,12 @@ wifi-densepose-train = { version = "0.3.1", path = "../wifi-densepose-train", de

 [dev-dependencies]
 tempfile = "3"
+# ADR-163: steady-state infer latency bench (real pose_v1 weights, Device::Cpu).
+criterion = { version = "0.5", features = ["html_reports"] }
+
+[[bench]]
+name = "infer_bench"
+harness = false

 [features]
 default = []
@@ -0,0 +1,89 @@
+//! Criterion bench for `cog-pose-estimation` steady-state inference latency
+//! (ADR-163, closing the ADR-159/160 deferred "cog inference latency bench" item).
+//!
+//! ## What this measures — and what the manifest's `cold_start_ms_avg` does NOT
+//!
+//! The pose cog's manifest (`cog/artifacts/manifests/x86_64/manifest.json`)
+//! cites `build_metadata.cold_start_ms_avg: 5.4` (30 invocations, measured on
+//! ruvultra / RTX 5080 host, candle 0.9 cpu). **That is a cold-start number** —
+//! it folds in one-time weight load / mmap / first-forward allocation.
+//!
+//! This bench measures the **steady-state** per-frame cost instead:
+//! `InferenceEngine::infer` over a FIXED CSI window on `Device::Cpu` with the
+//! **real** shipped `pose_v1.safetensors`, after a warm-up forward. Steady-state
+//! and cold-start are different measurements; we label both honestly and do not
+//! claim this reproduces the 5.4 ms manifest figure (different machine, different
+//! measurement). See `benchmarks/edge-latency/RESULTS.md`.
+//!
+//! Run (cog crates are normal workspace members):
+//!   cd v2 && cargo bench -p cog-pose-estimation --no-default-features
+//!   cd v2 && cargo bench -p cog-pose-estimation --no-default-features -- --warm-up-time 1 --measurement-time 2
+
+use std::hint::black_box;
+use std::path::Path;
+
+use criterion::{criterion_group, criterion_main, Criterion};
+
+use cog_pose_estimation::inference::{
+    CsiWindow, InferenceEngine, INPUT_SUBCARRIERS, INPUT_TIMESTEPS,
+};
+
+/// Deterministic fixed CSI window (seed-stable LCG).
+fn fixed_window() -> CsiWindow {
+    let mut s = 0x00C0_FFEEu32;
+    let data: Vec<f32> = (0..INPUT_SUBCARRIERS * INPUT_TIMESTEPS)
+        .map(|_| {
+            s = s.wrapping_mul(1103515245).wrapping_add(12345);
+            (s >> 16) as f32 / 32768.0 // [0, 1)
+        })
+        .collect();
+    CsiWindow { data }
+}
+
+fn real_weights() -> Option<std::path::PathBuf> {
+    let candidates = [
+        "cog/artifacts/pose_v1.safetensors",
+        "v2/crates/cog-pose-estimation/cog/artifacts/pose_v1.safetensors",
+        "crates/cog-pose-estimation/cog/artifacts/pose_v1.safetensors",
+    ];
+    candidates
+        .iter()
+        .map(Path::new)
+        .find(|p| p.exists())
+        .map(|p| p.to_path_buf())
+}
+
+fn bench_infer(c: &mut Criterion) {
+    let window = fixed_window();
+
+    match real_weights() {
+        Some(path) => {
+            let engine =
+                InferenceEngine::with_weights(Some(&path)).expect("load real pose_v1 weights");
+            assert!(
+                engine.backend().starts_with("candle-"),
+                "expected real Candle backend, got {} — bench would measure the stub",
+                engine.backend()
+            );
+            let _ = engine.infer(&window).expect("warmup infer");
+
+            c.bench_function("cog_pose_estimation::infer[cpu_real_weights_steady_state]", |b| {
+                b.iter(|| {
+                    black_box(engine.infer(black_box(&window)).expect("infer"));
+                });
+            });
+        }
+        None => {
+            eprintln!(
+                "NOTE: pose_v1.safetensors not found — skipping the real-weights infer bench. \
+                 (The committed RESULTS.md numbers require the in-repo weights.)"
+            );
+            c.bench_function("cog_pose_estimation::infer[SKIPPED_no_weights]", |b| {
+                b.iter(|| black_box(1 + 1));
+            });
+        }
+    }
+}
+
+criterion_group!(benches, bench_infer);
+criterion_main!(benches);
@@ -2,6 +2,33 @@
 # It is not intended for manual editing.
 version = 4

+[[package]]
+name = "aho-corasick"
+version = "1.1.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301"
+dependencies = [
+ "memchr",
+]
+
+[[package]]
+name = "anes"
+version = "0.1.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "4b46cbb362ab8752921c97e041f5e366ee6297bd428a31275b9fcf1e380f7299"
+
+[[package]]
+name = "anstyle"
+version = "1.0.14"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "940b3a0ca603d1eade50a4846a2afffd5ef57a9feac2c0e2ec2e14f9ead76000"
+
+[[package]]
+name = "autocfg"
+version = "1.5.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f2032f911046de80f0a198e0901378627c33f59ea0ac00e363d481118bd70a53"
+
 [[package]]
 name = "block-buffer"
 version = "0.10.4"
@@ -11,12 +38,76 @@ dependencies = [
 "generic-array",
 ]

+[[package]]
+name = "bumpalo"
+version = "3.20.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "72f5acc6cb2ba439de613abc23857ec3d78374d8ed5ac84e9d11336e87da8649"
+
+[[package]]
+name = "cast"
+version = "0.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
+
 [[package]]
 name = "cfg-if"
 version = "1.0.4"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"

+[[package]]
+name = "ciborium"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e"
+dependencies = [
+ "ciborium-io",
+ "ciborium-ll",
+ "serde",
+]
+
+[[package]]
+name = "ciborium-io"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757"
+
+[[package]]
+name = "ciborium-ll"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9"
+dependencies = [
+ "ciborium-io",
+ "half",
+]
+
+[[package]]
+name = "clap"
+version = "4.6.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1ddb117e43bbf7dacf0a4190fef4d345b9bad68dfc649cb349e7d17d28428e51"
+dependencies = [
+ "clap_builder",
+]
+
+[[package]]
+name = "clap_builder"
+version = "4.6.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "714a53001bf66416adb0e2ef5ac857140e7dc3a0c48fb28b2f10762fc4b5069f"
+dependencies = [
+ "anstyle",
+ "clap_lex",
+]
+
+[[package]]
+name = "clap_lex"
+version = "1.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9"
+
 [[package]]
 name = "cpufeatures"
 version = "0.2.17"
@@ -26,6 +117,73 @@ dependencies = [
 "libc",
 ]

+[[package]]
+name = "criterion"
+version = "0.5.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f2b12d017a929603d80db1831cd3a24082f8137ce19c69e6447f54f5fc8d692f"
+dependencies = [
+ "anes",
+ "cast",
+ "ciborium",
+ "clap",
+ "criterion-plot",
+ "is-terminal",
+ "itertools",
+ "num-traits",
+ "once_cell",
+ "oorandom",
+ "plotters",
+ "rayon",
+ "regex",
+ "serde",
+ "serde_derive",
+ "serde_json",
+ "tinytemplate",
+ "walkdir",
+]
+
+[[package]]
+name = "criterion-plot"
+version = "0.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6b50826342786a51a89e2da3a28f1c32b06e387201bc2d19791f622c673706b1"
+dependencies = [
+ "cast",
+ "itertools",
+]
+
+[[package]]
+name = "crossbeam-deque"
+version = "0.8.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51"
+dependencies = [
+ "crossbeam-epoch",
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-epoch"
+version = "0.9.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e"
+dependencies = [
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-utils"
+version = "0.8.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
+
+[[package]]
+name = "crunchy"
+version = "0.2.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5"
+
 [[package]]
 name = "crypto-common"
 version = "0.1.7"
@@ -46,6 +204,36 @@ dependencies = [
 "crypto-common",
 ]

+[[package]]
+name = "either"
+version = "1.16.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "91622ff5e7162018101f2fea40d6ebf4a78bbe5a49736a2020649edf9693679e"
+
+[[package]]
+name = "futures-core"
+version = "0.3.32"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d"
+
+[[package]]
+name = "futures-task"
+version = "0.3.32"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393"
+
+[[package]]
+name = "futures-util"
+version = "0.3.32"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6"
+dependencies = [
+ "futures-core",
+ "futures-task",
+ "pin-project-lite",
+ "slab",
+]
+
 [[package]]
 name = "generic-array"
 version = "0.14.7"
@@ -56,6 +244,60 @@ dependencies = [
 "version_check",
 ]

+[[package]]
+name = "half"
+version = "2.7.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6ea2d84b969582b4b1864a92dc5d27cd2b77b622a8d79306834f1be5ba20d84b"
+dependencies = [
+ "cfg-if",
+ "crunchy",
+ "zerocopy",
+]
+
+[[package]]
+name = "hermit-abi"
+version = "0.5.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fc0fef456e4baa96da950455cd02c081ca953b141298e41db3fc7e36b1da849c"
+
+[[package]]
+name = "is-terminal"
+version = "0.4.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3640c1c38b8e4e43584d8df18be5fc6b0aa314ce6ebf51b53313d4306cca8e46"
+dependencies = [
+ "hermit-abi",
+ "libc",
+ "windows-sys",
+]
+
+[[package]]
+name = "itertools"
+version = "0.10.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b0fd2260e829bddf4cb6ea802289de2f86d6a7a690192fbe91b3f46e0f2c8473"
+dependencies = [
+ "either",
+]
+
+[[package]]
+name = "itoa"
+version = "1.0.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682"
+
+[[package]]
+name = "js-sys"
+version = "0.3.100"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f2025f20d7a4fa7785846e7b63d10a76d3f1cee98ee5cb79ea59703f95e42162"
+dependencies = [
+ "cfg-if",
+ "futures-util",
+ "wasm-bindgen",
+]
+
 [[package]]
 name = "libc"
 version = "0.2.182"
@@ -68,6 +310,192 @@ version = "0.2.16"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "b6d2cec3eae94f9f509c767b45932f1ada8350c4bdb85af2fcab4a3c14807981"

+[[package]]
+name = "memchr"
+version = "2.8.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "88904434abc2901f197fe8cc55f0445e7ded921dba5911dad2e2b39b48e663c4"
+
+[[package]]
+name = "num-traits"
+version = "0.2.19"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841"
+dependencies = [
+ "autocfg",
+]
+
+[[package]]
+name = "once_cell"
+version = "1.21.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9f7c3e4beb33f85d45ae3e3a1792185706c8e16d043238c593331cc7cd313b50"
+
+[[package]]
+name = "oorandom"
+version = "11.1.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e"
+
+[[package]]
+name = "pin-project-lite"
+version = "0.2.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd"
+
+[[package]]
+name = "plotters"
+version = "0.3.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5aeb6f403d7a4911efb1e33402027fc44f29b5bf6def3effcc22d7bb75f2b747"
+dependencies = [
+ "num-traits",
+ "plotters-backend",
+ "plotters-svg",
+ "wasm-bindgen",
+ "web-sys",
+]
+
+[[package]]
+name = "plotters-backend"
+version = "0.3.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "df42e13c12958a16b3f7f4386b9ab1f3e7933914ecea48da7139435263a4172a"
+
+[[package]]
+name = "plotters-svg"
+version = "0.3.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "51bae2ac328883f7acdfea3d66a7c35751187f870bc81f94563733a154d7a670"
+dependencies = [
+ "plotters-backend",
+]
+
+[[package]]
+name = "proc-macro2"
+version = "1.0.106"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "quote"
+version = "1.0.45"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924"
+dependencies = [
+ "proc-macro2",
+]
+
+[[package]]
+name = "rayon"
+version = "1.12.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fb39b166781f92d482534ef4b4b1b2568f42613b53e5b6c160e24cfbfa30926d"
+dependencies = [
+ "either",
+ "rayon-core",
+]
+
+[[package]]
+name = "rayon-core"
+version = "1.13.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "22e18b0f0062d30d4230b2e85ff77fdfe4326feb054b9783a3460d8435c8ab91"
+dependencies = [
+ "crossbeam-deque",
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "regex"
+version = "1.12.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f1292b7759ae1cb9ec195452d1390a074f0cd8541ab7a5a8c31cd6db45d4a6ba"
+dependencies = [
+ "aho-corasick",
+ "memchr",
+ "regex-automata",
+ "regex-syntax",
+]
+
+[[package]]
+name = "regex-automata"
+version = "0.4.14"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f"
+dependencies = [
+ "aho-corasick",
+ "memchr",
+ "regex-syntax",
+]
+
+[[package]]
+name = "regex-syntax"
+version = "0.8.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d6f6ff9a378485b298a5286656da665ba74413d36db0979633275d2e708145d4"
+
+[[package]]
+name = "rustversion"
+version = "1.0.22"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d"
+
+[[package]]
+name = "same-file"
+version = "1.0.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502"
+dependencies = [
+ "winapi-util",
+]
+
+[[package]]
+name = "serde"
+version = "1.0.228"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e"
+dependencies = [
+ "serde_core",
+ "serde_derive",
+]
+
+[[package]]
+name = "serde_core"
+version = "1.0.228"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad"
+dependencies = [
+ "serde_derive",
+]
+
+[[package]]
+name = "serde_derive"
+version = "1.0.228"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "serde_json"
+version = "1.0.150"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9"
+dependencies = [
+ "itoa",
+ "memchr",
+ "serde",
+ "serde_core",
+ "zmij",
+]
+
 [[package]]
 name = "sha2"
 version = "0.10.9"
@@ -79,22 +507,171 @@ dependencies = [
 "digest",
 ]

+[[package]]
+name = "slab"
+version = "0.4.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5"
+
+[[package]]
+name = "syn"
+version = "2.0.117"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "unicode-ident",
+]
+
+[[package]]
+name = "tinytemplate"
+version = "1.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "be4d6b5f19ff7664e8c98d03e2139cb510db9b0a60b55f8e8709b689d939b6bc"
+dependencies = [
+ "serde",
+ "serde_json",
+]
+
 [[package]]
 name = "typenum"
 version = "1.19.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "562d481066bde0658276a35467c4af00bdc6ee726305698a55b86e61d7ad82bb"

+[[package]]
+name = "unicode-ident"
+version = "1.0.24"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75"
+
 [[package]]
 name = "version_check"
 version = "0.9.5"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"

+[[package]]
+name = "walkdir"
+version = "2.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "29790946404f91d9c5d06f9874efddea1dc06c5efe94541a7d6863108e3a5e4b"
+dependencies = [
+ "same-file",
+ "winapi-util",
+]
+
+[[package]]
+name = "wasm-bindgen"
+version = "0.2.123"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a254a4b10c19a76f09a27640e7ffbf9bc30bf67e16a3bf28aaefa4920fe81563"
+dependencies = [
+ "cfg-if",
+ "once_cell",
+ "rustversion",
+ "wasm-bindgen-macro",
+ "wasm-bindgen-shared",
+]
+
+[[package]]
+name = "wasm-bindgen-macro"
+version = "0.2.123"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "24a40fc75b0ec6f3746ceb10d36f53a93dcd68a93b11b6445983945d79eba0dc"
+dependencies = [
+ "quote",
+ "wasm-bindgen-macro-support",
+]
+
+[[package]]
+name = "wasm-bindgen-macro-support"
+version = "0.2.123"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "908f34bd9b9ce3d4caf07b72dfab63d61504d156856c6bd3cd87fa350cf3985b"
+dependencies = [
+ "bumpalo",
+ "proc-macro2",
+ "quote",
+ "syn",
+ "wasm-bindgen-shared",
+]
+
+[[package]]
+name = "wasm-bindgen-shared"
+version = "0.2.123"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7acbf7616c27b194bbb550bf77ed0c2c3e5b7fd1260a93082b95fb7f47959b92"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "web-sys"
+version = "0.3.100"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6e0871acf327f283dc6da28a1696cdc64fb355ba9f935d052021fa77f35cce69"
+dependencies = [
+ "js-sys",
+ "wasm-bindgen",
+]
+
 [[package]]
 name = "wifi-densepose-wasm-edge"
 version = "0.3.0"
 dependencies = [
+ "criterion",
 "libm",
 "sha2",
 ]
+
+[[package]]
+name = "winapi-util"
+version = "0.1.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c2a7b1c03c876122aa43f3020e6c3c3ee5c05081c9a00739faf7503aeba10d22"
+dependencies = [
+ "windows-sys",
+]
+
+[[package]]
+name = "windows-link"
+version = "0.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5"
+
+[[package]]
+name = "windows-sys"
+version = "0.61.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc"
+dependencies = [
+ "windows-link",
+]
+
+[[package]]
+name = "zerocopy"
+version = "0.8.52"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ce1022995ff5ff5d841ad7d994facc23098cd40152f2c1d11cd607c6f530653f"
+dependencies = [
+ "zerocopy-derive",
+]
+
+[[package]]
+name = "zerocopy-derive"
+version = "0.8.52"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1ae7f38b72ec2a254e2b87ef277cf2cd4fb97cbebf944faa6f33354da0867930"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "zmij"
+version = "1.0.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa"
@@ -11,6 +11,20 @@ categories = ["embedded", "wasm", "science"]

 [lib]
 crate-type = ["cdylib", "rlib"]
+# The lib's libtest harness does not understand criterion CLI flags
+# (`--warm-up-time` etc.), so exclude it from `cargo bench` — only the criterion
+# bench target below should receive bench args (ADR-163).
+bench = false
+
+# ADR-163: host-measured process_frame latency benches (closes the ADR-160
+# "criterion benches for process_frame budget claims" deferred item — HOST only;
+# the ESP32-S3 WASM3 budget remains unmeasured, see the bench header).
+# `std` is required (criterion is a host crate); the crate is workspace-EXCLUDED
+# so run from the crate dir: `cargo bench --features std`.
+[[bench]]
+name = "process_frame_bench"
+harness = false
+required-features = ["std"]

 [dependencies]
 # no_std math
@@ -18,6 +32,11 @@ libm = "0.2"
 # SHA-256 for RVF build hash (optional, used by builder)
 sha2 = { version = "0.10", optional = true, default-features = false }

+[dev-dependencies]
+# Host-only latency regression benches (ADR-163). Pinned to match the rest of
+# the workspace's bench crates.
+criterion = { version = "0.5", features = ["html_reports"] }
+
 [features]
 default = ["default-pipeline"]
 # Enable std for testing on host + RVF builder
@@ -0,0 +1,259 @@
+//! Criterion benches for the heaviest `process_frame` hot paths in the edge
+//! skill library (ADR-163, closing the ADR-160 §"Deferred Backlog" item
+//! "Criterion benches for process_frame budget claims").
+//!
+//! ## HONEST SCOPE — read this before citing any number here
+//!
+//! These benches measure **HOST** wall-clock latency on a development laptop.
+//! The per-module doc budgets (e.g. `exo_time_crystal` "H (heavy, <10ms) on
+//! ESP32-S3 WASM3") are **for a different target**: an Xtensa ESP32-S3 running
+//! the WASM3 interpreter. A native x86_64 host with `-O` is an **upper-bound
+//! proxy for the ALGORITHM cost only**; it is NOT the ESP32 number and does NOT
+//! reproduce the ESP32 budget. WASM3 interpretation on a ~240 MHz Xtensa core is
+//! typically 1-2 orders of magnitude slower than native host code, so a host
+//! median well under the budget does NOT prove the ESP32 meets it — it only
+//! bounds the work. The ESP32 figure remains UNMEASURED (needs hardware).
+//!
+//! What these benches DO prove (MEASURED-on-host):
+//!   * the hot paths run, on a fixed synthetic CSI frame, with a real median;
+//!   * a regression guard exists so a future change that 10×'s the host cost
+//!     is caught in CI/dev even before anyone reflashes an ESP32.
+//!
+//! Run (the crate is EXCLUDED from the v2 workspace — bench from the crate dir):
+//!   cd v2/crates/wifi-densepose-wasm-edge
+//!   cargo bench --features std
+//!   # quick smoke:
+//!   cargo bench --features std -- --warm-up-time 1 --measurement-time 2
+//!
+//! `med_seizure_detect` is gated behind `medical-experimental`; its bench is
+//! `#[cfg(feature = "medical-experimental")]` and only runs when that feature is
+//! also enabled:
+//!   cargo bench --features std,medical-experimental
+
+use criterion::{criterion_group, criterion_main, BatchSize, Criterion};
+use std::hint::black_box;
+
+use wifi_densepose_wasm_edge::exo_ghost_hunter::GhostHunterDetector;
+use wifi_densepose_wasm_edge::exo_time_crystal::TimeCrystalDetector;
+use wifi_densepose_wasm_edge::sec_weapon_detect::WeaponDetector;
+
+// ── Fixed synthetic CSI fixtures (deterministic LCG, seed-stable) ────────────
+
+/// Deterministic pseudo-random in [lo, hi) from a 32-bit LCG, matching the
+/// generator style used by `tests/budget_compliance.rs`.
+fn lcg(seed: &mut u32) -> f32 {
+    *seed = seed.wrapping_mul(1103515245).wrapping_add(12345);
+    (*seed >> 16) as f32 / 32768.0
+}
+
+fn synthetic_phases(n: usize, seed: u32) -> Vec<f32> {
+    let mut s = seed;
+    (0..n).map(|_| lcg(&mut s) * 6.2832 - 3.1416).collect()
+}
+
+fn synthetic_amplitudes(n: usize, seed: u32) -> Vec<f32> {
+    let mut s = seed;
+    (0..n).map(|_| lcg(&mut s) * 10.0 + 0.1).collect()
+}
+
+fn synthetic_variance(n: usize, seed: u32) -> Vec<f32> {
+    let mut s = seed;
+    (0..n).map(|_| lcg(&mut s) * 2.0 + 0.05).collect()
+}
+
+const N_SC: usize = 32; // per-subcarrier width (matches both modules' MAX_SC)
+
+// ── exo_time_crystal: compute_autocorrelation 256×128 hot path ───────────────
+//
+// `compute_autocorrelation` is private, so we drive it through the public
+// `process_frame`. To hit the full 256-point × 128-lag autocorrelation the
+// circular buffer must be FULL (≥256 samples) and the signal must be
+// non-constant (the module early-outs on `buf_var < 1e-8`). We pre-fill once
+// with a periodic-plus-noise motion-energy stream, then bench a single
+// `process_frame` (each call recomputes the full 256×128 autocorrelation =
+// ~32K multiply-accumulates, the M6-audit-named hot path).
+
+fn prefilled_time_crystal() -> TimeCrystalDetector {
+    let mut d = TimeCrystalDetector::new();
+    let mut s = 0xC0FFEEu32;
+    // 300 frames (> BUF_LEN=256) so the buffer is full and statistics are warm.
+    for i in 0..300 {
+        // period-10 square wave + small noise → guarantees buf_var > 0 and a
+        // genuine autocorrelation structure (the expensive path runs).
+        let base = if (i % 10) < 5 { 1.0 } else { 0.0 };
+        let me = base + lcg(&mut s) * 0.05;
+        black_box(d.process_frame(black_box(me)));
+    }
+    d
+}
+
+fn bench_exo_time_crystal(c: &mut Criterion) {
+    c.bench_function("exo_time_crystal::process_frame[autocorr_256x128]", |b| {
+        let mut s = 0x1357_9BDFu32;
+        b.iter_batched(
+            prefilled_time_crystal,
+            |mut d| {
+                // One frame = one full 256×128 autocorrelation pass.
+                let me = if (d.frame_count() % 10) < 5 { 1.0 } else { 0.0 } + lcg(&mut s) * 0.05;
+                black_box(d.process_frame(black_box(me)));
+            },
+            BatchSize::SmallInput,
+        );
+    });
+}
+
+// ── exo_ghost_hunter: periodicity + hidden-breathing hot path ────────────────
+//
+// Heaviest path runs only when the room is reported EMPTY (presence == 0):
+// per-group anomaly accumulation + aggregate-phase autocorrelation for hidden
+// periodic (breathing) signatures. We warm the noise floor + phase buffer first,
+// then bench one empty-room frame.
+
+fn prefilled_ghost_hunter() -> GhostHunterDetector {
+    let mut d = GhostHunterDetector::new();
+    let mut s = 0xBADC0DEu32;
+    // Warm the per-group EWMA noise floors + fill the phase buffer (PHASE_BUF_LEN=64)
+    // with a periodic phase signal so the periodicity autocorrelation has structure.
+    for i in 0..120u32 {
+        let phases: Vec<f32> = (0..N_SC)
+            .map(|k| libm::sinf(i as f32 * 0.4 + k as f32 * 0.1) * 0.3 + lcg(&mut s) * 0.02)
+            .collect();
+        let amps = synthetic_amplitudes(N_SC, 4000 + i);
+        let var = synthetic_variance(N_SC, 4500 + i);
+        black_box(d.process_frame(&phases, &amps, &var, 0, 0.05));
+    }
+    d
+}
+
+fn bench_exo_ghost_hunter(c: &mut Criterion) {
+    let amps = synthetic_amplitudes(N_SC, 9000);
+    let var = synthetic_variance(N_SC, 9500);
+    c.bench_function("exo_ghost_hunter::process_frame[empty_room_periodicity]", |b| {
+        let mut s = 0x2468_ACE0u32;
+        b.iter_batched(
+            prefilled_ghost_hunter,
+            |mut d| {
+                let i = d.frame_count();
+                let phases: Vec<f32> = (0..N_SC)
+                    .map(|k| libm::sinf(i as f32 * 0.4 + k as f32 * 0.1) * 0.3 + lcg(&mut s) * 0.02)
+                    .collect();
+                black_box(d.process_frame(
+                    black_box(&phases),
+                    black_box(&amps),
+                    black_box(&var),
+                    black_box(0),
+                    black_box(0.05),
+                ));
+            },
+            BatchSize::SmallInput,
+        );
+    });
+}
+
+// ── sec_weapon_detect: per-subcarrier Welford hot path ───────────────────────
+//
+// After calibration the detector runs a per-subcarrier online Welford update
+// over MAX_SC=32 subcarriers each frame (the M6-audit-named hot path). We
+// calibrate first (the early frames just accumulate baseline stats), then bench
+// one steady-state frame.
+
+fn calibrated_weapon_detector() -> WeaponDetector {
+    let mut d = WeaponDetector::new();
+    // Drive enough empty-room frames to complete calibration + warm the running
+    // Welford state. Calibration window is internal; 200 frames is comfortably
+    // past it for MAX_SC=32.
+    for i in 0..200u32 {
+        let phases = synthetic_phases(N_SC, 6000 + i);
+        let amps = synthetic_amplitudes(N_SC, 6500 + i);
+        let var = synthetic_variance(N_SC, 7000 + i);
+        black_box(d.process_frame(&phases, &amps, &var, 0.05, 0));
+    }
+    d
+}
+
+fn bench_sec_weapon_detect(c: &mut Criterion) {
+    c.bench_function("sec_weapon_detect::process_frame[per_sc_welford]", |b| {
+        let mut seed = 8000u32;
+        b.iter_batched(
+            calibrated_weapon_detector,
+            |mut d| {
+                seed = seed.wrapping_add(1);
+                let phases = synthetic_phases(N_SC, seed);
+                let amps = synthetic_amplitudes(N_SC, seed.wrapping_add(500));
+                let var = synthetic_variance(N_SC, seed.wrapping_add(1000));
+                black_box(d.process_frame(
+                    black_box(&phases),
+                    black_box(&amps),
+                    black_box(&var),
+                    black_box(0.3),
+                    black_box(1),
+                ));
+            },
+            BatchSize::SmallInput,
+        );
+    });
+}
+
+// ── med_seizure_detect: detect_rhythm / clonic autocorrelation hot path ──────
+//
+// Gated behind `medical-experimental` (ADR-160 §A1). The clonic-phase rhythm
+// detection autocorrelates the amplitude ring buffer (PHASE_WINDOW=100); we warm
+// the buffers with a high-energy rhythmic signal, then bench one frame.
+#[cfg(feature = "medical-experimental")]
+mod med {
+    use super::*;
+    use wifi_densepose_wasm_edge::med_seizure_detect::SeizureDetector;
+
+    fn warmed_seizure_detector() -> SeizureDetector {
+        let mut d = SeizureDetector::new();
+        let mut s = 0x5EE_D00Du32;
+        // High-energy ~4 Hz rhythmic (period ~5 frames at 20 Hz) → exercises the
+        // clonic-phase rhythm/autocorrelation path, with presence asserted.
+        for i in 0..150u32 {
+            let me = 2.5 + libm::sinf(i as f32 * 1.25) * 1.5;
+            let amp = 1.0 + lcg(&mut s) * 0.2;
+            black_box(d.process_frame(0.0, amp, me, 1));
+        }
+        d
+    }
+
+    pub fn bench_med_seizure_detect(c: &mut Criterion) {
+        c.bench_function("med_seizure_detect::process_frame[clonic_rhythm]", |b| {
+            let mut s = 0x9A_BCDE_F0u32;
+            b.iter_batched(
+                warmed_seizure_detector,
+                |mut d| {
+                    let i = d.frame_count();
+                    let me = 2.5 + libm::sinf(i as f32 * 1.25) * 1.5;
+                    let amp = 1.0 + lcg(&mut s) * 0.2;
+                    black_box(d.process_frame(
+                        black_box(0.0),
+                        black_box(amp),
+                        black_box(me),
+                        black_box(1),
+                    ));
+                },
+                BatchSize::SmallInput,
+            );
+        });
+    }
+}
+
+#[cfg(feature = "medical-experimental")]
+criterion_group!(
+    benches,
+    bench_exo_time_crystal,
+    bench_exo_ghost_hunter,
+    bench_sec_weapon_detect,
+    med::bench_med_seizure_detect,
+);
+
+#[cfg(not(feature = "medical-experimental"))]
+criterion_group!(
+    benches,
+    bench_exo_time_crystal,
+    bench_exo_ghost_hunter,
+    bench_sec_weapon_detect,
+);
+
+criterion_main!(benches);
Author	SHA1	Message	Date
rUv	3fb40a9deb	Merge pull request #1030 from ruvnet/feat/v2-beyond-sota-sweep-m9 Beyond-SOTA sweep M9 (ADR-163): edge-latency measurement debt → MEASURED-on-host benches	2026-06-12 08:14:57 -04:00
ruv	1a17cc5b06	docs(ADR-163): edge-latency RESULTS + PROOF/prove.sh wiring (T3) Adds benchmarks/edge-latency/RESULTS.md (wiflow-std RESULTS style: each measured number with reproduce command, machine, MEASURED-on-host grade, and the honest host-vs-ESP32 / steady-state-vs-cold-start caveats) and ADR-163 (HEADLINE: CLAIMED latency budgets -> MEASURED-on-host, closing M5/M6 measurement debt; ESP32-on-hardware still pending). - ADR-160 deferred 'criterion benches for process_frame budget claims' line updated to DONE (host) with the ESP32-pending note. - PROOF.md performance table gains the two edge-latency reproduce rows; provenance ADR range extended to ADR-163. - prove.sh gated section gains the edge-latency bench note (host proxy only; not asserted, never claims the ESP32 figure). Benches/docs only; no crate republishes. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-12 08:02:07 -04:00
ruv	7c13ec6a00	bench(cogs): steady-state CPU infer latency benches (ADR-163 T2) Criterion benches over InferenceEngine::infer for cog-person-count and cog-pose-estimation, on Device::Cpu with the real shipped safetensors weights (asserts candle backend so the stub is never silently benched), over a fixed CSI window after a warm-up forward. HOST-MEASURED steady-state medians (idle box): ~305us each. This is the recurring per-frame cost and is explicitly NOT the pose manifest's cold_start_ms_avg=5.4 (a different measurement, weight-load included, taken on ruvultra/RTX 5080) -- the two are labelled and not conflated. Closes the ADR-159/160 deferred cog inference-latency item. No production- code behavior change. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-12 08:01:50 -04:00
ruv	d3606d51a7	bench(wasm-edge): host process_frame latency benches (ADR-163 T1) Criterion benches over the M6-audit-named heaviest hot paths: exo_time_crystal 256x128 autocorrelation, exo_ghost_hunter periodicity, sec_weapon_detect per-subcarrier Welford, med_seizure_detect clonic rhythm (medical-experimental-gated). Drives each through the public process_frame on a fixed synthetic CSI frame after warming the relevant buffers. Crate is workspace-excluded: run from the crate dir with --features std. Set lib bench=false so libtest does not intercept criterion CLI flags. HOST-MEASURED medians (Intel Core Ultra 9 285H, native --release), NOT the ESP32/WASM3 doc budget (that needs hardware): time_crystal 17.3us, ghost_hunter 1.44us, weapon 0.42us, seizure 0.10us. Closes the ADR-160 deferred 'criterion benches for process_frame budget claims' item on host. No production-code behavior change. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-12 08:01:29 -04:00