docs(adr): ADR-180 — through-wall camera↔CSI hand-off demo ("Behind the Wall")

Proposed design for the HTML demo: camera-supervised CSI model infers a full skeleton, hands off camera→RF when you walk behind a wall, and keeps inferring the skeleton through the wall (S3 + C6 mmWave + Pi5 nexmon multistatic fusion + AETHER re-ID). Dead-reckoning Kalman smoother (reuses pose_tracker.rs) keeps the figure fluid through dropped CSI with bounded extrapolation → LOST, never a phantom. Honesty mechanism: a far-side camera (cognitum-v0) provides ground truth behind the wall so the through-wall skeleton PCK is MEASURED + published (metric-locked, ADR-173), not claimed. Reuses ADR-079 supervision, the multistatic fuser, the calibration crate, and the Observatory UI — new code is a hand-off module + dead-reckoning smoother + a single-file HTML viewer. Co-Authored-By: claude-flow <ruv@ruv.net>
fix(wasm-edge): sanitize non-finite host floats at the WASM↔host frame boundary (#1102 )
2026-06-16 11:23:19 +00:00 · 2026-06-15 15:30:59 -04:00 · 2026-06-15 13:06:46 -04:00 · 2026-06-15 12:35:29 -04:00 · 2026-06-15 12:01:17 -04:00 · 2026-06-15 11:11:19 -04:00
676 changed files with 70365 additions and 28371 deletions
@@ -33,6 +33,8 @@ jobs:
        working-directory: v2
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Install Rust toolchain
        run: rustup show && rustc --version
@@ -0,0 +1,199 @@
+name: Bench Regression Guard
+
+# Sub-deliverable 8.3 of the benchmark/optimization milestone.
+#
+# HONEST SCOPE (read this before assuming this gates on timing):
+#   * The `bench-compile` job is a REAL, HARD-FAILING regression gate. It runs
+#     `cargo bench --no-default-features --no-run`, which type-checks and links
+#     EVERY criterion bench in the v2/ workspace without running a single
+#     measurement. Benches are not part of `cargo test`, so they silently
+#     bit-rot when a public API they call changes — this job catches that the
+#     moment it happens. This is the part of this workflow that can fail a PR.
+#
+#   * The `bench-fast-run` job runs a small, curated subset of pure-CPU benches
+#     in criterion "quick mode" (short warm-up / measurement / 10 samples) and
+#     is INFORMATIONAL ONLY (`continue-on-error: true`). It does NOT gate on
+#     timing. Wall-clock timings on shared GitHub-hosted runners vary by
+#     2-3x run-to-run (noisy neighbours, CPU throttling, no pinned frequency),
+#     so a hard ">X ms" threshold here would flake constantly and teach
+#     everyone to ignore it. We deliberately do not pretend to do timing
+#     regression-gating we cannot deliver reliably. The numbers are surfaced in
+#     the job log + uploaded as an artifact for humans to eyeball trends.
+#
+# WHY NO criterion --baseline COMPARE GATE:
+#   criterion's `--save-baseline` / `--baseline` compare is the textbook
+#   regression mechanism, but it only produces a trustworthy verdict when the
+#   baseline and the candidate were measured on the SAME hardware under the SAME
+#   conditions. GitHub-hosted runners give neither (the baseline commit and the
+#   PR commit land on different physical machines). Committing a baseline JSON
+#   measured on one runner and comparing a different runner against it would
+#   manufacture false regressions. If/when these benches run on a dedicated,
+#   frequency-pinned self-hosted runner, a `--baseline` compare with a generous
+#   (>2x) noise floor becomes honest and can be added then. Until then,
+#   compile-verify + informational-run is the honest gate.
+
+on:
+  push:
+    branches: [ main, develop, 'feat/*' ]
+    paths:
+      - 'v2/crates/**/benches/**'
+      - 'v2/crates/**/Cargo.toml'
+      - 'v2/crates/**/src/**'
+      - 'v2/Cargo.toml'
+      - 'v2/Cargo.lock'
+      - '.github/workflows/bench-regression.yml'
+  pull_request:
+    paths:
+      - 'v2/crates/**/benches/**'
+      - 'v2/crates/**/Cargo.toml'
+      - 'v2/crates/**/src/**'
+      - 'v2/Cargo.toml'
+      - 'v2/Cargo.lock'
+      - '.github/workflows/bench-regression.yml'
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+env:
+  CARGO_TERM_COLOR: always
+  # Debuginfo is useless in CI and the 38-crate workspace target dir otherwise
+  # exhausts the runner disk (mirrors ci.yml's rust-tests job). The bench
+  # profile inherits release + debug = true (v2/Cargo.toml [profile.bench]);
+  # force it off so the link step does not run out of space.
+  CARGO_PROFILE_BENCH_DEBUG: "0"
+  CARGO_PROFILE_RELEASE_DEBUG: "0"
+
+jobs:
+  # ── HARD GATE: every bench must still compile + link ─────────────────────
+  bench-compile:
+    name: bench compile-verify (--no-run)
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout (recursive — wifi-densepose-rufield path-deps vendor/rufield)
+        uses: actions/checkout@v4
+        with:
+          # The workspace includes `wifi-densepose-rufield`, which path-deps the
+          # `vendor/rufield` submodule crates. Without a recursive checkout the
+          # whole workspace fails to resolve before any bench is built.
+          submodules: recursive
+
+      # The workspace pulls in `wifi-densepose-desktop` (Tauri v2) whose -sys
+      # crates need the GTK/WebKit/serial dev libraries via pkg-config, exactly
+      # as ci.yml's rust-tests job documents. A `--workspace` bench build links
+      # the whole graph, so these are required here too.
+      - name: Install Tauri / GTK / serial system dev libraries
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends \
+            libglib2.0-dev \
+            libgtk-3-dev \
+            libsoup-3.0-dev \
+            libjavascriptcoregtk-4.1-dev \
+            libwebkit2gtk-4.1-dev \
+            libayatana-appindicator3-dev \
+            librsvg2-dev \
+            libxdo-dev \
+            libudev-dev \
+            libdbus-1-dev \
+            libssl-dev \
+            pkg-config
+
+      - name: Install Rust toolchain
+        uses: dtolnay/rust-toolchain@stable
+
+      - name: Cache cargo (Swatinem/rust-cache)
+        uses: Swatinem/rust-cache@v2
+        with:
+          workspaces: v2
+          # Distinct cache scope from ci.yml's rust-tests so the bench profile
+          # artifacts (release+opt) do not evict the test profile cache.
+          key: bench-regression
+
+      # The core regression guard. `--no-run` compiles + links every bench
+      # target in the workspace's DEFAULT feature set but runs no measurement,
+      # so it is deterministic and fast-ish (build only). A bench that no longer
+      # compiles — because a type/signature it calls changed and nobody updated
+      # the bench — fails the build here. `--no-default-features` is the
+      # workspace's standard gate flag (openblas/tch/ort/onnx stay opt-out).
+      - name: Compile all workspace benches (default features)
+        working-directory: v2
+        run: cargo bench --workspace --no-default-features --no-run
+
+      # Feature-gated benches are skipped by the default build above because
+      # their `[[bench]]` entries carry `required-features`. Compile the ones we
+      # can guard so they are also covered against bit-rot.
+      #   * cir → wifi-densepose-signal/benches/cir_bench.rs (ADR-134). The
+      #     `cir` feature is pure-Rust (`cir = []`), so it builds on the stock
+      #     runner and is a real, hard-failing guard like the step above.
+      #
+      # NOT guarded here (honest scope):
+      #   * crv → wifi-densepose-ruvector/benches/crv_bench.rs. The `crv` feature
+      #     pulls the crates.io dependency `ruvector-crv 0.1.1`, which currently
+      #     FAILS to compile on stable (E0308 type mismatch in its own
+      #     `stage_iii.rs` — an UPSTREAM bug, unrelated to bench bit-rot).
+      #     Adding a hard `--features crv` compile step would make this workflow
+      #     red for a reason this gate is not meant to police. Re-add this step
+      #     once `ruvector-crv` ships a fixed release. (mqtt/onnx benches are
+      #     likewise left to their own crate workflows.)
+      - name: Compile feature-gated benches (cir)
+        working-directory: v2
+        run: cargo bench -p wifi-densepose-signal --no-default-features --features cir --bench cir_bench --no-run
+
+  # ── INFORMATIONAL: run a curated fast subset (never gates) ───────────────
+  bench-fast-run:
+    name: bench fast-run (informational, non-gating)
+    runs-on: ubuntu-latest
+    # NEVER fail the workflow on this job — timings are noise-prone on shared
+    # runners (see header). It exists to surface trends for humans, not to gate.
+    continue-on-error: true
+    needs: [bench-compile]
+    steps:
+      - name: Checkout (recursive)
+        uses: actions/checkout@v4
+        with:
+          submodules: recursive
+
+      - name: Install Rust toolchain
+        uses: dtolnay/rust-toolchain@stable
+
+      - name: Cache cargo (Swatinem/rust-cache)
+        uses: Swatinem/rust-cache@v2
+        with:
+          workspaces: v2
+          key: bench-regression
+
+      # Curated subset = pure-CPU, fast, dependency-light criterion benches that
+      # finish in seconds under quick-mode flags. Each is targeted by `--bench`
+      # (NOT a bare `cargo bench -p`) because the crates' lib targets use the
+      # libtest harness, which rejects criterion's CLI flags (--warm-up-time
+      # etc.) and aborts the run. Quick-mode: 1s warm-up, 2s measure, 10 samples.
+      - name: nvsim pipeline_throughput (quick)
+        working-directory: v2
+        run: |
+          mkdir -p ../bench-out
+          cargo bench -p nvsim --no-default-features --bench pipeline_throughput -- \
+            --warm-up-time 1 --measurement-time 2 --sample-size 10 \
+            | tee ../bench-out/nvsim_pipeline_throughput.txt
+
+      - name: ruvector sketch_bench (quick)
+        working-directory: v2
+        run: |
+          cargo bench -p wifi-densepose-ruvector --no-default-features --bench sketch_bench -- \
+            --warm-up-time 1 --measurement-time 2 --sample-size 10 \
+            | tee ../bench-out/ruvector_sketch_bench.txt
+
+      - name: ruvector fusion_bench (quick)
+        working-directory: v2
+        run: |
+          cargo bench -p wifi-densepose-ruvector --no-default-features --bench fusion_bench -- \
+            --warm-up-time 1 --measurement-time 2 --sample-size 10 \
+            | tee ../bench-out/ruvector_fusion_bench.txt
+
+      - name: Upload informational bench logs
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: bench-fast-run-logs
+          path: bench-out/
+          if-no-files-found: warn
@@ -53,6 +53,8 @@ jobs:
    steps:
      - name: Checkout
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
@@ -42,6 +42,8 @@ jobs:
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Determine deployment environment
      id: determine-env
@@ -86,6 +88,8 @@ jobs:
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up kubectl
      uses: azure/setup-kubectl@v3
@@ -132,6 +136,8 @@ jobs:
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up kubectl
      uses: azure/setup-kubectl@v3
@@ -29,6 +29,7 @@ jobs:
      continue-on-error: true
      uses: actions/checkout@v4
      with:
+        submodules: recursive
        fetch-depth: 0

    - name: Set up Python
@@ -82,6 +83,13 @@ jobs:
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
+      with:
+        submodules: recursive
+      # ADR-262 P1: `wifi-densepose-rufield` path-deps the `vendor/rufield`
+      # submodule. Without a recursive checkout the workspace build fails to
+      # resolve those path deps in CI even though it passes locally.
+      with:
+        submodules: recursive

    # `wifi-densepose-desktop` is a Tauri v2 app — `glib-sys`, `gtk-sys`,
    # `webkit2gtk-sys`, etc. need the Linux dev libraries via pkg-config or the
@@ -108,23 +116,36 @@ jobs:
    - name: Install Rust toolchain
      uses: dtolnay/rust-toolchain@stable

-    - name: Cache cargo
-      uses: actions/cache@v4
+    # Swatinem/rust-cache replaces a naive `actions/cache` of the whole
+    # `v2/target`. That manual cache of a 38-crate target dir (multi-GB) was an
+    # intermittent failure source — several CI runs this cycle died at the
+    # cache/setup step (after toolchain install, before "Run Rust tests"),
+    # needing a rerun. rust-cache is purpose-built for Rust: it caches the
+    # registry + git + a pruned target, evicts stale deps, and restores far more
+    # reliably (and faster) on large workspaces. `workspaces: v2` points it at
+    # the v2/ cargo workspace (keys on v2/Cargo.lock, caches v2/target).
+    - name: Cache cargo (Swatinem/rust-cache)
+      uses: Swatinem/rust-cache@v2
      with:
-        path: |
-          ~/.cargo/registry
-          ~/.cargo/git
-          v2/target
-        key: ${{ runner.os }}-cargo-${{ hashFiles('v2/Cargo.lock') }}
-        restore-keys: |
-          ${{ runner.os }}-cargo-
+        workspaces: v2

+    # The 38-crate workspace debug build exhausts the runner's disk when built
+    # with full debuginfo (observed: "final link failed: No space left on
+    # device" once the engine/benchmark crates landed; the same tree's local
+    # debug target measured 151 GB). Debuginfo is useless in CI — tests either
+    # pass or print their failure — so build without it; target shrinks ~5-10x.
    - name: Run Rust tests
      working-directory: v2
+      env:
+        CARGO_PROFILE_DEV_DEBUG: "0"
+        CARGO_PROFILE_TEST_DEBUG: "0"
      run: cargo test --workspace --no-default-features

    - name: Run ADR-147 worldmodel tests
      working-directory: v2
+      env:
+        CARGO_PROFILE_DEV_DEBUG: "0"
+        CARGO_PROFILE_TEST_DEBUG: "0"
      run: cargo test -p wifi-densepose-worldmodel --no-default-features

    # ADR-134 CIR tests are behind the `cir` feature so the bench dependency
@@ -189,6 +210,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Python ${{ matrix.python-version }}
      continue-on-error: true
@@ -254,6 +277,8 @@ jobs:
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Python
      uses: actions/setup-python@v6
@@ -267,28 +292,36 @@ jobs:
        pip install -r requirements.txt
        pip install pytest   # the perf suite is pytest, not locust

-    - name: Start application
-      working-directory: archive/v1
-      env:
-        # No CSI hardware in CI — serve mock pose data so the pose endpoints
-        # respond 200 under load instead of erroring "requires real CSI data".
-        MOCK_POSE_DATA: "true"
-      run: |
-        uvicorn src.api.main:app --host 0.0.0.0 --port 8000 &
-        sleep 10
+    # No "Start application" step: the gated test (test_frame_budget.py) drives
+    # the CSIProcessor pipeline in-process and makes no HTTP calls, so the old
+    # uvicorn server + `sleep 10` were dead weight — they only existed for the
+    # now-excluded api_throughput/inference_speed tests, and on every run dumped
+    # ~50 misleading "router requires hardware setup" ERROR lines for a server
+    # no test touched. MOCK_POSE_DATA is server-only and unused here.

    - name: Run performance tests
      working-directory: archive/v1
-      env:
-        MOCK_POSE_DATA: "true"
      run: |
-        # The repo's performance suite is pytest (test_api_throughput.py,
-        # test_frame_budget.py, test_inference_speed.py) — there is no
-        # locustfile.py, so the old `locust -f tests/performance/locustfile.py`
-        # command always failed with "Could not find ...". Run the real suite.
-        # -o addopts="" drops the root pyproject's --cov/--cov-fail-under=100
-        # flags (pytest-cov isn't installed here and 100% cov is for unit tests).
-        pytest tests/performance/ -o addopts="" -v --junitxml=perf-junit.xml
+        # Gate only on the genuine, deterministic perf guard:
+        # test_frame_budget.py times the *real* CSIProcessor pipeline against
+        # the ADR 50 ms per-frame budget (single-frame, p95 over 100 frames,
+        # +Doppler) — a true regression signal.
+        #
+        # test_api_throughput.py / test_inference_speed.py are excluded: every
+        # test there is a TDD red-phase stub (suffix `_should_fail_initially`)
+        # that times a *mock that sleeps* — meaningless as a perf signal, with
+        # machine-dependent wall-clock asserts (e.g. `actual_rps >= 40`,
+        # `batch_time < individual_time`) that are inherently flaky on shared
+        # CI runners, plus a cross-class fixture-scope bug. Forcing them green
+        # would be manufacturing a false signal; they stay in-repo for local
+        # TDD but do not gate CI until the underlying features are implemented.
+        #
+        # `python -m pytest` (not the bare `pytest` script) puts the cwd
+        # (archive/v1) on sys.path so `from src.core...` resolves — the bare
+        # script omits cwd and raises ModuleNotFoundError: No module named 'src'.
+        # -o addopts="" drops the root pyproject's --cov/--cov-fail-under=100.
+        python -m pytest tests/performance/test_frame_budget.py \
+          -o addopts="" -v --junitxml=perf-junit.xml

    - name: Upload performance results
      if: always()
@@ -314,6 +347,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Docker Buildx
      continue-on-error: true
@@ -386,6 +421,8 @@ jobs:
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Python
      uses: actions/setup-python@v6
@@ -35,6 +35,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Fetch /traffic/clones + /traffic/views from GitHub
        env:
@@ -28,6 +28,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable
@@ -78,6 +80,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable
@@ -145,6 +149,8 @@ jobs:
      vars.HAS_GCP_CREDENTIALS == 'true'
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Download x86_64 artifact
        uses: actions/download-artifact@v4
@@ -20,6 +20,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - uses: dtolnay/rust-toolchain@stable
        with: { targets: wasm32-unknown-unknown }
@@ -26,6 +26,8 @@ jobs:
    steps:
      - name: Checkout main
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Install Rust + wasm32 target
        uses: dtolnay/rust-toolchain@stable
@@ -28,6 +28,8 @@ jobs:
    steps:
      - name: Checkout
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Setup Node.js
        uses: actions/setup-node@v6
@@ -83,6 +85,8 @@ jobs:
    steps:
      - name: Checkout
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Setup Node.js
        uses: actions/setup-node@v6
@@ -131,6 +135,8 @@ jobs:
    steps:
      - name: Checkout
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Download all artifacts
        uses: actions/download-artifact@v4
@@ -22,6 +22,8 @@ jobs:
    if: github.ref_type == 'tag'
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      - name: Check firmware version.txt == tag
        run: |
          # Tag form: vX.Y.Z-esp32  →  expect version.txt to contain X.Y.Z
@@ -71,6 +73,8 @@ jobs:

    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Build firmware (${{ matrix.variant }})
        working-directory: firmware/esp32-csi-node
@@ -100,6 +100,8 @@ jobs:

    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Download QEMU artifact
        uses: actions/download-artifact@v4
@@ -214,6 +216,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Install clang
        run: |
@@ -263,6 +267,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Install NVS generator
        run: pip install esp-idf-nvs-partition-gen
@@ -317,6 +323,8 @@ jobs:

    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Download QEMU artifact
        uses: actions/download-artifact@v4
@@ -22,6 +22,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - uses: actions/setup-python@v6
        with:
@@ -41,6 +41,8 @@ jobs:

    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Install mosquitto + clients and start with allow_anonymous
        run: |
@@ -26,6 +26,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - uses: docker/setup-buildx-action@v3

@@ -76,6 +76,8 @@ jobs:
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive

      # Linux aarch64 needs QEMU for cross-build on x86_64 runners.
      - name: Set up QEMU
@@ -121,6 +123,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      - name: Install maturin
        run: pip install maturin>=1.7
      - name: Build sdist
@@ -144,6 +148,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
@@ -29,6 +29,8 @@ jobs:
    steps:
      - name: Checkout main
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Stage viewer for Pages
        run: |
@@ -40,6 +40,8 @@ jobs:
          - { label: 'full+train',       flags: '--features full,train' }
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      - uses: dtolnay/rust-toolchain@stable
      - name: Cache cargo
        uses: actions/cache@v4
@@ -60,6 +62,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      # v2/rust-toolchain.toml pins channel "1.89" with profile "minimal" (no
      # clippy). dtolnay@stable installs clippy on the floating "stable"
      # toolchain, but the override makes cargo use the separate "1.89"
@@ -93,6 +97,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      - uses: dtolnay/rust-toolchain@stable
      - name: Cache cargo
        uses: actions/cache@v4
@@ -127,6 +133,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
      - name: publish = false is present (no accidental crates.io publish)
        run: |
          CARGO=v2/crates/ruview-swarm/Cargo.toml
@@ -28,6 +28,7 @@ jobs:
      continue-on-error: true
      uses: actions/checkout@v4
      with:
+        submodules: recursive
        fetch-depth: 0

    - name: Set up Python
@@ -46,7 +47,10 @@ jobs:

    - name: Run Bandit security scan
      run: |
-        bandit -r src/ -f sarif -o bandit-results.sarif
+        # The Python codebase lives under archive/v1/src (it moved there when
+        # the runtime was rewritten in Rust). Scanning `src/` matched nothing,
+        # so this SAST step was a silent no-op.
+        bandit -r archive/v1/src/ -f sarif -o bandit-results.sarif
      continue-on-error: true

    - name: Upload Bandit results to GitHub Security
@@ -57,22 +61,20 @@ jobs:
        sarif_file: bandit-results.sarif
        category: bandit

-    - name: Run Semgrep security scan
-      continue-on-error: true
-      uses: returntocorp/semgrep-action@v1
-      with:
-        config: >-
-          p/security-audit
-          p/secrets
-          p/python
-          p/docker
-          p/kubernetes
-      env:
-        SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
-        
-    - name: Generate Semgrep SARIF
+    # Removed the deprecated `returntocorp/semgrep-action@v1` step: it was
+    # redundant (the pip `semgrep --sarif` below is what feeds GitHub Security;
+    # the action only pushed to the Semgrep cloud app via SEMGREP_APP_TOKEN) and
+    # it pulled `returntocorp/semgrep-agent:v1` from Docker Hub on every run,
+    # which intermittently timed out and turned this check red. The pip semgrep
+    # (installed above) needs no Docker pull. The action's `p/docker` +
+    # `p/kubernetes` rulesets are folded into the command below so coverage is
+    # preserved.
+    - name: Run Semgrep + generate SARIF
      run: |
-        semgrep --config=p/security-audit --config=p/secrets --config=p/python --sarif --output=semgrep.sarif src/
+        semgrep \
+          --config=p/security-audit --config=p/secrets --config=p/python \
+          --config=p/docker --config=p/kubernetes \
+          --sarif --output=semgrep.sarif archive/v1/src/
      continue-on-error: true

    - name: Upload Semgrep results to GitHub Security
@@ -96,6 +98,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Python
      continue-on-error: true
@@ -163,6 +167,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Docker Buildx
      continue-on-error: true
@@ -244,6 +250,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Run Checkov IaC scan
      continue-on-error: true
@@ -306,6 +314,7 @@ jobs:
      continue-on-error: true
      uses: actions/checkout@v4
      with:
+        submodules: recursive
        fetch-depth: 0

    - name: Run TruffleHog secret scan
@@ -340,6 +349,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Set up Python
      continue-on-error: true
@@ -377,6 +388,8 @@ jobs:
    - name: Checkout code
      continue-on-error: true
      uses: actions/checkout@v4
+      with:
+        submodules: recursive

    - name: Check security policy files
      continue-on-error: true
@@ -30,6 +30,8 @@ jobs:
    steps:
      - name: Checkout main
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Stage demos for Pages
        run: |
@@ -30,6 +30,8 @@ jobs:
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
+        with:
+          submodules: recursive

      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v6
@@ -16,6 +16,15 @@ firmware/esp32-csi-node/sdkconfig.defaults.bak
 # ESP-IDF set-target backup (local only)
 firmware/esp32-hello-world/sdkconfig.old

+# Host-built firmware test binaries (compiled from test/*.c, not source)
+firmware/esp32-csi-node/test/test_adr110
+firmware/esp32-csi-node/test/test_vitals
+firmware/esp32-csi-node/test/fuzz_serialize
+firmware/esp32-csi-node/test/fuzz_edge
+firmware/esp32-csi-node/test/fuzz_nvs
+firmware/esp32-csi-node/test/*.exe
+firmware/esp32-csi-node/test/*.obj
+
 # Claude Flow swarm runtime state
 .swarm/

@@ -14,3 +14,10 @@
 	path = vendor/rvcsi
 	url = https://github.com/ruvnet/rvcsi
 	branch = main
+[submodule "v2/crates/ruv-neural"]
+	path = v2/crates/ruv-neural
+	url = https://github.com/ruvnet/ruv-neural.git
+	branch = main
+[submodule "vendor/rufield"]
+	path = vendor/rufield
+	url = https://github.com/ruvnet/rufield
@@ -10,17 +10,20 @@ Dual codebase: Python v1 (`v1/`) and Rust port (`v2/`).
 | `wifi-densepose-core` | Core types, traits, error types, CSI frame primitives |
 | `wifi-densepose-signal` | SOTA signal processing + RuvSense multistatic sensing (16 modules) |
 | `wifi-densepose-nn` | Neural network inference (ONNX, PyTorch, Candle backends) |
-| `wifi-densepose-train` | Training pipeline with ruvector integration + ruview_metrics |
+| `wifi-densepose-train` | Training pipeline with ruvector integration + ruview_metrics; MAE pretraining recipe (`mae.rs`, ADR-152 §2.3) + WiFlow-STD port (`wiflow_std/`, tch-gated) |
 | `wifi-densepose-mat` | Mass Casualty Assessment Tool — disaster survivor detection |
-| `wifi-densepose-hardware` | ESP32 aggregator, TDM protocol, channel hopping firmware |
+| `wifi-densepose-hardware` | ESP32 aggregator, TDM protocol, channel hopping firmware; `ieee80211bf/` 802.11bf forward-compat protocol model (ADR-153) |
 | `wifi-densepose-ruvector` | RuVector v2.0.4 integration + cross-viewpoint fusion (5 modules) |
 | `wifi-densepose-wasm` | WebAssembly bindings for browser deployment |
-| `wifi-densepose-cli` | CLI tool (`wifi-densepose` binary) |
+| `wifi-densepose-cli` | CLI tool (`wifi-densepose` binary) — `calibrate`/`calibrate-serve`/`enroll`/`train-room`/`room-watch` + MAT (MAT gated behind the `mat` feature; build `--no-default-features` for the aarch64/appliance calibration binary) |
+| `wifi-densepose-calibration` | ADR-151 per-room calibration & specialist training — `baseline → enroll → extract → train` → bank of small specialists (presence/posture/breathing/heartbeat/restlessness/anomaly) + multistatic fusion; pure Rust, edge-deployable |
 | `wifi-densepose-sensing-server` | Lightweight Axum server for WiFi sensing UI |
 | `wifi-densepose-wifiscan` | Multi-BSSID WiFi scanning (ADR-022) |
 | `wifi-densepose-vitals` | ESP32 CSI-grade vital sign extraction (ADR-021) |
 | `nvsim` | Deterministic NV-diamond magnetometer pipeline simulator (ADR-089) — standalone leaf, WASM-ready |
 | `vendor/rvcsi` (submodule) | **rvCSI** — edge RF sensing runtime (ADR-095/096): 9 crates (`rvcsi-core`/`-dsp`/`-events`/`-adapter-file`/`-adapter-nexmon`/`-ruvector`/`-runtime`/`-node`/`-cli`). Lives in its own repo ([github.com/ruvnet/rvcsi](https://github.com/ruvnet/rvcsi)), vendored here under `vendor/rvcsi`, published to crates.io as `rvcsi-* 0.3.x` and to npm as `@ruv/rvcsi`. Not a `v2/` workspace member — depend on the published crates (or the submodule's `crates/rvcsi-*` paths). Normalized `CsiFrame`/`CsiWindow`/`CsiEvent` schema, validate-before-FFI, reusable DSP, typed confidence-scored events, the napi-c Nexmon shim (real nexmon_csi `.pcap` from a Raspberry Pi 5 / 4 / 3B+ — BCM43455c0), the napi-rs SDK, the `rvcsi` CLI, a Claude Code plugin. |
+| `vendor/rufield` (submodule) | **RuField MFS** — the open spec for camera-free multimodal field sensing (ADR-260). A common `FieldEvent`/`FieldTensor`/`FusionGraph`/`PrivacyClass`/`ProvenanceReceipt` model *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and quantum sensors. Lives in its own repo ([github.com/ruvnet/rufield](https://github.com/ruvnet/rufield)), vendored here under `vendor/rufield`. Not a `v2/` workspace member. v0.1 reference stack = 7 crates (`rufield-core`/`-provenance`/`-privacy`/`-adapters`/`-fusion`/`-bench`/`-viewer`), 72 tests/0 failed; `rufield-viewer` is an Axum + vanilla-JS read-only dashboard (`cargo run -p rufield-viewer`) completing ADR-260 §27.9. The WiFi-CSI modality is now **real-replay-backed** via `CsiReplayAdapter` (ingests real captured `.csi.jsonl` → fused presence/breathing inferences; replay-from-file, unlabeled CSI-variance proxy, not validated accuracy); mmWave/thermal + all synthetic-bench F1 numbers remain **SYNTHETIC** (no live hardware — live streaming + labeled accuracy are roadmap). |
+| `wifi-densepose-rufield` | ADR-262 P1 **anti-corruption bridge** — converts RuView WiFi-CSI sensing output (`SensingSnapshot` mirroring `SensingUpdate` + `TrustedOutput`, owned primitives, no dep on `wifi-densepose-sensing-server`) into **signed RuField `FieldEvent`s** (`Modality::WifiCsi`, real `timestamp_ns`, sha256 + ed25519 provenance, `synthetic=false`). The single coupling point between RuView and the standalone RuField MFS spec (§5.4); path-deps the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion`). **Critical §3.3 privacy mapping** (`map_privacy`): maps RuView class → RuField P0–P5 by **information content, never byte value**, fail-closed (`Derived → P4/P5`, never P1; `demoted` floors to ≥ P2). 15 tests / 0 failed (round-trip / `is_fusable` / fusion-ingest / privacy-safety / determinism). P1 plumbing — not wired into the live server (P3), no accuracy claim. |
 | `ruview-swarm` | Drone swarm control system (ADR-148) — hierarchical-mesh topology, Raft consensus, MARL, CSI sensing payload, MAVLink/PX4 compat, Ruflo AI-agent integration |

 ### RuvSense Modules (`signal/src/ruvsense/`)
@@ -72,6 +75,8 @@ All 5 ruvector crates integrated in workspace:
 - ADR-031: RuView sensing-first RF mode (Proposed)
 - ADR-032: Multistatic mesh security hardening (Proposed)
 - ADR-148: Drone swarm control system / `ruview-swarm` (In Progress)
+- ADR-152: WiFi-Pose SOTA 2026 intake — geometry conditioning, WiFlow-STD benchmark (measurement (a) complete: claims MEASURED-EQUIVALENT at ~96% PCK@20), MAE recipe (Proposed; §2.1–2.3, 2.6 implemented)
+- ADR-153: IEEE 802.11bf-2025 forward-compatibility protocol model (Accepted — amends ADR-152 §2.4)

 ### Supported Hardware

@@ -0,0 +1,78 @@
+# PROOF — reproduce every claim, or find the one we can't yet
+
+This project (RuView / wifi-densepose) has been publicly called "AI slop" and
+"fake." This document is the answer: **a skeptic can clone the repo, run one
+script, and have every headline claim either verified on their own machine or
+shown — explicitly — as "CLAIMED, not yet reproduced (here's exactly what it
+needs)."** Nothing below is asserted without a command you can run.
+
+```bash
+git clone https://github.com/ruvnet/RuView && cd RuView
+bash scripts/prove.sh          # core gate + the anti-slop assertion tests
+bash scripts/prove.sh --full   # also attempt the feature-gated subset
+```
+
+`prove.sh` exits 0 only if every **non-gated** claim passes. Gated claims never
+fail the run; they print the prerequisite (a GPU, a dataset, real hardware, a
+trained checkpoint) so you can reproduce them yourself.
+
+## Grading
+
+- **MEASURED** — reproduced on our hardware, with the exact command recorded, and
+  pinned by a test that *fails on the pre-fix code*. `prove.sh` re-runs these.
+- **CLAIMED** — cited from a source, or measured by the source, but not
+  reproduced in this repo's automated harness.
+- **DATA-GATED / HARDWARE-GATED** — the *code path* is real and tested, but the
+  *accuracy/throughput claim* needs data or hardware we don't ship. We never
+  fabricate the number; the code carries a typed error or a `weights_trained`/
+  provenance flag instead.
+
+## The hard gate (run on any machine with Rust + Python)
+
+| Claim | Grade | Reproduce |
+|---|---|---|
+| Rust workspace: 3,128 tests, 0 failed | **MEASURED** | `cd v2 && cargo test --workspace --no-default-features` |
+| Deterministic CSI pipeline proof (bit-exact SHA-256) | **MEASURED** | `python archive/v1/data/proof/verify.py` → `VERDICT: PASS` |
+
+## Anti-slop assertion tests (each fails on the pre-fix code)
+
+| Claim | Grade | Test (run via `cargo test -p <crate> <name>`) |
+|---|---|---|
+| Fusion crafted-input DoS panics are closed (ADR-156 §2.2) | **MEASURED** | `wifi-densepose-ruvector :: triangulation_out_of_range_index_returns_none_no_panic` |
+| **The "Soul Signature" identity claim, honestly bounded:** on WiFi-only cardiac+respiratory channels two people are **not separable** (gap ≈ 0.0005) | **MEASURED** | `wifi-densepose-bfld :: cardiac_alone_cannot_separate_identity_matches_audit` |
+| OccWorld `predict()` is real (input-dependent), not random noise | **MEASURED** | `wifi-densepose-occworld-candle :: predict_is_deterministic_for_same_input` |
+| Pose runtime emits frames under its own default config (ADR-159 A1) | **MEASURED** | `cog-pose-estimation :: default_config_emits_frames_with_real_model` |
+| Person-count flags untrained classes — no count inflation (ADR-159 A2) | **MEASURED** | `cog-person-count :: untrained_class_argmax_is_flagged_low_confidence` |
+| Medical edge skills carry a "not a medical device" disclaimer (ADR-160 A1) | **MEASURED** | `wifi-densepose-wasm-edge :: a1_med_modules_have_clinical_disclaimer` (`--features std`) |
+| Survivor dedup 3→1, count-inflation killed (ADR-158 §2) | **MEASURED** | `wifi-densepose-mat :: test_identical_vitals_no_location_dedup_to_one` (`--features mat`) |
+
+## Measured performance (criterion; reproduce on your machine)
+
+| Claim | Grade | Reproduce |
+|---|---|---|
+| PSD FFT-planner cache 2.0–3.1×, DTW band 2.4–4.1× (ADR-154) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-signal` |
+| fuse() double-clone removed ~2.17× marshalling (ADR-156) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-ruvector --bench fusion_bench` |
+| zero-copy ORT input ~1.48× (ADR-155) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-nn --features onnx --bench onnx_bench` |
+| pointcloud splats 9→2 passes ~1.24× (ADR-160 research) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-pointcloud --bench splats_bench` |
+| native wlanapi multi-BSSID scan 9.74 Hz (vs netsh ~2 Hz) | **MEASURED (Windows)** | `cd v2 && cargo test -p wifi-densepose-wifiscan -- --ignored measure_native_scan_rate` |
+| wasm-edge `process_frame` hot-path latency (host proxy, ADR-163) | **MEASURED-on-host** (NOT the ESP32/WASM3 budget — needs hardware) | `cd v2/crates/wifi-densepose-wasm-edge && cargo bench --features std` |
+| cog steady-state CPU infer latency ~305 µs (ADR-163; NOT the manifest cold-start) | **MEASURED-on-host** | `cd v2 && cargo bench -p cog-person-count -p cog-pose-estimation --no-default-features --bench infer_bench` |
+
+## What we do NOT claim (the honest negatives — the strongest anti-slop signal)
+
+| Capability | Status |
+|---|---|
+| **Named person-identity from WiFi** | **NOT achieved, and measured why.** The §3.6 matcher is real, but identity does not lock on WiFi-only channels (gap 0.0005). DATA-GATED on a real enrollment feeding the AETHER/body-resonance channel — never done. No named-identity claim is made. |
+| WiFlow-STD ~96% PCK@20 | **CLAIMED-reproduced** on our RTX 5080 (`benchmarks/wiflow-std/RESULTS.md`); HARDWARE-GATED for you (needs an NVIDIA GPU + the MM-Fi dataset). The upstream *shipped checkpoint* was **REFUTED** (0.08% PCK) — we publish that. |
+| OccWorld trajectory accuracy | DATA-GATED on a trained checkpoint; `predict()` carries `weights_trained=false` until one is loaded — never silently faked. |
+| Edge-skill detection accuracy (seizure, weapon, affect, …) | UNVALIDATED — every such module is now disclaimer-gated as experimental/research; the DSP is real, the accuracy is not claimed. |
+| 802.11bf-2025 OTA conformance | No commodity silicon ships a conformant interface as of 2026; ours is a simulation-tested forward-compat protocol model, not a certified implementation. |
+
+## Provenance
+
+Every claim above traces to a committed ADR (`docs/adr/ADR-154`…`ADR-163`), a
+test, a criterion bench, `benchmarks/wiflow-std/RESULTS.md`, or
+`benchmarks/edge-latency/RESULTS.md`. The history
+includes published **retractions** (the 92.9% PCK retraction; the WiFlow-STD
+shipped-checkpoint refutation; the NV-diamond BOM reality check) — a faker hides
+failures; we commit them.
@@ -194,7 +194,7 @@ The separate **17-keypoint pose-estimation model** is now published at [`ruvnet/
 | **Efficiency frontier** | [`docs/benchmarks/wifi-pose-efficiency-frontier.md`](docs/benchmarks/wifi-pose-efficiency-frontier.md) | SOTA-beating WiFi pose in a 20 KB int4 edge model |
 | **Pretrained encoder** | [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained) | 82.3% held-out temporal-triplet, 8 KB int4 |
 | **Reproducible proof (Trust Kill Switch)** | [`archive/v1/data/proof/verify.py`](archive/v1/data/proof/verify.py) + [`expected_features.sha256`](archive/v1/data/proof/expected_features.sha256) | one-command deterministic pipeline replay (SHA-256 of output vs published hash) |
-| **Benchmark-proof ADR** | [ADR-147](docs/adr/ADR-147-benchmark-proof.md) | how the numbers are produced and verified |
+| **Benchmark-proof ADR** | [ADR-168](docs/adr/ADR-168-benchmark-proof.md) | how the numbers are produced and verified |
 | **Witness attestation** | [`docs/WITNESS-LOG-028.md`](docs/WITNESS-LOG-028.md) | 33-row capability attestation matrix with per-claim evidence |

 ```bash
@@ -501,7 +501,7 @@ Every WiFi signal that passes through a room creates a unique fingerprint of tha
 **What it does in plain terms:**
 - Turns any WiFi signal into a 128-number "fingerprint" that uniquely describes what's happening in a room
 - Learns entirely on its own from raw WiFi data — no cameras, no labeling, no human supervision needed
- Recognizes rooms, detects intruders, identifies people, and classifies activities using only WiFi
+- Recognizes rooms, detects intruders, and classifies activities using only WiFi (named person-identity is an experimental, data-gated research capability — see below, not a shipped feature)
 - Runs on an $8 ESP32 chip (the entire model fits in 55 KB of memory)
 - Produces both body pose tracking AND environment fingerprints in a single computation

@@ -512,7 +512,7 @@ Every WiFi signal that passes through a room creates a unique fingerprint of tha
 | **Self-supervised learning** | The model watches WiFi signals and teaches itself what "similar" and "different" look like, without any human-labeled data | Deploy anywhere — just plug in a WiFi sensor and wait 10 minutes |
 | **Room identification** | Each room produces a distinct WiFi fingerprint pattern | Know which room someone is in without GPS or beacons |
 | **Anomaly detection** | An unexpected person or event creates a fingerprint that doesn't match anything seen before | Automatic intrusion and fall detection as a free byproduct |
-| **Person re-identification** | Each person disturbs WiFi in a slightly different way, creating a personal signature | Track individuals across sessions without cameras |
+| **Person re-identification** *(experimental, research)* | A real per-channel similarity matcher (Soul Signature §3.6, `wifi-densepose-bfld`); **measured** result: on WiFi-only cardiac+respiratory channels alone two people are *not* separable (gap ~0.0005) | Honest research capability — **named identity is not claimed** and is data-gated on enrollment with the decisive AETHER/body-resonance channel. See [#1021](https://github.com/ruvnet/RuView/issues/1021) |
 | **Environment adaptation** | MicroLoRA adapters (1,792 parameters per room) fine-tune the model for each new space | Adapts to a new room with minimal data — 93% less than retraining from scratch |
 | **Memory preservation** | EWC++ regularization remembers what was learned during pretraining | Switching to a new task doesn't erase prior knowledge |
 | **Hard-negative mining** | Training focuses on the most confusing examples to learn faster | Better accuracy with the same amount of training data |
@@ -610,7 +610,7 @@ Verify the plugin structure: `bash plugins/ruview/scripts/smoke.sh`. Full detail
 | [User Guide](docs/user-guide.md) | Step-by-step guide: installation, first run, API usage, hardware setup, training |
 | [Build Guide](docs/build-guide.md) | Building from source (Rust and Python) |
 | [**Home Assistant + Matter Integration**](docs/integrations/home-assistant.md) | **Works with Home Assistant** via MQTT auto-discovery + **Works with Matter** (Apple Home / Google Home / Alexa / SmartThings) — full entity catalog, 3 starter blueprints, Lovelace dashboards, privacy mode, threshold tuning ([ADR-115](docs/adr/ADR-115-home-assistant-integration.md)). |
-| [**BFLD — Beamforming Feedback Layer for Detection**](v2/crates/wifi-densepose-bfld/README.md) | New privacy-gated WiFi sensing layer that measures + structurally prevents identity leakage from 802.11ac/ax Beamforming Feedback Information. Three type-enforced invariants (raw BFI never exits node, identity embedding is in-RAM-only, cross-site correlation cryptographically impossible via per-site BLAKE3 keyed hash + daily rotation). Ships full operator surface (`BfldPipeline`, `BfldPipelineHandle`, Soul Signature `SoulMatchOracle` integration), MQTT topic router + HA-DISCO + availability + LWT, 3 operator HA blueprints, two runnable examples, eclipse-mosquitto:2 CI service container. 327+ tests. [ADR-118](docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md) umbrella + sub-ADRs [119](docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md)/[120](docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md)/[121](docs/adr/ADR-121-bfld-identity-risk-scoring.md)/[122](docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md)/[123](docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md). Research dossier: [`docs/research/BFLD/`](docs/research/BFLD/) (11 files, 13,544 words). |
+| [**BFLD — Beamforming Feedback Layer for Detection**](v2/crates/wifi-densepose-bfld/README.md) | New privacy-gated WiFi sensing layer that measures + structurally prevents identity leakage from 802.11ac/ax Beamforming Feedback Information. Three type-enforced invariants (raw BFI never exits node, identity embedding is in-RAM-only, cross-site correlation cryptographically impossible via per-site BLAKE3 keyed hash + daily rotation). Ships full operator surface (`BfldPipeline`, `BfldPipelineHandle`, the Soul Signature §3.6 per-channel matcher `EnrolledMatcher`/`SoulMatchOracle` — experimental; named identity is data-gated, **measured** as not-separable on WiFi-only channels alone), MQTT topic router + HA-DISCO + availability + LWT, 3 operator HA blueprints, two runnable examples, eclipse-mosquitto:2 CI service container. 327+ tests. [ADR-118](docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md) umbrella + sub-ADRs [119](docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md)/[120](docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md)/[121](docs/adr/ADR-121-bfld-identity-risk-scoring.md)/[122](docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md)/[123](docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md). Research dossier: [`docs/research/BFLD/`](docs/research/BFLD/) (11 files, 13,544 words). |
 | [**SENSE-BRIDGE — rvagent MCP server**](tools/ruview-mcp/README.md) | Dual-transport MCP server (`@ruvnet/rvagent`) bridging the RuView sensing stack to AI agents (Claude Code, Cursor, ruflo swarms). 6 tools wired: `ruview.presence.now`, `ruview.vitals.get_{breathing,heart_rate,all}`, `ruview.bfld.last_scan`, `ruview.bfld.subscribe`. stdio + Streamable HTTP (`POST /mcp`, Origin-validated, bearer-token auth, `127.0.0.1` bind). Full 20-tool Zod schema barrel + 5 RUVIEW-POLICY governance tools. 93 tests. [ADR-124](docs/adr/ADR-124-rvagent-mcp-ruvector-npm-integration.md). Try: `npx @ruvnet/rvagent stdio`. |
 | [Semantic Primitives — Precision/Recall](docs/integrations/semantic-primitives-metrics.md) | Per-primitive F1 on the held-out paired-capture set: someone-sleeping, possible-distress, room-active, elderly-inactivity-anomaly, meeting, bathroom, fall-risk, bed-exit, no-movement, multi-room. |
 | [Claude Code / Codex Plugin](plugins/ruview/README.md) | The `ruview` plugin + marketplace — skills, `/ruview-*` commands, agents, and the Codex prompt mirror |
@@ -221,11 +221,15 @@ class ESP32BinaryParser:

        snr = float(rssi - noise_floor)
        frequency = float(freq_mhz) * 1e6
-        bandwidth = 20e6  # default; could infer from n_subcarriers

-        if n_subcarriers <= 56:
+        # Bandwidth inference (issue #1005): HE-LTF uses a 4x denser tone
+        # grid than HT-LTF on the same channel width — an HE-SU frame with
+        # 256 bins (242 active HE20 tones) is a *20 MHz* capture, not 160.
+        if ppdu_byte in (1, 2, 3):  # HE-SU / HE-MU / HE-TB
+            bandwidth = 40e6 if (flags_byte & 0x01) or n_subcarriers > 256 else 20e6
+        elif n_subcarriers <= 64:  # ESP32 HT20 delivers the full 64-bin FFT
            bandwidth = 20e6
-        elif n_subcarriers <= 114:
+        elif n_subcarriers <= 128:
            bandwidth = 40e6
        elif n_subcarriers <= 242:
            bandwidth = 80e6
@@ -0,0 +1,137 @@
+# Edge-Latency Benchmark Results — ADR-163
+
+Converting **CLAIMED** edge latency budgets into **MEASURED-on-host** numbers,
+closing the measurement debt flagged by Milestones 5/6 (ADR-159 / ADR-160).
+Benches + docs only — **no production-code behavior changed**.
+
+## The honest caveat, up front (read before citing any number)
+
+Two distinct gaps separate every number below from the figure it is converting:
+
+1. **Host ≠ ESP32.** The wasm-edge skill modules document budgets *"on ESP32-S3
+   WASM3"* (e.g. `exo_time_crystal`: "H (<10 ms)"). These benches run **native
+   x86_64 on a development laptop**, not the Xtensa/WASM3 target. A native host
+   median is an **upper bound on the algorithm's work**, not the ESP32 number.
+   WASM3 interpretation on a ~240 MHz Xtensa core is typically 1–2 orders of
+   magnitude slower than native `-O` host code, so a host median far under the
+   budget **does NOT prove the ESP32 meets it.** *The ESP32 figure is NOT
+   reproduced here — it needs hardware.*
+
+2. **Bench ≠ the doc-claimed measurement.** For the cogs, the manifest cites a
+   **cold-start** number (`cold_start_ms_avg`, weight-load included); these
+   benches measure **steady-state** per-frame `infer` (warm, weights resident).
+   Different measurements; we report both, labelled.
+
+Grades (per `benchmarks/wiflow-std/RESULTS.md` / ADR-152 vocabulary):
+- **MEASURED-on-host** — reproduced in this repo on the machine below, exact
+  command recorded. NOT the ESP32 / NOT the cold-start figure.
+- **CLAIMED (ESP32)** — the doc budget; UNMEASURED on hardware here.
+
+## Machine
+
+| | |
+|---|---|
+| Host | `ruvzen` (Windows 11, this dev box) |
+| CPU | Intel Core Ultra 9 285H |
+| Toolchain | `cargo 1.91.1`, `--release` (opt-level per crate profile) |
+| Bench harness | criterion 0.5 (`time: [low **median** high]` reported below) |
+| Date | 2026-06-12 |
+
+Run-to-run spread on this box is non-trivial (criterion's low/high bracket the
+median by a few %); the medians below are single-session captures with the smoke
+settings `--warm-up-time 1 --measurement-time 2` (wasm-edge) / `3` (cogs). Re-run
+for your own machine — the absolute numbers are host-specific.
+
+---
+
+## T1 — wasm-edge `process_frame` hot paths (ADR-160 deferred item → DONE host)
+
+The crate is **excluded from the v2 workspace**; bench from the crate dir.
+
+```bash
+cd v2/crates/wifi-densepose-wasm-edge
+cargo bench --features std -- --warm-up-time 1 --measurement-time 2
+# med_seizure_detect is medical-experimental-gated:
+cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure
+```
+
+| Hot path (M6-audit-named) | Bench id | Host median | Grade | Doc budget (CLAIMED, ESP32) |
+|---|---|---|---|---|
+| `exo_time_crystal` 256-pt × 128-lag autocorrelation (full buffer) | `exo_time_crystal::process_frame[autocorr_256x128]` | **17.3 µs** | MEASURED-on-host | "H (<10 ms) on ESP32-S3 WASM3" — **NOT reproduced here (needs hardware)** |
+| `exo_ghost_hunter` empty-room periodicity + hidden-breathing | `exo_ghost_hunter::process_frame[empty_room_periodicity]` | **1.44 µs** | MEASURED-on-host | research/exotic; no firm ESP32 figure — host proxy only |
+| `sec_weapon_detect` per-subcarrier Welford (MAX_SC=32) | `sec_weapon_detect::process_frame[per_sc_welford]` | **0.42 µs** (420 ns) | MEASURED-on-host | research-grade; calibration-gated — host proxy only |
+| `med_seizure_detect` clonic-phase rhythm path (steady-state frame) | `med_seizure_detect::process_frame[clonic_rhythm]` | **0.10 µs** (105 ns) | MEASURED-on-host (feature-gated) | doc budget "S (<5 ms) on ESP32"; **NOT reproduced here** |
+
+Reading these honestly:
+
+- `exo_time_crystal` at **17.3 µs host** is the only one whose host cost is even
+  in the same *thousandths* of its 10 ms ESP32 budget — it does the most work
+  (~32K MACs/frame). 17.3 µs native says the algorithm is cheap; it says
+  **nothing** about whether WASM3-on-Xtensa lands under 10 ms. A naïve
+  host→ESP32 extrapolation (assume 100× interpreter+clock penalty) would put it
+  near ~1.7 ms, comfortably under — **but that is an extrapolation, not a
+  measurement**, and is recorded here only to show the host number is not
+  obviously in tension with the budget. ESP32 figure: **UNMEASURED**.
+- `med_seizure_detect`'s 105 ns is the **steady-state** per-frame cost; the
+  expensive clonic autocorrelation only fires when the state machine is in the
+  clonic phase, so this is a lower-bound on the heavy path, not the worst case.
+  It is still a real, committed host datapoint.
+- The pre-existing `tests/budget_compliance.rs` already asserts the L/S/H
+  wall-clock tiers (25 passing tests); these criterion benches add the
+  regression-grade, reproducible median that ADR-160 deferred.
+
+---
+
+## T2 — cog steady-state inference latency (ADR-159/160 deferred item → DONE)
+
+Cog crates are normal workspace members; bench from `v2/`. Real weights
+(`count_v1.safetensors` / `pose_v1.safetensors`) ship in-repo under each cog's
+`cog/artifacts/`, so the bench measures the **real Candle CPU forward**, not the
+stub (the bench `assert!`s `backend().starts_with("candle-")`).
+
+```bash
+cd v2
+cargo bench -p cog-person-count  --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3
+cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3
+```
+
+| Cog | Bench id | Host median (steady-state infer, CPU) | Grade | Manifest cold-start (CLAIMED, different measurement + machine) |
+|---|---|---|---|---|
+| cog-person-count | `cog_person_count::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | — (person-count manifest carries comparable provenance) |
+| cog-pose-estimation | `cog_pose_estimation::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | `cold_start_ms_avg: 5.4` (30 invocations, **ruvultra/RTX 5080 host**, candle 0.9 cpu) — **cold-start, NOT steady-state; NOT this machine** |
+
+> Spread caveat (observed, honest): both medians above were captured with the box
+> otherwise idle. A re-run of the validate-form command *while a second cargo job
+> was loading the same cores* gave 385 µs (person-count) / 973 µs (pose) —
+> the criterion low/high bracket widens to ~0.34–1.18 ms under contention. The
+> 305 µs figures are the idle-box datapoints; the absolute number is host- and
+> load-dependent (the ~10× pose swing is core contention, not a code change).
+
+Reading these honestly:
+
+- **Steady-state ≠ cold-start.** The pose manifest's `5.4 ms` folds in one-time
+  weight load / mmap / first-forward allocation. This bench warms the engine
+  first and times only the recurring per-frame forward, on a *different
+  machine*. The two numbers are not comparable and we do not claim this bench
+  reproduces the 5.4 ms manifest figure.
+- Both cogs share the same conv encoder; person-count adds a count head +
+  confidence head, pose adds a 256-wide MLP head. The host steady-state cost is
+  dominated by the three dilated Conv1d layers (56→64→128→128) shared by both —
+  which is why both land at ~305 µs.
+- **Empirical confirmation of the steady-state/cold-start gap:** pose
+  steady-state (305 µs host) is ~18× *under* the manifest's 5.4 ms cold-start.
+  Even accounting for the different machine, this is the expected shape — the
+  bulk of cold-start is one-time setup, not the forward pass — and it is exactly
+  why conflating the two would be dishonest.
+
+---
+
+## Status vs the deferred items
+
+| Deferred item | Was | Now |
+|---|---|---|
+| ADR-160 "Criterion benches for `process_frame` budget claims" | ACCEPTED-FUTURE | **DONE (host)**; ESP32-on-hardware still **PENDING** (needs the wasm32 target + a flashed ESP32-S3) |
+| ADR-159/160 cog inference latency (`cold_start_ms_avg` uncommitted-benched) | CLAIMED | **MEASURED-on-host (steady-state)**; cold-start-on-ruvultra remains the manifest's separate claim |
+
+Nothing here changes runtime behavior — these are benches + this results file
+only. No crate needs republishing.
@@ -0,0 +1,132 @@
+# Edge-Skill Synthetic-Ground-Truth Validation — RESULTS
+
+**Crate:** `v2/crates/wifi-densepose-wasm-edge` (workspace-EXCLUDED — build from its own dir)
+**Branch:** `feat/edge-skills-synthetic-validation`
+**ADR:** [ADR-160](../../docs/adr/ADR-160-edge-skill-library-honest-labeling.md)
+**Date:** 2026-06-13
+**Harness:** `tests/synthetic_validation.rs`
+
+> **HONESTY BOUNDARY — read first.** Everything below is **synthetic-ground-truth
+> validation**: a signal is *planted* with a known answer, the **real** detector
+> is run, and detection accuracy / precision / recall / rate-error is **measured**.
+> This is **NOT field accuracy.** A skill that recovers a planted sinusoid here is
+> proven to do the math it claims on a *constructed* signal; it is **NOT** proven
+> to work on real CSI in a real room. Skills whose detection target cannot be
+> honestly planted (clinical, weapon, affect, sleep-stage, sign-language) are
+> **NOT** given a number — they are listed under **DATA-GATED** with the real
+> data each would require.
+
+## Reproduce
+
+```bash
+cd v2/crates/wifi-densepose-wasm-edge   # workspace-excluded; build here
+cargo test --features std --test synthetic_validation -- --nocapture
+# also runs under the medical tier (med_* skills stay DATA-GATED, not validated):
+cargo test --features std,medical-experimental --test synthetic_validation -- --nocapture
+```
+
+Each `MEASURED-on-synthetic | …` line printed by the harness is the source of the
+table below. Numbers are deterministic (no RNG; pseudo-noise uses a fixed LCG seed).
+
+---
+
+## MEASURED-on-synthetic (constructible skills)
+
+| Skill | What was planted (ground truth) | Result | Grade |
+|-------|----------------------------------|--------|-------|
+| **vital_trend** | BPM held N≥6 calls at each threshold band (brady/tachy-pnea <12 / >25, brady/tachy-cardia <50 / >120, apnea breathing<1.0 for ≥20) vs normal | **acc 1.000, prec 1.000, recall 1.000** (TP5 FP0 TN5 FN0) | MEASURED |
+| **exo_time_crystal** | period-2 coordinated motion vs pseudo-noise + flat | **acc 1.000** (TP1 FP0 TN2 FN0) | MEASURED † |
+| **exo_ghost_hunter** (hidden breathing) | phase sinusoid at lag-8 (breathing band 5–15) in an empty room vs flat phase | **acc 1.000**; planted score **1.000**, flat **0.000** | MEASURED |
+| **occupancy** | 220-frame flat-amplitude calibration, then strong per-zone amplitude variance vs flat | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
+| **intrusion** | calibrate→arm (330 quiet frames), then per-subcarrier Δphase>1.5 + Δamp≫3σ vs quiet | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
+| **exo_rain_detect** | empty room, 60-frame baseline, then broadband variance (8/8 groups, ratio≫2.5) for ≥10 frames vs stable-low | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
+| **sig_flash_attention** | sustained high phase+amplitude in each of the 8 subcarrier groups; assert reported attention peak == planted group | **peak-localization 8/8 = 1.000** | MEASURED |
+| **spt_spiking_tracker** | sparse (2-subcarrier) large phase-delta in each of the 4 zones; assert tracked zone == planted zone | **zone-localization 4/4 = 1.000** | MEASURED ‡ |
+| **sig_optimal_transport** | sustained large frame-to-frame amplitude-distribution change vs stationary | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
+| **sig_mincut_person_match** | 2 persons with distinct stable per-region variance signatures over 40 frames | **person ids assigned, 0 id-swaps / 40 frames** | MEASURED |
+| **lrn_dtw_gesture_learn** | stillness → 3 identical gesture rehearsals → enrollment | **template enrolled (templates=1)** | MEASURED (enroll) §|
+| **sig_sparse_recovery** | 30 clean frames to init, then 8/32 (25%) nulled subcarriers | **dropout-detect + recovery-trigger = PASS** | MEASURED (trigger) ¶|
+
+### Caveats on individual results
+
+† **exo_time_crystal — honest discriminative limit.** A *pure* periodic signal
+already has autocorrelation peaks at lag L **and** 2L (natural harmonics), so this
+"period-doubling" detector cannot separate a true period-2 sub-harmonic from a
+plain periodic signal — an earlier plant using a clean sine produced a *false
+positive* (recorded during development). The construct it **can** discriminate
+with known ground truth is **periodic-coordination vs aperiodic** (noise/flat),
+which is what is measured (1.000). The original "sub-harmonic vs clean period"
+claim is **NOT** validatable with this algorithm.
+
+‡ **spt_spiking_tracker — plant must be sparse.** With weights init'd home=1.0 /
+cross=0.25, firing all 8 inputs in a zone (8×0.25=2.0 > threshold 1.0) overdrives
+*every* output neuron and the tracker collapses to zone 0 (measured 1/4 during
+development). Firing only 2 inputs (home 2.0 fires, cross 0.5 silent) yields clean
+4/4 zone localization. The validatable claim is *single-zone* localization.
+
+§ **lrn_dtw_gesture_learn — enrollment validated; replay-match NOT.** The
+deterministic, constructible part (stillness → 3 identical rehearsals → a template
+is enrolled) is MEASURED. The DTW *replay match* (731) did **not** fire on the
+identical replay in this run (`match_same=false`) — replay-recognition accuracy is
+**reported, not asserted**, and is not claimed as validated.
+
+¶ **sig_sparse_recovery — trigger validated; recovery accuracy is NEGATIVE.**
+The dropout-detection + ISTA-recovery *trigger* pipeline fires correctly on >10%
+planted nulls (asserted). But the **measured recovery accuracy is NOT a win**:
+recovered RMSE **1.0045** vs unrecovered-null RMSE **0.9830** (**−2.2%**, i.e.
+slightly *worse* than leaving the nulls at zero) on a neighbor-correlated signal.
+The tridiagonal correlation model's fixed point does not equal the planted truth.
+**The recovery's reconstruction quality is therefore NOT validated as effective on
+synthetic data** — only its detection/trigger path is. Reported honestly; no
+positive number claimed.
+
+---
+
+## DATA-GATED — NOT validatable on synthetic data
+
+Planting a "seizure-like" / "weapon-like" / "happy-like" synthetic signal and
+claiming the detector "works" validates **nothing real** and is exactly the
+AI-slop this project fights. These skills run real DSP (per ADR-160, 0 stubs) and
+keep their ADR-160 disclaimers, but get **no accuracy number** here. Each needs
+the specific real, labelled data listed:
+
+| Skill | Why not constructible on synthetic | Real data required |
+|-------|------------------------------------|--------------------|
+| `med_seizure_detect` | "seizure-like" motion is not a seizure; no ground-truth signature exists synthetically | Clinical EEG-/video-labelled tonic-clonic seizure CSI from instrumented patients |
+| `med_sleep_apnea` | a planted breathing-pause is not clinical apnea (AHI scoring, hypopnea, desaturation) | Polysomnography-labelled (PSG) overnight CSI with scored apnea/hypopnea events |
+| `med_cardiac_arrhythmia` | a synthetic HR sequence cannot encode true arrhythmia morphology | ECG-labelled CSI (AFib/PVC/etc.) from clinical monitoring |
+| `med_respiratory_distress` | distress is a clinical gestalt, not a plantable rate | Clinician-labelled respiratory-distress CSI episodes |
+| `med_gait_analysis` | clinical gait metrics need a reference motion-capture standard | Mocap-/force-plate-labelled gait CSI |
+| `sec_weapon_detect` | a high variance ratio is RF reflectivity, **not** weapon discrimination (ADR-160 §A3 already renamed the event to `HIGH_METAL_REFLECTIVITY`) | Labelled metal-object-vs-no-object CSI with controlled object classes |
+| `exo_emotion_detect` | affect is not recoverable from a planted heuristic; outputs are proxies (ADR-160 §A2) | Validated affect-labelled CSI (self-report / physiological ground truth) |
+| `exo_happiness_score` | "happiness" is a gait-energy proxy, not a measured affect (ADR-160 §A2) | Validated affect/valence-labelled CSI |
+| `exo_dream_stage` | sleep staging needs PSG reference (EEG/EOG/EMG) | PSG-staged overnight CSI |
+| `exo_gesture_language` | coarse gesture clusters ≠ true sign language (ADR-160 §A4) | Labelled ASL letter/word CSI dataset |
+
+> The above are **not failures** — they are the honest boundary. A smaller set of
+> genuinely-measured skills plus this explicit gated list is the deliverable, per
+> the prove-everything directive.
+
+---
+
+## Skills not in either list
+
+The remaining edge skills (smart-building / retail / industrial occupancy-style,
+the other `sig_*`/`lrn_*`/`spt_*`/`tmp_*`/`qnt_*`/`aut_*`/`ais_*` algorithm-named
+modules) are **wired and exercised live** in the unified pipeline integration test
+(`tests/pipeline_all.rs`, all 59 default / 64 medical skills run without panic over
+300 synthetic frames) but were **not** given an individual planted-ground-truth
+accuracy number here. They are honest REAL-DSP modules (ADR-160) whose physical
+observable could be planted with more harness work; that is deferred, not claimed.
+
+## Test counts (full crate suite)
+
+```
+DEFAULT  (--features std):                     631 passed, 0 failed
+  (lib 504; budget 25; honest_labeling 10; pipeline_all 4; synthetic_validation 12; bench 1; vendor 75)
+MEDICAL  (--features std,medical-experimental): 669 passed, 0 failed
+  (lib 542; +16 same new tests; med_* stay DATA-GATED, not validated)
+```
+
+(M6 baseline was 615 / 653; the new pipeline_all (4) + synthetic_validation (12)
+tests add 16 to each tier.)
@@ -0,0 +1,26 @@
+# Upstream clone (WiFlow-STD, DY2434) -- never commit third-party code/weights
+upstream/
+
+# Local python env
+.venv/
+
+# Downloaded data / artifacts
+data/
+downloads/
+*.pth
+*.pt
+*.npy
+*.npz
+*.zip
+*.mat
+*.safetensors
+results/parity_fixture.json
+__pycache__/
+*.onnx
+
+# Committed ground truth: corruption masks for the pristine Kaggle download.
+# remote/clean_v2.py zeroes the corrupted source windows IN PLACE, so these
+# masks CANNOT be regenerated from a cleaned copy (generate_corruption_masks.py
+# documents the criteria and reproduces them only from a fresh download).
+!results/nan_windows_mask.npy
+!results/big_windows_mask.npy
@@ -0,0 +1,486 @@
+# WiFlow-STD (DY2434) Benchmark Results — ADR-152 §2.2
+
+Upstream: <https://github.com/DY2434/WiFlow-WiFi-Pose-Estimation-with-Spatio-Temporal-Decoupling>
+pinned at `06899d29` (2026-04-05), Apache-2.0. Dataset: Kaggle `kaka2434/wiflow-dataset`
+(12.8 GB archive → 15.5 GB extracted; 360,000 windows of 540×20 CSI + 15-keypoint 2D labels).
+
+Published claims (README "Setting 1"): PCK@20 97.25%, PCK@30 98.63%, PCK@40 99.16%,
+PCK@50 99.48%, MPJPE 0.007 m, 2.23M params, 0.07 GFLOPs.
+
+## Measurement (a): their model on their data
+
+### Artifact verification (MEASURED, 2026-06-10, this repo `eval_repro.py`)
+
+| Check | Result |
+|---|---|
+| Parameter count | **2,225,042 (2.23M) — matches claim** |
+| FLOPs (torch profiler, batch 1) | ~0.055 GFLOPs — consistent with 0.07B claim |
+| CPU latency (Windows box, torch 2.12 CPU) | 13.2 ms/window @ batch 1 (76/s); 2.48 ms/sample @ batch 64 (403/s) |
+| Checkpoint load | `weights_only=True` (no pickle code execution) |
+
+### Released checkpoint does NOT reproduce the claims — REFUTED as shipped
+
+Running the released `best_pose_model.pth` through the released code on the released
+dataset with the released split procedure (seed-42 file-level 70/15/15; 54,000 test
+samples) yields:
+
+| Metric | Published | Measured (shipped checkpoint) |
+|---|---|---|
+| PCK@20 | 97.25% | **0.08%** |
+| PCK@30 | 98.63% | 0.78% |
+| PCK@40 | 99.16% | 5.53% |
+| PCK@50 | 99.48% | 15.42% |
+| MPJPE | 0.007 | **NaN** (dataset contains NaN CSI windows) |
+
+Raw output: `results/repro_a.json`.
+
+Diagnostics (on 2,000 NaN-free windows from the first files of the dataset, i.e.
+mostly would-be *training* data — so this is not a split mismatch):
+
+- Predictions correlate with targets (Pearson r ≈ 0.76) — the checkpoint is a trained
+  model, but in a **different keypoint normalization/order** than the released data.
+- Best-case post-hoc global per-axis affine correction: PCK@20 ≈ 20%.
+- Best-case per-keypoint affine correction (15×2 fitted transforms — generous
+  cheating): PCK@20 ≈ 72%, still far below 97.25%.
+- Pred↔target keypoint correspondence matrix is degenerate (multiple predicted
+  keypoints best-match the same target joint) — keypoint convention mismatch.
+
+### Reproducibility defects in the released artifacts
+
+1. `models/__init__.py` imports `TemporalConvNet`, which `models/tcn.py` does not
+   define — **the published code does not import/run as-is**.
+2. The released root checkpoint uses pre-rename module names (`att.*`, `final_conv.*`)
+   vs the published code (`attention.*`, `decoder.*`) — same shapes/param count, but
+   confirms the checkpoint predates the published code.
+3. The second shipped checkpoint (`cross_dataset_test/WiFlow/best_pose_model.pth`) is
+   a **different architecture** (342-channel input = MM-Fi layout, 3 TCN layers,
+   3-channel/3D decoder) — not usable on their own dataset.
+4. `run.py` ignores `--data_dir` and hardcodes `../preprocessed_csi_data`.
+5. The released dataset's final 13 files (indices 487–499; 9,072 windows, 2.52%)
+   are corrupted: NaN values plus garbage amplitudes up to 3.4e38 (float32 max) in
+   data that is otherwise [0,1]-normalized. Upstream code has no NaN/inf handling;
+   training as published on this download diverges — the first corrupted batch
+   overflows fp16 autocast and permanently poisons BatchNorm running statistics
+   (GradScaler step-skipping does not protect BN). The authors' training curves
+   show normal convergence, so their local data evidently differed from the
+   Kaggle upload. Window masks: `results/nan_windows_mask.npy`,
+   `results/big_windows_mask.npy`.
+
+### Reproducing the corruption masks
+
+The two mask files (9,070 NaN/Inf windows, 9,072 with |amplitude| > 1.5;
+union 9,072, all in dataset files 487–499) are **committed ground truth**
+(gitignore-negated, ~352 KB each). They can only be regenerated from a
+**pristine** Kaggle download: `remote/clean_v2.py` repairs the dataset by
+zeroing the corrupted windows in place, after which the corruption evidence
+is gone and a rescan returns all-False. `generate_corruption_masks.py`
+re-derives them (chunked scan, criteria: any non-finite value OR
+max |finite| > 1.5 per 540×20 window) and refuses to write all-False masks,
+which indicate a cleaned copy. Verified 2026-06-11: a regeneration from the
+local pristine download is bit-identical to the committed masks.
+
+### Retraining result (MEASURED, 2026-06-10): claims APPROXIMATELY REPRODUCED
+
+Since the shipped checkpoint is unusable, measurement (a) fell back to retraining
+with upstream code + defaults (seed 42, batch 64, early-stopped at epoch 41 of 50,
+best epoch 36, ~75 s/epoch) on ruvultra (RTX 5080). Deviations, all forced and
+documented: one-line fix for defect (1); torch 2.x+cu128 instead of pinned 2.3.1
+(Blackwell sm_120 unsupported); the 9,072 corrupted windows (defect 5) zeroed
+entirely — without this the published pipeline produces NaN from epoch 1 (observed).
+Scripts mirrored in `remote/`; raw metrics in `results/eval_retrained.json`.
+
+| Metric | Published | Retrained (full test, 54,000) | Retrained (corruption-free, 52,560) |
+|---|---|---|---|
+| PCK@20 | 97.25% | **96.09%** | **96.61%** |
+| PCK@30 | 98.63% | 97.89% | 98.23% |
+| PCK@40 | 99.16% | 98.58% | 98.79% |
+| PCK@50 | 99.48% | 98.99% | 99.11% |
+| MPJPE | 0.007 | 0.0098 | 0.0094 |
+
+Within ~0.6–1.2 PCK points of every published figure (single run, corrupted train
+windows zeroed, different torch/GPU). **Verdict: the accuracy claims are credible
+and approximately reproducible — but only after repairing the released dataset and
+code.** Val best: PCK@20 96.99%, MPJPE 0.0086 (epoch 36).
+
+One more defect found during the run:
+
+6. `train.py` calls `plot_training_history`, which is not defined anywhere — the
+   built-in post-training test evaluation is unreachable as published (crashes
+   with NameError after training completes).
+
+## ADR-152 §2.2 citation rule
+
+Evidence grade for the WiFlow-STD accuracy claims after measurement (a):
+**MEASURED-EQUIVALENT (96.1–96.6% PCK@20 reproduced by retraining; shipped
+checkpoint REFUTED; dataset/code require repairs)**. RuView docs may cite
+"~96% PCK@20 (our reproduction)" — still **not comparable** to our 17-keypoint
+ESP32 numbers (different hardware, 5 subjects, in-domain random split,
+15 keypoints).
+
+## Edge optimization (measured)
+
+ADR-152 "optimize beyond SOTA" track, 2026-06-10, this Windows box (Windows 11,
+16 torch threads, torch 2.12.0+cpu, onnxruntime 1.26.0). Subject: the retrained
+checkpoint `results/retrained_best_pose_model.pth` (2,225,042 fp32 params).
+Scripts: `quantize_bench.py`, `onnx_bench.py`, `eval_ort_accuracy.py`.
+Raw numbers: `results/edge_optimization.json`.
+
+Accuracy is on a **10,000-window seed-42 random subset** of the corruption-free
+test split (same seed-42 file-level 70/15/15 split as `eval_repro.py`; 54,000
+test windows, 1,440 corrupted excluded via `results/nan_windows_mask.npy` |
+`results/big_windows_mask.npy`, leaving 52,560; subset drawn with
+`np.random.default_rng(42)`). The fp32 subset PCK@20 (96.68%) matches the full
+clean-test figure (96.61%), so the subset is representative.
+
+Latency is CPU ms/window, median of repeated runs, 3 interleaved repetitions
+per variant (medians below; run-to-run spread on this box is large, roughly
+±20-40% at batch 1 — reps are in the JSON).
+
+| Variant | Disk size | Batch 1 (ms/win) | Batch 64 (ms/win) | PCK@20 | PCK@50 | MPJPE |
+|---|---|---|---|---|---|---|
+| torch fp32 (baseline) | 9.07 MB | 11.0 | 2.27 | 96.68% | 99.15% | 0.00936 |
+| torch fp16 (`.half()`) | **4.58 MB** | 24.3 | 2.42 | 96.68% | 99.15% | 0.00946 |
+| torch int8 dynamic | 9.07 MB (unchanged) | 15.6 | 2.06 | 96.68% (identical) | 99.15% | 0.00936 |
+| ONNX fp32 (onnxruntime) | 8.97 MB | **3.2** | **2.0** | 96.68% | 99.15% | 0.00936 |
+| ONNX int8 (ORT dynamic, supplementary) | **2.44 MB** | 6.5 | 5.8 | 96.52% | 99.15% | 0.01108 |
+
+Findings:
+
+- **torch dynamic INT8 quantizes nothing on this model.** The architecture has
+  **zero `nn.Linear` layers** — it is entirely Conv1d (21) + Conv2d (22) +
+  BatchNorm. `torch.ao.quantization.quantize_dynamic` (requested over
+  `{Linear, Conv1d, Conv2d}`) converted **0 modules / 0.0% of params**: dynamic
+  quantization only has kernels for Linear/RNN-family modules and silently
+  skips convolutions. The "int8" model is bit-identical to fp32 (same outputs,
+  same 9.07 MB). Conv quantization would require static (PTQ) quantization
+  with calibration — out of scope here; the ORT dynamic path below is the
+  honest int8 datapoint.
+- **fp16 halves size for free accuracy-wise** (PCK@20 −0.005 pt, MPJPE
+  +0.0001) but is *slower* on CPU at batch 1 (~2.2×) — torch CPU fp16 conv
+  kernels are emulated. fp16 is a storage/transport format here, not a CPU
+  runtime win.
+- **ONNX Runtime is the real batch-1 latency win: ~3.4× faster than torch**
+  (3.2 vs 11.0 ms/window) at identical accuracy (parity 2.4e-7).
+
+### Verdict on the paper's "~2.2 MB int8" claim
+
+**Plausible but not free, and unreachable by the obvious PyTorch route.**
+2,225,042 params × 1 byte ≈ 2.2 MB assumes *every* parameter quantizes.
+PyTorch dynamic quantization — the one-liner most readers would reach for —
+yields **9.07 MB (0% quantized)** because the model has no Linear layers.
+ONNX Runtime dynamic quantization, which does have int8 conv weight support,
+gets **2.44 MB** (close to the claim; the overhead is BatchNorm params/buffers
+and quantization scales kept in fp32) at a measurable accuracy cost:
+PCK@20 96.68 → 96.52% (−0.16 pt) and MPJPE 0.00936 → 0.01108 (+18%), and
+~2× slower inference than ONNX fp32 (ConvInteger kernels). The paper does not
+state a method or an int8 accuracy; treat "2.2 MB" as a weight-arithmetic
+estimate, achievable in practice only via conv-capable quantization toolchains
+and with a small accuracy penalty.
+
+### ONNX export status
+
+**Works.** Exported via the TorchScript exporter (`dynamo=False`), opset 17,
+with a dynamic batch axis — `results/retrained_fp32_dynamic.onnx` (8.97 MB),
+verified to run at batch 1/2/64. The axial attention's
+`view(N*W, C, H)` reshape traced correctly (sizes recorded as graph ops, not
+baked constants). The dynamo exporter also captures the graph but crashed on
+this box writing a ✅ to a cp1252 console (cosmetic Windows encoding issue, not
+a model blocker). Parity vs torch on the stored fixture
+(`results/parity_fixture.npz`, batch 2, seed 42): **max abs diff 2.4e-7 —
+PASS** (< 1e-4). ORT-quantized int8 model: `results/retrained_int8_ort_dynamic.onnx`.
+
+### Static PTQ (calibrated) — follow-up
+
+Follow-up to the dynamic-int8 row above (2026-06-10, same box, onnxruntime
+1.26.0): ONNX Runtime **static** post-training quantization
+(`quantize_static`, QDQ format, per-channel int8 weights + int8 activations)
+of the same fp32 export, calibrated on **corruption-free TRAINING-split
+windows only** (seed-42 file-level split, same masks; 1,000 windows for
+MinMax, 512 for the histogram calibrators; never test windows). Scopes:
+"conv-only" (`op_types_to_quantize=["Conv"]` — the attention path exports as
+Einsum/Softmax, which ORT never quantizes anyway, so "all-ops" additionally
+quantizes the elementwise Mul/Sigmoid/Add/AveragePool glue). Accuracy on the
+identical 10k-window seed-42 corruption-free test subset; latency median of
+3 interleaved reps (fp32/dynamic re-benched in-session as references).
+Script: `static_ptq_bench.py`; raw: `results/edge_optimization.json`
+(`onnx_static_ptq`).
+
+| Variant | Disk size | Batch 1 (ms/win) | Batch 64 (ms/win) | PCK@20 | PCK@50 | MPJPE |
+|---|---|---|---|---|---|---|
+| ONNX fp32 (reference) | 8.97 MB | 2.5 | 1.9 | 96.68% | 99.15% | 0.00936 |
+| ORT dynamic int8 (baseline) | **2.44 MB** | 5.7 | 4.6 | 96.52% | 99.15% | 0.01108 |
+| static QDQ **Percentile(99.99) conv-only** | 2.53 MB | 5.3 | 4.7 | 96.61% | 99.16% | **0.01031** |
+| static QDQ MinMax conv-only | 2.53 MB | 5.2 | 3.3 | **96.63%** | 99.19% | 0.01084 |
+| static QDQ Entropy conv-only | 2.53 MB | 5.2 | 3.1 | 96.60% | 99.19% | 0.01078 |
+| static QDQ MinMax all-ops | 2.60 MB | 6.5 | 3.9 | 95.45% | 99.14% | 0.01486 |
+| static QDQ Entropy all-ops | 2.60 MB | 5.7 | 4.1 | 95.30% | 99.13% | 0.01510 |
+| static QDQ Percentile all-ops | 2.60 MB | 5.3 | 4.3 | 96.39% | 99.17% | 0.01218 |
+
+**Verdict: static PTQ (conv-only) is the new best int8 point on accuracy —
+but only modestly, and it does not fix int8's latency penalty.**
+
+- **Accuracy: beats dynamic.** All three conv-only calibrations land at
+  PCK@20 96.60–96.63% (vs dynamic 96.52%, fp32 96.68% — recovers ~⅔ of the
+  dynamic gap) and MPJPE 0.0103–0.0108 (vs dynamic 0.01108). Best MPJPE:
+  Percentile conv-only, +10% over fp32 instead of dynamic's +18%.
+- **Size: slightly worse.** 2.53 MB vs 2.44 MB (+3.6%) — QDQ nodes and
+  per-channel scales cost a little; BatchNorm stays fp32 in both (the 12 BNs
+  follow Slice/Einsum/Reshape, never Conv, so they cannot be folded).
+- **Latency: a wash vs dynamic, still ~2× slower than ONNX fp32 at batch 1.**
+  Batch-1 medians 5.2–5.3 vs dynamic 5.7 ms/win in-session — within this
+  box's ±20–40% noise. Batch 64 leans static (3.1–3.3 for MinMax/Entropy
+  conv-only vs 4.6), same caveat.
+- **All-ops QDQ is strictly worse**: up to −1.4 pt PCK@20 and +60% MPJPE for
+  zero size/latency benefit — int8 activations through the elementwise glue
+  around the attention blocks is where the damage is. Conv-only is the right
+  scope.
+- Negative result worth recording: **Entropy calibration is a no-op here** —
+  on an identical calibration set it selects full-range thresholds
+  bit-identical to MinMax (all 247 scales equal; verified on a 64-window
+  smoke set). Also, ORT 1.26's `CalibMaxIntermediateOutputs` raises a
+  spurious "No data is collected" when the batch count divides the chunk
+  size (worked around in the script).
+
+Deployment guidance: need speed → ONNX fp32 (3.2 ms b1). Need int8 weights
+for size → static QDQ conv-only (Percentile or MinMax,
+`results/retrained_int8_static_percentile_conv.onnx`), which strictly
+dominates dynamic int8 on accuracy at ~equal latency and +0.09 MB.
+
+## Efficiency sweep (MEASURED, overnight 2026-06-10/11)
+
+ADR-152 beyond-SOTA track: compact purpose-built variants of the WiFlow-STD
+architecture, trained from scratch on the same cleaned dataset, identical
+seed-42 file-level split, loss and protocol as the measurement-(a) reference
+(fp32, batch 64, ≤50 epochs, patience 5; RTX 5080, ~22–29 min/variant).
+Variant transforms are pure channel/group/stride scalings of an
+architecture-exact parameterized model (validated: reproduces 2,225,042 params
+at the reference config). Scripts: `remote/sweep/`; raw:
+`results/efficiency_sweep.jsonl`; checkpoints `results/{half,quarter,tiny}_best.pth`
+(gitignored).
+
+| Variant | Params | vs 2.23M | Clean-test PCK@20 | PCK@50 | MPJPE | Best epoch |
+|---|---|---|---|---|---|---|
+| full (reference, meas. a) | 2,225,042 | 1× | 96.61% | 99.11% | 0.0094 | 36 |
+| **half** | **843,834** | **0.38×** | **96.62%** | **99.47%** | **0.00898** | 23 |
+| quarter | 338,600 | 0.15× | 96.05% | 99.43% | 0.00928 | 50 |
+| tiny | 56,290 | 0.025× | 94.11% | 99.36% | 0.0125 | 47 |
+
+Findings:
+
+- **The half model (843k params) strictly dominates the full reference** on
+  this dataset — equal PCK@20, better PCK@50 and MPJPE, converges in fewer
+  epochs. The published 2.23M architecture is over-parameterized for its own
+  benchmark.
+- **tiny (56k params, 1/39.5) holds 94.11% PCK@20** — a ~220 KB fp32 /
+  ~60 KB int8-class model in reach of severely constrained edge targets,
+  at −2.5 pt from the full reference.
+- Caveats: in-domain (5-subject random-file split) like every number on this
+  dataset; single run per variant; corruption-free test subset (52,560).
+  Cross-domain behavior of compact variants is untested — ADR-150's evidence
+  says capacity *hurts* cross-subject, so the compact end may generalize no
+  worse, but that is a hypothesis, not a measurement.
+
+### Compact-variant edge artifacts (MEASURED, 2026-06-11)
+
+Edge pipeline for the **tiny** checkpoint (56,290 params), same machinery and
+protocol as the full-model edge rows above (this Windows box, torch
+2.12.0+cpu, onnxruntime 1.26.0; dynamic-batch opset-17 TorchScript export;
+static QDQ **Percentile(99.99) conv-only** int8 calibrated on **512**
+corruption-free TRAIN-split windows; accuracy on the identical 10k-window
+seed-42 clean test subset; latency = median ms/window over 3 interleaved
+reps, with the full-model fp32/int8 sessions interleaved as same-session
+references). Script: `tiny_edge_bench.py`; raw:
+`results/edge_optimization.json` (`tiny_variant`). Torch-vs-ORT parity on the
+stored fixture input: **max abs diff 1.5e-7 — PASS** (< 1e-4). The tiny fp32
+subset PCK@20 (94.11%) matches the full clean-test sweep figure (94.11%)
+exactly, so the subset remains representative.
+
+Two forced deviations, both recorded in the JSON:
+
+1. **Adaptive-pool export rewrite.** tiny's derived stride schedule
+   `[2,1,1,1]` leaves feature width 16, and the TorchScript exporter rejects
+   `AdaptiveAvgPool2d((15,1))` when 15 is not a factor of the input height
+   (the full model never hit this — its width was exactly 15). Since the
+   pool over a fixed-size map is a fixed linear operator, the export wrapper
+   replaces it with `mean(-1)` (W axis, a factor) + a constant averaging
+   matmul using PyTorch's exact bin rule; the parity check (vs the original
+   torch model with the real pool) proves exactness.
+2. **Calibration count 512, not "~500"**: ORT 1.26's histogram collector
+   `np.asarray()`'s the per-batch maxima, so the calibration count must be a
+   multiple of the 64-window calibration batch or the ragged last batch
+   crashes it (the earlier static-PTQ run dodged this by using exactly 512).
+
+| Variant | Disk size | Batch 1 (ms/win) | Batch 64 (ms/win) | PCK@20 | PCK@50 | MPJPE |
+|---|---|---|---|---|---|---|
+| full ONNX fp32 (same-session ref) | 8.97 MB | 2.27 | 1.42 | 96.68% | 99.15% | 0.00936 |
+| full static QDQ Percentile conv-only (same-session ref) | 2.53 MB | 5.53 | 3.82 | 96.61% | 99.16% | 0.01031 |
+| **tiny ONNX fp32** | **0.295 MB** | **0.66** | **0.24** | **94.11%** | 99.37% | 0.01253 |
+| tiny static QDQ Percentile conv-only | 0.248 MB | 0.85 | 1.03 | 92.68% | 99.33% | 0.01491 |
+
+(tiny torch `.pth` checkpoint for reference: 0.34 MB on disk; 56,290 fp32
+params ≈ 225 KB of weights.)
+
+Findings:
+
+- **The smallest deployable WiFlow-class model is the tiny ONNX fp32
+  artifact: ~295 KB on disk, 0.66 ms/window batch-1 CPU (~1,500 windows/s),
+  94.1% PCK@20** — 30× smaller and ~3.4× faster (in-session) than the full
+  ONNX fp32 model for −2.6 pt PCK@20.
+- **int8 is a bad trade at this scale.** Static QDQ conv-only — the recipe
+  that cost the full model only 0.07 pt — costs tiny **−1.43 pt** PCK@20
+  (94.11 → 92.68%) and +19% MPJPE, saves only 47 KB (−16%; QDQ scales and
+  the fp32 BN/attention glue are proportionally larger in a small graph),
+  and is *slower* than tiny fp32 (0.85 vs 0.66 ms b1; 1.03 vs 0.24 ms b64 —
+  QDQ kernel overhead dominates when the convs are this small). A 56k-param
+  model has little redundancy left to absorb weight+activation rounding.
+- Deployment guidance, compact edition: ship tiny as **ONNX fp32** — at
+  295 KB the int8 size saving solves no real constraint and costs accuracy
+  and speed. If ~250 KB vs ~295 KB ever matters, weight-only quantization
+  would be the thing to try next, not QDQ.
+
+## Measurement (b): BLOCKED-ON-DATA (attempted 2026-06-10)
+
+The fine-tune-on-ESP32 measurement stopped at dataset characterization, per the
+pre-registered stop rule (<2,000 paired windows). Findings (MEASURED):
+
+- **Only one trainable paired dataset exists**: `ruvultra:~/work/cog-pose-train/paired.jsonl`
+  — 1,077 windows (one subject, one room, one 29.9-min session, single node;
+  CSI [56, 20]; 17 COCO keypoints, MediaPipe confidence mean 0.44 — only 264
+  windows pass ADR-079's own conf>0.5 training filter). Prior measured attempts
+  on this exact set: 0–3% torso-PCK@20 (temporal splits, three independent
+  pipelines). Fine-tuning a 2.23M-param model on ~860 train windows would
+  measure memorization, not transfer.
+- **The April session behind the old "92.9% PCK@20" claim is lost** (345
+  samples, 35 subcarriers; raw CSI gone from ruvzen/ruvultra/cognitum-v0; only
+  a 69-sample predictions+GT holdout survives at `models/wiflow-real/eval-holdout.jsonl`).
+- **Forensic recheck of that holdout RETRACTS the 92.9% figure**: the trainer's
+  `pck()` used an absolute 0.2 image-unit threshold (not torso-normalized) and
+  the model output a **constant pose** (pred std 0.0000 across 69 near-static
+  frames; a mean predictor scores 100% under the same protocol). The
+  torso-normalized PCK@20 on the same holdout is 19.1%. This corroborates the
+  2026-05-11 audit retraction (CHANGELOG, PR #535); stale doc citations were
+  removed 2026-06-10 (user-guide, readme-details, ADR-152 §2.1.3). The §2.2
+  no-citation rule now applies to ADR-079 accuracy claims.
+
+Unblock criteria: a paired collection session of ≥2k windows (≈35+ min at the
+observed stride; multi-pose, conf>0.5, ideally with the §2.1.3 two-checkerboard
+calibration), plus a re-baselined our-pipeline number under torso-PCK@20 on the
+same split. WiFlow-STD assets stand ready on ruvultra (`~/wiflow-std-bench/`).
+Also worth investigating: ADR-079's protocol predicts ~9k windows per 30 min;
+the May session under-delivered ~8× (aligner drop rate?).
+
+## Measurement (b) (MEASURED 2026-06-10/11)
+
+The data baseline unblocked: the 2026-06-10 22:10–22:40 collection session produced
+**2,046 paired windows** (`ruvultra:~/wiflow-std-bench/paired-20260610.jsonl`; ONE
+subject, ONE room, ONE ESP32 node, varied poses: walk/raise/squat/kick/wave/turn/
+jump/sit; aligner `scripts/align-ground-truth.js`, non-overlapping 20-frame windows
+~0.42 s; 17 COCO keypoints in normalized [0,1] camera coords; MediaPipe confidence
+mean 0.802, min 0.692 — all windows pass the conf>0.5 filter). The −4 h timestamp
+bug and the empty-frame confidence-dilution aligner findings are recorded
+separately; results only here. Trained on ruvultra (RTX 5080, torch 2.11+cu128,
+fp32, batch 32, GPU shared with the efficiency sweep). Scripts mirrored in
+`remote/measb/`; raw metrics + full training curves in `results/measurement_b.json`.
+
+### Two new aligner/dataset findings (forced deviations, MEASURED)
+
+1. **`csi_shape` is heterogeneous, not [70, 20]**: 1,347× [70,20], 284× [134,20],
+   243× [26,20], 130× [12,20], 42× [20,20]. The ESP32 stream emits mixed frame
+   types and `extractCsiMatrix` stamps each window's subcarrier count from
+   `window[0].subcarriers`, zero-padding/truncating the other frames — even
+   native-70 windows contain ~20.4% internally zero-padded short frames
+   (subcarriers 40–69 all-zero). Handling: the primary suite ("all 2,046")
+   linearly resamples every frame's subcarrier axis to 70 bins (identity for
+   native-70 frames) so the pre-registered n and split sizes hold; a secondary
+   suite restricts to the 1,347 native [70,20] windows as a homogeneity check.
+2. **Aligner layout bug**: `extractCsiMatrix` fills `matrix[f * nSc + s]`
+   (frame-major) but declares `shape: [nSc, nFrames]` — the stored shape label is
+   transposed relative to the data. Confirmed by coherent per-frame zero-tails;
+   corrected on load (`reshape(nFrames, nSc).T`).
+
+### Protocol (pre-registered, followed)
+
+Temporal split, no shuffling across time: first 70% train (1,432), next 15% val
+(307), last 15% test (307); seed 42 elsewhere. Model: learned 1×1 Conv1d 70→540
+adapter prepended to the upstream WiFlow-STD trunk; K=17 via the parameter-free
+adaptive pool (`AdaptiveAvgPool2d((17,1))` — pretrained weights load strict for
+any K). CSI normalized by the TRAIN-split p99 amplitude (129.7 all / 130.9
+native-70), clipped to [0,1]. Three runs, ≤60 epochs, early-stop patience 8 on
+val MPJPE, AdamW (adapter lr 1e-4; pretrained trunk lr 1e-5, 10× lower; scratch
+all 1e-4), fp32. Pretrained init = the measurement-(a) **retrained** checkpoint
+(`upstream/test/best_pose_model.pth`, ~96% PCK@20 on WiFlow data; the
+`att.`/`final_conv.` key remap from `eval_repro.py` applied defensively — a no-op,
+that checkpoint already uses post-rename keys). Frozen-trunk run: trunk
+`requires_grad=False` **and** held in `.eval()` so BatchNorm running stats cannot
+drift — a pure transfer probe; only the 70→540 adapter (38,340 params) trains.
+
+PCK is torso-normalized with **torso = ‖l_shoulder(5) − l_hip(11)‖** (upstream
+`calculate_pck` math — per-frame norm clamped at 0.01, mean over keypoints ×
+frames — but upstream's `NECK_IDX/PELVIS_IDX = 2, 12` is a 15-keypoint
+convention; on 17-kp COCO those indices are right_eye/right_hip, so the indices
+were replaced, not the math). MPJPE is in normalized image units (not meters).
+
+### Results — primary suite, all 2,046 windows (test = last 307)
+
+| Run | PCK@10 | PCK@20 | PCK@30 | PCK@40 | PCK@50 | MPJPE | pred std | best ep |
+|---|---|---|---|---|---|---|---|---|
+| **mean-pose baseline** (honesty bar) | **73.1%** | **95.9%** | **98.7%** | 99.3% | 99.3% | **0.0148** | 0 (by constr.) | — |
+| (i) pretrained-init, full fine-tune | 26.0% | 65.0% | 88.0% | 96.4% | 98.9% | 0.0313 | 0.0113 | 58/60 |
+| (ii) scratch | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.2554 | 0.0002 | 4 (stop @13) |
+| (iii) frozen-trunk (adapter only) | 0.0% | 0.0% | 0.2% | 3.2% | 14.4% | 0.1260 | 0.0073 | 59/60 |
+
+Secondary suite (native [70,20] windows only, n=1,347, test=202) reproduces the
+same ordering: mean-baseline 96.0% / pretrained 67.1% / scratch 0.0% /
+frozen-trunk 0.0% PCK@20 (MPJPE 0.0153 / 0.0318 / 0.2236 / 0.1343) — the
+subcarrier-resampling choice does not change any conclusion.
+
+### Interpretation
+
+- **Did pretraining-transfer happen? Partially — as optimization transfer, not
+  feature transfer, and not past the honesty bar.**
+  - *Pretrained vs scratch*: dramatic (65.0% vs 0.0% PCK@20). The pretrained init
+    is the only configuration that trains at all under the pre-registered budget.
+  - *Frozen-trunk*: near-zero (0.0% PCK@20, 14.4% @50). WiFlow-STD's frozen
+    features do **not** transfer to our ESP32 domain through a linear subcarrier
+    adapter — the pretrained benefit is a well-conditioned initialization (incl.
+    calibrated BN/output scales), not reusable CSI→pose features.
+  - *Everything vs mean-pose baseline*: **no run beats it.** A constant
+    train-mean pose scores 95.9% torso-PCK@20 / 0.0148 MPJPE on this test split,
+    because a single subject in one camera frame barely moves in normalized
+    coordinates. The fine-tuned model is a real, non-constant model
+    (pred std 0.0113 > 0 — passes the constant-pose detector that retracted the
+    old 92.9% figure) but its deviations from the mean hurt: it fits train-period
+    temporal dynamics that do not generalize across the temporal split.
+- **Verdict for ADR-152 §2.2(b): fine-tuning WiFlow-STD on this dataset does not
+  demonstrate CSI→pose signal beyond the mean pose.** Until a model beats the
+  mean-pose baseline on a temporal split, no PCK number from this line may be
+  cited as pose-estimation capability.
+
+### Caveats (honest, pre-registered)
+
+- Single subject, single room, single session (30 min), single ESP32 node —
+  in-domain temporal split only; nothing here speaks to cross-room or
+  cross-subject generalization.
+- 2k windows vs the 360k-window WiFlow-STD corpus — **NOT comparable** to the
+  ~96% in-domain measurement-(a) number, and the published 97.25% even less so.
+- The scratch run's total collapse (it cannot even reach the mean pose; its
+  output BatchNorm/SiLU head must learn output scale from random init at lr 1e-4)
+  is an optimization outcome under the fixed budget, not proof the architecture
+  cannot learn from scratch — the pretrained-vs-scratch gap partially reflects
+  this conditioning advantage.
+- Mixed-subcarrier frames (finding 1) mean even the "clean" windows carry ~20%
+  zero-padded frames; collection-side frame-type filtering should precede the
+  next session.
+- Mean-baseline PCK is inflated by low pose variance relative to torso size
+  (~0.2–0.3 image units); PCK@10 (73.1%) shows the same ceiling effect at a
+  stricter threshold — the bar is the bar, but a livelier dataset would lower it.
+
+## Pending
+
+- (b) fine-tune on our ESP32 17-keypoint eval set — **MEASURED 2026-06-10/11**,
+  see above: no run beats the mean-pose baseline; pretraining transfers as
+  optimization aid only.
+- (c) our internal WiFlow on their dataset (15-keypoint subset mapping) — also
+  affected: there is currently no validated internal pose model to compare
+  (the 92.9% artifact is retracted; the MM-Fi SOTA models in ADR-150 §3 are a
+  different input domain).
@@ -0,0 +1,200 @@
+"""Shared infrastructure for the LOCAL wiflow-std benchmark scripts (ADR-152).
+
+This module is the single canonical implementation of the helpers that were
+previously copy-pasted across eval_repro.py / quantize_bench.py /
+onnx_bench.py / eval_ort_accuracy.py / export_to_safetensors.py:
+
+  - ``import_upstream()``  -- sys.path setup + the models-package stub that
+    works around the upstream import bug, plus the >1GB np.load mmap patch
+  - ``install_np_load_mmap_patch()`` -- the mmap patch on its own
+  - ``remap_legacy_keys()`` / ``load_remapped_state()`` -- checkpoint
+    key remap for the pre-rename released checkpoint
+  - ``load_wiflow_model()`` -- WiFlowPoseModel from a checkpoint, eval mode
+  - ``set_seed()`` -- mirrors upstream run.py seeding exactly
+  - ``evaluate()`` -- THE canonical batch-weighted PCK/MPJPE evaluation loop
+    (thresholds 0.1-0.5, upstream utils/metrics.py math); accepts either a
+    torch nn.Module or an onnxruntime InferenceSession
+
+The scripts under remote/ deploy to ruvultra as standalone single files and
+therefore intentionally inline private copies of these helpers; when editing
+them, treat this module as the reference implementation and keep the copies
+in sync.
+"""
+
+import os
+import random
+import sys
+import time
+import types
+
+import numpy as np
+import torch
+
+HERE = os.path.dirname(os.path.abspath(__file__))
+UPSTREAM = os.path.join(HERE, "upstream")
+RESULTS = os.path.join(HERE, "results")
+
+DEFAULT_THRESHOLDS = (0.1, 0.2, 0.3, 0.4, 0.5)
+
+# ---------------------------------------------------------------------------
+# >1GB np.load mmap patch
+# ---------------------------------------------------------------------------
+
+# csi_windows.npy is ~13 GB; mmap large arrays instead of loading into RAM
+# (loading it eagerly needs ~15 GB).
+_np_load = np.load
+
+
+def _np_load_mmap(path, *a, **kw):
+    if (isinstance(path, str) and path.endswith(".npy")
+            and os.path.getsize(path) > 1 << 30 and "mmap_mode" not in kw):
+        kw["mmap_mode"] = "r"
+    return _np_load(path, *a, **kw)
+
+
+def install_np_load_mmap_patch():
+    """Globally patch np.load so .npy files >1GB are mmap'd read-only.
+
+    Idempotent. Patching the numpy module attribute is equivalent to the
+    historical ``upstream_dataset.np.load = _np_load_mmap`` (dataset.np IS
+    the numpy module), but works regardless of import order.
+    """
+    np.load = _np_load_mmap
+
+
+# ---------------------------------------------------------------------------
+# upstream import shim
+# ---------------------------------------------------------------------------
+
+def import_upstream(mmap_patch=True):
+    """Make the upstream WiFlow-STD clone importable; returns its path.
+
+    Upstream bug: models/__init__.py imports TemporalConvNet, which
+    models/tcn.py does not define -- the package fails to import as
+    published. Register a stub package so the broken __init__ never
+    executes; submodules (models.pose_model etc.) still resolve via
+    __path__. Idempotent.
+    """
+    if UPSTREAM not in sys.path:
+        sys.path.insert(0, UPSTREAM)
+    if "models" not in sys.modules:
+        _models_pkg = types.ModuleType("models")
+        _models_pkg.__path__ = [os.path.join(UPSTREAM, "models")]
+        sys.modules["models"] = _models_pkg
+    if mmap_patch:
+        install_np_load_mmap_patch()
+    return UPSTREAM
+
+
+# ---------------------------------------------------------------------------
+# checkpoint loading
+# ---------------------------------------------------------------------------
+
+# The released checkpoint predates the published code: modules were renamed
+# att -> attention, final_conv -> decoder (param count identical, 2.23M).
+LEGACY_RENAMES = {"att.": "attention.", "final_conv.": "decoder."}
+
+
+def remap_legacy_keys(state):
+    """Remap pre-rename state_dict keys; no-op for already-new-style keys."""
+    return {next((new + k[len(old):] for old, new in LEGACY_RENAMES.items()
+                  if k.startswith(old)), k): v
+            for k, v in state.items()}
+
+
+def load_remapped_state(path, map_location="cpu"):
+    """torch.load (weights_only) + legacy key remap."""
+    state = torch.load(path, map_location=map_location, weights_only=True)
+    return remap_legacy_keys(state)
+
+
+def load_wiflow_model(checkpoint, map_location="cpu", dropout=0.5):
+    """Full-size WiFlowPoseModel from a checkpoint, strict load, eval mode."""
+    import_upstream()
+    from models.pose_model import WiFlowPoseModel
+    model = WiFlowPoseModel(dropout=dropout)
+    model.load_state_dict(load_remapped_state(checkpoint, map_location),
+                          strict=True)
+    model.eval()
+    return model
+
+
+# ---------------------------------------------------------------------------
+# seeding
+# ---------------------------------------------------------------------------
+
+def set_seed(seed=42):
+    # mirror upstream run.py exactly
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed(seed)
+        torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+
+
+# ---------------------------------------------------------------------------
+# THE canonical evaluation loop
+# ---------------------------------------------------------------------------
+
+def evaluate(model, loader, device=None, dtype=None, label="",
+             thresholds=DEFAULT_THRESHOLDS, progress_every=50):
+    """Batch-weighted PCK/MPJPE over a DataLoader (upstream metrics math).
+
+    ``model`` may be a torch nn.Module (optionally evaluated on ``device``
+    with inputs cast to ``dtype``) or an onnxruntime InferenceSession.
+    Per-threshold PCK values are independent in upstream calculate_pck, so
+    evaluating a superset of thresholds never changes any individual value.
+
+    Returns {"samples", "mpjpe", "pck@10".."pck@50", "wall_seconds"}.
+    """
+    import_upstream()
+    from utils.metrics import calculate_mpjpe, calculate_pck
+
+    is_ort = hasattr(model, "get_inputs")  # onnxruntime InferenceSession
+    if is_ort:
+        inp = model.get_inputs()[0].name
+
+        def forward(bx):
+            return torch.from_numpy(model.run(None, {inp: bx.numpy()})[0])
+    else:
+        model.eval()
+
+        def forward(bx):
+            if device is not None:
+                bx = bx.to(device)
+            if dtype is not None:
+                bx = bx.to(dtype)
+            return model(bx).float()
+
+    thresholds = list(thresholds)
+    totals = {t: 0.0 for t in thresholds}
+    total_mpe, n = 0.0, 0
+    t0 = time.time()
+    with torch.no_grad():
+        for batch_idx, (bx, by) in enumerate(loader):
+            out = forward(bx)
+            if device is not None and not is_ort:
+                by = by.to(device)
+            mpe = calculate_mpjpe(out, by)
+            pck = calculate_pck(out, by, thresholds=thresholds)
+            bs = by.size(0)
+            total_mpe += mpe * bs
+            for t in totals:
+                totals[t] += pck[t] * bs
+            n += bs
+            if batch_idx % progress_every == 0:
+                tag = f"[{label}] " if label else ""
+                pck20 = totals.get(0.2)
+                pck20_str = f"pck20={pck20 / n:.4f} " if pck20 is not None else ""
+                print(f"  {tag}batch {batch_idx}: n={n} {pck20_str}"
+                      f"mpjpe={total_mpe / n:.4f} ({time.time() - t0:.0f}s)",
+                      flush=True)
+    return {
+        "samples": n,
+        "mpjpe": total_mpe / n,
+        **{f"pck@{int(t * 100)}": totals[t] / n for t in thresholds},
+        "wall_seconds": time.time() - t0,
+    }
@@ -0,0 +1,67 @@
+"""ADR-152 edge optimization: accuracy of the ONNX fp32 and ORT-dynamic-int8
+models on the same corruption-free 10k test subset used by quantize_bench.py.
+
+The torch dynamic-int8 path quantizes nothing (no nn.Linear in the model), so
+the only real int8 datapoint for the paper's "~2.2 MB int8" claim is the
+onnxruntime dynamically quantized model -- this script measures what that
+quantization costs in PCK/MPJPE.
+
+Usage:
+  .venv/Scripts/python.exe eval_ort_accuracy.py \
+      --data-dir <preprocessed_csi_data> [--subset 10000]
+
+Writes/merges into results/edge_optimization.json under key "onnx_accuracy".
+"""
+
+import argparse
+import json
+import os
+import sys
+
+HERE = os.path.dirname(os.path.abspath(__file__))
+sys.path.insert(0, HERE)
+
+from _bench_common import RESULTS, evaluate  # noqa: E402
+from quantize_bench import build_test_subset  # noqa: E402  (sets up upstream imports)
+
+
+def evaluate_ort(sess, loader, label):
+    """ORT-session evaluation via the canonical _bench_common.evaluate loop."""
+    return evaluate(sess, loader, label=label)
+
+
+def main():
+    import onnxruntime as ort
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-dir", default=os.path.join(
+        os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
+        "wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
+    parser.add_argument("--subset", type=int, default=10000)
+    parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
+    args = parser.parse_args()
+
+    loader, _n_clean = build_test_subset(args.data_dir, args.subset)
+    results = {}
+    for label, fname in (("onnx_fp32", "retrained_fp32_dynamic.onnx"),
+                         ("onnx_int8_ort_dynamic", "retrained_int8_ort_dynamic.onnx")):
+        path = os.path.join(RESULTS, fname)
+        if not os.path.exists(path):
+            results[label] = {"error": f"{fname} not found; run onnx_bench.py first"}
+            continue
+        sess = ort.InferenceSession(path, providers=["CPUExecutionProvider"])
+        print(f"=== accuracy: {label} ({fname}) ===")
+        results[label] = evaluate_ort(sess, loader, label)
+        print(json.dumps(results[label], indent=2))
+
+    merged = {}
+    if os.path.exists(args.out):
+        with open(args.out) as f:
+            merged = json.load(f)
+    merged["onnx_accuracy"] = results
+    with open(args.out, "w") as f:
+        json.dump(merged, f, indent=2)
+    print(f"wrote {args.out}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,102 @@
+"""ADR-152 §2.2 measurement (a): reproduce WiFlow-STD (DY2434) published test metrics.
+
+Runs the released pretrained checkpoint (upstream/best_pose_model.pth) against the
+released Kaggle dataset (kaka2434/wiflow-dataset) using the upstream code path:
+identical dataset class, identical file-level 70/15/15 split at seed 42, identical
+PCK/MPJPE implementations (utils/metrics.py).
+
+Published claims (README, "Setting 1 random split"):
+  PCK@20 97.25% | PCK@30 98.63% | PCK@40 99.16% | PCK@50 99.48% | MPJPE 0.007 m
+
+Usage:
+  .venv/Scripts/python.exe eval_repro.py --data-dir <dir containing csi_windows.npy>
+"""
+
+import argparse
+import json
+import os
+import sys
+
+import torch
+from torch.utils.data import DataLoader
+
+from _bench_common import (UPSTREAM, evaluate, import_upstream,
+                           load_remapped_state, set_seed)
+
+import_upstream()  # sys.path + models stub + >1GB np.load mmap patch
+
+from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders  # noqa: E402
+from models.pose_model import WiFlowPoseModel  # noqa: E402
+
+
+def find_data_dir(root):
+    for dirpath, _dirnames, filenames in os.walk(root):
+        if "csi_windows.npy" in filenames:
+            return dirpath
+    return None
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-dir", required=True,
+                        help="Directory containing csi_windows.npy (searched recursively)")
+    parser.add_argument("--checkpoint", default=os.path.join(UPSTREAM, "best_pose_model.pth"))
+    parser.add_argument("--batch-size", type=int, default=64)
+    parser.add_argument("--out", default=os.path.join(os.path.dirname(os.path.abspath(__file__)),
+                                                      "results", "repro_a.json"))
+    args = parser.parse_args()
+
+    data_dir = args.data_dir
+    if not os.path.exists(os.path.join(data_dir, "csi_windows.npy")):
+        located = find_data_dir(data_dir)
+        if located is None:
+            sys.exit(f"csi_windows.npy not found under {data_dir}")
+        data_dir = located
+    print(f"data dir: {data_dir}")
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print(f"device: {device}, torch {torch.__version__}")
+
+    set_seed(42)
+
+    dataset = PreprocessedCSIKeypointsDataset(
+        data_dir=data_dir, keypoint_scale=1000.0, enable_temporal_clean=True)
+
+    # split must match upstream: file-level shuffle at random_seed=42, 70/15/15
+    _train_loader, _val_loader, test_loader = create_preprocessed_train_val_test_loaders(
+        dataset=dataset, batch_size=args.batch_size, num_workers=0, random_seed=42)
+
+    model = WiFlowPoseModel(dropout=0.5).to(device)
+    # released checkpoint predates the published code: modules were renamed
+    # att -> attention, final_conv -> decoder (param count identical, 2.23M)
+    state = load_remapped_state(args.checkpoint, map_location=device)
+    model.load_state_dict(state, strict=True)
+    n_params = sum(p.numel() for p in model.parameters())
+    print(f"checkpoint: {args.checkpoint} ({n_params/1e6:.2f}M params)")
+
+    # upstream also evaluates with drop_last=True; we report the full test set
+    # (drop_last=False) and the drop_last variant for exact comparability
+    results = {"published": {"pck@20": 0.9725, "pck@30": 0.9863, "pck@40": 0.9916,
+                             "pck@50": 0.9948, "mpjpe": 0.007},
+               "params_millions": n_params / 1e6,
+               "data_dir": data_dir,
+               "device": str(device)}
+
+    print("=== test set (full, drop_last=False) ===")
+    results["test_full"] = evaluate(model, test_loader, device=device)
+    print(json.dumps(results["test_full"], indent=2))
+
+    test_loader_dl = DataLoader(test_loader.dataset, batch_size=args.batch_size,
+                                shuffle=False, drop_last=True)
+    print("=== test set (drop_last=True, as upstream train.py) ===")
+    results["test_drop_last"] = evaluate(model, test_loader_dl, device=device)
+    print(json.dumps(results["test_drop_last"], indent=2))
+
+    os.makedirs(os.path.dirname(args.out), exist_ok=True)
+    with open(args.out, "w") as f:
+        json.dump(results, f, indent=2)
+    print(f"wrote {args.out}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,174 @@
+"""ADR-152 §2.2: export the retrained WiFlow-STD PyTorch checkpoint to
+safetensors with tch-rs (VarStore) variable names, plus a numerical-parity
+fixture for the Rust port.
+
+Outputs (all under results/, gitignored):
+  retrained_wiflow_std.safetensors  -- 248 f32 tensors named exactly as the
+                                       Rust WiFlowStdModel VarStore expects
+                                       (see wiflow_std/model.rs
+                                       `dump_variable_names` for the
+                                       authoritative name dump)
+  parity_fixture.npz                -- deterministic input (seed 42,
+                                       shape (2, 540, 20), uniform [0,1]) and
+                                       the Python model's eval-mode output
+  parity_fixture.json               -- same data as flattened f32 lists, for
+                                       the dependency-free Rust test
+                                       (tests/test_wiflow_std_parity.rs)
+
+PyTorch -> tch key mapping (derived from the VarStore dump, not guessed):
+
+  tcn.network.{i}.conv1_group.weight        -> tcn{i}.conv1_group.weight
+  tcn.network.{i}.bn*_{group,pw}.<leaf>     -> tcn{i}.bn*_{group,pw}.<leaf>
+  tcn.network.{i}.downsample.0.weight       -> tcn{i}.ds_conv.weight
+  tcn.network.{i}.downsample.1.<leaf>       -> tcn{i}.ds_bn.<leaf>
+  up.block.{0,1,4,5,8,9}.<leaf>             -> conv_in.{conv1,bn1,conv2,bn2,conv3,bn3}.<leaf>
+  up.downsample.{0,1}.<leaf>                -> conv_in.{ds_conv,ds_bn}.<leaf>
+  residual_blocks.{i}.block.{...}.<leaf>    -> conv{i}.{conv1..bn3}.<leaf>
+  residual_blocks.{i}.downsample.{0,1}      -> conv{i}.{ds_conv,ds_bn}
+  attention.{width,height}_axis.qkv_transform.weight
+                                            -> attention.{width,height}.qkv.weight
+  attention.{width,height}_axis.bn_*        -> attention.{width,height}.bn_*
+  decoder.{0,1,3,4}.<leaf>                  -> {dec_conv1,dec_bn1,dec_conv2,dec_bn2}.<leaf>
+  *.num_batches_tracked                     -> dropped (tch BatchNorm has no such buffer)
+
+Legacy upstream names (att. -> attention., final_conv. -> decoder.) are
+remapped first, exactly as eval_repro.py does for the released checkpoint.
+
+Usage:
+  .venv/Scripts/python.exe export_to_safetensors.py
+"""
+
+import json
+import os
+import re
+
+import numpy as np
+import torch
+from safetensors.torch import save_file
+
+from _bench_common import RESULTS, import_upstream, remap_legacy_keys
+
+import_upstream()  # sys.path + models stub
+
+from models.pose_model import WiFlowPoseModel  # noqa: E402
+
+CHECKPOINT = os.path.join(RESULTS, "retrained_best_pose_model.pth")
+
+# Sequential index -> tch sub-name inside one ConvBlock1/AsymmetricConvBlock:
+# [Conv2d(0), BN(1), SiLU(2), Dropout2d(3), Conv2d(4), BN(5), SiLU(6),
+#  Dropout2d(7), Conv2d(8), BN(9)]
+_BLOCK_IDX = {"0": "conv1", "1": "bn1", "4": "conv2", "5": "bn2",
+              "8": "conv3", "9": "bn3"}
+_DS_IDX = {"0": "ds_conv", "1": "ds_bn"}
+_DECODER_IDX = {"0": "dec_conv1", "1": "dec_bn1", "3": "dec_conv2",
+                "4": "dec_bn2"}
+
+
+def _conv_block(new_prefix: str, rest: str) -> str:
+    m = re.fullmatch(r"block\.(\d+)\.(.+)", rest)
+    if m:
+        return f"{new_prefix}.{_BLOCK_IDX[m.group(1)]}.{m.group(2)}"
+    m = re.fullmatch(r"downsample\.(\d+)\.(.+)", rest)
+    if m:
+        return f"{new_prefix}.{_DS_IDX[m.group(1)]}.{m.group(2)}"
+    raise KeyError(f"unmapped conv-block key: {new_prefix} / {rest}")
+
+
+def map_key(key: str) -> str:
+    """Map one PyTorch state_dict key to the tch VarStore name."""
+    m = re.fullmatch(r"tcn\.network\.(\d+)\.(.+)", key)
+    if m:
+        i, rest = m.groups()
+        rest = (rest.replace("downsample.0.", "ds_conv.")
+                    .replace("downsample.1.", "ds_bn."))
+        return f"tcn{i}.{rest}"
+
+    m = re.fullmatch(r"up\.(.+)", key)
+    if m:
+        return _conv_block("conv_in", m.group(1))
+
+    m = re.fullmatch(r"residual_blocks\.(\d+)\.(.+)", key)
+    if m:
+        return _conv_block(f"conv{m.group(1)}", m.group(2))
+
+    m = re.fullmatch(r"attention\.(width|height)_axis\.(.+)", key)
+    if m:
+        axis, rest = m.groups()
+        rest = rest.replace("qkv_transform.", "qkv.")
+        return f"attention.{axis}.{rest}"
+
+    m = re.fullmatch(r"decoder\.(\d+)\.(.+)", key)
+    if m:
+        return f"{_DECODER_IDX[m.group(1)]}.{m.group(2)}"
+
+    raise KeyError(f"unmapped checkpoint key: {key}")
+
+
+def main():
+    state = torch.load(CHECKPOINT, map_location="cpu", weights_only=True)
+    if not isinstance(state, dict) or "tcn.network.0.conv1_group.weight" not in {
+        k for k in state
+    } | {k.replace("att.", "attention.") for k in state}:
+        # tolerate trainer wrappers like {"model_state_dict": ...}
+        for wrapper in ("model_state_dict", "state_dict", "model"):
+            if isinstance(state, dict) and wrapper in state:
+                state = state[wrapper]
+                break
+
+    # Legacy upstream names predate the published code (_bench_common).
+    state = remap_legacy_keys(state)
+
+    mapped = {}
+    dropped = 0
+    for k, v in state.items():
+        if k.endswith("num_batches_tracked"):
+            dropped += 1
+            continue
+        tch_key = map_key(k)
+        if tch_key in mapped:
+            raise KeyError(f"duplicate mapped key: {k} -> {tch_key}")
+        mapped[tch_key] = v.detach().to(torch.float32).contiguous()
+
+    n_params = sum(v.numel() for k, v in mapped.items()
+                   if "running_" not in k)
+    print(f"checkpoint tensors: {len(state)} "
+          f"(dropped {dropped} num_batches_tracked)")
+    print(f"mapped tensors: {len(mapped)}, "
+          f"non-buffer params: {n_params/1e6:.6f}M")
+    assert len(mapped) == 248, f"expected 248 tch variables, got {len(mapped)}"
+    assert n_params == 2_225_042, f"param count mismatch: {n_params}"
+
+    st_path = os.path.join(RESULTS, "retrained_wiflow_std.safetensors")
+    save_file(mapped, st_path)
+    print(f"wrote {st_path}")
+
+    # ---- parity fixture --------------------------------------------------
+    model = WiFlowPoseModel(dropout=0.5)
+    model.load_state_dict(state, strict=True)
+    model.eval()
+
+    gen = torch.Generator().manual_seed(42)
+    x = torch.rand(2, 540, 20, generator=gen, dtype=torch.float32)
+    with torch.no_grad():
+        y = model(x)
+    print(f"fixture input {tuple(x.shape)} -> output {tuple(y.shape)}, "
+          f"output range [{y.min().item():.6f}, {y.max().item():.6f}]")
+
+    np.savez(os.path.join(RESULTS, "parity_fixture.npz"),
+             input=x.numpy(), output=y.numpy())
+    fixture = {
+        "seed": 42,
+        "input_shape": list(x.shape),
+        "input": x.flatten().tolist(),
+        "output_shape": list(y.shape),
+        "output": y.flatten().tolist(),
+    }
+    json_path = os.path.join(RESULTS, "parity_fixture.json")
+    with open(json_path, "w") as f:
+        json.dump(fixture, f)
+    print(f"wrote {os.path.join(RESULTS, 'parity_fixture.npz')}")
+    print(f"wrote {json_path}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,148 @@
+"""Regenerate results/nan_windows_mask.npy + results/big_windows_mask.npy by
+scanning a PRISTINE kagglehub download of the WiFlow-STD dataset
+(kaka2434/wiflow-dataset v1, csi_windows.npy, 360,000 windows of 540x20).
+
+============================ READ THIS FIRST ===============================
+This script MUST be run against an UNCLEANED copy of the dataset.
+
+remote/clean_v2.py (and its predecessor clean_nan.py) repair the dataset by
+zeroing the corrupted windows IN PLACE, with no backup. A cleaned copy
+contains no non-finite values and no out-of-range amplitudes, so on a cleaned
+copy this scan produces ALL-FALSE masks -- silently wrong ground truth. The
+script errors out loudly in that case (see the sanity check in main()).
+
+That irreversibility is exactly why the two committed mask files under
+results/ (gitignore-negated) are the canonical ground truth: once a download
+has been cleaned, the masks can NEVER be regenerated from it. Only run this
+on a fresh `kagglehub.dataset_download("kaka2434/wiflow-dataset")`.
+============================================================================
+
+Criteria (per window; mirrors the original 2026-06-10 scan and the
+remote/clean_v2.py repair criteria):
+
+  nan mask: any non-finite value (NaN/Inf) anywhere in the 540x20 window
+  big mask: max |finite value| > 1.5 (the data is otherwise [0,1]-normalized;
+            the corrupted files contain garbage up to 3.4e38, float32 max)
+
+Expected result on the pristine Kaggle download (RESULTS.md defect 5):
+  nan: 9,070 True | big: 9,072 True | union: 9,072 -- all windows in dataset
+  files 487-499 (the final 13 files), window indices 350,922-359,999.
+
+Usage:
+  PYTHONUTF8=1 .venv/Scripts/python.exe generate_corruption_masks.py \
+      [--data-dir <dir containing csi_windows.npy>] [--out-dir results]
+"""
+
+import argparse
+import os
+import sys
+
+import numpy as np
+
+HERE = os.path.dirname(os.path.abspath(__file__))
+RESULTS = os.path.join(HERE, "results")
+
+EXPECTED = {"nan": 9070, "big": 9072, "union": 9072,
+            "files": (487, 499), "windows": (350922, 359999)}
+
+
+def scan(csi_path, chunk=4000):
+    """Chunked scan of the (mmap'd) windows array; returns (nan_mask, big_mask)."""
+    csi = np.load(csi_path, mmap_mode="r")
+    n = len(csi)
+    nan_mask = np.zeros(n, dtype=bool)
+    big_mask = np.zeros(n, dtype=bool)
+    for i in range(0, n, chunk):
+        block = np.asarray(csi[i:i + chunk])
+        finite = np.isfinite(block)
+        nan_mask[i:i + chunk] = (~finite).any(axis=(1, 2))
+        big_mask[i:i + chunk] = (
+            np.abs(np.where(finite, block, 0)).max(axis=(1, 2)) > 1.5)
+        if (i // chunk) % 10 == 0:
+            print(f"  scanned {min(i + chunk, n):,}/{n:,} windows "
+                  f"(nan={int(nan_mask.sum()):,} big={int(big_mask.sum()):,})",
+                  flush=True)
+    return nan_mask, big_mask
+
+
+def describe_files(data_dir, mask):
+    """Map marked windows to dataset file indices via window_info.npz."""
+    info = os.path.join(data_dir, "window_info.npz")
+    if not os.path.exists(info):
+        return None
+    w2f = np.load(info)["window_to_file"]
+    return np.unique(w2f[mask])
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Regenerate the corruption masks from a PRISTINE "
+                    "(uncleaned) kagglehub download. See module docstring.")
+    parser.add_argument("--data-dir", default=os.path.join(
+        os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
+        "wiflow-dataset", "versions", "1", "preprocessed_csi_data"),
+        help="Directory containing csi_windows.npy (PRISTINE copy)")
+    parser.add_argument("--out-dir", default=RESULTS,
+                        help="Where to write the two .npy masks")
+    parser.add_argument("--chunk", type=int, default=4000,
+                        help="Windows per scan chunk (memory/speed tradeoff)")
+    args = parser.parse_args()
+
+    csi_path = os.path.join(args.data_dir, "csi_windows.npy")
+    if not os.path.exists(csi_path):
+        sys.exit(f"csi_windows.npy not found in {args.data_dir}")
+
+    print(f"scanning {csi_path} (chunk={args.chunk}) ...")
+    nan_mask, big_mask = scan(csi_path, args.chunk)
+    union = nan_mask | big_mask
+    print(f"nan: {int(nan_mask.sum()):,} | big: {int(big_mask.sum()):,} | "
+          f"union: {int(union.sum()):,} of {len(union):,} windows")
+
+    # ---- sanity check: an all-False result means a CLEANED copy ------------
+    if not union.any():
+        sys.exit(
+            "ERROR: scan found ZERO corrupted windows.\n"
+            "\n"
+            "The pristine Kaggle download (kaka2434/wiflow-dataset v1) is "
+            "known to contain\n"
+            "9,072 corrupted windows (NaN/Inf + amplitudes up to 3.4e38) in "
+            "dataset files\n"
+            "487-499 (RESULTS.md, reproducibility defect 5). Finding none "
+            "means this copy\n"
+            "has almost certainly already been repaired by remote/clean_v2.py "
+            "(or clean_nan.py),\n"
+            "which zeroes the corrupted windows IN PLACE -- after that the "
+            "corruption evidence\n"
+            "is gone and the masks CANNOT be regenerated from this copy.\n"
+            "\n"
+            "Refusing to overwrite the committed ground-truth masks with "
+            "all-False ones.\n"
+            "Re-download the dataset (kagglehub.dataset_download("
+            "'kaka2434/wiflow-dataset'))\n"
+            "and point --data-dir at the fresh, uncleaned copy.")
+
+    files = describe_files(args.data_dir, union)
+    if files is not None:
+        print(f"marked windows span dataset files {files.min()}-{files.max()}: "
+              f"{files.tolist()}")
+        lo, hi = EXPECTED["files"]
+        if files.min() != lo or files.max() != hi:
+            print(f"WARNING: expected marked files exactly {lo}-{hi} "
+                  f"(the pristine v1 download); got {files.min()}-{files.max()}. "
+                  f"Different dataset version, or a partially cleaned copy?")
+    for name, mask, exp in (("nan", nan_mask, EXPECTED["nan"]),
+                            ("big", big_mask, EXPECTED["big"])):
+        if int(mask.sum()) != exp:
+            print(f"WARNING: {name} mask has {int(mask.sum()):,} True windows; "
+                  f"the pristine v1 download yields {exp:,}.")
+
+    os.makedirs(args.out_dir, exist_ok=True)
+    for name, mask in (("nan_windows_mask.npy", nan_mask),
+                       ("big_windows_mask.npy", big_mask)):
+        out = os.path.join(args.out_dir, name)
+        np.save(out, mask)
+        print(f"wrote {out} ({int(mask.sum()):,} True)")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,220 @@
+"""ADR-152 edge optimization: ONNX export + onnxruntime CPU benchmark for the
+retrained WiFlow-STD checkpoint.
+
+- Exports fp32 to ONNX. The axial attention reshapes with python ints taken
+  from tensor.size() (view(N*W, C, H)), so a traced graph bakes the batch
+  size; we first try a dynamic-batch export and verify it actually works at
+  batch sizes 1/2/64 -- if not, we fall back to fixed-batch exports.
+- Verifies output parity vs torch on the stored fixture
+  (results/parity_fixture.npz, batch 2, seed 42): max abs diff < 1e-4.
+- Measures onnxruntime CPU latency at batch 1 and 64 (median of N runs).
+- Supplementary: onnxruntime dynamic int8 quantization of the exported model
+  (weight size datapoint for the paper's "~2.2 MB int8" claim).
+
+Usage:
+  .venv/Scripts/python.exe onnx_bench.py
+
+Writes/merges into results/edge_optimization.json under key "onnx".
+"""
+
+import json
+import os
+import platform
+import statistics
+import time
+import traceback
+
+import numpy as np
+import torch
+
+from _bench_common import RESULTS, import_upstream, load_wiflow_model
+
+import_upstream()  # sys.path + models stub + >1GB np.load mmap patch
+
+CHECKPOINT = os.path.join(RESULTS, "retrained_best_pose_model.pth")
+OUT_JSON = os.path.join(RESULTS, "edge_optimization.json")
+
+
+def load_fp32_model():
+    return load_wiflow_model(CHECKPOINT)
+
+
+def try_export(model, path, batch, dynamic, opset=17):
+    """Returns (ok, exporter_used, error)."""
+    x = torch.rand(batch, 540, 20)
+    attempts = []
+    if dynamic:
+        attempts.append(("dynamo", dict(dynamo=True,
+                                        dynamic_shapes={"x": {0: "batch"}})))
+        attempts.append(("torchscript", dict(dynamo=False,
+                                             dynamic_axes={"input": {0: "batch"},
+                                                           "output": {0: "batch"}})))
+    else:
+        attempts.append(("torchscript", dict(dynamo=False)))
+        attempts.append(("dynamo", dict(dynamo=True)))
+    last_err = None
+    for name, kw in attempts:
+        try:
+            with torch.no_grad():
+                torch.onnx.export(model, (x,), path, opset_version=opset,
+                                  input_names=["input"], output_names=["output"],
+                                  **kw)
+            return True, name, None
+        except Exception as e:  # noqa: BLE001
+            last_err = f"{name}: {type(e).__name__}: {e}"
+            traceback.print_exc()
+    return False, None, last_err
+
+
+def ort_session(path):
+    import onnxruntime as ort
+    return ort.InferenceSession(path, providers=["CPUExecutionProvider"])
+
+
+def ort_run(sess, x):
+    inp = sess.get_inputs()[0].name
+    return sess.run(None, {inp: x})[0]
+
+
+def bench_ort(sess, batch, n_runs):
+    rng = np.random.default_rng(123)
+    x = rng.random((batch, 540, 20), dtype=np.float32)
+    for _ in range(max(5, n_runs // 10)):
+        ort_run(sess, x)
+    times = []
+    for _ in range(n_runs):
+        t0 = time.perf_counter()
+        ort_run(sess, x)
+        times.append(time.perf_counter() - t0)
+    med = statistics.median(times)
+    return {
+        "batch_size": batch,
+        "runs": n_runs,
+        "median_ms_per_batch": med * 1e3,
+        "median_ms_per_window": med * 1e3 / batch,
+        "windows_per_second": batch / med,
+    }
+
+
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(
+        description="ONNX export + onnxruntime CPU benchmark for the "
+                    "retrained WiFlow-STD checkpoint (no options; see "
+                    "module docstring). NB: the published "
+                    "retrained_fp32_dynamic.onnx came from the TorchScript "
+                    "exporter; on newer torch the dynamo attempt may succeed "
+                    "first and produce a different (external-data) artifact.")
+    parser.parse_args()
+
+    import onnxruntime
+    model = load_fp32_model()
+    results = {
+        "env": {
+            "torch": torch.__version__,
+            "onnxruntime": onnxruntime.__version__,
+            "platform": platform.platform(),
+        },
+    }
+
+    fixture = np.load(os.path.join(RESULTS, "parity_fixture.npz"))
+    fx, fy = fixture["input"], fixture["output"]  # (2,540,20) -> (2,15,2)
+
+    # ---- export: dynamic batch first, fall back to fixed --------------------
+    dyn_path = os.path.join(RESULTS, "retrained_fp32_dynamic.onnx")
+    ok, exporter, err = try_export(model, dyn_path, batch=2, dynamic=True)
+    dynamic_works = False
+    if ok:
+        # verify the dynamic graph really runs at other batch sizes
+        try:
+            sess = ort_session(dyn_path)
+            for b in (1, 2, 64):
+                y = ort_run(sess, np.zeros((b, 540, 20), dtype=np.float32))
+                assert y.shape == (b, 15, 2), y.shape
+            dynamic_works = True
+        except Exception as e:  # noqa: BLE001
+            print(f"dynamic-batch model does not generalize: {e}")
+
+    sessions = {}
+    if dynamic_works:
+        results["export"] = {"mode": "dynamic-batch", "exporter": exporter,
+                             "file": os.path.basename(dyn_path),
+                             "size_mb": os.path.getsize(dyn_path) / 1e6}
+        sess = ort_session(dyn_path)
+        sessions = {1: sess, 2: sess, 64: sess}
+        print(f"dynamic-batch export OK via {exporter}")
+    else:
+        results["export"] = {"mode": "fixed-batch", "fallback_reason": err,
+                             "files": {}}
+        for b in (1, 2, 64):
+            p = os.path.join(RESULTS, f"retrained_fp32_b{b}.onnx")
+            ok, exporter, err = try_export(model, p, batch=b, dynamic=False)
+            if not ok:
+                results["export"]["files"][str(b)] = {"error": err}
+                print(f"EXPORT FAILED at batch {b}: {err}")
+                continue
+            results["export"]["files"][str(b)] = {
+                "exporter": exporter, "file": os.path.basename(p),
+                "size_mb": os.path.getsize(p) / 1e6}
+            sessions[b] = ort_session(p)
+            print(f"fixed-batch {b} export OK via {exporter}")
+
+    # ---- parity vs torch on the fixture -------------------------------------
+    if 2 in sessions:
+        y_ort = ort_run(sessions[2], fx)
+        with torch.no_grad():
+            y_torch = model(torch.from_numpy(fx)).numpy()
+        results["parity"] = {
+            "fixture": "results/parity_fixture.npz (batch 2, seed 42)",
+            "max_abs_diff_vs_stored_fixture": float(np.abs(y_ort - fy).max()),
+            "max_abs_diff_vs_torch_now": float(np.abs(y_ort - y_torch).max()),
+            "pass_lt_1e-4": bool(np.abs(y_ort - y_torch).max() < 1e-4),
+        }
+        print("parity:", json.dumps(results["parity"], indent=2))
+
+    # ---- latency -------------------------------------------------------------
+    results["latency"] = {}
+    if 1 in sessions:
+        results["latency"]["batch1"] = bench_ort(sessions[1], 1, 100)
+        print(f"ORT batch 1:  {results['latency']['batch1']['median_ms_per_window']:.2f} ms/window")
+    if 64 in sessions:
+        results["latency"]["batch64"] = bench_ort(sessions[64], 64, 30)
+        print(f"ORT batch 64: {results['latency']['batch64']['median_ms_per_window']:.3f} ms/window")
+
+    # ---- supplementary: ORT dynamic int8 (size datapoint for the 2.2MB claim)
+    src = (dyn_path if dynamic_works
+           else os.path.join(RESULTS, "retrained_fp32_b1.onnx"))
+    if os.path.exists(src):
+        try:
+            from onnxruntime.quantization import QuantType, quantize_dynamic
+            q_path = os.path.join(RESULTS, "retrained_int8_ort_dynamic.onnx")
+            quantize_dynamic(src, q_path, weight_type=QuantType.QInt8)
+            entry = {"file": os.path.basename(q_path),
+                     "size_mb": os.path.getsize(q_path) / 1e6}
+            try:
+                qs = ort_session(q_path)
+                yq = ort_run(qs, fx[:1] if not dynamic_works else fx)
+                ref = fy[:1] if not dynamic_works else fy
+                entry["runs"] = True
+                entry["max_abs_diff_vs_fp32_fixture"] = float(np.abs(yq - ref).max())
+            except Exception as e:  # noqa: BLE001
+                entry["runs"] = False
+                entry["run_error"] = f"{type(e).__name__}: {e}"
+            results["ort_int8_dynamic_supplementary"] = entry
+            print("ORT int8:", json.dumps(entry, indent=2))
+        except Exception as e:  # noqa: BLE001
+            results["ort_int8_dynamic_supplementary"] = {
+                "error": f"{type(e).__name__}: {e}"}
+
+    merged = {}
+    if os.path.exists(OUT_JSON):
+        with open(OUT_JSON) as f:
+            merged = json.load(f)
+    merged["onnx"] = results
+    with open(OUT_JSON, "w") as f:
+        json.dump(merged, f, indent=2)
+    print(f"wrote {OUT_JSON}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,228 @@
+"""ADR-152 "optimize beyond SOTA": edge-optimization benchmark for the
+retrained WiFlow-STD checkpoint (results/retrained_best_pose_model.pth,
+~96% PCK@20, fp32 params 2,225,042).
+
+Measures, for fp32 / fp16 / dynamic-int8 torch variants:
+  (a) serialized state_dict size on disk,
+  (b) CPU inference latency per window at batch 1 and batch 64
+      (median of repeated runs, this Windows box),
+  (c) accuracy (PCK@20/50 + MPJPE, upstream metrics) on a corruption-free
+      random subset of the seed-42 file-level 70/15/15 test split
+      (same split as eval_repro.py; corrupted windows 487-499 excluded via
+      results/nan_windows_mask.npy | results/big_windows_mask.npy).
+
+Also verifies the paper's "~2.2 MB int8" size claim: reports which layer
+types torch dynamic quantization actually converts (the model contains NO
+nn.Linear -- it is Conv1d/Conv2d/BatchNorm only) and the real on-disk size.
+
+Usage:
+  .venv/Scripts/python.exe quantize_bench.py \
+      --data-dir C:/Users/ruv/.cache/kagglehub/datasets/kaka2434/wiflow-dataset/versions/1/preprocessed_csi_data \
+      [--subset 10000] [--skip-accuracy]
+
+Writes/merges into results/edge_optimization.json under key "torch".
+"""
+
+import argparse
+import json
+import os
+import platform
+import statistics
+import time
+
+import numpy as np
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+
+from _bench_common import HERE, RESULTS, evaluate, import_upstream, load_wiflow_model
+
+import_upstream()  # sys.path + models stub + >1GB np.load mmap patch
+
+from dataset import (  # noqa: E402
+    PreprocessedCSIKeypointsDataset,
+    create_preprocessed_train_val_test_loaders,
+)
+
+CHECKPOINT = os.path.join(RESULTS, "retrained_best_pose_model.pth")
+
+
+def load_fp32_model():
+    # legacy upstream key remap inside is a harmless no-op on this checkpoint
+    return load_wiflow_model(CHECKPOINT)
+
+
+def state_dict_size_bytes(model, path):
+    torch.save(model.state_dict(), path)
+    return os.path.getsize(path)
+
+
+def bench_latency(model, batch_size, n_runs, dtype=torch.float32):
+    gen = torch.Generator().manual_seed(123)
+    x = torch.rand(batch_size, 540, 20, generator=gen).to(dtype)
+    with torch.no_grad():
+        for _ in range(max(5, n_runs // 10)):  # warmup
+            model(x)
+        times = []
+        for _ in range(n_runs):
+            t0 = time.perf_counter()
+            model(x)
+            times.append(time.perf_counter() - t0)
+    med = statistics.median(times)
+    return {
+        "batch_size": batch_size,
+        "runs": n_runs,
+        "median_ms_per_batch": med * 1e3,
+        "median_ms_per_window": med * 1e3 / batch_size,
+        "windows_per_second": batch_size / med,
+    }
+
+
+def build_test_subset(data_dir, subset_size, batch_size=64):
+    """Seed-42 file-level 70/15/15 test split (exactly as eval_repro.py),
+    minus corrupted windows, then a seed-42 random subset."""
+    dataset = PreprocessedCSIKeypointsDataset(
+        data_dir=data_dir, keypoint_scale=1000.0, enable_temporal_clean=True)
+    _tr, _va, test_loader = create_preprocessed_train_val_test_loaders(
+        dataset=dataset, batch_size=batch_size, num_workers=0, random_seed=42)
+    test_indices = np.asarray(test_loader.dataset.indices)
+
+    corrupted = (np.load(os.path.join(RESULTS, "nan_windows_mask.npy"))
+                 | np.load(os.path.join(RESULTS, "big_windows_mask.npy")))
+    clean = test_indices[~corrupted[test_indices]]
+    print(f"test split: {len(test_indices)} windows, "
+          f"{len(test_indices) - len(clean)} corrupted excluded, "
+          f"{len(clean)} clean")
+
+    if subset_size and subset_size < len(clean):
+        rng = np.random.default_rng(42)
+        clean = np.sort(rng.choice(clean, size=subset_size, replace=False))
+    subset = torch.utils.data.Subset(dataset, clean.tolist())
+    loader = DataLoader(subset, batch_size=batch_size, shuffle=False,
+                        num_workers=0)
+    return loader, len(clean)
+
+
+def quantize_int8_dynamic(fp32_model):
+    """torch.ao.quantization.quantize_dynamic on Linear/Conv where supported.
+    Returns (model, report) where report documents what actually quantized."""
+    qmodel = torch.ao.quantization.quantize_dynamic(
+        fp32_model, {nn.Linear, nn.Conv1d, nn.Conv2d}, dtype=torch.qint8)
+
+    quantized, total_params, quant_params = [], 0, 0
+    for name, mod in qmodel.named_modules():
+        cls = type(mod).__module__ + "." + type(mod).__name__
+        if "quantized" in cls:
+            w = mod.weight() if callable(getattr(mod, "weight", None)) else None
+            numel = w.numel() if w is not None else 0
+            quant_params += numel
+            quantized.append({"module": name, "class": cls, "params": numel})
+    for p in fp32_model.parameters():
+        total_params += p.numel()
+
+    n_linear = sum(isinstance(m, nn.Linear) for m in fp32_model.modules())
+    n_conv1d = sum(isinstance(m, nn.Conv1d) for m in fp32_model.modules())
+    n_conv2d = sum(isinstance(m, nn.Conv2d) for m in fp32_model.modules())
+    report = {
+        "eligible_module_counts": {
+            "nn.Linear": n_linear, "nn.Conv1d": n_conv1d, "nn.Conv2d": n_conv2d},
+        "modules_actually_quantized": quantized,
+        "n_modules_quantized": len(quantized),
+        "params_total": total_params,
+        "params_quantized": quant_params,
+        "params_quantized_fraction": quant_params / total_params,
+    }
+    return qmodel, report
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-dir", default=os.path.join(
+        os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
+        "wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
+    parser.add_argument("--subset", type=int, default=10000)
+    parser.add_argument("--runs-b1", type=int, default=100)
+    parser.add_argument("--runs-b64", type=int, default=30)
+    parser.add_argument("--skip-accuracy", action="store_true")
+    parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
+    args = parser.parse_args()
+
+    torch.manual_seed(42)
+    results = {
+        "env": {
+            "torch": torch.__version__,
+            "platform": platform.platform(),
+            "processor": platform.processor(),
+            "num_threads": torch.get_num_threads(),
+            "checkpoint": os.path.relpath(CHECKPOINT, HERE),
+        },
+        "variants": {},
+    }
+
+    # ---- build variants ---------------------------------------------------
+    fp32 = load_fp32_model()
+    n_params = sum(p.numel() for p in fp32.parameters())
+    results["env"]["params"] = n_params
+    print(f"fp32 model: {n_params:,} params")
+
+    fp16 = load_fp32_model().half()
+
+    int8, q_report = quantize_int8_dynamic(load_fp32_model())
+    results["int8_dynamic_quant_report"] = q_report
+    print(f"int8 dynamic: {q_report['n_modules_quantized']} modules quantized, "
+          f"{q_report['params_quantized_fraction']*100:.1f}% of params")
+
+    variants = {
+        "fp32": (fp32, torch.float32, "retrained_fp32_resaved.pth"),
+        "fp16": (fp16, torch.float16, "retrained_fp16.pth"),
+        "int8_dynamic": (int8, torch.float32, "retrained_int8_dynamic.pth"),
+    }
+
+    # ---- (a) size + (b) latency -------------------------------------------
+    for name, (model, dtype, fname) in variants.items():
+        path = os.path.join(RESULTS, fname)
+        size = state_dict_size_bytes(model, path)
+        print(f"\n=== {name}: {size/1e6:.3f} MB on disk ({fname}) ===")
+        lat1 = bench_latency(model, 1, args.runs_b1, dtype)
+        lat64 = bench_latency(model, 64, args.runs_b64, dtype)
+        print(f"  batch 1:  {lat1['median_ms_per_window']:.2f} ms/window "
+              f"({lat1['windows_per_second']:.0f}/s)")
+        print(f"  batch 64: {lat64['median_ms_per_window']:.3f} ms/window "
+              f"({lat64['windows_per_second']:.0f}/s)")
+        results["variants"][name] = {
+            "file": fname,
+            "size_bytes": size,
+            "size_mb": size / 1e6,
+            "latency_batch1": lat1,
+            "latency_batch64": lat64,
+        }
+
+    # ---- (c) accuracy ------------------------------------------------------
+    if not args.skip_accuracy:
+        loader, n_clean = build_test_subset(args.data_dir, args.subset)
+        results["accuracy_subset"] = {
+            "description": "seed-42 file-level 70/15/15 test split, corrupted "
+                           "windows (files 487-499) excluded, seed-42 random "
+                           "subset",
+            "subset_size": min(args.subset, n_clean) if args.subset else n_clean,
+            "clean_test_total": n_clean,
+        }
+        for name, (model, dtype, _f) in variants.items():
+            print(f"\n=== accuracy: {name} ===")
+            results["variants"][name]["accuracy"] = evaluate(
+                model, loader, dtype=dtype, label=name)
+            print(json.dumps(results["variants"][name]["accuracy"], indent=2))
+
+    # ---- merge into edge_optimization.json ---------------------------------
+    merged = {}
+    if os.path.exists(args.out):
+        with open(args.out) as f:
+            merged = json.load(f)
+    merged["torch"] = results
+    with open(args.out, "w") as f:
+        json.dump(merged, f, indent=2)
+    print(f"\nwrote {args.out}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,14 @@
+import numpy as np, os
+d = os.path.expanduser('~/wiflow-std-bench/preprocessed_csi_data')
+csi = np.load(os.path.join(d, 'csi_windows.npy'), mmap_mode='r+')
+zeroed = 0
+chunk = 4000
+for i in range(0, len(csi), chunk):
+    block = csi[i:i+chunk]
+    finite = np.isfinite(block)
+    bad = (~finite).any(axis=(1, 2)) | (np.abs(np.where(finite, block, 0)).max(axis=(1, 2)) > 1.5)
+    if bad.any():
+        block[bad] = 0.0
+        zeroed += int(bad.sum())
+csi.flush()
+print(f'zeroed {zeroed} corrupted windows entirely')
@@ -0,0 +1,112 @@
+"""Evaluate the retrained WiFlow-STD checkpoint (ADR-152 §2.2a fallback).
+
+Scores the model produced by run.py (train_output/best_pose_model.pth or similar)
+on the seed-42 test split: full test set AND NaN-free subset (excluding windows
+that were zero-filled by clean_nan.py — file indices 487-499).
+
+NOTE: deployed to ruvultra (~/wiflow-std-bench) as a standalone single file,
+so it deliberately inlines its helpers. The reference implementations (upstream
+import shim, >1GB np.load mmap patch, key-remap loader, canonical evaluate
+loop) live in benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
+"""
+import json, os, random, sys
+
+import numpy as np
+import torch
+from torch.utils.data import DataLoader, Subset
+
+# csi_windows.npy is ~13 GB; mmap large arrays instead of eagerly loading
+# ~15 GB into RAM (same patch as _bench_common._np_load_mmap).
+_np_load = np.load
+
+
+def _np_load_mmap(path, *a, **kw):
+    if (isinstance(path, str) and path.endswith('.npy')
+            and os.path.getsize(path) > 1 << 30 and 'mmap_mode' not in kw):
+        kw['mmap_mode'] = 'r'
+    return _np_load(path, *a, **kw)
+
+
+np.load = _np_load_mmap
+
+sys.path.insert(0, os.path.expanduser('~/wiflow-std-bench/upstream'))
+from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders
+from models.pose_model import WiFlowPoseModel
+from utils.metrics import calculate_pck, calculate_mpjpe
+
+
+def find_checkpoint():
+    cands = []
+    for root, _, files in os.walk(os.path.expanduser('~/wiflow-std-bench/train_output')):
+        for f in files:
+            if f.endswith('.pth'):
+                cands.append(os.path.join(root, f))
+    # also upstream/test default output dir
+    for root, _, files in os.walk(os.path.expanduser('~/wiflow-std-bench/upstream')):
+        for f in files:
+            if f.endswith('.pth') and 'best' in f and 'cross_dataset' not in root:
+                p = os.path.join(root, f)
+                if os.path.getmtime(p) > os.path.getmtime(os.path.expanduser('~/wiflow-std-bench/train.log')) - 86400 * 2:
+                    cands.append(p)
+    cands = [c for c in cands if not c.endswith('upstream/best_pose_model.pth')]
+    if not cands:
+        sys.exit('no retrained checkpoint found')
+    return max(cands, key=os.path.getmtime)
+
+
+def evaluate(model, loader, device):
+    model.eval()
+    totals = {t: 0.0 for t in (0.1, 0.2, 0.3, 0.4, 0.5)}
+    total_mpe, n = 0.0, 0
+    with torch.no_grad():
+        for bx, by in loader:
+            bx, by = bx.to(device), by.to(device)
+            out = model(bx)
+            bs = by.size(0)
+            total_mpe += calculate_mpjpe(out, by) * bs
+            pck = calculate_pck(out, by, thresholds=list(totals))
+            for t in totals:
+                totals[t] += pck[t] * bs
+            n += bs
+    return {'samples': n, 'mpjpe': total_mpe / n,
+            **{f'pck@{int(t*100)}': totals[t] / n for t in totals}}
+
+
+random.seed(42); np.random.seed(42); torch.manual_seed(42)
+torch.cuda.manual_seed_all(42)
+torch.backends.cudnn.deterministic = True
+
+d = os.path.expanduser('~/wiflow-std-bench/preprocessed_csi_data')
+dataset = PreprocessedCSIKeypointsDataset(data_dir=d, keypoint_scale=1000.0,
+                                          enable_temporal_clean=True)
+_, _, test_loader = create_preprocessed_train_val_test_loaders(
+    dataset=dataset, batch_size=256, num_workers=2, random_seed=42)
+
+device = torch.device('cuda')
+ckpt = find_checkpoint()
+print('checkpoint:', ckpt)
+model = WiFlowPoseModel(dropout=0.5).to(device)
+state = torch.load(ckpt, map_location=device, weights_only=True)
+renames = {'att.': 'attention.', 'final_conv.': 'decoder.'}
+state = {next((new + k[len(old):] for old, new in renames.items()
+               if k.startswith(old)), k): v for k, v in state.items()}
+model.load_state_dict(state, strict=True)
+
+results = {'checkpoint': ckpt}
+print('=== full test set ===')
+results['test_full'] = evaluate(model, test_loader, device)
+print(json.dumps(results['test_full'], indent=2))
+
+# NaN-free subset: exclude windows from corrupted files 487-499
+test_subset = test_loader.dataset            # Subset(dataset, test_indices)
+w2f = dataset.window_to_file
+clean_idx = [i for i in test_subset.indices if w2f[i] < 487]
+print(f'=== NaN-free test subset ({len(clean_idx)} of {len(test_subset.indices)}) ===')
+clean_loader = DataLoader(Subset(dataset, clean_idx), batch_size=256, shuffle=False)
+results['test_clean'] = evaluate(model, clean_loader, device)
+print(json.dumps(results['test_clean'], indent=2))
+
+out = os.path.expanduser('~/wiflow-std-bench/eval_retrained.json')
+with open(out, 'w') as f:
+    json.dump(results, f, indent=2)
+print('wrote', out)
@@ -0,0 +1,374 @@
+"""ADR-152 SS2.2 measurement (b): WiFlow-STD fine-tuned on our fresh ESP32 paired dataset.
+
+Dataset: ~/wiflow-std-bench/paired-20260610.jsonl -- 2,046 paired windows collected
+2026-06-10 22:10-22:40 (ONE subject, ONE room, ONE ESP32 node, varied poses).
+Per record: csi = flat float32 list, csi_shape, kp = 17 COCO [x, y] normalized [0,1]
+camera coords, conf (MediaPipe mean confidence, all > 0.5 in this set), ts_start/ts_end.
+Aligner: scripts/align-ground-truth.js, non-overlapping 20-frame windows (~0.42 s each).
+
+Dataset findings (MEASURED on this file, 2026-06-10):
+  - csi_shape is HETEROGENEOUS, not uniformly [70, 20]: 1,347x [70,20], 284x [134,20],
+    243x [26,20], 130x [12,20], 42x [20,20]. The ESP32 stream emits mixed frame types
+    and the aligner stamps each window's subcarrier count from frame[0]
+    (extractCsiMatrix: nSc = window[0].subcarriers), zero-padding/truncating the rest.
+    Even native-70 windows contain ~20.4% internally zero-padded short frames
+    (subcarriers 40..69 all-zero for those frames).
+  - LAYOUT BUG: the aligner fills matrix[f * nSc + s] (frame-major) but declares
+    shape [nSc, nFrames]. The true layout is (frame, subcarrier); we reshape
+    (nFrames, nSc) and transpose. Confirmed by coherent per-frame zero-tails.
+  - Handling here (primary suite, "all2046"): every frame's subcarrier axis is
+    linearly resampled to 70 bins (np.interp over a normalized index domain;
+    identity for native-70 frames) so the pre-registered n=2,046 and split sizes
+    hold. Secondary suite ("native70") restricts to the 1,347 native [70,20]
+    windows (temporal 70/15/15 of those) as a homogeneity robustness check.
+
+Pre-registered protocol (followed exactly):
+  1. TEMPORAL split (records are time-sorted; asserted): first 70% train (1,432),
+     next 15% val (307), last 15% test (307). No shuffling across time. Seed 42
+     for everything else.
+  2. Model: upstream WiFlow-STD trunk (WiFlowPoseModel) with a learned 1x1 Conv1d
+     projection 70->540 prepended, and K=17 via the parameter-free adaptive pool
+     (AdaptiveAvgPool2d((17, 1)) instead of (15, 1)) -- pretrained weights load
+     for any K. CSI normalization: divide by the TRAIN-split 99th-percentile
+     amplitude, clip to [0, 1] (documented in output JSON).
+  3. Three runs, <=60 epochs, early-stop patience 8 on val MPJPE, batch 32,
+     AdamW, fp32 (no autocast):
+       (i)   pretrained-init: trunk init from upstream/test/best_pose_model.pth
+             (the measurement-(a) retrained checkpoint, ~96% PCK@20 on WiFlow data;
+             key remap att.->attention. / final_conv.->decoder. applied defensively
+             as in eval_repro.py -- a no-op for this checkpoint, which already uses
+             the new names). Discriminative lr: adapter 1e-4, trunk 1e-5.
+       (ii)  scratch: same architecture, random init, all params lr 1e-4.
+       (iii) frozen-trunk: pretrained trunk frozen (requires_grad=False AND held in
+             .eval() so BatchNorm running stats cannot drift -- pure transfer probe);
+             only the 70->540 adapter trains, lr 1e-4.
+  4. Metrics on the temporal TEST split: torso-normalized PCK@10/20/30/40/50 and
+     MPJPE. Upstream utils/metrics.py calculate_pck(use_torso_norm=True) hardcodes
+     NECK_IDX/PELVIS_IDX = 2, 12 -- a 15-keypoint convention that is WRONG for our
+     17 COCO keypoints (2 = right_eye, 12 = right_hip). We therefore reimplement the
+     identical math (per-frame norm distance, clamp min 0.01, mean over all
+     keypoints x frames) with torso = ||l_shoulder(5) - l_hip(11)||.
+     Also reported: prediction std across test frames (constant-pose detector;
+     must be > 0) and the mean-pose-predictor baseline (train-split mean pose
+     evaluated on test -- the honesty bar).
+
+Usage (on ruvultra):
+  nice -n 10 nohup ~/wiflow-std-bench/venv/bin/python train_measb.py > train_measb.log 2>&1 &
+
+NOTE: deployed to ruvultra as a standalone single file, so it deliberately
+inlines its helpers. The reference implementations (upstream import shim,
+np.load mmap patch, key-remap loader, canonical evaluate loop) live in
+benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
+"""
+
+import json
+import os
+import random
+import sys
+import time
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+BENCH = os.path.expanduser("~/wiflow-std-bench")
+UPSTREAM = os.path.join(BENCH, "upstream")
+MEASB = os.path.join(BENCH, "measb")
+DATA = os.path.join(BENCH, "paired-20260610.jsonl")
+CHECKPOINT = os.path.join(UPSTREAM, "test", "best_pose_model.pth")
+
+sys.path.insert(0, UPSTREAM)
+
+# Upstream defect (1): models/__init__.py imports a name tcn.py does not define.
+# Register a stub package so the broken __init__ never executes (as eval_repro.py).
+import types  # noqa: E402
+
+_models_pkg = types.ModuleType("models")
+_models_pkg.__path__ = [os.path.join(UPSTREAM, "models")]
+sys.modules["models"] = _models_pkg
+
+from models.pose_model import WiFlowPoseModel  # noqa: E402
+
+SEED = 42
+K = 17
+N_SUBC = 70
+TRUNK_IN = 540
+BATCH = 32          # <= 64 per protocol (GPU shared with the efficiency sweep)
+MAX_EPOCHS = 60
+PATIENCE = 8
+LR_ADAPTER = 1e-4
+LR_TRUNK_FT = 1e-5  # 10x lower for the pretrained trunk vs the fresh adapter
+L_SHOULDER, L_HIP = 5, 11
+THRESHOLDS = (0.1, 0.2, 0.3, 0.4, 0.5)
+
+
+def set_seed(seed=SEED):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+
+
+def resample_subcarriers(frame_major, n_out=N_SUBC):
+    """(nFrames, nSc) -> (nFrames, n_out) by per-frame linear interpolation.
+
+    Identity for nSc == n_out. Normalized index domain [0, 1] on both sides.
+    """
+    nf, nsc = frame_major.shape
+    if nsc == n_out:
+        return frame_major
+    xi = np.linspace(0.0, 1.0, nsc)
+    xo = np.linspace(0.0, 1.0, n_out)
+    return np.stack([np.interp(xo, xi, frame_major[f]) for f in range(nf)]).astype(np.float32)
+
+
+def load_dataset():
+    csi, kps, confs, ts, native70 = [], [], [], [], []
+    shape_counts = {}
+    with open(DATA) as f:
+        for line in f:
+            r = json.loads(line)
+            nsc, nf = r["csi_shape"]
+            shape_counts[f"{nsc}x{nf}"] = shape_counts.get(f"{nsc}x{nf}", 0) + 1
+            assert nf == 20, r["csi_shape"]
+            # Aligner layout bug: data is frame-major despite the declared
+            # [nSc, nFrames] shape -- reshape (nFrames, nSc), then resample the
+            # subcarrier axis to 70 and transpose to (70 subcarriers, 20 frames).
+            fm = np.asarray(r["csi"], dtype=np.float32).reshape(nf, nsc)
+            csi.append(resample_subcarriers(fm).T)
+            kp = np.asarray(r["kp"], dtype=np.float32)
+            assert kp.shape == (K, 2), kp.shape
+            kps.append(kp)
+            confs.append(r["conf"])
+            ts.append(r["ts_start"])
+            native70.append(nsc == N_SUBC)
+    assert all(ts[i] <= ts[i + 1] for i in range(len(ts) - 1)), "records not time-sorted"
+    return (np.stack(csi), np.stack(kps), np.asarray(confs, dtype=np.float32),
+            np.asarray(native70), shape_counts, ts[0], ts[-1])
+
+
+def temporal_split(n):
+    n_train = int(round(n * 0.70))
+    n_val = int(round(n * 0.15))
+    return slice(0, n_train), slice(n_train, n_train + n_val), slice(n_train + n_val, n)
+
+
+class AdaptedWiFlow(nn.Module):
+    """1x1 Conv1d adapter 70->540 + upstream WiFlow-STD trunk with K=17 pool head."""
+
+    def __init__(self, k=K, dropout=0.5):
+        super().__init__()
+        self.adapter = nn.Conv1d(N_SUBC, TRUNK_IN, kernel_size=1)
+        nn.init.kaiming_normal_(self.adapter.weight, mode="fan_out", nonlinearity="relu")
+        nn.init.constant_(self.adapter.bias, 0)
+        self.trunk = WiFlowPoseModel(dropout=dropout)
+        # K=17 via the parameter-free adaptive pool: decoder emits [B, 2, 15, 20]
+        # spatial maps; pooling H->17 instead of 15 yields [B, 17, 2] with no new
+        # parameters, so the pretrained state_dict loads strict=True for any K.
+        self.trunk.avg_pool = nn.AdaptiveAvgPool2d((k, 1))
+
+    def forward(self, x):
+        return self.trunk(self.adapter(x))
+
+
+def load_pretrained_trunk(trunk, path):
+    state = torch.load(path, map_location="cpu", weights_only=True)
+    # Defensive remap as in eval_repro.py (no-op for the retrained checkpoint).
+    renames = {"att.": "attention.", "final_conv.": "decoder."}
+    state = {next((new + k[len(old):] for old, new in renames.items()
+                   if k.startswith(old)), k): v
+             for k, v in state.items()}
+    trunk.load_state_dict(state, strict=True)
+
+
+def pck_torso(pred, target, thresholds=THRESHOLDS):
+    """Upstream calculate_pck math, torso = l_shoulder(5)<->l_hip(11) for 17-kp COCO."""
+    norm = torch.sqrt(((target[:, L_SHOULDER] - target[:, L_HIP]) ** 2).sum(dim=1))
+    norm = torch.clamp(norm, min=0.01)
+    dist = torch.sqrt(((pred - target) ** 2).sum(dim=2)) / norm.unsqueeze(1)
+    return {f"pck@{int(t * 100)}": (dist <= t).float().mean().item() for t in thresholds}
+
+
+def mpjpe(pred, target):
+    return torch.sqrt(((pred - target) ** 2).sum(dim=2)).mean().item()
+
+
+@torch.no_grad()
+def predict(model, x, batch=256):
+    model.eval()
+    return torch.cat([model(x[i:i + batch]) for i in range(0, len(x), batch)])
+
+
+def eval_preds(pred, target):
+    out = pck_torso(pred, target)
+    out["mpjpe"] = mpjpe(pred, target)
+    # Constant-pose detector: std across test frames per coordinate, mean over
+    # the 17x2 coordinates. 0.0 == degenerate constant predictor.
+    out["pred_std"] = pred.std(dim=0).mean().item()
+    return out
+
+
+def train_run(name, x_tr, y_tr, x_va, y_va, device, pretrained, freeze_trunk,
+              lr_trunk):
+    set_seed(SEED)
+    model = AdaptedWiFlow().to(device)
+    if pretrained:
+        load_pretrained_trunk(model.trunk, CHECKPOINT)
+    if freeze_trunk:
+        for p in model.trunk.parameters():
+            p.requires_grad = False
+        groups = [{"params": model.adapter.parameters(), "lr": LR_ADAPTER}]
+    else:
+        groups = [{"params": model.adapter.parameters(), "lr": LR_ADAPTER},
+                  {"params": model.trunk.parameters(), "lr": lr_trunk}]
+    opt = torch.optim.AdamW(groups)
+    loss_fn = nn.MSELoss()
+
+    n = len(x_tr)
+    best_val, best_state, best_epoch, bad = float("inf"), None, -1, 0
+    history = []
+    t0 = time.time()
+    for epoch in range(MAX_EPOCHS):
+        model.train()
+        if freeze_trunk:
+            model.trunk.eval()  # keep BatchNorm running stats fixed: pure transfer
+        perm = torch.randperm(n, device=device)
+        ep_loss = 0.0
+        for i in range(0, n, BATCH):
+            idx = perm[i:i + BATCH]
+            opt.zero_grad()
+            loss = loss_fn(model(x_tr[idx]), y_tr[idx])
+            loss.backward()
+            opt.step()
+            ep_loss += loss.item() * len(idx)
+        val_mpjpe = mpjpe(predict(model, x_va), y_va)
+        history.append({"epoch": epoch, "train_mse": ep_loss / n, "val_mpjpe": val_mpjpe})
+        marker = ""
+        if val_mpjpe < best_val:
+            best_val, best_epoch, bad = val_mpjpe, epoch, 0
+            best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
+            marker = " *"
+        else:
+            bad += 1
+        print(f"[{name}] epoch {epoch:02d} train_mse {ep_loss / n:.6f} "
+              f"val_mpjpe {val_mpjpe:.5f}{marker}", flush=True)
+        if bad >= PATIENCE:
+            print(f"[{name}] early stop at epoch {epoch} (best {best_epoch})", flush=True)
+            break
+    model.load_state_dict(best_state)
+    torch.save(best_state, os.path.join(MEASB, f"{name}_best.pth"))
+    return model, {"best_epoch": best_epoch, "best_val_mpjpe": best_val,
+                   "epochs_run": len(history), "wall_seconds": round(time.time() - t0, 1),
+                   "history": history}
+
+
+def run_suite(tag, csi, kps, device):
+    """Temporal 70/15/15 split, mean-pose baseline, three training runs."""
+    n = len(csi)
+    tr, va, te = temporal_split(n)
+    print(f"=== suite {tag}: n={n} train={tr.stop} val={va.stop - va.start} "
+          f"test={te.stop - te.start} ===", flush=True)
+
+    # CSI normalization constant from TRAIN split only.
+    train_p99 = float(np.percentile(csi[tr], 99))
+    train_max = float(csi[tr].max())
+    print(f"[{tag}] train p99={train_p99:.3f} max={train_max:.3f} -> /p99, clip [0,1]",
+          flush=True)
+    csi_n = np.clip(csi / train_p99, 0.0, 1.0).astype(np.float32)
+
+    x = torch.from_numpy(csi_n).to(device)
+    y = torch.from_numpy(kps).to(device)
+    x_tr, y_tr = x[tr], y[tr]
+    x_va, y_va = x[va], y[va]
+    x_te, y_te = x[te], y[te]
+
+    suite = {
+        "n_windows": n,
+        "split": {"n_train": int(tr.stop), "n_val": int(va.stop - va.start),
+                  "n_test": int(te.stop - te.start)},
+        "csi_norm": {"method": "divide by train-split p99 amplitude, clip [0,1]",
+                     "train_p99": train_p99, "train_max": train_max},
+        "runs": {},
+    }
+
+    # Honesty bar: mean-pose predictor fit on TRAIN, evaluated on TEST.
+    mean_pose = y_tr.mean(dim=0, keepdim=True).expand(len(y_te), -1, -1)
+    suite["mean_pose_baseline"] = eval_preds(mean_pose, y_te)
+    suite["mean_pose_baseline"]["note"] = "train-split mean pose; pred_std 0 by construction"
+    print(f"[{tag}] mean-pose baseline:", json.dumps(suite["mean_pose_baseline"]),
+          flush=True)
+
+    configs = [
+        ("pretrained", dict(pretrained=True, freeze_trunk=False, lr_trunk=LR_TRUNK_FT)),
+        ("scratch", dict(pretrained=False, freeze_trunk=False, lr_trunk=LR_ADAPTER)),
+        ("frozen_trunk", dict(pretrained=True, freeze_trunk=True, lr_trunk=0.0)),
+    ]
+    for name, cfg in configs:
+        print(f"=== run: {tag}/{name} {cfg} ===", flush=True)
+        model, train_info = train_run(f"{tag}_{name}", x_tr, y_tr, x_va, y_va,
+                                      device, **cfg)
+        test_metrics = eval_preds(predict(model, x_te), y_te)
+        n_trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
+        suite["runs"][name] = {"config": cfg, "trainable_params": n_trainable,
+                               "train": {k: v for k, v in train_info.items()
+                                         if k != "history"},
+                               "history": train_info["history"],
+                               "test": test_metrics}
+        print(f"[{tag}/{name}] TEST:", json.dumps(test_metrics), flush=True)
+    return suite
+
+
+def main():
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print(f"device {device}, torch {torch.__version__}", flush=True)
+    set_seed(SEED)
+
+    csi, kps, confs, native70, shape_counts, ts_first, ts_last = load_dataset()
+    print(f"shape distribution: {shape_counts}", flush=True)
+
+    results = {
+        "protocol": {
+            "dataset": DATA, "n_windows": len(csi),
+            "ts_first": ts_first, "ts_last": ts_last,
+            "conf_mean": float(confs.mean()), "conf_min": float(confs.min()),
+            "csi_shape_distribution": shape_counts,
+            "csi_layout_note": "aligner stores frame-major data under a transposed "
+                               "[nSc, nFrames] shape label; corrected on load",
+            "csi_resample": "per-frame linear interp of subcarrier axis to 70 bins "
+                            "(identity for native-70 frames); native-70 windows still "
+                            "contain ~20.4% internally zero-padded short frames",
+            "split": "temporal 70/15/15 (no shuffle across time)",
+            "model": "1x1 Conv1d 70->540 adapter + WiFlowPoseModel trunk, "
+                     "AdaptiveAvgPool2d((17,1)) head (parameter-free K=17)",
+            "checkpoint": CHECKPOINT,
+            "checkpoint_note": "measurement-(a) retrained checkpoint (~96% PCK@20 on "
+                               "WiFlow data); att./final_conv. remap applied "
+                               "defensively (no-op, already new-style keys)",
+            "optimizer": f"AdamW, adapter lr {LR_ADAPTER}, fine-tuned trunk lr "
+                         f"{LR_TRUNK_FT} (10x lower), scratch all {LR_ADAPTER}",
+            "batch": BATCH, "max_epochs": MAX_EPOCHS, "patience": PATIENCE,
+            "precision": "fp32", "seed": SEED,
+            "pck": "torso-normalized, torso = ||l_shoulder(5) - l_hip(11)||, "
+                   "clamp min 0.01, mean over keypoints x frames "
+                   "(upstream math; upstream 2/12 indices are a 15-kp convention)",
+        },
+        # Primary: all 2,046 windows (pre-registered n), subcarrier axis resampled.
+        "all2046": None,
+        # Secondary robustness check: the 1,347 native [70,20] windows only.
+        "native70": None,
+    }
+
+    results["all2046"] = run_suite("all2046", csi, kps, device)
+    results["native70"] = run_suite("native70", csi[native70], kps[native70], device)
+
+    out = os.path.join(MEASB, "measurement_b.json")
+    with open(out, "w") as f:
+        json.dump(results, f, indent=2)
+    print(f"wrote {out}", flush=True)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,33 @@
+#!/bin/bash
+set -ex
+cd ~/wiflow-std-bench
+
+# 1. clone upstream at the pinned commit
+if [ ! -d upstream ]; then
+  git clone https://github.com/DY2434/WiFlow-WiFi-Pose-Estimation-with-Spatio-Temporal-Decoupling upstream
+fi
+cd upstream && git checkout 06899d294a0f44709d601a53e91dbf24759daefb && cd ..
+
+# 2. documented deviation: fix upstream import bug (TemporalConvNet does not exist)
+sed -i 's/from .tcn import TemporalConvNet/from .tcn import TemporalBlock/; s/'"'"'TemporalConvNet'"'"'/'"'"'TemporalBlock'"'"'/' upstream/models/__init__.py
+
+# 3. venv: torch cu128 (RTX 5080 = sm_120 needs >=2.7; their pin 2.3.1 predates Blackwell)
+if [ ! -d venv ]; then
+  python3 -m venv venv
+  ./venv/bin/pip install -q --upgrade pip
+  ./venv/bin/pip install -q torch --index-url https://download.pytorch.org/whl/cu128
+  ./venv/bin/pip install -q numpy pandas matplotlib seaborn scikit-learn opencv-python-headless scipy tqdm psutil kagglehub
+fi
+./venv/bin/python -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
+
+# 4. dataset via kagglehub (anonymous, public dataset)
+DS=$(./venv/bin/python -c "import kagglehub; print(kagglehub.dataset_download('kaka2434/wiflow-dataset'))")
+echo "dataset at: $DS"
+
+# 5. run.py hardcodes ../preprocessed_csi_data relative to upstream/
+ln -sfn "$DS/preprocessed_csi_data" ~/wiflow-std-bench/preprocessed_csi_data
+
+# 6. train with upstream defaults (seed 42 set inside run.py)
+../venv/bin/python ../clean_nan.py 2>/dev/null || venv/bin/python clean_nan.py
+cd upstream
+../venv/bin/python run.py --gpu 0 --batch_size 64 --epochs 50 --output_dir ../train_output
@@ -0,0 +1,332 @@
+"""Configurable compact variants of the WiFlow-STD pose model (ADR-152 efficiency sweep).
+
+This is a parameterized copy of upstream models/{pose_model,tcn,convnet,attention}.py
+(DY2434/WiFlow @ 06899d29, Apache-2.0). upstream/ is NOT modified. Deviations from
+upstream, all forced by shrinking channels and documented per variant in run_sweep.py:
+
+1. TCN grouped-conv groups: upstream hardcodes groups=20, which does not divide
+   the compact channel counts (e.g. 270, 135, 85). Rule here:
+   - groups_mode='gcd20': per-conv groups = gcd(channels, 20)  (== 20 wherever
+     upstream's choice is valid, incl. the 540-ch input conv; falls back to the
+     largest common divisor with 20 otherwise).
+   - groups_mode='depthwise': groups = channels (tiny variant only).
+2. Conv2d downsampling strides: upstream uses 4 stride-(1,2) blocks because
+   240/2^4 = 15 == n_keypoints. With smaller TCN output widths that would leave
+   <15 rows and AdaptiveAvgPool2d((15,1)) would duplicate rows across keypoints.
+   Rule: halve the width only while the result stays >= 15 (stride-2 blocks
+   first, stride-1 after). Full model: 240 -> 4 halvings = upstream exactly.
+3. input_pw_groups (tiny only): the dense 540->c pointwise + residual downsample
+   in TCN block 1 cost 2*540*c params (a ~117k floor that alone exceeds the
+   tiny <100k budget). tiny groups these two convs (groups=4; 4 | gcd(540, 68)).
+4. Decoder mid-channels: upstream 64->32; here c_last -> max(c_last // 2, 4).
+"""
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+def tcn_groups(channels: int, mode: str) -> int:
+    if mode == 'depthwise':
+        return channels
+    if mode == 'gcd20':
+        return math.gcd(channels, 20)
+    raise ValueError(mode)
+
+
+# ---------------------------------------------------------------- TCN (copy of tcn.py)
+class Chomp1d(nn.Module):
+    def __init__(self, chomp_size):
+        super().__init__()
+        self.chomp_size = chomp_size
+
+    def forward(self, x):
+        return x[:, :, :-self.chomp_size].contiguous()
+
+
+class CompactGroupedTemporalBlock(nn.Module):
+    """Upstream InnerGroupedTemporalBlock with parameterized groups."""
+
+    def __init__(self, n_inputs, n_outputs, kernel_size, stride, dilation, padding,
+                 dropout=0.2, groups_mode='gcd20', pw_groups=1):
+        super().__init__()
+        g_in = tcn_groups(n_inputs, groups_mode)
+        g_out = tcn_groups(n_outputs, groups_mode)
+        self.groups = (g_in, g_out)
+        self.pw_groups = pw_groups
+
+        self.conv1_group = nn.Conv1d(n_inputs, n_inputs, kernel_size, stride=stride,
+                                     padding=padding, dilation=dilation,
+                                     groups=g_in, bias=False)
+        self.chomp1 = Chomp1d(padding) if padding > 0 else nn.Identity()
+        self.bn1_group = nn.BatchNorm1d(n_inputs)
+        self.relu1_group = nn.SiLU(inplace=True)
+
+        self.conv1_pw = nn.Conv1d(n_inputs, n_outputs, 1, groups=pw_groups, bias=False)
+        self.bn1_pw = nn.BatchNorm1d(n_outputs)
+        self.relu1_pw = nn.SiLU(inplace=True)
+        self.dropout1 = nn.Dropout(dropout)
+
+        self.conv2_group = nn.Conv1d(n_outputs, n_outputs, kernel_size, stride=1,
+                                     padding=padding, dilation=dilation,
+                                     groups=g_out, bias=False)
+        self.chomp2 = Chomp1d(padding) if padding > 0 else nn.Identity()
+        self.bn2_group = nn.BatchNorm1d(n_outputs)
+        self.relu2_group = nn.SiLU(inplace=True)
+
+        self.conv2_pw = nn.Conv1d(n_outputs, n_outputs, 1, bias=False)
+        self.bn2_pw = nn.BatchNorm1d(n_outputs)
+        self.relu2_pw = nn.SiLU(inplace=True)
+        self.dropout2 = nn.Dropout(dropout)
+
+        self.downsample = nn.Sequential(
+            nn.Conv1d(n_inputs, n_outputs, 1, groups=pw_groups, bias=False),
+            nn.BatchNorm1d(n_outputs)
+        ) if n_inputs != n_outputs else nn.Identity()
+
+    def forward(self, x):
+        res = self.downsample(x)
+        out = self.conv1_group(x)
+        out = self.chomp1(out)
+        out = self.bn1_group(out)
+        out = self.relu1_group(out)
+        out = self.conv1_pw(out)
+        out = self.bn1_pw(out)
+        out = self.relu1_pw(out)
+        out = self.dropout1(out)
+        out = self.conv2_group(out)
+        out = self.chomp2(out)
+        out = self.bn2_group(out)
+        out = self.relu2_group(out)
+        out = self.conv2_pw(out)
+        out = self.bn2_pw(out)
+        out = self.relu2_pw(out)
+        out = self.dropout2(out)
+        return F.silu(out + res)
+
+
+class CompactTemporalBlock(nn.Module):
+    def __init__(self, num_inputs, num_channels, kernel_size=3, dropout=0.2,
+                 groups_mode='gcd20', input_pw_groups=1):
+        super().__init__()
+        layers = []
+        for i, out_channels in enumerate(num_channels):
+            dilation_size = 2 ** i
+            in_channels = num_inputs if i == 0 else num_channels[i - 1]
+            layers.append(CompactGroupedTemporalBlock(
+                in_channels, out_channels, kernel_size, stride=1,
+                dilation=dilation_size, padding=(kernel_size - 1) * dilation_size,
+                dropout=dropout, groups_mode=groups_mode,
+                pw_groups=input_pw_groups if i == 0 else 1))
+        self.network = nn.Sequential(*layers)
+
+    def forward(self, x):
+        return self.network(x)
+
+
+# ------------------------------------------------------- Conv2d path (copy of convnet.py)
+class AsymmetricConvBlock(nn.Module):
+    """Upstream block with parameterized width stride (upstream: always (1,2))."""
+
+    def __init__(self, in_channels, out_channels, dropout=0.3, stride_w=2):
+        super().__init__()
+        self.block = nn.Sequential(
+            nn.Conv2d(in_channels, out_channels, kernel_size=(1, 3),
+                      stride=(1, stride_w), padding=(0, 1)),
+            nn.BatchNorm2d(out_channels),
+            nn.SiLU(inplace=True),
+            nn.Dropout2d(dropout),
+            nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
+            nn.BatchNorm2d(out_channels),
+            nn.SiLU(inplace=True),
+            nn.Dropout2d(dropout),
+            nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
+            nn.BatchNorm2d(out_channels)
+        )
+        self.downsample = nn.Sequential(
+            nn.Conv2d(in_channels, out_channels, kernel_size=1,
+                      stride=(1, stride_w), bias=False),
+            nn.BatchNorm2d(out_channels)
+        )
+        self.activation = nn.SiLU(inplace=True)
+
+    def forward(self, x):
+        return self.activation(self.block(x) + self.downsample(x))
+
+
+class ConvBlock1(nn.Module):
+    def __init__(self, in_channels, out_channels, dropout=0.3):
+        super().__init__()
+        self.block = nn.Sequential(
+            nn.Conv2d(in_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
+            nn.BatchNorm2d(out_channels),
+            nn.SiLU(inplace=True),
+            nn.Dropout2d(dropout),
+            nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
+            nn.BatchNorm2d(out_channels),
+            nn.SiLU(inplace=True),
+            nn.Dropout2d(dropout),
+            nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
+            nn.BatchNorm2d(out_channels)
+        )
+        self.downsample = nn.Sequential(
+            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False),
+            nn.BatchNorm2d(out_channels)
+        )
+        self.activation = nn.SiLU(inplace=True)
+
+    def forward(self, x):
+        return self.activation(self.block(x) + self.downsample(x))
+
+
+# ----------------------------------------------------- attention (verbatim attention.py)
+class AxialAttention(nn.Module):
+    def __init__(self, in_planes, out_planes, groups=8, stride=1, bias=False, width=False):
+        assert (in_planes % groups == 0) and (out_planes % groups == 0)
+        super().__init__()
+        self.in_planes = in_planes
+        self.out_planes = out_planes
+        self.groups = groups
+        self.group_planes = out_planes // groups
+        self.stride = stride
+        self.bias = bias
+        self.width = width
+        self.qkv_transform = nn.Conv1d(in_planes, out_planes * 3, kernel_size=1,
+                                       stride=1, padding=0, bias=False)
+        self.bn_qkv = nn.BatchNorm1d(out_planes * 3)
+        self.bn_similarity = nn.BatchNorm2d(groups)
+        self.bn_output = nn.BatchNorm1d(out_planes)
+        if stride > 1:
+            self.pooling = nn.AvgPool2d(stride, stride=stride)
+        nn.init.normal_(self.qkv_transform.weight.data, 0, math.sqrt(1. / self.in_planes))
+
+    def forward(self, x):
+        if self.width:
+            x = x.permute(0, 2, 1, 3)
+        else:
+            x = x.permute(0, 3, 1, 2)
+        N, W, C, H = x.shape
+        x = x.contiguous().view(N * W, C, H)
+        qkv = self.bn_qkv(self.qkv_transform(x))
+        qkv = qkv.reshape(N * W, 3, self.out_planes, H).permute(1, 0, 2, 3)
+        q, k, v = qkv[0], qkv[1], qkv[2]
+        q = q.reshape(N * W, self.groups, self.group_planes, H)
+        k = k.reshape(N * W, self.groups, self.group_planes, H)
+        v = v.reshape(N * W, self.groups, self.group_planes, H)
+        qk = torch.einsum('bgci, bgcj->bgij', q, k)
+        qk = self.bn_similarity(qk)
+        similarity = F.softmax(qk, dim=-1)
+        sv = torch.einsum('bgij,bgcj->bgci', similarity, v)
+        sv = sv.reshape(N * W, self.out_planes, H)
+        out = self.bn_output(sv)
+        out = out.view(N, W, self.out_planes, H)
+        if self.width:
+            out = out.permute(0, 2, 1, 3)
+        else:
+            out = out.permute(0, 2, 3, 1)
+        if self.stride > 1:
+            out = self.pooling(out)
+        return out
+
+
+class DualAxialAttention(nn.Module):
+    def __init__(self, in_planes, out_planes, groups=8, stride=1, bias=False):
+        super().__init__()
+        self.width_axis = AxialAttention(in_planes, out_planes, groups, stride, bias, width=True)
+        self.height_axis = AxialAttention(out_planes, out_planes, groups, stride, bias, width=False)
+
+    def forward(self, x):
+        return self.height_axis(self.width_axis(x))
+
+
+# --------------------------------------------------------------- full model
+def compute_strides(width: int, n_blocks: int, target: int = 15):
+    """Halve width while result stays >= target (upstream: 240 -> 4 halvings -> 15)."""
+    strides = []
+    for _ in range(n_blocks):
+        nxt = (width + 1) // 2  # conv k=3 s=2 p=1: out = ceil(in/2)
+        if nxt >= target:
+            strides.append(2)
+            width = nxt
+        else:
+            strides.append(1)
+    return strides, width
+
+
+class CompactWiFlowPoseModel(nn.Module):
+    """Parameterized upstream WiFlowPoseModel.
+
+    Upstream config == tcn_channels=[540,440,340,240], conv_channels=[8,16,32,64],
+    attn_groups=8, groups_mode='gcd20' (gcd(c,20)==20 for all upstream channels),
+    input_pw_groups=1 -> identical architecture, 2,225,042 params.
+    """
+
+    def __init__(self, tcn_channels, conv_channels, attn_groups,
+                 groups_mode='gcd20', input_pw_groups=1, dropout=0.3,
+                 num_subcarriers=540, num_keypoints=15):
+        super().__init__()
+        self.tcn = CompactTemporalBlock(
+            num_inputs=num_subcarriers, num_channels=tcn_channels, kernel_size=3,
+            dropout=dropout, groups_mode=groups_mode, input_pw_groups=input_pw_groups)
+
+        self.up = ConvBlock1(1, conv_channels[0])
+
+        strides, self.final_width = compute_strides(
+            tcn_channels[-1], len(conv_channels), target=num_keypoints)
+        self.conv_strides = strides
+        self.residual_blocks = nn.ModuleList()
+        in_channels = conv_channels[0]
+        for out_channels, s in zip(conv_channels, strides):
+            self.residual_blocks.append(
+                AsymmetricConvBlock(in_channels, out_channels, stride_w=s))
+            in_channels = out_channels
+
+        c_last = conv_channels[-1]
+        self.attention = DualAxialAttention(c_last, c_last, groups=attn_groups)
+
+        c_mid = max(c_last // 2, 4)
+        self.decoder = nn.Sequential(
+            nn.Conv2d(c_last, c_mid, kernel_size=3, padding=1),
+            nn.BatchNorm2d(c_mid),
+            nn.SiLU(inplace=True),
+            nn.Conv2d(c_mid, 2, kernel_size=1),
+            nn.BatchNorm2d(2),
+            nn.SiLU(inplace=True)
+        )
+        self.avg_pool = nn.AdaptiveAvgPool2d((num_keypoints, 1))
+        self._initialize_weights()
+
+    def _initialize_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv1d):
+                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, (nn.BatchNorm1d, nn.LayerNorm)):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.Linear):
+                nn.init.xavier_normal_(m.weight)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+
+    def forward(self, x):
+        # [B, 540, 20]
+        x = self.tcn(x)                          # [B, C_tcn, 20]
+        x = x.transpose(1, 2).unsqueeze(1)       # [B, 1, 20, C_tcn]
+        x = self.up(x)
+        for block in self.residual_blocks:
+            x = block(x)                         # [B, C_conv, 20, W']
+        x = x.permute(0, 1, 3, 2)                # [B, C_conv, W', 20]
+        x = self.attention(x)
+        x = self.decoder(x)                      # [B, 2, W', 20]
+        x = self.avg_pool(x).squeeze(-1)         # [B, 2, 15]
+        return x.transpose(1, 2)                 # [B, 15, 2]
+
+
+def describe(model: 'CompactWiFlowPoseModel'):
+    params = sum(p.numel() for p in model.parameters())
+    tcn_g = [blk.groups for blk in model.tcn.network]
+    return {'params': params, 'tcn_groups_per_block': tcn_g,
+            'conv_strides': model.conv_strides, 'final_width': model.final_width}
@@ -0,0 +1,278 @@
+"""WiFlow-STD compact-variant efficiency sweep (ADR-152) — sequential overnight runner.
+
+Trains compact variants of the upstream WiFlow-STD architecture on the same
+data/split as the full-size reference retraining (seed 42, file-level 70/15/15,
+upstream dataset.py) and evaluates PCK@10..50 + MPJPE on the full test split and
+the corruption-free test subset (file indices < 487).
+
+Training mirrors upstream run.py/train.py defaults except:
+- fp32 only (no fp16 autocast / GradScaler — avoids the BN-poisoning trap
+  documented in RESULTS.md defect 5; data on disk is already cleaned).
+- batch 64 (kept modest: another GPU job may share the 16 GB card tonight).
+- scheduler + early stopping keyed on val MPJPE (upstream early-stops on val MPE
+  with patience 5; same here).
+
+Usage:
+  venv/bin/python sweep/run_sweep.py --dry-run    # param counts only
+  nohup venv/bin/python sweep/run_sweep.py > sweep/sweep.log 2>&1 &
+
+Idempotent: variants already present in sweep/results.jsonl are skipped.
+
+NOTE: deployed to ruvultra (~/wiflow-std-bench/sweep) as a standalone file, so
+it deliberately inlines its helpers. The reference implementations (upstream
+import shim, >1GB np.load mmap patch, key-remap loader, canonical evaluate
+loop) live in benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
+"""
+import argparse
+import copy
+import json
+import os
+import random
+import sys
+import time
+
+import numpy as np
+import torch
+from torch.utils.data import DataLoader, Subset
+
+# csi_windows.npy is ~13 GB; mmap large arrays instead of eagerly loading
+# ~15 GB into RAM (same patch as _bench_common._np_load_mmap).
+_np_load = np.load
+
+
+def _np_load_mmap(path, *a, **kw):
+    if (isinstance(path, str) and path.endswith('.npy')
+            and os.path.getsize(path) > 1 << 30 and 'mmap_mode' not in kw):
+        kw['mmap_mode'] = 'r'
+    return _np_load(path, *a, **kw)
+
+
+np.load = _np_load_mmap
+
+BENCH = os.path.expanduser('~/wiflow-std-bench')
+SWEEP = os.path.join(BENCH, 'sweep')
+sys.path.insert(0, os.path.join(BENCH, 'upstream'))
+sys.path.insert(0, SWEEP)
+
+from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders  # noqa: E402
+from losses.pose_loss import PoseLoss          # noqa: E402
+from utils.metrics import calculate_pck, calculate_mpjpe  # noqa: E402
+from model_compact import CompactWiFlowPoseModel, describe  # noqa: E402
+
+VARIANTS = [
+    # name, tcn_channels, conv_channels, attn_groups, groups_mode, input_pw_groups
+    dict(name='half',    tcn=[270, 220, 170, 120], conv=[4, 8, 16, 32], attn_groups=4,
+         groups_mode='gcd20', input_pw_groups=1),
+    dict(name='quarter', tcn=[135, 110, 85, 60],   conv=[2, 4, 8, 16],  attn_groups=2,
+         groups_mode='gcd20', input_pw_groups=1),
+    dict(name='tiny',    tcn=[68, 56, 44, 32],     conv=[2, 4, 8, 16],  attn_groups=2,
+         groups_mode='depthwise', input_pw_groups=4),
+]
+
+BATCH = 64
+EPOCHS = 50
+PATIENCE = 5
+LR = 1e-4
+WEIGHT_DECAY = 5e-5
+SEED = 42
+CORRUPT_FILE_START = 487  # files 487-499 were zero-filled by clean_nan.py
+
+
+def set_seed(seed=SEED):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+
+
+def build_model(v, dropout=0.5):
+    return CompactWiFlowPoseModel(
+        tcn_channels=v['tcn'], conv_channels=v['conv'], attn_groups=v['attn_groups'],
+        groups_mode=v['groups_mode'], input_pw_groups=v['input_pw_groups'],
+        dropout=dropout)
+
+
+@torch.no_grad()
+def evaluate(model, loader, device):
+    model.eval()
+    totals = {t: 0.0 for t in (0.1, 0.2, 0.3, 0.4, 0.5)}
+    total_mpe, n = 0.0, 0
+    for bx, by in loader:
+        bx, by = bx.to(device), by.to(device)
+        out = model(bx)
+        bs = by.size(0)
+        total_mpe += calculate_mpjpe(out, by) * bs
+        pck = calculate_pck(out, by, thresholds=list(totals))
+        for t in totals:
+            totals[t] += pck[t] * bs
+        n += bs
+    return {'samples': n, 'mpjpe': total_mpe / n,
+            **{f'pck@{int(t * 100)}': totals[t] / n for t in totals}}
+
+
+def train_variant(v, dataset, device):
+    set_seed(SEED)
+    train_loader, val_loader, test_loader = create_preprocessed_train_val_test_loaders(
+        dataset=dataset, batch_size=BATCH, num_workers=2, random_seed=SEED)
+
+    set_seed(SEED)  # re-seed after split so init is split-independent
+    model = build_model(v).to(device)
+    info = describe(model)
+    print(f"[{v['name']}] params={info['params']:,} tcn_groups={info['tcn_groups_per_block']} "
+          f"conv_strides={info['conv_strides']} final_width={info['final_width']}", flush=True)
+
+    criterion = PoseLoss(position_weight=1.0, bone_weight=0.2, loss_type='smooth_l1')
+    optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY,
+                                  betas=(0.9, 0.999))
+    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
+        optimizer, mode='min', factor=0.5, patience=3, min_lr=LR / 1000,
+        cooldown=1, threshold=1e-4)
+
+    best_val_mpe = float('inf')
+    best_val_pck20 = 0.0
+    best_epoch = 0
+    best_state = None
+    patience_counter = 0
+    t0 = time.time()
+    error = None
+    epochs_run = 0
+
+    for epoch in range(1, EPOCHS + 1):
+        model.train()
+        ep_loss, nb = 0.0, 0
+        te = time.time()
+        for i, (bx, by) in enumerate(train_loader):
+            bx = bx.to(device, non_blocking=True)
+            by = by.to(device, non_blocking=True)
+            optimizer.zero_grad(set_to_none=True)
+            out = model(bx)
+            loss, _parts = criterion(out, by)
+            if not torch.isfinite(loss):
+                error = f'non-finite loss at epoch {epoch} step {i}'
+                break
+            loss.backward()
+            optimizer.step()
+            ep_loss += loss.item()
+            nb += 1
+            if epoch == 1 and i % 500 == 0:
+                print(f"[{v['name']}] e1 step {i}/{len(train_loader)} loss={loss.item():.5f}",
+                      flush=True)
+        if error:
+            break
+        epochs_run = epoch
+
+        val = evaluate(model, val_loader, device)
+        scheduler.step(val['mpjpe'])
+        lr_now = optimizer.param_groups[0]['lr']
+        print(f"[{v['name']}] epoch {epoch}/{EPOCHS} train_loss={ep_loss / max(nb, 1):.5f} "
+              f"val_mpjpe={val['mpjpe']:.5f} val_pck20={val['pck@20'] * 100:.2f}% "
+              f"lr={lr_now:.2e} ({time.time() - te:.0f}s)", flush=True)
+
+        if val['mpjpe'] < best_val_mpe:
+            best_val_mpe = val['mpjpe']
+            best_val_pck20 = val['pck@20']
+            best_epoch = epoch
+            best_state = copy.deepcopy(model.state_dict())
+            patience_counter = 0
+        else:
+            patience_counter += 1
+            if patience_counter >= PATIENCE:
+                print(f"[{v['name']}] early stop at epoch {epoch} (best {best_epoch})", flush=True)
+                break
+
+    train_seconds = time.time() - t0
+    result = {
+        'variant': v['name'], 'params': info['params'],
+        'tcn_channels': v['tcn'], 'conv_channels': v['conv'],
+        'attn_groups': v['attn_groups'], 'groups_mode': v['groups_mode'],
+        'input_pw_groups': v['input_pw_groups'],
+        'tcn_groups_per_block': info['tcn_groups_per_block'],
+        'conv_strides': info['conv_strides'], 'final_width': info['final_width'],
+        'batch_size': BATCH, 'max_epochs': EPOCHS, 'patience': PATIENCE,
+        'lr': LR, 'weight_decay': WEIGHT_DECAY, 'seed': SEED, 'precision': 'fp32',
+        'epochs_run': epochs_run, 'best_epoch': best_epoch,
+        'best_val_mpjpe': best_val_mpe if best_state else None,
+        'best_val_pck20': best_val_pck20 if best_state else None,
+        'train_seconds': round(train_seconds, 1),
+        'torch': torch.__version__, 'error': error,
+        'finished_utc': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
+    }
+
+    if best_state is not None:
+        ckpt = os.path.join(SWEEP, f"{v['name']}_best.pth")
+        torch.save(best_state, ckpt)
+        result['checkpoint'] = ckpt
+        model.load_state_dict(best_state)
+
+        eval_loader = DataLoader(test_loader.dataset, batch_size=256, shuffle=False,
+                                 num_workers=2)
+        result['test_full'] = evaluate(model, eval_loader, device)
+
+        w2f = dataset.window_to_file
+        clean_idx = [i for i in test_loader.dataset.indices if w2f[i] < CORRUPT_FILE_START]
+        clean_loader = DataLoader(Subset(dataset, clean_idx), batch_size=256,
+                                  shuffle=False, num_workers=2)
+        result['test_clean'] = evaluate(model, clean_loader, device)
+        print(f"[{v['name']}] TEST clean: pck20={result['test_clean']['pck@20'] * 100:.2f}% "
+              f"mpjpe={result['test_clean']['mpjpe']:.5f} | full: "
+              f"pck20={result['test_full']['pck@20'] * 100:.2f}%", flush=True)
+    return result
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument('--dry-run', action='store_true', help='print param counts and exit')
+    args = ap.parse_args()
+
+    if args.dry_run:
+        for v in VARIANTS:
+            m = build_model(v)
+            info = describe(m)
+            x = torch.randn(2, 540, 20)
+            m.eval()
+            y = m(x)
+            print(f"{v['name']:8s} params={info['params']:>9,} "
+                  f"tcn={v['tcn']} conv={v['conv']} attn_g={v['attn_groups']} "
+                  f"mode={v['groups_mode']} pw_g={v['input_pw_groups']} "
+                  f"tcn_groups={info['tcn_groups_per_block']} strides={info['conv_strides']} "
+                  f"W'={info['final_width']} out={tuple(y.shape)}")
+        return
+
+    results_path = os.path.join(SWEEP, 'results.jsonl')
+    done = set()
+    if os.path.exists(results_path):
+        with open(results_path) as f:
+            for line in f:
+                try:
+                    done.add(json.loads(line)['variant'])
+                except Exception:
+                    pass
+
+    device = torch.device('cuda')
+    print(f"torch {torch.__version__} on {torch.cuda.get_device_name(0)}", flush=True)
+    data_dir = os.path.join(BENCH, 'preprocessed_csi_data')
+    dataset = PreprocessedCSIKeypointsDataset(data_dir=data_dir, keypoint_scale=1000.0,
+                                              enable_temporal_clean=True)
+
+    for v in VARIANTS:
+        if v['name'] in done:
+            print(f"[{v['name']}] already in results.jsonl — skipping", flush=True)
+            continue
+        print(f"\n===== variant: {v['name']} =====", flush=True)
+        try:
+            result = train_variant(v, dataset, device)
+        except Exception as e:  # record and move on to next variant
+            import traceback
+            traceback.print_exc()
+            result = {'variant': v['name'], 'error': repr(e),
+                      'finished_utc': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())}
+        with open(results_path, 'a') as f:
+            f.write(json.dumps(result) + '\n')
+            f.flush()
+    print('\nSWEEP COMPLETE', flush=True)
+
+
+if __name__ == '__main__':
+    main()
@@ -0,0 +1,772 @@
+{
+  "torch": {
+    "env": {
+      "torch": "2.12.0+cpu",
+      "platform": "Windows-11-10.0.26200-SP0",
+      "processor": "Intel64 Family 6 Model 197 Stepping 2, GenuineIntel",
+      "num_threads": 16,
+      "checkpoint": "results\\retrained_best_pose_model.pth",
+      "params": 2225042
+    },
+    "variants": {
+      "fp32": {
+        "file": "retrained_fp32_resaved.pth",
+        "size_bytes": 9068948,
+        "size_mb": 9.068948,
+        "latency_batch1": {
+          "batch_size": 1,
+          "runs": 100,
+          "median_ms_per_batch": 24.903650000851485,
+          "median_ms_per_window": 24.903650000851485,
+          "windows_per_second": 40.15475642991324
+        },
+        "latency_batch64": {
+          "batch_size": 64,
+          "runs": 30,
+          "median_ms_per_batch": 184.02919999789447,
+          "median_ms_per_window": 2.875456249967101,
+          "windows_per_second": 347.77089723115813
+        },
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9668200004577636,
+          "pck@50": 0.9915333324432373,
+          "mpjpe": 0.00936222033649683,
+          "wall_seconds": 37.85407733917236
+        }
+      },
+      "fp16": {
+        "file": "retrained_fp16.pth",
+        "size_bytes": 4580332,
+        "size_mb": 4.580332,
+        "latency_batch1": {
+          "batch_size": 1,
+          "runs": 100,
+          "median_ms_per_batch": 23.936699999467237,
+          "median_ms_per_window": 23.936699999467237,
+          "windows_per_second": 41.776853117691964
+        },
+        "latency_batch64": {
+          "batch_size": 64,
+          "runs": 30,
+          "median_ms_per_batch": 102.32584999903338,
+          "median_ms_per_window": 1.5988414062348966,
+          "windows_per_second": 625.4529036465817
+        },
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.966773332977295,
+          "pck@50": 0.9915066654205322,
+          "mpjpe": 0.009460017587244511,
+          "wall_seconds": 21.632277250289917
+        }
+      },
+      "int8_dynamic": {
+        "file": "retrained_int8_dynamic.pth",
+        "size_bytes": 9068948,
+        "size_mb": 9.068948,
+        "latency_batch1": {
+          "batch_size": 1,
+          "runs": 100,
+          "median_ms_per_batch": 18.105350000041653,
+          "median_ms_per_window": 18.105350000041653,
+          "windows_per_second": 55.23229321707117
+        },
+        "latency_batch64": {
+          "batch_size": 64,
+          "runs": 30,
+          "median_ms_per_batch": 168.77549999844632,
+          "median_ms_per_window": 2.6371171874757238,
+          "windows_per_second": 379.20195763359703
+        },
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9668200004577636,
+          "pck@50": 0.9915333324432373,
+          "mpjpe": 0.00936222033649683,
+          "wall_seconds": 45.35376596450806
+        }
+      }
+    },
+    "int8_dynamic_quant_report": {
+      "eligible_module_counts": {
+        "nn.Linear": 0,
+        "nn.Conv1d": 21,
+        "nn.Conv2d": 22
+      },
+      "modules_actually_quantized": [],
+      "n_modules_quantized": 0,
+      "params_total": 2225042,
+      "params_quantized": 0,
+      "params_quantized_fraction": 0.0
+    },
+    "accuracy_subset": {
+      "description": "seed-42 file-level 70/15/15 test split, corrupted windows (files 487-499) excluded, seed-42 random subset",
+      "subset_size": 10000,
+      "clean_test_total": 10000
+    }
+  },
+  "onnx": {
+    "env": {
+      "torch": "2.12.0+cpu",
+      "onnxruntime": "1.26.0",
+      "platform": "Windows-11-10.0.26200-SP0"
+    },
+    "export": {
+      "mode": "dynamic-batch",
+      "exporter": "torchscript",
+      "file": "retrained_fp32_dynamic.onnx",
+      "size_mb": 8.971781
+    },
+    "parity": {
+      "fixture": "results/parity_fixture.npz (batch 2, seed 42)",
+      "max_abs_diff_vs_stored_fixture": 2.384185791015625e-07,
+      "max_abs_diff_vs_torch_now": 2.384185791015625e-07,
+      "pass_lt_1e-4": true
+    },
+    "latency": {
+      "batch1": {
+        "batch_size": 1,
+        "runs": 100,
+        "median_ms_per_batch": 2.5410999987798277,
+        "median_ms_per_window": 2.5410999987798277,
+        "windows_per_second": 393.5303610563043
+      },
+      "batch64": {
+        "batch_size": 64,
+        "runs": 30,
+        "median_ms_per_batch": 181.95204999938142,
+        "median_ms_per_window": 2.8430007812403346,
+        "windows_per_second": 351.7410218803118
+      }
+    },
+    "ort_int8_dynamic_supplementary": {
+      "file": "retrained_int8_ort_dynamic.onnx",
+      "size_mb": 2.438794,
+      "runs": true,
+      "max_abs_diff_vs_fp32_fixture": 0.00827130675315857
+    }
+  },
+  "onnx_accuracy": {
+    "onnx_fp32": {
+      "samples": 10000,
+      "pck@20": 0.9668200004577636,
+      "pck@50": 0.9915333324432373,
+      "mpjpe": 0.00936222568154335,
+      "wall_seconds": 22.34790802001953
+    },
+    "onnx_int8_ort_dynamic": {
+      "samples": 10000,
+      "pck@20": 0.965240001964569,
+      "pck@50": 0.9915466655731201,
+      "mpjpe": 0.01108054072111845,
+      "wall_seconds": 55.742953062057495
+    }
+  },
+  "latency_controlled_rerun": {
+    "note": "3 interleaved repetitions per variant, median ms/window; quiet box",
+    "fp32": {
+      "batch1_ms_per_window_median": 10.969150001983508,
+      "batch1_reps": [
+        10.969150001983508,
+        12.646450000829645,
+        10.49820000116597
+      ],
+      "batch64_ms_per_window_median": 2.2734187500077496,
+      "batch64_reps": [
+        2.377234374989712,
+        2.124126562478068,
+        2.2734187500077496
+      ]
+    },
+    "fp16": {
+      "batch1_ms_per_window_median": 24.313550000442774,
+      "batch1_reps": [
+        25.1078499986761,
+        21.856999999727122,
+        24.313550000442774
+      ],
+      "batch64_ms_per_window_median": 2.414695312495496,
+      "batch64_reps": [
+        2.5705156249955508,
+        1.7137437499741281,
+        2.414695312495496
+      ]
+    },
+    "int8_dynamic": {
+      "batch1_ms_per_window_median": 15.627150000000256,
+      "batch1_reps": [
+        17.67525000104797,
+        14.627999998992891,
+        15.627150000000256
+      ],
+      "batch64_ms_per_window_median": 2.0546906250160646,
+      "batch64_reps": [
+        2.0546906250160646,
+        2.03407343752815,
+        2.9325796875241394
+      ]
+    },
+    "onnx_fp32": {
+      "batch1_ms_per_window_median": 3.186650001225644,
+      "batch1_reps": [
+        2.7332500012562377,
+        3.1995500012271805,
+        3.186650001225644
+      ],
+      "batch64_ms_per_window_median": 1.9893374999924163,
+      "batch64_reps": [
+        1.5590843750032946,
+        1.9893374999924163,
+        2.2144343749914697
+      ]
+    },
+    "onnx_int8_ort_dynamic": {
+      "batch1_ms_per_window_median": 6.50984999811044,
+      "batch1_reps": [
+        6.50984999811044,
+        6.455249998907675,
+        6.789299999581999
+      ],
+      "batch64_ms_per_window_median": 5.770093750015803,
+      "batch64_reps": [
+        5.770093750015803,
+        3.912374999970325,
+        7.8067296875019565
+      ]
+    }
+  },
+  "onnx_static_ptq": {
+    "env": {
+      "onnxruntime": "1.26.0",
+      "torch": "2.12.0+cpu",
+      "platform": "Windows-11-10.0.26200-SP0",
+      "source_model": "retrained_fp32_dynamic.onnx",
+      "preprocessed_model": {
+        "file": "retrained_fp32_preproc.onnx",
+        "size_mb": 8.981529
+      }
+    },
+    "variants": {
+      "minmax_all": {
+        "file": "retrained_int8_static_minmax_all.onnx",
+        "size_bytes": 2604286,
+        "size_mb": 2.604286,
+        "calibration": {
+          "method": "minmax",
+          "windows": 1000,
+          "percentile": null,
+          "seconds": 5.052440166473389
+        },
+        "scope": "all",
+        "per_channel": true,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {
+          "Add": 9,
+          "AveragePool": 1,
+          "BatchNormalization": 12,
+          "Concat": 10,
+          "Conv": 43,
+          "DequantizeLinear": 283,
+          "Einsum": 4,
+          "Gather": 16,
+          "Mul": 39,
+          "QuantizeLinear": 181,
+          "Reshape": 14,
+          "Shape": 2,
+          "Sigmoid": 37,
+          "Slice": 8,
+          "Softmax": 2,
+          "Squeeze": 1,
+          "Transpose": 7,
+          "Unsqueeze": 11
+        },
+        "max_abs_diff_vs_fp32_fixture": 0.015945255756378174,
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9545266661643982,
+          "pck@50": 0.9913666645050049,
+          "mpjpe": 0.014860070134699345,
+          "wall_seconds": 43.455235958099365
+        }
+      },
+      "minmax_conv": {
+        "file": "retrained_int8_static_minmax_conv.onnx",
+        "size_bytes": 2527421,
+        "size_mb": 2.527421,
+        "calibration": {
+          "method": "minmax",
+          "windows": 1000,
+          "percentile": null,
+          "seconds": 4.380746126174927
+        },
+        "scope": "conv",
+        "per_channel": true,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {
+          "Add": 9,
+          "AveragePool": 1,
+          "BatchNormalization": 12,
+          "Concat": 10,
+          "Conv": 43,
+          "DequantizeLinear": 156,
+          "Einsum": 4,
+          "Gather": 16,
+          "Mul": 39,
+          "QuantizeLinear": 78,
+          "Reshape": 14,
+          "Shape": 2,
+          "Sigmoid": 37,
+          "Slice": 8,
+          "Softmax": 2,
+          "Squeeze": 1,
+          "Transpose": 7,
+          "Unsqueeze": 11
+        },
+        "max_abs_diff_vs_fp32_fixture": 0.010693132877349854,
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9663399996757507,
+          "pck@50": 0.9918666641235352,
+          "mpjpe": 0.01084446222037077,
+          "wall_seconds": 35.937947034835815
+        }
+      },
+      "entropy_all": {
+        "file": "retrained_int8_static_entropy_all.onnx",
+        "size_bytes": 2604268,
+        "size_mb": 2.604268,
+        "calibration": {
+          "method": "entropy",
+          "windows": 512,
+          "percentile": null,
+          "seconds": 23.835066318511963
+        },
+        "scope": "all",
+        "per_channel": true,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {
+          "Add": 9,
+          "AveragePool": 1,
+          "BatchNormalization": 12,
+          "Concat": 10,
+          "Conv": 43,
+          "DequantizeLinear": 283,
+          "Einsum": 4,
+          "Gather": 16,
+          "Mul": 39,
+          "QuantizeLinear": 181,
+          "Reshape": 14,
+          "Shape": 2,
+          "Sigmoid": 37,
+          "Slice": 8,
+          "Softmax": 2,
+          "Squeeze": 1,
+          "Transpose": 7,
+          "Unsqueeze": 11
+        },
+        "max_abs_diff_vs_fp32_fixture": 0.015280365943908691,
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9530466662406921,
+          "pck@50": 0.9912600006103516,
+          "mpjpe": 0.015098519864678382,
+          "wall_seconds": 51.514281034469604
+        }
+      },
+      "entropy_conv": {
+        "file": "retrained_int8_static_entropy_conv.onnx",
+        "size_bytes": 2527403,
+        "size_mb": 2.527403,
+        "calibration": {
+          "method": "entropy",
+          "windows": 512,
+          "percentile": null,
+          "seconds": 9.634419918060303
+        },
+        "scope": "conv",
+        "per_channel": true,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {
+          "Add": 9,
+          "AveragePool": 1,
+          "BatchNormalization": 12,
+          "Concat": 10,
+          "Conv": 43,
+          "DequantizeLinear": 156,
+          "Einsum": 4,
+          "Gather": 16,
+          "Mul": 39,
+          "QuantizeLinear": 78,
+          "Reshape": 14,
+          "Shape": 2,
+          "Sigmoid": 37,
+          "Slice": 8,
+          "Softmax": 2,
+          "Squeeze": 1,
+          "Transpose": 7,
+          "Unsqueeze": 11
+        },
+        "max_abs_diff_vs_fp32_fixture": 0.012535125017166138,
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9659599989891052,
+          "pck@50": 0.9918666648864746,
+          "mpjpe": 0.010778637571632861,
+          "wall_seconds": 41.01180171966553
+        }
+      },
+      "percentile_all": {
+        "file": "retrained_int8_static_percentile_all.onnx",
+        "size_bytes": 2604052,
+        "size_mb": 2.604052,
+        "calibration": {
+          "method": "percentile",
+          "windows": 512,
+          "percentile": 99.99,
+          "seconds": 20.221954584121704
+        },
+        "scope": "all",
+        "per_channel": true,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {
+          "Add": 9,
+          "AveragePool": 1,
+          "BatchNormalization": 12,
+          "Concat": 10,
+          "Conv": 43,
+          "DequantizeLinear": 283,
+          "Einsum": 4,
+          "Gather": 16,
+          "Mul": 39,
+          "QuantizeLinear": 181,
+          "Reshape": 14,
+          "Shape": 2,
+          "Sigmoid": 37,
+          "Slice": 8,
+          "Softmax": 2,
+          "Squeeze": 1,
+          "Transpose": 7,
+          "Unsqueeze": 11
+        },
+        "max_abs_diff_vs_fp32_fixture": 0.017689883708953857,
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9639333323478698,
+          "pck@50": 0.9916799991607667,
+          "mpjpe": 0.012176512064039708,
+          "wall_seconds": 49.365190744400024
+        }
+      },
+      "percentile_conv": {
+        "file": "retrained_int8_static_percentile_conv.onnx",
+        "size_bytes": 2527241,
+        "size_mb": 2.527241,
+        "calibration": {
+          "method": "percentile",
+          "windows": 512,
+          "percentile": 99.99,
+          "seconds": 8.223475694656372
+        },
+        "scope": "conv",
+        "per_channel": true,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {
+          "Add": 9,
+          "AveragePool": 1,
+          "BatchNormalization": 12,
+          "Concat": 10,
+          "Conv": 43,
+          "DequantizeLinear": 156,
+          "Einsum": 4,
+          "Gather": 16,
+          "Mul": 39,
+          "QuantizeLinear": 78,
+          "Reshape": 14,
+          "Shape": 2,
+          "Sigmoid": 37,
+          "Slice": 8,
+          "Softmax": 2,
+          "Squeeze": 1,
+          "Transpose": 7,
+          "Unsqueeze": 11
+        },
+        "max_abs_diff_vs_fp32_fixture": 0.014725983142852783,
+        "accuracy": {
+          "samples": 10000,
+          "pck@20": 0.9660599988937378,
+          "pck@50": 0.9916066654205322,
+          "mpjpe": 0.010310938355326652,
+          "wall_seconds": 36.89548587799072
+        }
+      }
+    },
+    "latency": {
+      "note": "3 interleaved repetitions per variant, median ms/window; onnx_fp32 / onnx_int8_ort_dynamic are same-session references",
+      "onnx_fp32": {
+        "batch1_reps": [
+          4.5327999996516155,
+          2.535649999117595,
+          2.167549997466267
+        ],
+        "batch64_reps": [
+          1.9354515624740998,
+          2.4948054687854437,
+          1.9334703125082342
+        ],
+        "batch1_ms_per_window_median": 2.535649999117595,
+        "batch64_ms_per_window_median": 1.9354515624740998
+      },
+      "onnx_int8_ort_dynamic": {
+        "batch1_reps": [
+          5.698599999959697,
+          5.721350000385428,
+          4.805099997611251
+        ],
+        "batch64_reps": [
+          4.096601562508795,
+          4.857628124995017,
+          4.583800000006022
+        ],
+        "batch1_ms_per_window_median": 5.698599999959697,
+        "batch64_ms_per_window_median": 4.583800000006022
+      },
+      "entropy_all": {
+        "batch1_reps": [
+          6.444149999879301,
+          5.038299999796436,
+          5.713200000172947
+        ],
+        "batch64_reps": [
+          4.149468750028973,
+          3.437125000004926,
+          4.410960937491382
+        ],
+        "batch1_ms_per_window_median": 5.713200000172947,
+        "batch64_ms_per_window_median": 4.149468750028973
+      },
+      "entropy_conv": {
+        "batch1_reps": [
+          4.874750000453787,
+          5.169099998965976,
+          5.236699998931726
+        ],
+        "batch64_reps": [
+          3.010160156236452,
+          3.1175546875203963,
+          3.516850781238645
+        ],
+        "batch1_ms_per_window_median": 5.169099998965976,
+        "batch64_ms_per_window_median": 3.1175546875203963
+      },
+      "percentile_all": {
+        "batch1_reps": [
+          5.184749999898486,
+          5.2898499998264015,
+          5.916899999647285
+        ],
+        "batch64_reps": [
+          4.305105468745296,
+          4.460741406262514,
+          4.184502343747454
+        ],
+        "batch1_ms_per_window_median": 5.2898499998264015,
+        "batch64_ms_per_window_median": 4.305105468745296
+      },
+      "percentile_conv": {
+        "batch1_reps": [
+          4.916449999655015,
+          7.150899999032845,
+          5.284949998895172
+        ],
+        "batch64_reps": [
+          3.855813281262499,
+          4.688969531230214,
+          5.220103124997877
+        ],
+        "batch1_ms_per_window_median": 5.284949998895172,
+        "batch64_ms_per_window_median": 4.688969531230214
+      },
+      "minmax_all": {
+        "batch1_reps": [
+          6.463300000177696,
+          7.149449998905766,
+          5.3209000016067876
+        ],
+        "batch64_reps": [
+          3.9251343750095202,
+          4.033442187505898,
+          3.428199218745931
+        ],
+        "batch1_ms_per_window_median": 6.463300000177696,
+        "batch64_ms_per_window_median": 3.9251343750095202
+      },
+      "minmax_conv": {
+        "batch1_reps": [
+          5.9961499991914025,
+          5.236549999608542,
+          4.854399998293957
+        ],
+        "batch64_reps": [
+          4.368359375007458,
+          3.249617187492504,
+          3.0238906249735464
+        ],
+        "batch1_ms_per_window_median": 5.236549999608542,
+        "batch64_ms_per_window_median": 3.249617187492504
+      }
+    },
+    "accuracy_subset": {
+      "description": "seed-42 file-level 70/15/15 test split, corrupted windows excluded, seed-42 random subset (same as quantize_bench/eval_ort_accuracy)",
+      "subset_size": 10000
+    }
+  },
+  "tiny_variant": {
+    "env": {
+      "torch": "2.12.0+cpu",
+      "onnxruntime": "1.26.0",
+      "platform": "Windows-11-10.0.26200-SP0",
+      "num_threads": 16,
+      "checkpoint": "results\\tiny_best.pth",
+      "checkpoint_size_bytes": 340555,
+      "params": 56290,
+      "variant_config": {
+        "tcn": [
+          68,
+          56,
+          44,
+          32
+        ],
+        "conv": [
+          2,
+          4,
+          8,
+          16
+        ],
+        "attn_groups": 2,
+        "groups_mode": "depthwise",
+        "input_pw_groups": 4
+      }
+    },
+    "export": {
+      "mode": "dynamic-batch",
+      "exporter": "torchscript",
+      "opset": 17,
+      "file": "tiny_fp32_dynamic.onnx",
+      "size_bytes": 295279,
+      "size_mb": 0.295279,
+      "verified_batches": [
+        1,
+        2,
+        64
+      ],
+      "note": "AdaptiveAvgPool2d((15,1)) replaced at export by an exact mean(-1) + constant averaging matmul (final_width 16 is not a multiple of 15, which the TorchScript exporter rejects); exactness proven by the parity check vs the original torch model"
+    },
+    "parity": {
+      "fixture": "results/parity_fixture.npz input (batch 2, seed 42); reference output recomputed with the tiny torch model",
+      "max_abs_diff_vs_torch": 1.4901161193847656e-07,
+      "pass_lt_1e-4": true
+    },
+    "int8_static_percentile_conv": {
+      "file": "tiny_int8_static_percentile_conv.onnx",
+      "size_bytes": 248278,
+      "size_mb": 0.248278,
+      "calibration": {
+        "method": "percentile",
+        "percentile": 99.99,
+        "windows": 512,
+        "scope": "conv-only TRAIN-split corruption-free",
+        "seconds": 1.5347836017608643
+      },
+      "per_channel": true,
+      "activation_type": "QInt8",
+      "weight_type": "QInt8",
+      "max_abs_diff_vs_fp32_fixture": 0.018491357564926147
+    },
+    "latency": {
+      "note": "3 interleaved repetitions per variant, median ms/window; full-model sessions are same-session references",
+      "tiny_onnx_fp32": {
+        "batch1_reps": [
+          0.6312500008789357,
+          0.6834500018157996,
+          0.6595999984710943
+        ],
+        "batch64_reps": [
+          0.37747578119251557,
+          0.24196640623586063,
+          0.2314671875183194
+        ],
+        "batch1_ms_per_window_median": 0.6595999984710943,
+        "batch64_ms_per_window_median": 0.24196640623586063
+      },
+      "tiny_onnx_int8_static_percentile_conv": {
+        "batch1_reps": [
+          0.7988500001374632,
+          0.9382499993080273,
+          0.8451000030618161
+        ],
+        "batch64_reps": [
+          0.9211476562995813,
+          1.3045390625165965,
+          1.026230468767153
+        ],
+        "batch1_ms_per_window_median": 0.8451000030618161,
+        "batch64_ms_per_window_median": 1.026230468767153
+      },
+      "full_onnx_fp32_reference": {
+        "batch1_reps": [
+          2.267249998112675,
+          2.80170000041835,
+          2.132149998942623
+        ],
+        "batch64_reps": [
+          1.3050578124875756,
+          1.4244992187855132,
+          1.8014164062947202
+        ],
+        "batch1_ms_per_window_median": 2.267249998112675,
+        "batch64_ms_per_window_median": 1.4244992187855132
+      },
+      "full_onnx_int8_static_percentile_conv_reference": {
+        "batch1_reps": [
+          5.529599999135826,
+          4.768399998283712,
+          6.215800000063609
+        ],
+        "batch64_reps": [
+          3.815724218725336,
+          3.1025562500417436,
+          4.333318749957016
+        ],
+        "batch1_ms_per_window_median": 5.529599999135826,
+        "batch64_ms_per_window_median": 3.815724218725336
+      }
+    },
+    "accuracy_subset": {
+      "description": "seed-42 file-level 70/15/15 test split, corrupted windows excluded, seed-42 random subset (same as quantize_bench/eval_ort_accuracy/static_ptq_bench)",
+      "subset_size": 10000
+    },
+    "accuracy": {
+      "tiny_onnx_fp32": {
+        "samples": 10000,
+        "pck@20": 0.941106667804718,
+        "pck@50": 0.99369333152771,
+        "mpjpe": 0.012527281279861927,
+        "wall_seconds": 10.927234888076782
+      },
+      "tiny_onnx_int8_static_percentile_conv": {
+        "samples": 10000,
+        "pck@20": 0.9268133331298828,
+        "pck@50": 0.9932933319091797,
+        "mpjpe": 0.014906252065300942,
+        "wall_seconds": 12.320892333984375
+      }
+    }
+  }
+}
@@ -0,0 +1,3 @@
+{"variant": "half", "params": 843834, "tcn_channels": [270, 220, 170, 120], "conv_channels": [4, 8, 16, 32], "attn_groups": 4, "groups_mode": "gcd20", "input_pw_groups": 1, "tcn_groups_per_block": [[20, 10], [10, 20], [20, 10], [10, 20]], "conv_strides": [2, 2, 2, 1], "final_width": 15, "batch_size": 64, "max_epochs": 50, "patience": 5, "lr": 0.0001, "weight_decay": 5e-05, "seed": 42, "precision": "fp32", "epochs_run": 28, "best_epoch": 23, "best_val_mpjpe": 0.008576328293592842, "best_val_pck20": 0.9690593021534107, "train_seconds": 1346.4, "torch": "2.11.0+cu128", "error": null, "finished_utc": "2026-06-11T03:09:47Z", "checkpoint": "/home/ruvultra/wiflow-std-bench/sweep/half_best.pth", "test_full": {"samples": 54000, "mpjpe": 0.009419974447676428, "pck@10": 0.8740543655289544, "pck@20": 0.9610469643628156, "pck@30": 0.9813556064146537, "pck@40": 0.9896086878246731, "pck@50": 0.9934827546013726}, "test_clean": {"samples": 52560, "mpjpe": 0.008980081718602137, "pck@10": 0.8840944136840205, "pck@20": 0.9662253179869514, "pck@30": 0.9847971080282144, "pck@40": 0.9917795997050618, "pck@50": 0.9946956242600532}}
+{"variant": "quarter", "params": 338600, "tcn_channels": [135, 110, 85, 60], "conv_channels": [2, 4, 8, 16], "attn_groups": 2, "groups_mode": "gcd20", "input_pw_groups": 1, "tcn_groups_per_block": [[20, 5], [5, 10], [10, 5], [5, 20]], "conv_strides": [2, 2, 1, 1], "final_width": 15, "batch_size": 64, "max_epochs": 50, "patience": 5, "lr": 0.0001, "weight_decay": 5e-05, "seed": 42, "precision": "fp32", "epochs_run": 50, "best_epoch": 50, "best_val_mpjpe": 0.008780752391864856, "best_val_pck20": 0.9672531302240159, "train_seconds": 1754.4, "torch": "2.11.0+cu128", "error": null, "finished_utc": "2026-06-11T03:39:06Z", "checkpoint": "/home/ruvultra/wiflow-std-bench/sweep/quarter_best.pth", "test_full": {"samples": 54000, "mpjpe": 0.009705399298005634, "pck@10": 0.8646123917014511, "pck@20": 0.9553815319449813, "pck@30": 0.979827209190086, "pck@40": 0.9887037501511751, "pck@50": 0.9931309027671814}, "test_clean": {"samples": 52560, "mpjpe": 0.009279253277105465, "pck@10": 0.8742288637923323, "pck@20": 0.9605315079427745, "pck@30": 0.9833016723076865, "pck@40": 0.9908206971631566, "pck@50": 0.9942719799017071}}
+{"variant": "tiny", "params": 56290, "tcn_channels": [68, 56, 44, 32], "conv_channels": [2, 4, 8, 16], "attn_groups": 2, "groups_mode": "depthwise", "input_pw_groups": 4, "tcn_groups_per_block": [[540, 68], [68, 56], [56, 44], [44, 32]], "conv_strides": [2, 1, 1, 1], "final_width": 16, "batch_size": 64, "max_epochs": 50, "patience": 5, "lr": 0.0001, "weight_decay": 5e-05, "seed": 42, "precision": "fp32", "epochs_run": 50, "best_epoch": 47, "best_val_mpjpe": 0.012602971208592256, "best_val_pck20": 0.9397210340146666, "train_seconds": 1540.1, "torch": "2.11.0+cu128", "error": null, "finished_utc": "2026-06-11T04:04:50Z", "checkpoint": "/home/ruvultra/wiflow-std-bench/sweep/tiny_best.pth", "test_full": {"samples": 54000, "mpjpe": 0.012859782406853305, "pck@10": 0.7640358444319831, "pck@20": 0.9364815320968628, "pck@30": 0.9731568422317505, "pck@40": 0.9866444962642811, "pck@50": 0.992488939108672}, "test_clean": {"samples": 52560, "mpjpe": 0.012502924276904246, "pck@10": 0.770895526488985, "pck@20": 0.9411073559313967, "pck@30": 0.9764840687790962, "pck@40": 0.9886695077067278, "pck@50": 0.9936238432039409}}
@@ -0,0 +1,21 @@
+{
+  "checkpoint": "/home/ruvultra/wiflow-std-bench/upstream/test/best_pose_model.pth",
+  "test_full": {
+    "samples": 54000,
+    "mpjpe": 0.009834060806367133,
+    "pck@10": 0.8686346120127925,
+    "pck@20": 0.9608815324571398,
+    "pck@30": 0.9789111610695168,
+    "pck@40": 0.9857975759682832,
+    "pck@50": 0.9898827553325229
+  },
+  "test_clean": {
+    "samples": 52560,
+    "mpjpe": 0.009432755044379373,
+    "pck@10": 0.876996495807189,
+    "pck@20": 0.9661454100405608,
+    "pck@30": 0.9823453060205306,
+    "pck@40": 0.987909734176537,
+    "pck@50": 0.9911238361167036
+  }
+}
@@ -0,0 +1,32 @@
+{
+  "published": {
+    "pck@20": 0.9725,
+    "pck@30": 0.9863,
+    "pck@40": 0.9916,
+    "pck@50": 0.9948,
+    "mpjpe": 0.007
+  },
+  "params_millions": 2.225042,
+  "data_dir": "C:\\Users\\ruv\\.cache\\kagglehub\\datasets\\kaka2434\\wiflow-dataset\\versions\\1\\preprocessed_csi_data",
+  "device": "cpu",
+  "test_full": {
+    "samples": 54000,
+    "mpjpe": NaN,
+    "pck@10": 5.6790124349020145e-05,
+    "pck@20": 0.0007876543271596785,
+    "pck@30": 0.007780246982971827,
+    "pck@40": 0.05529259262923841,
+    "pck@50": 0.1542370371548114,
+    "wall_seconds": 118.03756999969482
+  },
+  "test_drop_last": {
+    "samples": 53952,
+    "mpjpe": NaN,
+    "pck@10": 5.6840649370682976e-05,
+    "pck@20": 0.0007883550872372227,
+    "pck@30": 0.007787168910892621,
+    "pck@40": 0.055318307667895535,
+    "pck@50": 0.15425316342412276,
+    "wall_seconds": 120.87458372116089
+  }
+}
@@ -0,0 +1,333 @@
+"""ADR-152 edge optimization follow-up: ONNX Runtime STATIC post-training
+quantization (calibration-based QDQ) of the retrained WiFlow-STD model, to
+improve on the dynamic-int8 result (2.44 MB, PCK@20 96.52%, 6.5 ms/win b1).
+
+Static PTQ pre-computes activation ranges from calibration data, so inference
+uses QLinearConv/QDQ kernels instead of dynamic ConvInteger -- typically both
+faster and (with good calibration) closer to fp32 accuracy.
+
+Method:
+  - Calibration set: corruption-free windows drawn ONLY from the seed-42
+    file-level TRAINING split (same split as eval_repro.py; corrupted windows
+    excluded via results/nan_windows_mask.npy | big_windows_mask.npy), chosen
+    with np.random.default_rng(42). Never test windows.
+  - quantize_static, QuantFormat.QDQ, per-channel int8 weights, int8
+    activations; calibration methods MinMax / Entropy / Percentile(99.99);
+    scopes "all" (ORT default op set) vs "conv" (op_types_to_quantize=
+    ["Conv"] -- leaves the attention path, which exports as Einsum/Softmax
+    and elementwise ops, in fp32).
+  - Model is pre-processed first (quant_pre_process: symbolic shape
+    inference + ORT graph optimization, folds BatchNormalization into Conv).
+  - Accuracy: identical protocol to eval_ort_accuracy.py -- the 10,000-window
+    seed-42 subset of the corruption-free test split (PCK@20/50, MPJPE).
+  - Latency: median ms/window at batch 1 (100 runs) and batch 64 (30 runs),
+    3 interleaved repetitions across all variants (fp32 and dynamic-int8
+    sessions included as same-session reference points).
+
+Usage:
+  PYTHONUTF8=1 .venv/Scripts/python.exe static_ptq_bench.py \
+      [--data-dir <preprocessed_csi_data>] [--subset 10000]
+      [--calib-minmax 1000] [--calib-hist 512] [--skip-accuracy]
+
+Writes/merges into results/edge_optimization.json under key "onnx_static_ptq".
+"""
+
+import argparse
+import collections
+import json
+import os
+import platform
+import statistics
+import sys
+import time
+
+import numpy as np
+import torch
+
+HERE = os.path.dirname(os.path.abspath(__file__))
+sys.path.insert(0, HERE)
+
+from _bench_common import RESULTS  # noqa: E402
+# quantize_bench sets up upstream imports + the np.load mmap patch
+# (both via _bench_common.import_upstream)
+from quantize_bench import build_test_subset  # noqa: E402
+import quantize_bench as qb  # noqa: E402
+from eval_ort_accuracy import evaluate_ort  # noqa: E402
+
+FP32_ONNX = os.path.join(RESULTS, "retrained_fp32_dynamic.onnx")
+DYN_INT8_ONNX = os.path.join(RESULTS, "retrained_int8_ort_dynamic.onnx")
+PREPROC_ONNX = os.path.join(RESULTS, "retrained_fp32_preproc.onnx")
+
+
+# ---------------------------------------------------------------------------
+# calibration data: corruption-free TRAINING-split windows only
+# ---------------------------------------------------------------------------
+
+def build_calibration_windows(data_dir, n_windows):
+    """Seed-42 file-level 70/15/15 TRAIN split (exactly as eval_repro.py),
+    minus corrupted windows, then a seed-42 random draw of n_windows."""
+    dataset = qb.PreprocessedCSIKeypointsDataset(
+        data_dir=data_dir, keypoint_scale=1000.0, enable_temporal_clean=True)
+    train_loader, _va, _te = qb.create_preprocessed_train_val_test_loaders(
+        dataset=dataset, batch_size=64, num_workers=0, random_seed=42)
+    train_indices = np.asarray(train_loader.dataset.indices)
+
+    corrupted = (np.load(os.path.join(RESULTS, "nan_windows_mask.npy"))
+                 | np.load(os.path.join(RESULTS, "big_windows_mask.npy")))
+    clean = train_indices[~corrupted[train_indices]]
+    print(f"train split: {len(train_indices)} windows, "
+          f"{len(train_indices) - len(clean)} corrupted excluded, "
+          f"{len(clean)} clean")
+
+    rng = np.random.default_rng(42)
+    sel = np.sort(rng.choice(clean, size=n_windows, replace=False))
+    xs = np.stack([dataset[int(i)][0].numpy() for i in sel]).astype(np.float32)
+    print(f"calibration tensor: {xs.shape} from {n_windows} clean TRAIN windows")
+    return xs
+
+
+def make_reader(windows, batch_size=64):
+    from onnxruntime.quantization import CalibrationDataReader
+
+    class WindowReader(CalibrationDataReader):
+        def __init__(self):
+            self._batches = [windows[i:i + batch_size]
+                             for i in range(0, len(windows), batch_size)]
+            self._it = iter(self._batches)
+
+        def get_next(self):
+            b = next(self._it, None)
+            return None if b is None else {"input": b}
+
+        def rewind(self):
+            self._it = iter(self._batches)
+
+        def __len__(self):
+            return len(self._batches)
+
+    return WindowReader()
+
+
+# ---------------------------------------------------------------------------
+# quantization variants
+# ---------------------------------------------------------------------------
+
+def preprocess_model():
+    from onnxruntime.quantization.shape_inference import quant_pre_process
+    quant_pre_process(FP32_ONNX, PREPROC_ONNX)
+    return PREPROC_ONNX
+
+
+def quantize_variant(src, dst, method, scope, calib_windows):
+    from onnxruntime.quantization import (CalibrationMethod, QuantFormat,
+                                          QuantType, quantize_static)
+    methods = {
+        "minmax": CalibrationMethod.MinMax,
+        "entropy": CalibrationMethod.Entropy,
+        "percentile": CalibrationMethod.Percentile,
+    }
+    # NB: do NOT pass CalibMaxIntermediateOutputs -- in ORT 1.26 the MinMax
+    # calibrater clears its buffer every N batches and then raises
+    # "No data is collected" if the batch count is divisible by N.
+    extra = {}
+    if method == "percentile":
+        extra["CalibPercentile"] = 99.99
+    op_types = ["Conv"] if scope == "conv" else None
+
+    t0 = time.time()
+    quantize_static(
+        src, dst, make_reader(calib_windows),
+        quant_format=QuantFormat.QDQ,
+        op_types_to_quantize=op_types,
+        per_channel=True,
+        activation_type=QuantType.QInt8,
+        weight_type=QuantType.QInt8,
+        calibrate_method=methods[method],
+        extra_options=extra,
+    )
+    secs = time.time() - t0
+
+    import onnx
+    ops = collections.Counter(n.op_type for n in onnx.load(dst).graph.node)
+    return {
+        "file": os.path.basename(dst),
+        "size_bytes": os.path.getsize(dst),
+        "size_mb": os.path.getsize(dst) / 1e6,
+        "calibration": {"method": method,
+                        "windows": int(len(calib_windows)),
+                        "percentile": extra.get("CalibPercentile"),
+                        "seconds": secs},
+        "scope": scope,
+        "per_channel": True,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+        "node_counts": {k: v for k, v in sorted(ops.items())},
+    }
+
+
+# ---------------------------------------------------------------------------
+# latency (3 interleaved reps, like the latency_controlled_rerun)
+# ---------------------------------------------------------------------------
+
+def ort_session(path):
+    import onnxruntime as ort
+    return ort.InferenceSession(path, providers=["CPUExecutionProvider"])
+
+
+def bench_ort(sess, batch, n_runs):
+    rng = np.random.default_rng(123)
+    x = rng.random((batch, 540, 20), dtype=np.float32)
+    inp = sess.get_inputs()[0].name
+    for _ in range(max(5, n_runs // 10)):
+        sess.run(None, {inp: x})
+    times = []
+    for _ in range(n_runs):
+        t0 = time.perf_counter()
+        sess.run(None, {inp: x})
+        times.append(time.perf_counter() - t0)
+    return statistics.median(times) * 1e3 / batch  # ms/window
+
+
+def interleaved_latency(sessions, reps=3, runs_b1=100, runs_b64=30):
+    lat = {name: {"batch1_reps": [], "batch64_reps": []} for name in sessions}
+    for rep in range(reps):
+        for name, sess in sessions.items():
+            lat[name]["batch1_reps"].append(bench_ort(sess, 1, runs_b1))
+            lat[name]["batch64_reps"].append(bench_ort(sess, 64, runs_b64))
+            print(f"  rep {rep + 1}/{reps} {name}: "
+                  f"b1={lat[name]['batch1_reps'][-1]:.2f} "
+                  f"b64={lat[name]['batch64_reps'][-1]:.3f} ms/win", flush=True)
+    for name in lat:
+        lat[name]["batch1_ms_per_window_median"] = statistics.median(
+            lat[name]["batch1_reps"])
+        lat[name]["batch64_ms_per_window_median"] = statistics.median(
+            lat[name]["batch64_reps"])
+    return lat
+
+
+# ---------------------------------------------------------------------------
+
+def main():
+    import onnxruntime
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-dir", default=os.path.join(
+        os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
+        "wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
+    parser.add_argument("--subset", type=int, default=10000)
+    parser.add_argument("--calib-minmax", type=int, default=1000)
+    parser.add_argument("--calib-hist", type=int, default=512,
+                        help="calibration windows for Entropy/Percentile "
+                             "(histogram calibraters hold all intermediate "
+                             "activations in RAM)")
+    parser.add_argument("--skip-accuracy", action="store_true")
+    parser.add_argument("--methods", default="minmax,entropy,percentile",
+                        help="comma list of calibration methods to (re)run; "
+                             "results merge into existing onnx_static_ptq")
+    parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
+    args = parser.parse_args()
+
+    results = {
+        "env": {
+            "onnxruntime": onnxruntime.__version__,
+            "torch": torch.__version__,
+            "platform": platform.platform(),
+            "source_model": os.path.basename(FP32_ONNX),
+        },
+        "variants": {},
+    }
+
+    # ---- calibration data (TRAIN split only) -------------------------------
+    calib_mm = build_calibration_windows(args.data_dir, args.calib_minmax)
+    calib_hist = calib_mm[:args.calib_hist]
+
+    # ---- preprocess + quantize ---------------------------------------------
+    print("\n=== quant_pre_process (shape inference + graph optimization) ===")
+    src = preprocess_model()
+    results["env"]["preprocessed_model"] = {
+        "file": os.path.basename(src),
+        "size_mb": os.path.getsize(src) / 1e6,
+    }
+
+    matrix = [(m, s) for m in args.methods.split(",")
+              for s in ("all", "conv")]
+    for method, scope in matrix:
+        name = f"{method}_{scope}"
+        dst = os.path.join(RESULTS, f"retrained_int8_static_{name}.onnx")
+        calib = calib_mm if method == "minmax" else calib_hist
+        print(f"\n=== quantize_static: {name} "
+              f"({len(calib)} calib windows) ===", flush=True)
+        try:
+            results["variants"][name] = quantize_variant(
+                src, dst, method, scope, calib)
+            print(f"  {results['variants'][name]['size_mb']:.3f} MB")
+        except Exception as e:  # noqa: BLE001
+            results["variants"][name] = {"error": f"{type(e).__name__}: {e}"}
+            print(f"  FAILED: {e}")
+
+    # ---- fixture parity (sanity, batch 2) ----------------------------------
+    fixture = np.load(os.path.join(RESULTS, "parity_fixture.npz"))
+    fx, fy = fixture["input"], fixture["output"]
+    sessions = {}
+    for name, info in results["variants"].items():
+        if "error" in info:
+            continue
+        path = os.path.join(RESULTS, info["file"])
+        try:
+            sess = ort_session(path)
+            yq = sess.run(None, {sess.get_inputs()[0].name: fx})[0]
+            info["max_abs_diff_vs_fp32_fixture"] = float(np.abs(yq - fy).max())
+            sessions[name] = sess
+        except Exception as e:  # noqa: BLE001
+            info["run_error"] = f"{type(e).__name__}: {e}"
+    print("\nfixture max-abs-diff vs fp32:",
+          {n: round(results["variants"][n].get("max_abs_diff_vs_fp32_fixture",
+                                               float("nan")), 5)
+           for n in results["variants"]})
+
+    # ---- latency: 3 interleaved reps incl. fp32 + dynamic-int8 reference ----
+    print("\n=== latency (3 interleaved reps) ===")
+    lat_sessions = {"onnx_fp32": ort_session(FP32_ONNX),
+                    "onnx_int8_ort_dynamic": ort_session(DYN_INT8_ONNX)}
+    lat_sessions.update(sessions)
+    results["latency"] = {
+        "note": "3 interleaved repetitions per variant, median ms/window; "
+                "onnx_fp32 / onnx_int8_ort_dynamic are same-session references",
+        **interleaved_latency(lat_sessions),
+    }
+
+    # ---- accuracy on the standard 10k corruption-free test subset ----------
+    if not args.skip_accuracy:
+        loader, n_clean = build_test_subset(args.data_dir, args.subset)
+        results["accuracy_subset"] = {
+            "description": "seed-42 file-level 70/15/15 test split, corrupted "
+                           "windows excluded, seed-42 random subset (same as "
+                           "quantize_bench/eval_ort_accuracy)",
+            "subset_size": min(args.subset, n_clean) if args.subset else n_clean,
+        }
+        for name, sess in sessions.items():
+            print(f"\n=== accuracy: {name} ===")
+            results["variants"][name]["accuracy"] = evaluate_ort(
+                sess, loader, name)
+            print(json.dumps(results["variants"][name]["accuracy"], indent=2))
+
+    # ---- merge into edge_optimization.json ----------------------------------
+    merged = {}
+    if os.path.exists(args.out):
+        with open(args.out) as f:
+            merged = json.load(f)
+    prev = merged.get("onnx_static_ptq")
+    if prev:  # nested merge so partial --methods reruns don't clobber
+        prev["env"] = results["env"]
+        prev["variants"].update(results["variants"])
+        prev.setdefault("latency", {}).update(results["latency"])
+        if "accuracy_subset" in results:
+            prev["accuracy_subset"] = results["accuracy_subset"]
+    else:
+        merged["onnx_static_ptq"] = results
+    with open(args.out, "w") as f:
+        json.dump(merged, f, indent=2)
+    print(f"\nwrote {args.out}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,313 @@
+"""ADR-152 efficiency-sweep follow-up: edge pipeline for the TINY compact
+WiFlow-STD variant (56,290 params, results/tiny_best.pth, trained overnight
+2026-06-10/11 -- see RESULTS.md "Efficiency sweep").
+
+Headline question: what does the smallest deployable WiFlow-class model look
+like (KB + ms + PCK)? Reuses the onnx_bench.py / static_ptq_bench.py
+machinery on the tiny checkpoint:
+
+  1. Load tiny_best.pth with remote/sweep/model_compact.py
+     (depthwise TCN groups, input_pw_groups=4, conv [2,4,8,16], attn groups 2).
+  2. Export ONNX: dynamic batch, opset 17, TorchScript exporter (dynamo=False)
+     -- same recipe that worked for the full model; verified at batch 1/2/64.
+     One forced deviation: tiny's stride schedule [2,1,1,1] leaves final_width
+     16, and the TorchScript exporter cannot export AdaptiveAvgPool2d((15,1))
+     when 15 is not a factor of the input height (the full model never hit
+     this -- its width was exactly 15). The adaptive pool over a fixed-size
+     feature map is a fixed linear map, so the export wrapper replaces it with
+     an exact matmul equivalent (PyTorch adaptive-pool bin semantics:
+     bin i averages rows floor(i*H/K)..ceil((i+1)*H/K)); the W axis (20->1,
+     a factor) becomes mean(-1). Exactness is proven by the parity check
+     below, which compares against the ORIGINAL torch model with the real
+     AdaptiveAvgPool2d.
+  3. Torch-vs-ORT parity on the stored fixture input
+     (results/parity_fixture.npz, batch 2, seed 42 -- same 540x20 input layout;
+     reference output recomputed with the tiny torch model). PASS < 1e-4.
+  4. Static QDQ conv-only int8 (quant_pre_process + quantize_static,
+     per-channel QInt8 weights+activations, Percentile(99.99) calibration on
+     512 corruption-free TRAIN-split windows -- the winning recipe and
+     calibration count from static_ptq_bench.py. 512, not "about 500":
+     ORT 1.26's histogram collector np.asarray()'s the per-batch maxima, so
+     the calibration count must be a multiple of the batch size 64 or the
+     ragged last batch crashes it).
+  5. Disk size + CPU latency b1/b64 (3 interleaved reps, median ms/window)
+     for tiny fp32 + tiny int8, with the full-model ONNX fp32 + static-int8
+     sessions interleaved as same-session references.
+  6. Accuracy (PCK@20/50 + MPJPE) on the identical 10k-window seed-42
+     corruption-free test subset for tiny fp32 + tiny int8.
+
+Usage:
+  PYTHONUTF8=1 .venv/Scripts/python.exe tiny_edge_bench.py \
+      [--data-dir <preprocessed_csi_data>] [--subset 10000] [--calib 512]
+  (--calib must be a multiple of 64; see step 4 above)
+
+Writes/merges into results/edge_optimization.json under key "tiny_variant".
+"""
+
+import argparse
+import json
+import os
+import platform
+import sys
+import time
+
+import numpy as np
+import torch
+
+HERE = os.path.dirname(os.path.abspath(__file__))
+RESULTS = os.path.join(HERE, "results")
+sys.path.insert(0, HERE)
+sys.path.insert(0, os.path.join(HERE, "remote", "sweep"))
+
+# quantize_bench sets up upstream imports + the np.load mmap patch
+from quantize_bench import build_test_subset  # noqa: E402
+from eval_ort_accuracy import evaluate_ort  # noqa: E402
+from static_ptq_bench import (  # noqa: E402
+    build_calibration_windows,
+    interleaved_latency,
+    make_reader,
+    ort_session,
+)
+from model_compact import CompactWiFlowPoseModel, describe  # noqa: E402
+
+TINY_CKPT = os.path.join(RESULTS, "tiny_best.pth")
+TINY_FP32_ONNX = os.path.join(RESULTS, "tiny_fp32_dynamic.onnx")
+TINY_PREPROC_ONNX = os.path.join(RESULTS, "tiny_fp32_preproc.onnx")
+TINY_INT8_ONNX = os.path.join(RESULTS, "tiny_int8_static_percentile_conv.onnx")
+FULL_FP32_ONNX = os.path.join(RESULTS, "retrained_fp32_dynamic.onnx")
+FULL_INT8_ONNX = os.path.join(RESULTS, "retrained_int8_static_percentile_conv.onnx")
+
+# Exact tiny config from remote/sweep/run_sweep.py VARIANTS (measured 56,290
+# params, clean-test PCK@20 94.11% -- results/efficiency_sweep.jsonl).
+TINY = dict(tcn=[68, 56, 44, 32], conv=[2, 4, 8, 16], attn_groups=2,
+            groups_mode="depthwise", input_pw_groups=4)
+
+
+def load_tiny_model():
+    model = CompactWiFlowPoseModel(
+        tcn_channels=TINY["tcn"], conv_channels=TINY["conv"],
+        attn_groups=TINY["attn_groups"], groups_mode=TINY["groups_mode"],
+        input_pw_groups=TINY["input_pw_groups"], dropout=0.5)
+    state = torch.load(TINY_CKPT, map_location="cpu", weights_only=True)
+    model.load_state_dict(state, strict=True)
+    model.eval()
+    return model
+
+
+def adaptive_pool_matrix(h_in, h_out):
+    """Exact AdaptiveAvgPool1d as a (h_out, h_in) averaging matrix, using
+    PyTorch's bin rule: bin i covers rows floor(i*h_in/h_out) ..
+    ceil((i+1)*h_in/h_out)."""
+    w = torch.zeros(h_out, h_in)
+    for i in range(h_out):
+        s = (i * h_in) // h_out
+        e = -((-(i + 1) * h_in) // h_out)  # ceil division
+        w[i, s:e] = 1.0 / (e - s)
+    return w
+
+
+class ExportWrapper(torch.nn.Module):
+    """CompactWiFlowPoseModel forward with the AdaptiveAvgPool2d((K,1))
+    replaced by an exact fixed linear map (mean over the factor W axis, then
+    a constant averaging matmul over the non-factor H axis) so the
+    TorchScript ONNX exporter accepts it. Bit-equivalent up to float
+    round-off; proven by the parity check against the original model."""
+
+    def __init__(self, m, num_keypoints=15):
+        super().__init__()
+        self.m = m
+        self.register_buffer(
+            "pool_w_t", adaptive_pool_matrix(m.final_width, num_keypoints).t())
+
+    def forward(self, x):
+        m = self.m
+        x = m.tcn(x)
+        x = x.transpose(1, 2).unsqueeze(1)
+        x = m.up(x)
+        for block in m.residual_blocks:
+            x = block(x)
+        x = x.permute(0, 1, 3, 2)
+        x = m.attention(x)
+        x = m.decoder(x)                  # [B, 2, H=final_width, T=20]
+        x = x.mean(-1)                    # W-axis pool (20 -> 1, a factor)
+        x = x.matmul(self.pool_w_t)       # exact adaptive H pool: [B, 2, K]
+        return x.transpose(1, 2)          # [B, K, 2]
+
+
+def export_onnx(model):
+    """Dynamic-batch TorchScript export (the recipe that worked for the full
+    model in onnx_bench.py), verified at batch 1/2/64. Uses ExportWrapper
+    (see docstring) because final_width 16 is not a multiple of 15."""
+    wrapper = ExportWrapper(model).eval()
+    x = torch.rand(2, 540, 20)
+    with torch.no_grad():
+        torch.onnx.export(
+            wrapper, (x,), TINY_FP32_ONNX, opset_version=17,
+            input_names=["input"], output_names=["output"], dynamo=False,
+            dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}})
+    sess = ort_session(TINY_FP32_ONNX)
+    inp = sess.get_inputs()[0].name
+    for b in (1, 2, 64):
+        y = sess.run(None, {inp: np.zeros((b, 540, 20), dtype=np.float32)})[0]
+        assert y.shape == (b, 15, 2), y.shape
+    return {
+        "mode": "dynamic-batch", "exporter": "torchscript", "opset": 17,
+        "file": os.path.basename(TINY_FP32_ONNX),
+        "size_bytes": os.path.getsize(TINY_FP32_ONNX),
+        "size_mb": os.path.getsize(TINY_FP32_ONNX) / 1e6,
+        "verified_batches": [1, 2, 64],
+        "note": "AdaptiveAvgPool2d((15,1)) replaced at export by an exact "
+                "mean(-1) + constant averaging matmul (final_width 16 is not "
+                "a multiple of 15, which the TorchScript exporter rejects); "
+                "exactness proven by the parity check vs the original torch "
+                "model",
+    }
+
+
+def quantize_tiny(calib_windows):
+    """quant_pre_process + static QDQ conv-only Percentile(99.99) int8 --
+    the winning recipe from static_ptq_bench.py."""
+    from onnxruntime.quantization import (CalibrationMethod, QuantFormat,
+                                          QuantType, quantize_static)
+    from onnxruntime.quantization.shape_inference import quant_pre_process
+
+    quant_pre_process(TINY_FP32_ONNX, TINY_PREPROC_ONNX)
+    t0 = time.time()
+    quantize_static(
+        TINY_PREPROC_ONNX, TINY_INT8_ONNX, make_reader(calib_windows),
+        quant_format=QuantFormat.QDQ,
+        op_types_to_quantize=["Conv"],
+        per_channel=True,
+        activation_type=QuantType.QInt8,
+        weight_type=QuantType.QInt8,
+        calibrate_method=CalibrationMethod.Percentile,
+        extra_options={"CalibPercentile": 99.99},
+    )
+    return {
+        "file": os.path.basename(TINY_INT8_ONNX),
+        "size_bytes": os.path.getsize(TINY_INT8_ONNX),
+        "size_mb": os.path.getsize(TINY_INT8_ONNX) / 1e6,
+        "calibration": {"method": "percentile", "percentile": 99.99,
+                        "windows": int(len(calib_windows)),
+                        "scope": "conv-only TRAIN-split corruption-free",
+                        "seconds": time.time() - t0},
+        "per_channel": True,
+        "activation_type": "QInt8",
+        "weight_type": "QInt8",
+    }
+
+
+def main():
+    import onnxruntime
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-dir", default=os.path.join(
+        os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
+        "wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
+    parser.add_argument("--subset", type=int, default=10000)
+    parser.add_argument("--calib", type=int, default=512,
+                        help="calibration windows; must be a multiple of the "
+                             "64-window calibration batch (ORT histogram "
+                             "collector rejects ragged batches)")
+    parser.add_argument("--skip-accuracy", action="store_true")
+    parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
+    args = parser.parse_args()
+
+    if args.calib % 64 != 0:
+        parser.error(
+            f"--calib must be a multiple of 64 (got {args.calib}): ORT 1.26's "
+            f"histogram calibration collector np.asarray()'s the per-batch "
+            f"maxima and crashes on a ragged final batch (calibration batch "
+            f"size is 64)")
+
+    model = load_tiny_model()
+    info = describe(model)
+    print(f"tiny model: {info['params']:,} params, tcn_groups={info['tcn_groups_per_block']}, "
+          f"strides={info['conv_strides']}, final_width={info['final_width']}")
+    assert info["params"] == 56290, info["params"]
+
+    results = {
+        "env": {
+            "torch": torch.__version__,
+            "onnxruntime": onnxruntime.__version__,
+            "platform": platform.platform(),
+            "num_threads": torch.get_num_threads(),
+            "checkpoint": os.path.relpath(TINY_CKPT, HERE),
+            "checkpoint_size_bytes": os.path.getsize(TINY_CKPT),
+            "params": info["params"],
+            "variant_config": TINY,
+        },
+    }
+
+    # ---- export + parity ----------------------------------------------------
+    print("\n=== ONNX export (dynamic batch, opset 17, torchscript) ===")
+    results["export"] = export_onnx(model)
+    print(f"  {results['export']['size_mb']:.3f} MB, batches {results['export']['verified_batches']} OK")
+
+    fixture = np.load(os.path.join(RESULTS, "parity_fixture.npz"))
+    fx = fixture["input"]  # (2, 540, 20), seed 42 -- same input layout as full model
+    sess_fp32 = ort_session(TINY_FP32_ONNX)
+    y_ort = sess_fp32.run(None, {sess_fp32.get_inputs()[0].name: fx})[0]
+    with torch.no_grad():
+        y_torch = model(torch.from_numpy(fx)).numpy()
+    results["parity"] = {
+        "fixture": "results/parity_fixture.npz input (batch 2, seed 42); "
+                   "reference output recomputed with the tiny torch model",
+        "max_abs_diff_vs_torch": float(np.abs(y_ort - y_torch).max()),
+        "pass_lt_1e-4": bool(np.abs(y_ort - y_torch).max() < 1e-4),
+    }
+    print("parity:", json.dumps(results["parity"], indent=2))
+    assert results["parity"]["pass_lt_1e-4"], "torch-vs-ORT parity FAILED"
+
+    # ---- static PTQ int8 ------------------------------------------------------
+    print(f"\n=== static QDQ int8 (Percentile conv-only, {args.calib} calib windows) ===")
+    calib = build_calibration_windows(args.data_dir, args.calib)
+    results["int8_static_percentile_conv"] = quantize_tiny(calib)
+    print(f"  {results['int8_static_percentile_conv']['size_mb']:.3f} MB")
+    sess_int8 = ort_session(TINY_INT8_ONNX)
+    yq = sess_int8.run(None, {sess_int8.get_inputs()[0].name: fx})[0]
+    results["int8_static_percentile_conv"]["max_abs_diff_vs_fp32_fixture"] = float(
+        np.abs(yq - y_torch).max())
+
+    # ---- latency (3 interleaved reps, full-model sessions as references) -----
+    print("\n=== latency (3 interleaved reps) ===")
+    lat_sessions = {
+        "tiny_onnx_fp32": sess_fp32,
+        "tiny_onnx_int8_static_percentile_conv": sess_int8,
+        "full_onnx_fp32_reference": ort_session(FULL_FP32_ONNX),
+        "full_onnx_int8_static_percentile_conv_reference": ort_session(FULL_INT8_ONNX),
+    }
+    results["latency"] = {
+        "note": "3 interleaved repetitions per variant, median ms/window; "
+                "full-model sessions are same-session references",
+        **interleaved_latency(lat_sessions),
+    }
+
+    # ---- accuracy on the standard 10k corruption-free test subset ------------
+    if not args.skip_accuracy:
+        loader, n_clean = build_test_subset(args.data_dir, args.subset)
+        results["accuracy_subset"] = {
+            "description": "seed-42 file-level 70/15/15 test split, corrupted "
+                           "windows excluded, seed-42 random subset (same as "
+                           "quantize_bench/eval_ort_accuracy/static_ptq_bench)",
+            "subset_size": min(args.subset, n_clean) if args.subset else n_clean,
+        }
+        results["accuracy"] = {}
+        for name, sess in (("tiny_onnx_fp32", sess_fp32),
+                           ("tiny_onnx_int8_static_percentile_conv", sess_int8)):
+            print(f"\n=== accuracy: {name} ===")
+            results["accuracy"][name] = evaluate_ort(sess, loader, name)
+            print(json.dumps(results["accuracy"][name], indent=2))
+
+    # ---- merge into edge_optimization.json -----------------------------------
+    merged = {}
+    if os.path.exists(args.out):
+        with open(args.out) as f:
+            merged = json.load(f)
+    merged["tiny_variant"] = results
+    with open(args.out, "w") as f:
+        json.dump(merged, f, indent=2)
+    print(f"\nwrote {args.out}")
+
+
+if __name__ == "__main__":
+    main()
@@ -14,6 +14,13 @@ COPY v2/crates/ ./crates/
 # Copy vendored RuVector crates
 COPY vendor/ruvector/ /build/vendor/ruvector/

+# Copy vendored RuField submodule — the `wifi-densepose-rufield` bridge crate
+# (ADR-262) path-deps `../../../vendor/rufield/crates/*`, which from the Docker
+# build layout (v2/ collapsed into /build) resolves to /vendor/rufield. Copy the
+# whole tree so the rufield workspace Cargo.toml (workspace-dep inheritance) and
+# the four bridged crates (rufield-core/-provenance/-privacy/-fusion) all resolve.
+COPY vendor/rufield/ /vendor/rufield/
+
 # Build release binaries:
 #   - sensing-server with `mqtt` feature so the HA-DISCO MQTT publisher
 #     (ADR-115) is wired in (auto-discovery topics flow to Home Assistant)
@@ -24,10 +24,13 @@ services:
    environment:
      - RUST_LOG=info
      # CSI_SOURCE controls the data source for the sensing server.
-      # Options: auto (default) — probe for ESP32 UDP then fall back to simulation
+      # Options: auto (default) — probe for ESP32 UDP then host WiFi; **fail
+      #                           hard with exit 78 if neither is detected**.
+      #                           Synthetic data is no longer a silent fallback
+      #                           (issue #937 fix) — operators must opt in.
      #          esp32          — receive real CSI frames from an ESP32 on UDP port 5005
      #          wifi           — use host Wi-Fi RSSI/scan data (Windows netsh)
-      #          simulated      — generate synthetic CSI data (no hardware required)
+      #          simulated      — explicitly generate synthetic CSI for demo mode
      - CSI_SOURCE=${CSI_SOURCE:-auto}
      # MODELS_DIR controls where the server scans for .rvf model files.
      # Mount a host directory and set this to make models visible:
@@ -11,10 +11,65 @@
 #      docker run ruvnet/wifi-densepose:latest --model /app/models/my.rvf
 #
 # Environment variables:
-#   CSI_SOURCE   — data source: auto (default), esp32, wifi, simulated
+#   CSI_SOURCE   — data source. Valid values:
+#                    auto       — try ESP32 then Windows WiFi, **fail-loud if no
+#                                 real hardware is detected** (issue #937 fix:
+#                                 the server no longer silently falls back to
+#                                 synthetic data — that's now opt-in only).
+#                    esp32      — listen for UDP CSI on the configured port.
+#                    wifi       — Windows-native WiFi capture.
+#                    simulated  — explicit demo mode with synthetic CSI.
+#                  Default is `auto`. Set CSI_SOURCE=simulated when you want
+#                  fake data tagged as such; never set it implicitly.
 #   MODELS_DIR   — directory to scan for .rvf model files (default: data/models)
 set -e

+# ── Issue #864: fail-closed on default posture ───────────────────────────────
+# The pre-fix default was: empty RUVIEW_API_TOKEN (auth off) + --bind-addr
+# 0.0.0.0 + docker-compose publishing :3000/:3001/:5005 → an unauthenticated
+# attacker on any reachable network segment could read /api/v1/sensing/latest
+# and the /ws/sensing live stream. That posture is unsafe on guest WiFi,
+# untrusted LANs, accidentally-port-forwarded hosts, or any reverse-proxied
+# deployment. Refuse to start with this combination.
+#
+# Escape hatches (operator must opt in explicitly):
+#   * Set RUVIEW_API_TOKEN to a strong secret → auth enabled on /api/v1/*.
+#   * Set RUVIEW_ALLOW_UNAUTHENTICATED=1 → preserves the pre-fix behaviour;
+#     only safe on an isolated trust boundary.
+#   * Set RUVIEW_BIND_ADDR to a loopback / private interface → unauth is fine
+#     when the socket isn't reachable. The auto-bind nudges toward 127.0.0.1.
+#
+# This check runs only for the default sensing-server path (no args + flag-only
+# args). The `cog-ha-matter` / `homecore` routes below are excluded because
+# they own their own auth lifecycle.
+case "${1:-}" in
+    cog-ha-matter|ha-matter|homecore|homecore-server) ;;
+    *)
+        if [ -z "${RUVIEW_API_TOKEN:-}" ] && [ "${RUVIEW_ALLOW_UNAUTHENTICATED:-}" != "1" ]; then
+            # If the operator hasn't overridden the bind, refuse outright on
+            # the default 0.0.0.0. If they've nailed it to loopback (or a
+            # specific private address they trust), let it run.
+            __bind_default="${RUVIEW_BIND_ADDR:-0.0.0.0}"
+            case "$__bind_default" in
+                127.*|localhost|::1)
+                    : ;;  # loopback bind is safe even without a token
+                *)
+                    echo "[entrypoint] ERROR: refusing to start sensing-server with default" >&2
+                    echo "[entrypoint]        posture: RUVIEW_API_TOKEN is unset AND bind is" >&2
+                    echo "[entrypoint]        ${__bind_default}. /ws/sensing streams live sensing" >&2
+                    echo "[entrypoint]        frames; that data would be readable by anyone who" >&2
+                    echo "[entrypoint]        can reach this host. Pick one:" >&2
+                    echo "[entrypoint]          docker run -e RUVIEW_API_TOKEN=\$(openssl rand -hex 32) ..." >&2
+                    echo "[entrypoint]          docker run -e RUVIEW_BIND_ADDR=127.0.0.1 ..." >&2
+                    echo "[entrypoint]          docker run -e RUVIEW_ALLOW_UNAUTHENTICATED=1 ...   # only on trusted network" >&2
+                    echo "[entrypoint]        See https://github.com/ruvnet/RuView/issues/864" >&2
+                    exit 64
+                    ;;
+            esac
+        fi
+        ;;
+esac
+
 # Route to cog-ha-matter (ADR-116) when invoked as:
 #   docker run <image> cog-ha-matter [--flags]
 # or via the short alias `ha-matter`. Strips the keyword and execs the
@@ -48,7 +103,7 @@ if [ "${1#-}" != "$1" ] || [ -z "$1" ]; then
        --ui-path /app/ui \
        --http-port 3000 \
        --ws-port 3001 \
-        --bind-addr 0.0.0.0 \
+        --bind-addr "${RUVIEW_BIND_ADDR:-0.0.0.0}" \
        "$@"
 fi

@@ -57,7 +57,7 @@ This witness separates what was **empirically observed on real silicon today** f

 | # | Claim | Why it's not verified |
 |---|---|---|
-| **B1** | "Wi-Fi 6 HE-LTF: 242 subcarriers per HE20 frame" | The only AP in range (`ruv.net`) is 11n-only. Every captured frame is 128 bytes = 64 subcarriers (HT-LTF, `ppdu_type=0`). No HE-SU/HE-MU/HE-TB observed. Even if an 11ax AP were available, **whether ESP-IDF v5.4's CSI callback exposes HE-LTF subcarriers via `wifi_csi_info_t.buf` is an open question** — the public API was designed for HT-LTF, and the driver may quietly downconvert. **Validate by capturing CSI against an 11ax AP and comparing `info->len` between HT and HE frames.** |
+| **B1** | "Wi-Fi 6 HE-LTF: 242 subcarriers per HE20 frame" | The only AP in range (`ruv.net`) is 11n-only. Every captured frame is 128 bytes = 64 subcarriers (HT-LTF, `ppdu_type=0`). No HE-SU/HE-MU/HE-TB observed. Even if an 11ax AP were available, **whether ESP-IDF v5.4's CSI callback exposes HE-LTF subcarriers via `wifi_csi_info_t.buf` is an open question** — the public API was designed for HT-LTF, and the driver may quietly downconvert. **Validate by capturing CSI against an 11ax AP and comparing `info->len` between HT and HE frames.**<br><br>**RESOLVED WITH MEASUREMENT (2026-06-11, external — issue #1005, production deployment by @stuinfla):** the open question is answered in both directions. **IDF v5.4's driver blob downconverts** (148 B / 64-subcarrier HT frames, PPDU byte 0x00, on a confirmed-HE link); **IDF v5.5.2 delivers true HE-LTF** — 532 B frames = 256 bins (242 active HE20 tones), PPDU byte 0x01 (HE-SU), ~90% of frames, same board/AP/link. Setup: XIAO ESP32-C6 → hostapd on Intel AX210, 2.4 GHz ch 6, `ieee80211ax=1`. No firmware change required (`acquire_csi_su=1` was already set); the gate was purely the IDF driver version. Three C6 nodes ran this mode simultaneously with ADR-110 ESP-NOW sync. Requires the issue-#1005 version-guard fix in `c6_sync_espnow.c` to build on v5.5.x. |<br><br>**REPLICATED IN-HOUSE (2026-06-11):** same source + fix, fresh IDF v5.5.2 toolchain, original COM12 board (`20:6e:f1:17:00:84`), AP `ruv.net` (11ax 2.4 GHz): **84% of 1,525 captured frames at 532 B / PPDU 0x01 (HE-SU)**, HT minority 148 B / 0x00. Evidence grade: MEASURED (two independent rigs). |
 | **B2** | "TWT-bounded deterministic CSI cadence (10 ms wake)" | No 11ax AP in range. The TWT setup *call* was exercised live and the graceful fallback path is now correct (A9), but the agreement itself was never accepted. **Validate by associating with an 11ax AP that has TWT Responder=1, then capturing the timestamped CSI cadence vs the wall clock.** |
 | **B3** | "±100 µs cross-node alignment over 802.15.4" | 3 boards initialized their radios with correct EUIs (A4/A5), but **none stepped down from candidate-leader to follower** during repeated 35-second multi-board captures. <br><br>**Coex hypothesis REJECTED**: rebuilt + reflashed all 3 boards with `CONFIG_C6_TIMESYNC_CHANNEL=26` (2480 MHz, non-overlapping with WiFi ch 5 at 2432 MHz). Result identical: 3× candidate, 0× "stepping down". So 2.4 GHz radio coex was NOT the cause. <br><br>**Current leading hypothesis**: OpenThread (CONFIG_OPENTHREAD_ENABLED=y) owns the 802.15.4 radio when its stack is initialized — our weak-symbol overrides of `esp_ieee802154_receive_done` / `_transmit_done` may never be called because OpenThread registers strong handlers. Validation in progress: rebuilding with `CONFIG_OPENTHREAD_ENABLED=n` (raw 802.15.4 only, our beacon protocol is private — no need for the Thread stack). If leader election fires under raw-15.4-only, hypothesis confirmed. <br><br>If raw-only also fails, next move is to dump the actual PHY frame bytes via the IEEE 802.15.4 sniffer mode on a 4th board and diagnose at the frame level. |
 | **B4** | "~5 µA hibernation for battery seed nodes" | No INA / Joulescope current measurement available on this bench. The shipped code uses `esp_deep_sleep_enable_gpio_wakeup` (ext1 path, ESP-IDF default ~10 µA), not a true LP-core polling program. The 5 µA number is the C6 datasheet figure for ULP-level hibernation, not a measured value. **Validate by hooking an INA219/INA226 between the dev board's 3V3 rail and the regulator output, then averaging current over a 60-second cycle with the LP-core armed.** |
@@ -1081,6 +1081,23 @@ The `wifi-densepose-vitals` crate (ESP32 CSI-grade vital signs) has not yet been
 - SONA-based environment adaptation
 - VitalSignStore with tiered temporal compression

+## Implementation Notes
+
+### 2026-06 — ESP32 edge vitals: person-count over-count + presence flicker (#998, #996)
+
+Two robustness bugs were fixed in the on-device edge path (`firmware/esp32-csi-node/main/edge_processing.c`, the ADR-039 packet `0xC5110002`). These touch the *boolean/count emission logic*, not the underlying CSI signal-processing math, and do **not** constitute a validated-accuracy claim — true occupancy-count and presence accuracy vs labelled ground truth remain hardware/data-gated (COM9 ESP32-S3 + labelled capture).
+
+- **#998 `n_persons` over-count (reported 4 for one person).** `update_multi_person_vitals()` divided the top-K subcarriers into `top_k_count/2` groups and marked *every* group `active`, so one body's multipath always read the full `EDGE_MAX_PERSONS`. Added an energy gate (`EDGE_PERSON_MIN_ENERGY_RATIO`), spatial dedup (`EDGE_PERSON_MIN_SC_SEP`), and a persistence debounce (`EDGE_PERSON_PERSIST_FRAMES`) via two pure functions `count_distinct_persons()` / `person_count_debounce()`.
+- **#996 presence flag flicker at ~50 cm.** Single-threshold compare on a noisy `presence_score` chattered at the boundary. Replaced with a Schmitt trigger + clear-debounce (`presence_flag_update()`, constants `EDGE_PRESENCE_HYST_RATIO` / `EDGE_PRESENCE_CLEAR_FRAMES`); `presence_score` is unchanged and still emitted for consumer-side thresholding.
+
+Both are pinned by host-buildable C99 tests in `firmware/esp32-csi-node/test/test_vitals_count_presence.c` (`make run_vitals`). The exact thresholds are documented constants pending on-device calibration against ground truth.
+
+### 2026-06 — Rust `wifi-densepose-vitals`: IIR filter NaN/inf self-heal (ADR-158 §A1)
+
+A correctness/safety review of the Rust extraction crate found a real bug parallel to the firmware robustness class above. The 2nd-order resonator `bandpass_filter` in both `breathing.rs` and `heartrate.rs` latches each output `y[n]` into its filter state (`y1`/`y2`). A single non-finite amplitude residual from a corrupt CSI frame produced a NaN `output` that was written into the state; the existing `extract()` `is_finite()` guard dropped that one sample from the history buffer **but never sanitized the poisoned filter state**, so every later output stayed NaN, was rejected too, and the sliding-window history never refilled — breathing **and** heart-rate extraction went silently dead (returning `None` forever) until `reset()`. On the alert path this is a safety-relevant denial of service (one bad frame stops vitals monitoring with no error surfaced).
+
+Fix: when `bandpass_filter` computes a non-finite `output`, it resets the IIR state to default and returns `0.0`, so the resonator self-heals on the next clean frame (the `0.0` is still dropped by the caller's finite-check, so no spurious sample enters history). Same shape as the calibration NaN bug (ADR-154 §3) — the prior hardening guarded the *history boundary* but not the *filter-state boundary*. Pinned by `breathing::tests::nan_frame_does_not_permanently_poison_filter`, `breathing::tests::inf_mid_stream_does_not_freeze_history`, and `heartrate::tests::nan_frame_does_not_permanently_poison_filter` (all FAIL pre-fix, verified by reverting). The review also de-magicked the HR physiological plausibility band into named `HR_PLAUSIBLE_MIN_BPM`/`HR_PLAUSIBLE_MAX_BPM` consts (value-identical 40/180 BPM) and added a fabricated-vital negative (`pure_noise_is_never_reported_valid` — broadband noise never yields a clinically `Valid` HR; the extractor honestly returns low-confidence `Unreliable`). Clean dimensions confirmed with evidence: flat/silent input → `None`; pure noise → low-confidence `Unreliable`, never `Valid`; harmonic-rich breathing with no cardiac component → low-confidence, not a confident false HR; out-of-band BPM rejected by the plausibility clamp.
+
 ## References

 - Ramsauer et al. (2020). "Hopfield Networks is All You Need." ICLR 2021. (ModernHopfield formulation)
@@ -5,7 +5,7 @@
 | Status | Proposed |
 | Date | 2026-03-06 |
 | Deciders | ruv |
-| Depends on | ADR-012 (ESP32 CSI Mesh), ADR-039 (Edge Intelligence), ADR-040 (WASM Programmable Sensing), ADR-044 (Provisioning Enhancements), ADR-050 (Security Hardening), ADR-051 (Server Decomposition) |
+| Depends on | ADR-012 (ESP32 CSI Mesh), ADR-039 (Edge Intelligence), ADR-040 (WASM Programmable Sensing), ADR-044 (Provisioning Enhancements), ADR-166 (Security Hardening, renumbered from ADR-050), ADR-051 (Server Decomposition) |
 | Issue | [#177](https://github.com/ruvnet/RuView/issues/177) |

 ## Context
@@ -211,7 +211,7 @@ pub struct FlashProgress {
 // commands/ota.rs

 /// Push firmware to a node via HTTP OTA (port 8032).
-/// Includes PSK authentication per ADR-050.
+/// Includes PSK authentication per ADR-166.
 #[tauri::command]
 async fn ota_update(
    node_ip: String,
@@ -801,7 +801,7 @@ Total estimated effort: ~11 weeks for a single developer.
 - ADR-039: ESP32 Edge Intelligence
 - ADR-040: WASM Programmable Sensing
 - ADR-044: Provisioning Tool Enhancements
- ADR-050: Quality Engineering — Security Hardening
+- ADR-166: Quality Engineering — Security Hardening (renumbered from ADR-050)
 - ADR-051: Sensing Server Decomposition
 - `firmware/esp32-csi-node/` — ESP32 firmware source
 - `firmware/esp32-csi-node/provision.py` — Current provisioning script
@@ -1,6 +1,6 @@
 # ADR-080: QE Analysis Remediation Plan

- **Status:** Proposed
+- **Status:** Proposed — P0 security findings #1–#3 **RESOLVED** on the shipped Rust sensing-server boundary (2026-06-13; closes ADR-164 G11)
 - **Date:** 2026-04-06
 - **Source:** [QE Analysis Gist (2026-04-05)](https://gist.github.com/proffesor-for-testing/a6b84d7a4e26b7bbef0cf12f932925b7)
 - **Full Reports:** [proffesor-for-testing/RuView `qe-reports` branch](https://github.com/proffesor-for-testing/RuView/tree/qe-reports/docs/qe-reports)
@@ -13,25 +13,38 @@ An 8-agent QE swarm analyzed ~305K lines across Rust, Python, C firmware, and Ty

 Address the 15 prioritized issues from the QE analysis in three waves: P0 (immediate), P1 (this sprint), P2 (this quarter).

+## Security P0 closure note (2026-06-13) — Rust sensing-server boundary
+
+The three P0 security findings below were logged against the **Python v1** API
+(`archive/v1/src/…`). ADR-164 G11 re-scoped them to the *shipped* boundary:
+`wifi-densepose-sensing-server` (Rust). They were verified against the current
+Rust crate and closed on branch `fix/adr-080-sensing-server-security`. Each fix
+(or already-fixed finding) is pinned by a test that fails on the old behavior.
+**The Python v1 paths remain as-is** — v1 is archived and not the shipped
+surface; this closure governs the live Rust server only.
+
 ## P0 — Fix Immediately

-### 1. Rate Limiter Bypass (Security HIGH)
+### 1. Rate Limiter Bypass / XFF spoofing (Security HIGH) — **RESOLVED (verified absent on Rust boundary)**

- **Location:** `archive/v1/src/middleware/rate_limit.py:200-206`
+- **Original location (v1):** `archive/v1/src/middleware/rate_limit.py:200-206`
 - **Problem:** Trusts `X-Forwarded-For` without validation. Any client bypasses rate limits via header spoofing.
- **Fix:** Validate forwarded headers against trusted proxy list, or use connection IP directly.
+- **Rust verification (2026-06-13):** The Rust sensing-server has **no XFF-trusting control to bypass** — there is no IP-based rate-limiter and no IP-allowlist, and neither security middleware reads a forwarded header. `bearer_auth.rs` authenticates on the token alone (`require_bearer` inspects only the `AUTHORIZATION` header); `host_validation.rs` decides on the `Host` header only. A repo-wide grep for `x-forwarded-for|forwarded|peer_addr|client_ip|real-ip` over `wifi-densepose-sensing-server` returns nothing. The only "rate limiter" is the MQTT *sample-rate* gate (`mqtt/state.rs`), a per-entity publish throttle with no IP/header input.
+- **Resolution:** No code change needed (no vulnerable surface). Regression tests pin the immunity: `bearer_auth::tests::xff_header_never_affects_auth_decision` (spoofed XFF never flips a 401↔200 decision) and `host_validation::tests::forwarded_headers_never_bypass_host_allowlist` (spoofed `X-Forwarded-Host: localhost` never lets a foreign `Host: evil.com` past the allowlist). Residual: if an IP-based control is ever added, it must derive the peer from the socket (`ConnectInfo<SocketAddr>`) and only honor XFF from an explicit `--trusted-proxy` CIDR — captured as guidance in the test docstrings.

-### 2. Exception Details Leaked in Responses (Security HIGH)
+### 2. Exception Details Leaked in Responses (Security HIGH, CWE-209) — **RESOLVED**

- **Location:** `archive/v1/src/api/routers/pose.py:140`, `stream.py:297`, +5 endpoints
- **Problem:** Stack traces visible regardless of environment.
- **Fix:** Wrap with generic error responses in production; log details server-side only.
+- **Original location (v1):** `archive/v1/src/api/routers/pose.py:140`, `stream.py:297`, +5 endpoints
+- **Problem:** Internal error/stack-trace detail serialized into client responses.
+- **Rust finding (2026-06-13):** Six handlers in `wifi-densepose-sensing-server/src/main.rs` serialized the internal error `Display` into the JSON body: `edge_registry_endpoint` returned a panicked `spawn_blocking` `JoinError` (`"task … panicked"`) in a `500` and the raw upstream error in a `503`; `delete_model`/`delete_recording`/`start_recording` returned `std::io::Error` strings (OS detail / path); `calibration_start`/`calibration_stop` returned the `FieldModel` error chain.
+- **Fix:** New `src/error_response.rs` module — `internal_error` / `internal_error_json` / `upstream_unavailable` log the full detail **server-side only** (tagged with a correlation id) and return a generic body (`{"error":"internal_error","correlation_id":…}`) with no `panicked`, no file paths, no Debug chain. All six call-sites rewired. Pinned by `error_response::tests::internal_error_body_does_not_leak_detail` (leak-substring guard, verified to fail on the reverted old body) + 4 sibling tests.

-### 3. WebSocket JWT in URL (Security HIGH, CWE-598)
+### 3. WebSocket JWT in URL (Security HIGH, CWE-598) — **RESOLVED (verified absent on Rust boundary)**

- **Location:** `archive/v1/src/api/routers/stream.py:74`, `archive/v1/src/middleware/auth.py:243`
+- **Original location (v1):** `archive/v1/src/api/routers/stream.py:74`, `archive/v1/src/middleware/auth.py:243`
 - **Problem:** Tokens in query strings visible in logs/proxies/browser history.
- **Fix:** Use WebSocket subprotocol or first-message auth pattern.
+- **Rust verification (2026-06-13):** The Rust sensing-server never reads a token from the URL. `require_bearer` (`bearer_auth.rs`) inspects only the `Authorization` header; the WebSocket handlers (`ws_sensing_handler`/`ws_introspection_handler`/`ws_pose_handler`) take a bare `WebSocketUpgrade` with no `Query` extractor; the single `Query` in the crate (`EdgeRegistryParams`) is a non-secret `refresh` flag.
+- **Resolution:** No code change needed (no query-token path exists). Regression test `bearer_auth::tests::query_string_token_is_never_accepted` proves `?token=`/`?access_token=` in the URL never authenticates (stays `401`) while the same token in the header succeeds (`200`) — verified to fail if a query-token path is re-introduced.

 ### 4. Rust Tests Not in CI

@@ -259,14 +259,75 @@ Validation runs against:
 - **ADR-083** (Proposed) — Per-cluster Pi compute hop. Defines the
  device class that hosts the sketch bank.

+## Pass 2 — randomized rotation + multi-bit (ADR-156 §8, landed 2026-06)
+
+The "Open question" below ("does `BinaryQuantized` need a randomized
+rotation pre-pass?") is now **answered with measured numbers** via
+ADR-156 §10. Summary:
+
+- **Pass 2 (randomized rotation) is implemented** —
+  `crates/wifi-densepose-ruvector/src/rotation.rs`: a deterministic
+  `R = H·D` (Fast Hadamard Transform + seeded ±1 sign flips), `O(d log d)`
+  / `O(d)`, norm-preserving, reproducible from a stored `u64` seed. Opt-in
+  via `Sketch::from_embedding_rotated` / `SketchBank::with_rotation`;
+  Pass-1 API and wire format unchanged.
+- **Measured top-K coverage** (anisotropic planted-cluster fixture,
+  cosine ground truth, dim=128 N=2048 K=8): rotation lifts coverage
+  **36.13% → 46.39%** at the strict `candidate_k = K` bar, and Pass-2
+  reaches the **≥90% acceptance bar at candidate_k = 24 (~3× over-fetch)**.
+  Multi-bit (≤4-bit) reaches 74% at the strict bar. **Honest verdict:
+  neither rotation nor ≤4-bit multi-bit clears the strict-K 90% bar on
+  this distribution; the bar is met via the over-fetch "candidate set"
+  pattern this ADR specifies** (Decision §"the canonical pattern" — sketch
+  picks the candidate set, full precision refines). Full numbers and
+  reproduce commands in ADR-156 §10.
+- **Pre-existing `SketchBank::topk` bug fixed** — the `n > k` heap path
+  returned the k *farthest* sketches (min-heap mistaken for max-heap);
+  only the `n ≤ k` fast path had test coverage. Fixed + regression-pinned
+  (`topk_heap_path_returns_nearest`,
+  `tight_clusters_give_high_coverage_with_overfetch`). This makes every
+  prior top-K acceptance number in this ADR depend on the fixed path; the
+  ≥90% coverage criterion is only meaningful post-fix.
+
+## Pass 2b — RaBitQ unbiased distance estimator (ADR-156 §11, landed 2026-06)
+
+The **real** RaBitQ contribution (Gao & Long, SIGMOD 2024) — an
+**unbiased estimator of the inner product / distance** from the 1-bit
+code + per-vector side info, not just sign bits — is now implemented and
+**MEASURED against this ADR's ≥90% strict-K bar**:
+
+- **Implemented** — `crates/wifi-densepose-ruvector/src/estimator.rs`:
+  `EstimatorSketch` (Pass-2 sign code + 8 B/vec side info:
+  `residual_norm` + `x_dot_o = ⟨x̄, o'⟩`), `DistanceEstimator`
+  (`⟨o',q'⟩ ≈ ⟨x̄,q'⟩ / x_dot_o`, the paper's unbiased rescale), and
+  `EstimatorBank` reranking candidates by the estimate instead of raw
+  Hamming. **Zero-centroid simplification** (`c = 0`) documented;
+  paper-faithful centroid path also built (`with_centroid`). Additive —
+  Pass-1/Pass-2 and the wire format are unchanged.
+- **MEASURED strict-K coverage** (same fixture as §"Pass 2", cosine
+  ground truth): the estimator lifts the strict `candidate_k = K` bar
+  **46.39% (Pass-2 sign) → 49.71% (estimator, cosine rerank)** — a real
+  **+3.3 pp** lift, but **still ~40 pp short of the ≥90% strict bar.**
+  At over-fetch the estimator does better than sign (95.12% vs 91.60% at
+  candidate_k = 24). **Honest verdict: the unbiased estimator does NOT
+  clear the strict-K 90% bar on this distribution** — the binding
+  constraint is the 1-bit code's information ceiling, not estimator
+  variance. The ≥90% acceptance bar is still met only via the over-fetch
+  "candidate set" pattern this ADR's Decision specifies; the estimator
+  **reduces the over-fetch factor** needed but does not remove it. This
+  is a **published negative**, reported as such. Full numbers + reproduce
+  commands in ADR-156 §11.
+
 ## Open questions

 - **Does `BinaryQuantized` need a randomized rotation pre-pass for
-  RuView's embedding distributions?** Pure sign quantization assumes
-  zero-centered, isotropic embeddings. If AETHER / spectrogram
-  distributions are skewed (likely for spectrogram), add a
-  `randomized_rotation` pre-pass following the original RaBitQ paper
-  (Gao & Long, SIGMOD 2024). Decided after pass-1 benchmark.
+  RuView's embedding distributions?** **ANSWERED (ADR-156 §10):** rotation
+  is built and measured — it helps (+10pp at strict K) but is not
+  sufficient alone for strict-K 90% on the tested anisotropic
+  distribution; the over-fetch candidate-set pattern meets the bar.
+  Pure sign quantization assumes zero-centered, isotropic embeddings; the
+  rotation decorrelates anisotropic coords as the RaBitQ paper
+  (Gao & Long, SIGMOD 2024) prescribes.
 - **Sketch dimension target.** Default to the embedding's native
  dimension (128 for AETHER, 256 for spectrogram). Higher-dimensional
  sketches (Johnson-Lindenstrauss-projected to 512) trade compute for
@@ -19,7 +19,7 @@ The production CSI node firmware (`firmware/esp32-csi-node`) was built around th

 | C6 capability | What it enables for sensing | Why we can't get it on S3 |
 |---|---|---|
-| **802.11ax (Wi-Fi 6) HE-LTF CSI** | 242 subcarriers per HE20 frame (vs 52 for HT-LTF), HE-MU/HE-TB PPDU types, OFDMA-aware channel sounding | S3 radio is HT-only (n) |
+| **802.11ax (Wi-Fi 6) HE-LTF CSI** | 242 subcarriers per HE20 frame (vs 52 for HT-LTF), HE-MU/HE-TB PPDU types, OFDMA-aware channel sounding. **Hardware-confirmed 2026-06-11** (issue #1005, external production deployment): requires **ESP-IDF ≥ 5.5** — the v5.4 driver blob silently downconverts to 64-subcarrier HT even on a confirmed-HE link; v5.5.2 delivers 532 B frames = 256 bins (242 active tones), PPDU 0x01 (HE-SU). See WITNESS-LOG-110 §B1 (resolved). | S3 radio is HT-only (n) |
 | **802.15.4 (Thread / Zigbee)** | Cross-node time-sync over a separate radio — frees Wi-Fi airtime for CSI, ±100 µs alignment possible without coordination traffic on the sensing channel | S3 has no 802.15.4 |
 | **TWT (Target Wake Time)** | Sensor negotiates a deterministic wake slot with the AP; CSI cadence becomes scheduler-bounded instead of opportunistic | Requires 802.11ax — S3 can't speak it |
 | **LP-core + hibernation (~5 µA)** | Always-on motion gate runs on a separate RISC-V LP core in deep sleep; HP core stays off until a real event | S3 ULP is FSM-only, ~10 µA floor |
@@ -104,6 +104,57 @@ Ranked by build cost × user impact:
 | **P9** | HACS integration repo (`hass-wifi-densepose`) for HA-side install path | pending |
 | **P10** | Witness bundle + CSA-style spec compliance check | pending |

+## 4.1 Crypto/security review notes (§2.2 witness chain — ADR-262 P2 prerequisite)
+
+Beyond-SOTA crypto+security review of the SHA-256 + Ed25519 witness chain
+(`witness.rs` / `witness_signing.rs`) and the manifest signature surface
+(`manifest.rs`), because ADR-262 P2 proposes to **reuse this exact signing
+chain**. Top priority was the sibling `wifi-densepose-engine` bug class —
+unframed boundary-to-boundary concatenation of operator-influenceable strings
+into a signed/hashed digest.
+
+- **Engine bug class ABSENT (good result, reported with byte evidence).**
+  `canonical_bytes` is `DOMAIN_TAG ‖ prev_hash[32] ‖ seq:u64-be ‖ ts:u64-be ‖
+  kind_len:u32-be ‖ kind ‖ payload_len:u32-be ‖ payload`. The two
+  variable-length operator-influenceable fields (`kind`, `payload`) are
+  **length-prefixed**; the fixed-width fields are self-delimiting → the
+  encoding is injective (no two distinct event tuples share a preimage). The
+  Ed25519 signature signs the **identical** bytes the SHA-256 chain commits to.
+  No separate unframed concatenation exists; the manifest `binary_signature`
+  is signed at build time (Makefile) over a single fixed-length `binary_sha256`
+  hex value, not in-crate.
+
+- **CHM-WIT-01 (FIXED) — domain-separation tag added.** The engine fix
+  prescribed *domain-tag + length-prefix*; length-prefix was present, the
+  domain tag was not. Added a versioned, NUL-terminated
+  `WITNESS_DOMAIN_TAG = b"cog-ha-matter/witness-event/v1\x00"` prefix so the
+  witness message can never be replayed as a message for another Ed25519
+  context that shares key infrastructure (notably the manifest signature).
+  **Witness bytes change by design** (prior on-disk hashes/signatures
+  invalidated, as with the engine fix); verified safe because no in-repo crate
+  consumes cog-ha-matter witness bytes programmatically (doc-mentions only).
+
+- **CHM-WIT-02 (HARDENED) — `verify_signature` now uses `verify_strict`.** For
+  an audit chain the signature is the attestation, so non-canonical encodings
+  and small-order keys are rejected (RFC 8032 strict), giving the "one
+  canonical signature per event" property. Not a forgery fix — the verifying
+  key is caller-pinned, never read from the event.
+
+- **Confirmed clean (with evidence):** verify-before-trust + key-pinning
+  (`verify_signature` takes the verifying key as a parameter; `read_jsonl`
+  re-derives every hash and chain-verifies); key handling (the crate never
+  generates/stores/logs/serializes a signing key — only a documented test-only
+  fixed seed; production keys come from the Seed secure store, out of scope);
+  determinism (positional bytes, deterministic Ed25519, alphabetically-locked
+  JSONL field order, sorted TXT records — no HashMap/float nondeterminism feeds
+  any digest); fail-closed parsing (structured errors, no panics; `main.rs`
+  reads no untrusted files/paths).
+
+Tests: `cog-ha-matter --no-default-features` 64 → **68**, 0 failed (CHM-WIT-01
+pinned by 4 fails-on-old tests across `witness.rs`/`witness_signing.rs`;
+CHM-WIT-02 guarded by a key-pinning test). Python deterministic proof
+unchanged (cog-ha-matter is off the signal proof path).
+
 ## 5. References

 - ADR-101 — `cog-pose-estimation` packaging precedent (signed binaries on GCS, .cog manifest)
@@ -190,4 +190,78 @@ The entity registry is a `RwLock<HashMap<EntityId, EntityEntry>>` backed by an a

 - `v2/crates/wifi-densepose-sensing-server/src/main.rs` — Axum + Tokio architecture pattern used throughout the existing server stack
 - `docs/adr/ADR-126-ruview-native-ha-port-master.md` — HOMECORE master; §5.5 crate naming; §6 compatibility contract; §5.1 RUVIEW-POLICY
+
+---
+
+## 9. Security & concurrency review (P1 core, beyond-SOTA sweep)
+
+Foundational review of the `homecore` crate — the state store + event bus +
+service/entity registries every other HOMECORE module trusts. Same rigor as
+the ADR-129/130/132/133/161 sibling reviews. **Three real fixes (one
+concurrency, two hardening), each pinned by a fails-on-old test; the bus-lag
+and lock-discipline dimensions confirmed clean with evidence.**
+
+- **HC-RACE-01 (state-set TOCTOU — lost / reordered `state_changed`, the
+  crux). FIXED.** `StateMachine::set` did `get()` (releasing the DashMap
+  shard lock) → compute the next snapshot + the no-op / `last_changed`
+  decision → `insert()` (re-acquiring the lock) → `send()`. The
+  read-modify-write was **not atomic** w.r.t. a concurrent writer on the
+  same entity, contradicting §2.1's promise that "the writer atomically
+  replaces the map entry." A writer that read a stale `old` could
+  mis-classify a genuine transition as a no-op and **drop its
+  `state_changed` event** (a missed automation trigger) or fire an event
+  whose `new_state` duplicated the previously delivered one (a spurious
+  trigger for any automation keyed on `old_state != new_state`). **Fix:**
+  hold the shard write-lock across the entire read→decide→insert→fire
+  sequence via `entry()`/`insert_entry()`; `tx.send` is non-blocking,
+  non-async, and never re-enters the map, so firing under the shard lock
+  cannot deadlock and keeps global event order in lock-step with global
+  commit order. Pinned by `concurrent_set_fires_no_duplicate_adjacent_events`
+  (4 writers toggling one entity A/B; asserts no two consecutive fired
+  events carry an identical `new_state` — impossible under correct
+  serialisation; a probe observed ~93k such duplicate-adjacent events across
+  200 trials on the racy code, zero on the fix).
+- **HC-EID-LEN-01 (unbounded `entity_id` — memory-DoS at the REST boundary).
+  FIXED.** `homecore-api/src/rest.rs` parses untrusted path segments
+  straight through `EntityId::parse`; with no length cap, an
+  otherwise-valid id (`a.` + many MB of `[a-z0-9_]`) was accepted and a
+  `POST /api/states/<giant>` would persist it into the DashMap state store
+  (permanent growth across distinct ids). **Fix:** reject ids longer than
+  `MAX_ENTITY_ID_LEN` (255, HA-compatible) up front in `parse()`, before any
+  per-char scan, with a new `EntityIdError::TooLong`; fail-closed at the
+  boundary type protects every caller. Pinned by `entity_id_length_boundary`
+  (exactly-MAX accepted, MAX+1 and a 4 MiB id rejected — fails on old code).
+- **HC-SVC-PANIC-01 (service-handler panic not isolated). HARDENED.**
+  `ServiceRegistry::call` already ran handlers outside the registry lock (no
+  `RwLock` poisoning, no blocking of other callers — clean), but a
+  panicking handler unwound through `call()` into the caller's task. **Fix:**
+  wrap the handler future in `AssertUnwindSafe` + `catch_unwind`, converting
+  a panic to `ServiceError::HandlerPanicked`; the registry stays fully
+  usable. Pinned by `panicking_handler_is_isolated_and_registry_survives`.
+
+**Dimensions confirmed clean (with evidence):**
+
+- **Event-bus bounds / lag (same class as the homecore-api WS lag-DoS).**
+  Both `StateMachine` and `EventBus` use bounded `tokio::sync::broadcast`
+  (capacity 4,096). A slow subscriber gets a recoverable `Lagged(n)`
+  (drop-oldest + re-sync); `fire_*` is non-blocking and **never waits on
+  slow receivers**, so a lagging subscriber cannot block the publisher, grow
+  the channel without bound, or take down a fast subscriber. Evidenced by
+  `slow_subscriber_does_not_block_publisher_or_kill_the_bus` (fire 3×
+  capacity at an idle subscriber; publisher unblocked, bus stays live).
+- **Lock ordering / lock-across-await (deadlock).** No code path holds two
+  of `{state DashMap, registry RwLock, service RwLock}` simultaneously, so
+  no inconsistent-ordering deadlock can exist. Every `tokio::sync::RwLock`
+  guard in `registry.rs`/`service.rs` is used in a single synchronous
+  statement and dropped before any `.await`; `call` explicitly scopes the
+  read guard out before awaiting the handler. The only guard held across a
+  send is the DashMap shard lock in `set`, across a synchronous
+  (non-await) broadcast send — safe.
+- **Panic-on-input.** No reachable `unwrap`/`expect`/index in non-test code
+  beyond the safe `send().unwrap_or(0)` and the dead-but-harmless
+  `split_once(...).unwrap_or(...)` fallbacks on already-validated ids.
+
+`cargo test -p homecore --no-default-features`: **20 → 24 passed, 0 failed**
+(+4 pins). Workspace green; Python deterministic proof unchanged
+(`f8e76f21…46f7a`, bit-exact — `homecore` is off the signal proof path).
 - `docs/adr/ADR-028-esp32-capability-audit.md` — witness chain pattern (Ed25519 per state transition)
@@ -190,6 +190,23 @@ This is the same Wasmtime host already used for integration plugins (ADR-128)

 ---

+## 8a. Security review (beyond-SOTA sweep, post ADR-154–159)
+
+A focused security review of `homecore-automation` (the execution/eval surface — triggers → conditions → actions, with templates) was run after the ADR-154–159 sweep, applying the same rigor that the sibling engine/bfld/calibration/vitals/geo reviews used. **Two real DoS findings, each pinned by a fails-on-old test; the condition-bypass, fail-closed-parsing, and action-authorization dimensions were probed and found clean.**
+
+- **HC-SEC-01 (template-injection / unbounded-expansion DoS, HIGH) — FIXED.** A `template:` condition / `value_template` is user automation config, and was rendered with MiniJinja's defaults: **no instruction budget, no output cap**. A single condition such as `{% for i in range(5000) %}{% for j in range(5000) %}xxxx{% endfor %}{% endfor %}` rendered a **100 MB string over ~11 s on one render call** (measured) — a CPU/memory denial of service (the bfld-class "unbounded expansion"; MiniJinja's per-call `range()` 10k cap does **not** stop nested loops). **Fix:** enable MiniJinja's `fuel` feature and set a per-render budget (`set_fuel(Some(1_000_000))`) so a nested loop burns one unit per iteration — the attack now fails fast (~90 ms) with "engine ran out of fuel"; plus a 64 KiB source-length cap rejecting pathological sources before compilation. Legitimate HA templates (a few dozen instructions) are unaffected. Pinned by `nested_loop_template_is_bounded_not_unbounded_dos`, `single_huge_repeat_template_is_bounded`, `oversized_template_source_is_rejected` (all fail-on-old: unbounded render / no rejection), and `legitimate_template_still_renders_within_fuel` (no regression).
+- **HC-SEC-02 (panic-on-config DoS, MEDIUM) — FIXED.** `Action::Delay { seconds }` and `Action::WaitForTrigger { timeout_seconds }` fed the user-supplied float straight into `Duration::from_secs_f64`, which **panics** on negative, NaN, infinite, or overflowing inputs — all reachable from a crafted (or typo'd) YAML (`delay: {seconds: -1}`, `.nan`, `.inf`, `1e308`). One hostile config aborts the spawned automation run task with a panic (measured: "cannot convert float seconds to Duration: value is negative"). **Fix:** a `safe_duration_from_secs` guard that saturates instead of panicking (NaN/±inf/negative → `Duration::ZERO`, matching HA's lenient "non-positive delay = no delay"; absurdly large → clamped to ~100 years). Pinned by `delay_negative_seconds_does_not_panic`, `delay_nan_seconds_does_not_panic`, `delay_infinite_seconds_does_not_panic`, `wait_for_trigger_negative_timeout_does_not_panic`, `safe_duration_saturates_hostile_values` (incl. overflow clamp).
+
+**Dimensions confirmed clean (with evidence):**
+- **Condition bypass / fail-closed eval** — a `Condition::Template` whose render errors evaluates to `false` (`condition.rs` `Err(_) => false`), and a `Choose` branch condition that fails to deserialize is treated as **non-matching** (the branch is skipped), not silently passing (`action.rs` `ChoiceBranch::matches` `Err(_) => return false`). Both fail **closed** (do-not-run), confirmed by the existing `choose_*` tests and template-false-blocks-action behavioral test. No true-by-default-on-parse-error path found.
+- **Re-entrancy / livelock (DoS)** — run-mode machinery is bounded and tested: `Single`/`IgnoreFirst` re-entrancy guard, `Restart` cancel-and-replace, `Queued` FIFO serialization, and `max: N` semaphore cap (ADR-162; `restart_mode_cancels_prior_run`, `queued_mode_runs_sequentially_not_concurrently`, `max_two_caps_concurrency_at_two`, `single_mode_does_not_double_fire_on_rapid_triggers`). A self-triggering automation does not livelock the engine — each fire is bounded by its run-mode.
+- **Action authorization** — templates are read-only sandboxed (`states`/`state_attr`/`is_state`/`now` globals; no service-call or state-set global is exposed to template scope), so a template cannot escalate into an action. Service authorization itself is enforced at the `homecore` service-registry boundary (out of this crate's scope); no gap found in what the automation crate enforces.
+- **Panic-on-config (parse)** — `serde_yaml`/`serde_json` deserialization returns structured `AutomationError` (no `unwrap`/`expect`/index reachable from a crafted config in the eval/exec path); the only remaining panic surface was the `from_secs_f64` path fixed as HC-SEC-02.
+
+Validation: `cargo test -p homecore-automation --no-default-features` → 54 passed / 0 failed (+14 over baseline). Python deterministic proof unchanged (homecore-automation is off the signal-processing proof path).
+
+---
+
 ## 9. References

 ### HA upstream
@@ -0,0 +1,444 @@
+# ADR-131: HOMECORE-UI — Operational dashboard for the two-tier Cognitum stack
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted — UI implemented (§10); full backend wiring specified (§11–§12) |
+| **Date** | 2026-06-14 |
+| **Deciders** | ruv |
+| **Codename** | **HOMECORE-UI** — first-class operator dashboard inside the Cognitum Appliance shell |
+| **Relates to** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (HOMECORE master), [ADR-127](ADR-127-homecore-state-machine-rust.md) (HOMECORE-CORE state machine), [ADR-128](ADR-128-homecore-integration-plugin-system.md) (HOMECORE-PLUGINS), [ADR-129](ADR-129-homecore-automation-engine.md) (automation engine), [ADR-130](ADR-130-homecore-rest-websocket-api.md) (HOMECORE-API), [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (recorder/semantic search), [ADR-151](ADR-151-room-calibration-specialist-training.md) (room calibration HTTP API), [ADR-100](ADR-100-cog-packaging-specification.md) (Cog packaging), [ADR-116](ADR-116-cog-ha-matter-seed.md) (cog-ha-matter), [ADR-069](ADR-069-cognitum-seed-csi-pipeline.md) (SEED RVF ingest), [ADR-105](ADR-105-federated-csi-training.md) (federated CSI training) |
+| **Tracking issue** | TBD |
+| **Parent** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (sub-ADR, HOMECORE-127…134 family) |
+
+---
+
+## 1. Context
+
+HOMECORE (ADR-126 through ADR-134) is the native Rust + WASM + TypeScript port of Home Assistant running as the hub on the Cognitum v0 Appliance. As of P2, the state machine ([ADR-127](ADR-127-homecore-state-machine-rust.md)), API ([ADR-130](ADR-130-homecore-rest-websocket-api.md)), and COG runtime ([ADR-128](ADR-128-homecore-integration-plugin-system.md)) are in place. What is missing is a first-class dashboard UI that operators, integrators, and residents can use to manage the full two-tier hardware stack that HOMECORE coordinates.
+
+### 1.1 The two-tier hardware model this UI must represent
+
+This is the most important architectural constraint the UI must carry through every panel:
+
+- **Cognitum SEED** — a Pi Zero 2 W-based edge node. It has its own RVF vector store (8-dim, content-addressed, with kNN queries), Ed25519 witness chain, SHA-256 ingest audit trail, onboard environmental sensors (BME280 temperature/humidity/pressure, PIR motion, reed switch, ADS1115 4-channel ADC, vibration), 13 drift detectors, an MCP proxy (114 tools, JSON-RPC 2.0, default-deny policy), 98 HTTPS API endpoints, and epoch-based swarm sync for multi-SEED deployments. SEEDs sit close to the ESP32 sensing nodes and receive feature vectors from them at 1 Hz. Multiple SEEDs can form a peer mesh. **This is the sensing and memory tier.**
+- **Cognitum v0 Appliance** — a Pi 5 + Hailo-10H hub, running at `:9000`. It hosts the COG runtime (`/var/lib/cognitum/apps/`), the HOMECORE state machine and event bus, the calibration service, `ruview-mcp-brain:9876`, `cognitum-rvf-agent:9004`, `ruvector-hailo-worker:50051`, and acts as the fleet coordinator for multi-room correlation and federated training. The Appliance is where HOMECORE runs, and it is what the dashboard user is sitting in front of. **This is the computation and orchestration tier.**
+
+SEEDs are **subordinate nodes that the Appliance supervises** — they are not peers. The UI navigation hierarchy must reflect this: the Appliance is the root, SEEDs are children, ESP32 nodes are leaves.
+
+### 1.2 What the UI is not
+
+HOMECORE-UI is **not** a re-skin of the existing Cognitum Cog Store. It is a full operational dashboard that **extends** the Cognitum platform's shell — the Cog Store, API Explorer, and Guide already exist and must remain intact, with the HOMECORE dashboard added as a first-class navigation section alongside them.
+
+---
+
+## 2. Decision
+
+Build HOMECORE-UI as a **complete** TypeScript + Rust→WASM frontend (per this ADR's §3 and the HOMECORE-127…134 family) that:
+
+1. Lives at `http://cognitum-v0:9000/homecore` (or as a dedicated nav item in the Cognitum Appliance shell).
+2. Is visually and stylistically seamless with the existing Cognitum platform — same dark theme, same design tokens, same component patterns as `https://seed.cognitum.one/store`.
+3. Drives the HOMECORE REST + WebSocket API ([ADR-130](ADR-130-homecore-rest-websocket-api.md)) and the calibration HTTP API ([ADR-151](ADR-151-room-calibration-specialist-training.md)) for all data.
+4. Updates in real-time via the homecore `subscribe_events` WebSocket channel. **The UI must never poll for entity state.**
+
+**This is a decision to deliver the complete operational dashboard — every panel in §4.1 through §4.10, every navigation section in §5, fully wired to live data — not a design-system scaffold or a partial first cut.** A static layout shell with placeholder data is explicitly **out of scope as a deliverable**: the design system (§3) is a means to the complete UI, not an end in itself. The acceptance bar for this ADR is that an operator can drive the full two-tier stack — fleet, entities, rooms, COGs, calibration, events, audit, and settings — from the dashboard, against real APIs, with no panel left as a stub.
+
+### 2.1 `homecore-server` is the single backend-for-frontend (BFF) gateway
+
+The data the dashboard needs is spread across **three backend tiers that are not one process**: (a) `homecore-api` (`/api/*` REST + `/api/websocket`, mounted in `homecore-server`); (b) the **calibration API** (`/api/v1/*`, served by a *separate* binary — `wifi-densepose calibrate-serve` / `wifi-densepose-sensing-server`); and (c) the **SEED device tier + appliance daemons** (RVF vector store, witness chain, onboard sensors, reflex rules, COG supervisor, federation), which are physically separate HTTPS services on the SEED nodes and the appliance.
+
+The browser must talk to **exactly one origin.** Therefore `homecore-server` is promoted to the **single BFF / API gateway** for HOMECORE-UI: it serves the static assets at `/homecore`, serves `homecore-api` at `/api/*`, and **adds a new `/api/homecore/*` namespace** that proxies and aggregates the calibration API and the SEED/appliance tiers server-side. The UI only ever issues same-origin requests; cross-service auth (SEED bearer tokens, calibration tokens) is held by the gateway and **never exposed to the browser**. This collapses the CORS/multi-port problem and gives one place to enforce the long-lived-access-token auth (§4.10).
+
+### 2.2 No mock data in production
+
+The in-browser mock layer that the first UI cut shipped behind DEMO banners (§7.1, prior revision) is **demoted to a dev-only fixture** gated behind an explicit `?demo=1` / `HOMECORE_UI_DEMO=1` flag. The production build wires **every** panel to a real gateway endpoint. The full endpoint contract and the backend work each panel needs are specified in **§11**; the staged path to get there is **§12**. A panel may show an empty/typed-error state when its upstream is down, but it must never silently render fabricated data.
+
+---
+
+## 3. Design system — Cognitum platform conventions
+
+The implementor **must study `https://seed.cognitum.one/store` as the definitive design reference before writing a single line of CSS.** The existing platform's design tokens, extracted from production, are:
+
+### 3.1 Colour palette (CSS custom properties)
+
+| Token | Value | Role |
+|---|---|---|
+| `--bg` | `#0a0e1a` | page background (very dark navy) |
+| `--bg2` | `#111627` | secondary background / nav strip |
+| `--card` | `#171d30` | card / panel surface |
+| `--card-h` | `#1e2540` | card hover state |
+| `--border` | `#252d45` | all border strokes (≈0.67px, subtle) |
+| `--t1` | `#e0e4f0` | primary text (near-white) |
+| `--t2` | `#8890a8` | secondary / muted text |
+| `--t3` | `#505872` | tertiary / disabled text |
+| `--cyan` | `#4ecdc4` | primary action colour (Install buttons, live indicators, accents) |
+| `--cyan-d` | `rgba(78,205,196,0.15)` | cyan tint background for status badges |
+| `--green` | `#6bcb77` | success / online / healthy states |
+| `--green-d` | `rgba(107,203,119,0.15)` | green tint background |
+| `--amber` | `#d4a574` | warning / stale / degraded states |
+| `--amber-d` | `rgba(212,165,116,0.15)` | amber tint background |
+| `--red` | `#e06060` | error / offline / veto states |
+| `--red-d` | `rgba(224,96,96,0.15)` | red tint background |
+| `--purple` | `#a78bfa` | informational / epoch / chain indicators |
+| `--purple-d` | `rgba(167,139,250,0.15)` | purple tint background |
+| `--r` | `10px` | standard border radius on all cards and panels |
+
+### 3.2 Typography
+
+- `--font`: `'Segoe UI', system-ui, -apple-system, sans-serif` — all body and heading text.
+- `--mono`: `'Cascadia Code', 'Fira Code', Consolas, monospace` — all entity IDs, API endpoints, hex values, JSON payloads, COG binary hashes.
+
+### 3.3 Component patterns (from the live Cog Store and API Explorer)
+
+- **Cards**: `background: var(--card)`, `border: 0.67px solid var(--border)`, `border-radius: var(--r)`, `padding: 24px`.
+- **Category pills / status badges**: small `border-radius: 4–6px`, uppercase text, coloured background tint (e.g. `background: var(--cyan-d); color: var(--cyan)` for `RUNNING`; `background: var(--amber-d); color: var(--amber)` for `STALE`).
+- **Primary action buttons**: `background: var(--cyan)`, `color: var(--bg)`, no border — matching the existing "Install" button style exactly.
+- **Secondary / ghost buttons**: transparent background, `border: 1px solid var(--border)`, `color: var(--t1)` — matching the existing "Details" button style.
+- **Nav strip**: `background: var(--bg2)`, text items in `--t2`, active item highlighted in `--cyan` with a bottom underline.
+- **Featured card gradient borders**: top-edge linear gradient from `var(--cyan)` to `var(--purple)` — replicate for HOMECORE section headers.
+- **Live metric cards** (API Explorer status page): icon + large numeric value in `--cyan` or `--green`, label in `--t2` below, on a `var(--card)` background.
+- **Method badge pills** on the API Explorer (`GET` in green, `POST` in amber, `AUTH` in purple) — reuse this same pill system for COG status indicators.
+
+The implementor **must not introduce new colours, typefaces, or border radii.** Every component should feel like it was built by the same team that built the Cog Store and the API Explorer. A user navigating from the Cog Store into the HOMECORE dashboard should not notice a visual seam.
+
+---
+
+## 4. UI sections — required panels
+
+### 4.1 System Dashboard (the "home screen")
+
+The always-visible overview panel. Modelled on the API Explorer's live metric cards. All values update in real-time.
+
+- **v0 Appliance health strip** — reuse the exact metric-card pattern from `seed.cognitum.one/status`: one card each for CPU %, RAM usage, Hailo-10H inference load (% utilisation), Hailo temperature, uptime, and the running services (`ruview-mcp-brain:9876`, `cognitum-rvf-agent:9004`, `ruvector-hailo-worker:50051`). Values in `--cyan`, labels in `--t2`. This strip is always at the top — it represents the machine the user is looking at.
+- **SEED Fleet overview** — a grid of SEED node cards (one per paired SEED) on the `var(--card)` surface with `var(--border)`. Each card shows: online/offline status pill (green/red), firmware version, epoch number, current vector count, last ingest timestamp, and witness-chain validity badge. A collapsed row shows the SEED's 5 onboard sensors in summary (PIR: yes/no, door: open/closed, temperature from BME280). Offline SEEDs render the entire card with a `--red-d` background tint. Clicking a SEED card navigates to the SEED Detail view (§4.2).
+- **ESP32 Node summary** — count of active ESP32 nodes per SEED, current frame rate (target: 100 Hz CSI + 1 Hz feature vectors), and a compact warning list for nodes with known issues (presence_score normalisation anomaly, stale firmware version).
+- **COG Runtime status row** — a horizontal strip of status pills for each installed COG on the v0 Appliance. Pill colours follow the existing badge convention: `--green-d`/`--green` for running, `--red-d`/`--red` for failed, `--t3`/`--t2` for stopped. COG name in `--mono`. Clicking a pill navigates to COG Management (§4.6).
+- **Event Bus activity indicator** — a small real-time sparkline showing the homecore broadcast channel event rate (events/sec). Indicate channel lag if a subscriber is falling behind the 4,096-event capacity.
+
+### 4.2 SEED Detail View (per-SEED drill-down)
+
+Accessible from the fleet grid. Full-page panel for a single SEED node, using the card + section-header pattern from the Cog Store's detail views.
+
+- **SEED identity header** — `device_id` in `--mono`, firmware version, paired status in green, USB vs WiFi connection mode. A section-header gradient border (cyan → purple, matching the featured card style) visually separates this from Appliance content.
+- **Vector Store panel** — current vector count, dimension (8), last kNN query latency, current epoch number, a small sparkline of ingest rate over the last hour, and a storage budget bar showing usage against the 100K working-set target. A "Compact now" button (`POST /api/v1/store/compact`) in ghost style. When usage exceeds 80%, the bar renders in `--amber`.
+- **Witness Chain panel** — chain length (SHA-256 entries), last verification timestamp, a one-click "Verify chain" button (`POST /api/v1/witness/verify`), and an "Export attestation bundle" button for regulated deployments. The Ed25519 custody attestation (device-bound keypair, epoch + vector count + witness head) renders here. Chain length in `--purple`, following the existing epoch/chain colour convention.
+- **Onboard Sensors panel** — live readings from all 5 sensors in individual sub-cards: BME280 (temperature °C, humidity %, pressure hPa), PIR (motion boolean with last-triggered timestamp), reed switch (open/closed with last-changed timestamp), ADS1115 (4 analog channels with configurable labels), vibration (boolean with last-triggered). These are ground-truth validators against CSI readings and are critical for diagnosing false positives in the mixture-of-specialists. Sensor values in `--cyan`; sensor names in `--t2`.
+- **Reflex Rules panel** — the 3 pre-configured rules with current state: `fragility_alarm` (threshold 0.3 → relay actuator), `drift_cutoff` (threshold 1.0), `hd_anomaly_indicator` (threshold 200 → PWM brightness). Show last-fired time for each. The `fragility_alarm` threshold is the most commonly adjusted field and should be editable inline. Rules that have recently fired render with a `--amber-d` background tint.
+- **Cognitive Analysis panel** — boundary fragility score (0.0–1.0, from Stoer-Wagner min-cut on the kNN graph) rendered as a progress bar: green below 0.3, amber 0.3–0.6, red above 0.6. High fragility (>0.3) indicates a regime change in the environment and should be visually prominent. Temporal coherence phase boundaries shown as a labelled timeline of detected environment state transitions. kNN graph rebuild cadence indicator (every 10 s).
+- **Ingest pipeline status** — which ESP32 nodes feed this SEED, the packet type each is sending (`0xC5110003` native feature vectors vs `0xC5110002` vitals fallback path — distinguished visually since native is preferred), current ingest batch size, flush interval, and bridge path topology (direct vs host-laptop hop). The bridge-hop warning (known architectural limitation) renders in `--amber` since it adds a network hop.
+
+### 4.3 SEED Fleet Map (multi-SEED topology)
+
+For deployments with more than one SEED, a topology view showing the mesh:
+
+- **Node hierarchy diagram** — v0 Appliance at root, SEEDs as second tier (grouped by room/zone), ESP32 nodes as leaves under each SEED. Lines represent active data flows. ESP-NOW mesh sync links between SEEDs shown as dashed lines. Connection health shown via line colour (green/amber/red). All labels in `--mono`.
+- **Cross-SEED event deduplication indicator** — for events that span multiple SEEDs (one fall detected by two rooms; one occupant tracked through room A → hallway → room B), show a fusion badge indicating how many SEEDs contributed to the composite event.
+- **Federation config** ([ADR-105](ADR-105-federated-csi-training.md)) — federated-learning round coordinator role (which SEED is the round coordinator), current round number, K healthy nodes selected, delta exchange status. **Model deltas only — never raw CSI** is a design invariant that must be labelled explicitly in the UI.
+
+### 4.4 Entity & State Browser
+
+The homecore state machine (`DashMap<EntityId, Arc<State>>`) is the authoritative source of truth. Every COG running on the v0 Appliance contributes entities.
+
+- **Entity list by domain** — grouped by the `domain.` prefix of `EntityId`, using collapsible section headers. The 21 entities per ESP32 node (11 raw + 10 semantic primitives from `cog-ha-matter`) are the most important set. For each entity: current state string (in `--t1`), last-changed timestamp (in `--t3`), attribute map as collapsible JSON in `--mono`, and the Context (`user_id` + `parent_id` causality chain, critical for care/audit deployments). Entity IDs always in `--mono`.
+- **SEED provenance badge** — each entity carries a small badge showing its data lineage: which ESP32 node → which SEED → which COG → homecore state machine. This trace is invaluable for debugging false positives and is a **first-class UI element, not a collapsed detail.**
+- **Domain filter + semantic search** — filter by domain prefix and, once [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (homecore-recorder) lands, ruvector-backed semantic search: "when did the living room anomaly score last correlate with a door-open event?" A keyword filter across entity IDs and attribute keys ships in the initial release regardless of [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) status, given entity density; the semantic search layers on top once the recorder lands.
+- **Real-time WebSocket feed** — entity states update live via the homecore `subscribe_events` WebSocket command ([ADR-130](ADR-130-homecore-rest-websocket-api.md)). The UI must never poll. Show a broadcast-channel lag indicator; warn visually if the subscriber is falling behind the 4,096-event channel capacity.
+- **StateChanged detail panel** — clicking any entity opens a slide-over panel showing the full `StateChangedEvent`: `old_state`, `new_state`, `context.id`, `context.user_id`, and the `context.parent_id` chain rendered as a breadcrumb trail.
+
+### 4.5 RoomState / Sensing Panel
+
+Surfaces the mixture-of-specialists output from the calibration service — the highest-level per-room sensing result. Data comes from `GET /api/v1/room/state?bank=<room_id>` on the v0 Appliance.
+
+- **Per-room cards** — one card per `room_id` on the `var(--card)` surface. Each card shows live `RoomState` JSON fields as sub-rows: presence (occupied/absent chip in green/red with confidence bar), posture (standing/sitting/lying chip with confidence), breathing BPM (numeric in `--cyan` with range indicator 6–30), heart rate BPM (numeric in `--cyan` with range indicator 40–120), restlessness score (0–1 progress bar), and anomaly score (0–1 with normal/anomalous label, bar turns red above a configurable threshold).
+- **STALE warning** — when `stale: true` (the specialist bank was trained against a different baseline), render the entire room card with a `--amber-d` background tint and a prominent amber banner reading "Bank stale — baseline has changed" with a direct "Recalibrate room" link into the calibration wizard (§4.7). This is the most common real-world failure mode and **must never be subtle.**
+- **VETO indicator** — when `vetoed: true` (anomaly veto suppressed vitals/posture because the window was physically implausible), render the affected specialist slots in `--red` with a "Veto active" label. Values suppressed by veto **must not render as zeros** — they must render as explicitly withheld.
+- **Null specialist placeholders** — specialists not yet trained (`null` in the specialist bank) render as "Not trained" placeholders in `--t3` with a small "Calibrate to enable" prompt in ghost style. They are **not** errors.
+- **Confidence bars** — each specialist output has a confidence float, shown as a small inline bar (`--cyan` fill) next to the reading. Low confidence (< 0.4) renders the bar in `--amber`.
+- **Multi-SEED fusion indicator** — for rooms served by multiple SEEDs, show a small badge indicating how many SEED nodes contributed to the `MultiNodeMixture` for this room's reading.
+
+### 4.6 v0 Appliance COG Management
+
+The v0 Appliance hosts COGs at `/var/lib/cognitum/apps/`. This panel is the operational companion to the existing Cog Store (`seed.cognitum.one/store`). It must match the Cog Store's visual conventions precisely — same card layout, same category pills, same install/detail button pair — because operators will move between the two surfaces.
+
+- **Installed COGs list** — for each COG: `id` and `version` in `--mono`, architecture badge (`arm`/`hailo10` etc., category-pill pattern), status pill (running/stopped/failed/updating in green/grey/red/amber), `binary_sha256` verified badge (Ed25519 signature verification shown as a shield icon in `--green` or `--red`), and PID from the pid file. Actions: start, stop, restart (ghost style), and view `output.log` / `error.log` in a monospace drawer using `--mono`. Edit `config.json` inline with syntax highlighting.
+- **COG Store / App Registry** — browsable `app-registry.json` listing. This panel should visually mirror `seed.cognitum.one/store` as closely as possible — same featured-card hero layout, same icon + title + description + category pill + action button structure. One-click install downloads the binary from GCS, verifies `binary_sha256` + `binary_signature`, writes the manifest, and starts the COG. Show which new homecore entities will appear in the state machine after install, as a preview list before confirming.
+- **OTA Updates** — a badge count on installed COGs with available updates, matching the "Installed (N)" tab badge convention from the existing Cog Store. Show a diff panel (version change, new entities, config schema changes) before confirming the update.
+- **Hailo HEF status** — for COGs with `arch: hailo10`: loaded HEF files on the Hailo-10H, current inference throughput, and `ruvector-hailo-worker:50051` connection status. The RF Foundation Encoder ([ADR-150](ADR-150-rf-foundation-encoder.md)) and neural pose head display here once available.
+
+### 4.7 Calibration Wizard
+
+The full baseline → enroll → train → verify pipeline runs via HTTP against the v0 Appliance ([ADR-151](ADR-151-room-calibration-specialist-training.md)). This is a multi-step guided flow — not a raw API panel. Use a stepped wizard layout with a progress indicator at the top (steps 1–5 as numbered pills, active step in `--cyan`, completed in `--green`, pending in `--t3`).
+
+- **Step 1 — Select room and SEED** — enter a `room_id` name (validated against `[A-Za-z0-9_-]{1,64}`) and select which SEED(s) and ESP32 nodes serve this room from a dropdown populated from the live fleet. Show current CSI ingest health for the selected nodes inline — if frames are not arriving at the expected rate, display an amber warning **before** allowing the operator to proceed. A broken ingest pipeline will silently fail calibration.
+- **Step 2 — Baseline capture** — `POST /api/v1/calibration/start`. A large full-width animated progress bar (cyan fill) reads from `GET /api/v1/calibration/status`: frames recorded vs target, ETA in seconds, `z_median` value. If `motion_flagged` is true, overlay an amber banner: "Room must be empty — movement detected." The baseline UUID produced here is the anchor for all future STALE detection for this room — display it in `--mono` once complete so operators can record it.
+- **Step 3 — Anchor enrollment** — the 8 anchor labels in enforced order: `empty`, `stand_still`, `sit`, `lie_down`, `breathe_slow`, `breathe_normal`, `small_move`, `sleep_posture`. For each: a human-readable instruction with an illustration, a countdown timer rendered as a circular progress ring in `--cyan`, and an immediate quality-gate result (accepted in green, retry in amber with a reason string). Drive via `POST /api/v1/enroll/anchor` + `GET /api/v1/enroll/status`. After each accepted anchor, show the extracted feature values (mean, variance, breathing_score, heart_score) in a small `--mono` data row so operators can sanity-check the capture. Show overall progress as "N / 8 anchors accepted."
+- **Step 4 — Train** — a single `POST /api/v1/room/train` call. Show the 6 specialist results as a checklist: presence (threshold + occupied_var), posture (prototype count), breathing (min_score), heartbeat (min_score), restlessness (calm/active motion values), anomaly (prototype count + scale). Specialists that returned non-null render in `--green`. Null specialists (insufficient anchor data) render in `--amber` with a "Re-enroll missing anchors" prompt linking back to Step 3 for the specific missing labels.
+- **Step 5 — Verify live** — display the live `RoomState` for the just-trained room using the same per-room card layout as §4.5. Prompt the operator to stand in the room and verify presence is detected, try sitting/lying to confirm posture, and breathe normally to confirm vitals are in plausible range. A "Confirm and save" button (cyan, primary) closes the wizard; a "Something's wrong — re-enroll" button (ghost) loops back to Step 3.
+
+### 4.8 Event Bus & Automation Feed
+
+- **Live event stream panel** — a virtualized scrolling list of `SystemEvent` variants (`StateChanged`, `EntityRegistered`, `ConfigReloaded`) and notable `DomainEvent`s from the homecore Tokio broadcast channel. Each row shows: event-type pill (coloured by variant), `entity_id` in `--mono`, old state → new state arrow, timestamp, and `context.user_id`. The stream is filterable by entity domain, event type, or source SEED/COG. The filter bar uses the same search-input style as the Cog Store's search field.
+- **Context causality breadcrumb** — expanding any event row shows the full Context chain (`context.id` → `parent_id` → `grandparent_id`) as a breadcrumb trail in `--mono`. This is how automation loops become visible without any separate debugging tool.
+- **Automation builder** ([ADR-129](ADR-129-homecore-automation-engine.md) scope) — a trigger → condition → action editor on the card surface. The most important RuView-specific trigger types to support are: `state_changed` on `RoomState` entities with a threshold expression (e.g. `anomaly.value > 0.8`), SEED reflex-rule firing events (`fragility_alarm`, `hd_anomaly_indicator`), and custom `domain_event` topics. Actions include calling services in the homecore service registry and firing domain events. The condition expression editor uses `--mono`.
+
+### 4.9 Witness / Audit Log
+
+- **Unified witness timeline** — a chronological merged view of events from both tiers: the SEED's SHA-256 ingest chain (every RVF store write attested) and homecore's Ed25519 state-transition chain (biometric crossings, BFLD identity-risk elevations). Each row: `entity_id` in `--mono`, old/new state, timestamp, source SEED `device_id`, signing key fingerprint (first 8 chars in `--mono`). Pagination uses the same "Showing X–Y of Z" convention from the Cog Store's cog grid.
+- **Privacy mode banner** — a persistent top-of-panel banner showing current privacy mode: `--green-d`/green text for full-publish mode; `--amber-d`/amber text for audit-only mode (SHA-256 digests on-SEED only, no MQTT state messages). Show the per-SEED privacy mode state, since SEEDs can be individually configured. Toggling privacy mode is a high-stakes action — require an explicit "Confirm" step with a summary of what will change.
+- **Export bundle** — an "Export attestation bundle" button (ghost) that packages the SEED witness chain + homecore Ed25519 chain as a downloadable archive for regulated-deployment (care home, hotel, shared office) compliance handoff.
+
+### 4.10 Settings & Integration Config
+
+- **SEED fleet management** — add, remove, and reprovision SEEDs. Show the USB-only pairing requirement prominently (the pairing window only opens via `169.254.42.1`, not WiFi — a security invariant). Per-SEED: `device_id` in `--mono`, firmware version, bearer token status, and a "Rotate token" action (ghost) that walks the operator through the secure token rotation flow.
+- **ESP32 node provisioning** — per-node NVS config display (target IP, target port, node_id), last-seen firmware version, and a link to the provisioning script. The `node_id` → room/zone assignment is editable here and persists to the room calibration system's `room_id` mapping.
+- **MQTT / cog-ha-matter config** ([ADR-116](ADR-116-cog-ha-matter-seed.md)) — broker URL, credentials (masked), MQTT topic prefix, mDNS advertisement status (`_ruview-ha._tcp`), and a live connection indicator (green dot for connected, red for unreachable). The 21 HA-DISCO entities per node are listed here with their `via_device` assignments showing which SEED they belong to in HA's device registry.
+- **Long-lived access tokens** — for homecore-api companion-app connections (HA 2025.1 wire-compat, [ADR-130](ADR-130-homecore-rest-websocket-api.md)). Token creation, last-used timestamp, and revocation. The HA companion-app pairing QR-code flow surfaces here.
+- **Federation config** — for multi-SEED deployments: ESP-NOW mesh sync status, cross-SEED epoch alignment values, and federated-learning round settings (coordinator SEED, round cadence, Krum aggregation parameters per [ADR-105](ADR-105-federated-csi-training.md)). The design invariant **"model deltas only, never raw CSI"** must be labelled explicitly in this panel.
+
+---
+
+## 5. Navigation structure
+
+HOMECORE-UI must integrate into the existing Cognitum Appliance nav shell. The top nav should read:
+
+```
+Framework | Guide | Cog Store | HOMECORE | Status
+```
+
+— inserting **HOMECORE** as a first-class nav item between the existing "Cog Store" and "Status" entries, using the same nav-item style (text in `--t2`, active state in `--cyan` with bottom underline).
+
+Within the HOMECORE section, a left sidebar (or top sub-nav on narrow viewports) provides section navigation:
+
+```
+Dashboard | SEED Fleet | Entities | Rooms | COGs | Calibration | Events | Audit | Settings
+```
+
+The COG Store panel within HOMECORE (§4.6) links out to `seed.cognitum.one/store` for the full catalog view, ensuring the existing Cog Store remains the canonical browsing experience.
+
+---
+
+## 6. Key UX invariants
+
+These must be maintained across every panel:
+
+1. **Always make the tier origin of any data explicit.** A `RoomState` reading traces to an ESP32 node → SEED → COG → v0 Appliance state machine. The provenance badge (§4.4) must appear wherever entity states are displayed.
+2. **The `stale` and `vetoed` flags from `RoomState` and the kNN fragility score from SEED cognitive analysis are meaningful diagnostic signals** — they must never be silently hidden, styled grey-on-grey, or collapsed behind an expand toggle. They represent system health operators need to act on.
+3. **Values that are `null` because a specialist has not been trained must be visually distinct from values that are unavailable due to an error.** The distinction is operationally important: `null` means "calibrate to enable," unavailable means "investigate."
+4. **All entity IDs, hashes, API endpoints, binary signatures, device UUIDs, and JSON payloads must use `--mono` font.** This is already the convention in the API Explorer and must be consistent throughout HOMECORE-UI.
+5. **The v0 Appliance Hailo HAT is a separate subsystem from the SEED's edge compute.** Inference results tagged as Hailo-sourced (COGs with `arch: hailo10`) must be visually distinguished from results from CPU-only COGs (`arch: arm`) so operators can triage hardware-specific failures.
+
+---
+
+## 7. Scope — complete UI delivery
+
+The deliverable is the **entire** dashboard. Every panel below ships fully implemented and wired to its live data source — there is no scaffold-only milestone and no panel left as a placeholder. The table records each panel's authoritative backing API so the build can proceed in whatever order best fits the dependency graph; it is a dependency map, **not** a sequence of partial releases.
+
+| Panel | Section | Backing API / source |
+|---|---|---|
+| System Dashboard | §4.1 | [ADR-130](ADR-130-homecore-rest-websocket-api.md) WebSocket + appliance health endpoints |
+| SEED Detail View | §4.2 | SEED HTTPS API (vector store, witness, sensors, reflex, cognitive analysis) |
+| SEED Fleet Map | §4.3 | fleet topology + federation ([ADR-105](ADR-105-federated-csi-training.md)) |
+| Entity & State Browser | §4.4 | [ADR-127](ADR-127-homecore-state-machine-rust.md) state machine via [ADR-130](ADR-130-homecore-rest-websocket-api.md) `subscribe_events`; semantic search via [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) |
+| RoomState / Sensing | §4.5 | [ADR-151](ADR-151-room-calibration-specialist-training.md) `GET /api/v1/room/state` |
+| COG Management | §4.6 | [ADR-128](ADR-128-homecore-integration-plugin-system.md) plugin runtime + [ADR-100](ADR-100-cog-packaging-specification.md) app registry |
+| Calibration Wizard | §4.7 | [ADR-151](ADR-151-room-calibration-specialist-training.md) calibration HTTP API |
+| Event Bus & Automation | §4.8 | [ADR-130](ADR-130-homecore-rest-websocket-api.md) broadcast channel + [ADR-129](ADR-129-homecore-automation-engine.md) automation engine |
+| Witness / Audit Log | §4.9 | SEED SHA-256 ingest chain + homecore Ed25519 chain |
+| Settings & Integration | §4.10 | SEED provisioning, [ADR-116](ADR-116-cog-ha-matter-seed.md) MQTT/Matter, LLAT, federation |
+
+### 7.1 Build sequencing within the complete deliverable
+
+The complete UI depends on backing services that mature on their own timelines. Each panel is built against the **real gateway endpoint** defined in §11; where the upstream is not yet available the panel renders a typed empty/error state, **not** fabricated data (the dev-only `?demo=1` fixture of §2.2 exists for offline development only and is never the shipped behaviour). Concretely, the hard contract dependencies are: [ADR-130](ADR-130-homecore-rest-websocket-api.md) (REST + WebSocket), [ADR-127](ADR-127-homecore-state-machine-rust.md) (state machine), [ADR-151](ADR-151-room-calibration-specialist-training.md) (calibration), [ADR-128](ADR-128-homecore-integration-plugin-system.md) (plugin runtime), [ADR-129](ADR-129-homecore-automation-engine.md) (automation), [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (event history + semantic search), [ADR-116](ADR-116-cog-ha-matter-seed.md) (SEED/Matter), [ADR-069](ADR-069-cognitum-seed-csi-pipeline.md) (SEED ingest), and [ADR-105](ADR-105-federated-csi-training.md) (federation). The keyword entity filter (§4.4) ships immediately; semantic search layers on once [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) lands. The exact panel→endpoint→upstream map and the new gateway code each requires are §11; the staged delivery is §12.
+
+---
+
+## 8. Consequences
+
+### 8.1 Positive
+
+- Operators, integrators, and residents get a single coherent surface for the full two-tier stack, replacing the need to SSH into SEEDs or hand-craft API calls.
+- The dashboard reuses the proven Cognitum design tokens and component patterns verbatim, so it ships visually consistent with no separate design effort and no perceptible seam between surfaces.
+- Diagnostic signals that today are invisible (`stale`/`vetoed` flags, kNN fragility, provenance lineage, channel lag) become first-class, surfacing the system's most common real-world failure modes directly to operators.
+
+### 8.2 Negative / risks
+
+- The UI hard-depends on the wire-compat guarantees of ADR-130 and the calibration contract of ADR-151; schema drift in either breaks panels silently. Integration tests against every backing contract in §7 are required.
+- Committing to the complete UI in one deliverable is a larger up-front effort and couples the UI's readiness to the maturity of multiple backing services (§7.1, §11). The mitigation is the BFF gateway (§2.1): each panel targets one same-origin endpoint, and the gateway absorbs upstream churn behind a stable contract.
+- Promoting `homecore-server` to a gateway means it now **proxies cross-tier traffic** (calibration API, SEED HTTPS, appliance daemons). This adds a network hop, a place for upstream timeouts/partial failures to surface, and a server-side store of SEED bearer tokens that must be protected (§11.10). Each proxied route needs an explicit timeout + typed error mapping so one slow SEED cannot stall the dashboard.
+- Several panels depend on data that only exists on **real hardware or new daemons** (SEED device tier, appliance host metrics, COG supervisor). Until those upstreams exist the corresponding gateway routes return `503 upstream_unavailable`; this is honest but means the dashboard is only as "live" as the tiers behind it (§11 classifies every endpoint by what it depends on).
+- Faithfully mirroring `seed.cognitum.one/store` couples HOMECORE-UI to the external Cog Store's evolving design; token drift there must be tracked and re-synced.
+- The two-tier mental model (Appliance root, SEED children, ESP32 leaves) must be enforced consistently; any panel that flattens or peers the tiers undermines the core architectural constraint.
+
+---
+
+## 9. References
+
+- `https://seed.cognitum.one/store` — primary design reference for all visual conventions.
+- `https://seed.cognitum.one/status` — reference for live metric-card layout.
+- [ADR-126](ADR-126-ruview-native-ha-port-master.md) — HOMECORE master ADR.
+- [ADR-127](ADR-127-homecore-state-machine-rust.md) — HOMECORE-CORE state machine and entity registry.
+- [ADR-128](ADR-128-homecore-integration-plugin-system.md) — HOMECORE-PLUGINS WASM COG substrate.
+- [ADR-129](ADR-129-homecore-automation-engine.md) — HOMECORE automation engine.
+- [ADR-130](ADR-130-homecore-rest-websocket-api.md) — HOMECORE-API REST + WebSocket wire-compat.
+- [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) — homecore-recorder, history + semantic search.
+- [ADR-100](ADR-100-cog-packaging-specification.md) — Cognitum Cog packaging specification (manifest.json, status values, on-device layout).
+- [ADR-116](ADR-116-cog-ha-matter-seed.md) — cog-ha-matter (SEED cog, HA-DISCO entity surface, mDNS).
+- [ADR-069](ADR-069-cognitum-seed-csi-pipeline.md) — ESP32 CSI → Cognitum SEED RVF ingest pipeline (SEED architecture detail).
+- [ADR-105](ADR-105-federated-csi-training.md) — Federated CSI training (multi-SEED federation).
+- [ADR-151](ADR-151-room-calibration-specialist-training.md) — Per-room calibration specialist training (calibration HTTP API).
+- `v2/crates/homecore/src/` — state machine, entity, event, registry source.
+- `docs/integration/calibration-appliance-integration.md` — calibration API contract and RoomState schema.
+
+---
+
+## 10. Implementation status
+
+Implemented as a zero-dependency, no-build-step vanilla TS/JS + CSS frontend served by `homecore-server` at `/homecore` (the `rufield-viewer` "Axum + vanilla-JS" pattern). The complete deliverable per §2/§7 — all ten panels, fully rendered, wired to live data where the backing service exists and to a contract-conformant DEMO-flagged mock layer (§7.1) where it does not.
+
+**Location:** `v2/crates/homecore-server/ui/` — `css/tokens.css` (the §3.1 palette, verbatim) + `css/app.css` (§3.3 components); `js/{ui,api,ws,mock,app}.js` (shared helpers, REST client, `subscribe_events` WS client, mock layer, shell+router); `js/panels/*.js` (one module per §4 panel). Mounted via `tower-http` `ServeDir` in `homecore-server::build_app`, gated by `--ui-dir`/`HOMECORE_UI_DIR`.
+
+**Verification:**
+- **Rust** — `#[cfg(test)] mod ui_tests` in `homecore-server/src/main.rs`: 5 integration tests (`tower::oneshot`) covering index, design tokens, all ten panel modules served, API coexistence, and mount-disable. *Written but not compiled in the authoring environment (no Rust toolchain present); run `cargo test -p homecore-server` on a Rust host before merge.*
+- **Frontend** — `ui/` test suite under plain `node` (no npm install): `npm test` → import/export graph verifier (15 modules) + render-smoke (executes every panel against a DOM shim; 21 checks) + interaction suite (live WS patch, ws.js handshake/parse, calibration contract; 3 checks). **24/24 green.**
+- **Benchmark** — `npm run bench`: total bundle **136.8 KB** uncompressed (**~37× smaller** than HA's ~5 MB Lit bundle, the ADR-126 §1.1 foil); slowest panel **1.5 ms/cold-render**.
+
+**Honest scope — current vs. target.** *Earlier cut:* the front-end was complete but only §4.4 Entities was wired to a real backend; the rest rendered from an in-browser mock. *This revision implements the §11 wiring:*
+
+- **Front-end (§11.11) — DONE and verified.** `api.js` rewritten: all data accessors are async and call the §11.2 gateway routes; the mock layer is demoted to a dev-only fixture reachable **only** under `?demo=1` / `HOMECORE_UI_DEMO` (§2.2); every panel `await`s and renders a typed empty/error state on failure (no mock fallback in production). All ten panels converted (3 by hand, 7 via parallel agents). Verified under Node: 5 test files green — import graph, boot, render-smoke (22), interaction (3), **and a new prod-errors suite (13) that runs with demo OFF + gateway unreachable and asserts every panel renders an error state, never mock, never throws** (it caught and fixed a real unhandled-rejection in the events panel).
+- **Gateway (§11.1–§11.6) — IMPLEMENTED, COMPILED, TESTED, RUN.** New `homecore-server/src/gateway.rs` (+`reqwest` dep, +CLI/env flags `--calibration-url`/`--calibration-token`/`--apps-dir`/`--gateway-timeout-ms`, merged into `build_app` via `gateway_router`). Real handlers: `/api/cal/*` reverse-proxy (W2), `GET /api/homecore/rooms` with the §11.3 RoomState adapter (W2), `GET /api/homecore/cogs` supervisor over the apps dir (W4), `GET /api/homecore/appliance` from `/proc` + port probes (W6). SEED-device/appliance-daemon routes (seeds, federation, witness, privacy, settings, automations, events-history, hailo, tokens — W3/W5) return a typed `503 upstream_unavailable` per §11.2. **Verified on Rust 1.89: `cargo test -p homecore-server --no-default-features` = 12/12 pass** (6 gateway + 6 UI mount). **Run live:** `GET /api/homecore/appliance` returns real `/proc` metrics + TCP service probes; unauth → `401`; `cogs` → `[]` with no apps dir; SEED-tier → typed `503`; and against a mock calibration upstream the `/api/cal/*` proxy passes through (`200`) and `GET /api/homecore/rooms` correctly adapts `RoomState` to the UI shape (`breathing`→`breathing_bpm`, `heartbeat:null`→`heart_bpm:null`, injected `anomaly.threshold`/`room_id`, `stale` passthrough). **Live testing caught + fixed one real bug** — a double-`v1` path in the `/api/cal/*` proxy URL.
+
+The endpoint-by-endpoint contract is **§11**; the staged plan and which endpoints depend on real SEED/appliance hardware vs. pure software is **§12**.
+
+---
+
+## 11. Backend wiring — making every panel real
+
+This section is the authoritative contract for full functionality. It removes the mock layer from the production path (§2.2) by routing every panel through the `homecore-server` BFF gateway (§2.1). Each endpoint is classified by what it depends on:
+
+- **EXISTS** — backend code already in this repo; gateway only proxies/adapts.
+- **NEW-GW** — pure software the gateway itself implements (filesystem, `/proc`, process control, recorder query) — no new external service.
+- **NEW-API** — a small HTTP wrapper to add to an existing in-repo crate (`homecore-api`, `homecore-automation`).
+- **SEED-DEV** — depends on a SEED node's on-device HTTPS API (separate hardware/firmware).
+- **APPLIANCE** — depends on an appliance daemon / accelerator stat source.
+
+### 11.1 Gateway shape
+
+`homecore-server` already mounts `homecore-api` at `/api/*` and the UI at `/homecore`. It gains a new **`/api/homecore/*`** namespace (the dashboard-specific aggregation surface) plus a **`/api/cal/*`** reverse-proxy to the calibration service. The browser issues only same-origin requests; the gateway fans out server-side, holding all upstream credentials (§11.10). Every proxied route has an explicit timeout and maps upstream failure to a typed body (`503 upstream_unavailable`, `504 upstream_timeout`) so one slow tier never stalls the dashboard.
+
+### 11.2 Master endpoint contract (panel → gateway route → upstream → status)
+
+| Panel | UI method (`api.js`) | Gateway route | Upstream / source | Class |
+|---|---|---|---|---|
+| §4.4 Entities | `states()` | `GET /api/states` | `homecore` state machine | **EXISTS** ✅ wired |
+| §4.4/§4.8 live feed | WS | `GET /api/websocket` (`subscribe_events`) | `homecore` event bus | **EXISTS** ✅ wired |
+| §4.8 Event history | `eventHistory(q)` | `GET /api/events?since=…` | `homecore-recorder` ([ADR-132](ADR-132-homecore-recorder-history-semantic-search.md)) | **NEW-API** |
+| §4.8 Automations | `automations()` / `saveAutomation()` | `GET/POST/DELETE /api/homecore/automations` | `homecore-automation` ([ADR-129](ADR-129-homecore-automation-engine.md)) | **NEW-API** |
+| §4.5 Rooms | `roomStates()` | `GET /api/homecore/rooms` → per-room `GET /api/cal/v1/room/state?bank=` | `calibrate-serve` ([ADR-151](ADR-151-room-calibration-specialist-training.md)) | **EXISTS** (proxy + adapter) |
+| §4.7 Calibration | `calibration.*` | `POST /api/cal/v1/calibration/{start,stop}`, `GET …/status`, `POST …/enroll/anchor`, `GET …/enroll/status`, `POST …/room/train` | `calibrate-serve` | **EXISTS** (proxy) |
+| §4.6 COGs | `cogs()` / `cogAction()` / `cogLogs()` | `GET /api/homecore/cogs`, `POST …/cogs/:id/{start,stop,restart}`, `GET …/cogs/:id/logs`, `GET/PUT …/cogs/:id/config` | COG supervisor over `/var/lib/cognitum/apps/` ([ADR-100](ADR-100-cog-packaging-specification.md)/[ADR-128](ADR-128-homecore-integration-plugin-system.md)) | **NEW-GW** |
+| §4.6 Hailo HEF | `hailo()` | `GET /api/homecore/hailo` | `ruvector-hailo-worker:50051` | **APPLIANCE** |
+| §4.1 Appliance health | `appliance()` | `GET /api/homecore/appliance` | host `/proc` + Hailo stats + service probes | **NEW-GW** (+APPLIANCE for Hailo) |
+| §4.1/§4.2 Fleet + SEED detail | `seeds()` / `seed(id)` | `GET /api/homecore/seeds`, `GET …/seeds/:id` | SEED device HTTPS API ([ADR-069](ADR-069-cognitum-seed-csi-pipeline.md)) via registry | **SEED-DEV** |
+| §4.2 SEED actions | `seedCompact()` / `seedVerify()` | `POST …/seeds/:id/{compact,witness/verify}` | SEED device API | **SEED-DEV** |
+| §4.3 Federation | `federation()` | `GET /api/homecore/federation` | federation coordinator ([ADR-105](ADR-105-federated-csi-training.md)) | **SEED-DEV/APPLIANCE** |
+| §4.9 Witness/Audit | `witnessLog(p,s)` | `GET /api/homecore/witness?page=…` | merge: `homecore` Ed25519 chain + per-SEED SHA-256 chains | **NEW-API + SEED-DEV** |
+| §4.9 Privacy mode | `privacyModes()` / `setPrivacy()` | `GET/POST /api/homecore/privacy` | SEED privacy control plane ([ADR-141](ADR-141-bfld-privacy-control-plane-modes-attestation.md)) + cog-ha-matter | **SEED-DEV** |
+| §4.9 Export bundle | `exportAttestation()` | `GET /api/homecore/witness/export` | gateway packages both chains | **NEW-GW** |
+| §4.10 Tokens (LLAT) | `tokens()` / `createToken()` / `revokeToken()` | `GET/POST/DELETE /api/homecore/tokens` | `homecore-api` `LongLivedTokenStore` | **NEW-API** |
+| §4.10 MQTT/Matter | `mqttConfig()` | `GET /api/homecore/integrations/mqtt` | cog-ha-matter config ([ADR-116](ADR-116-cog-ha-matter-seed.md)) | **NEW-GW/SEED-DEV** |
+| §4.10 ESP32 provisioning | `nodes()` / `assignRoom()` | `GET/PUT /api/homecore/nodes` | SEED ingest config ([ADR-069](ADR-069-cognitum-seed-csi-pipeline.md)) | **SEED-DEV** |
+| §4.10 SEED mgmt | `pairSeed()` / `rotateToken()` | `POST /api/homecore/seeds/{pair,:id/rotate-token}` | SEED pairing (USB `169.254.42.1`) | **SEED-DEV** |
+
+### 11.3 Calibration proxy + RoomState adapter
+
+The calibration service is real but on a different binary/port; the gateway reverse-proxies it under `/api/cal/*` (upstream base from `HOMECORE_CALIBRATION_URL`). Its `RoomState` (`wifi-densepose-calibration/src/runtime.rs`) does **not** match the UI's shape, so the gateway adapts it in `GET /api/homecore/rooms`:
+
+| Real field (`RoomState`) | UI field | Adapter rule |
+|---|---|---|
+| `breathing: Option<SpecialistReading>` | `breathing_bpm: {value,confidence}\|null` | rename; `value`=`reading.value`, `confidence`=`reading.confidence`; `None`→`null` (preserves "not trained") |
+| `heartbeat: Option<…>` | `heart_bpm: {…}\|null` | rename `heartbeat`→`heart_bpm` |
+| `presence/posture/restlessness` | same names `{value,confidence}\|null` | `posture.value`=`reading.label` (class), else numeric |
+| `anomaly: Option<…>` | `anomaly: {value,confidence,threshold}` | inject `threshold`=`MixtureOfSpecialists.veto_threshold` (0.5) |
+| `vetoed` / `stale` | `vetoed` / `stale` | pass through (drives the §4.5/§6 banners) |
+| *(absent)* | `room_id`, `seeds[]` | injected by the gateway from the **room registry** |
+
+A **room registry** (config or derived from `GET /api/cal/v1/calibration/baselines`) maps each `room_id` → bank name + serving SEED ids, so `GET /api/homecore/rooms` returns one adapted record per room. `Option::None` → JSON `null` keeps the null-vs-withheld distinction (§6 invariant 3) intact end-to-end.
+
+### 11.4 SEED registry & device-API proxy
+
+The gateway holds a **SEED registry** (`device_id` → base URL + bearer token + zone), populated by pairing (§4.10) and persisted server-side. `GET /api/homecore/seeds[/:id]` fans out to each SEED's on-device API and shapes the result to the §4.2 card/detail model. Expected SEED-side endpoints (the contract the SEED firmware must satisfy — a subset of its 98 endpoints): health; vector-store stats (`vector_count`, `dim`, `epoch`, `knn_latency_ms`, ingest rate); witness (`len`, `last_verify`, `valid`) + `POST verify`; onboard sensors (BME280/PIR/reed/ADS1115/vibration); reflex rules + thresholds; cognitive analysis (fragility, coherence phases); ingest feeders (ESP32 node ids + packet type `0xC5110003`/`0xC5110002` + rate). Offline/unreachable SEEDs surface as `online:false` (drives the §4.1 red tint) rather than failing the whole list.
+
+### 11.5 Appliance metrics collector (§4.1)
+
+`GET /api/homecore/appliance`, implemented in the gateway: CPU/RAM/uptime from `/proc`; Hailo load + temperature from the Hailo runtime/sysfs (or `ruvector-hailo-worker` stats); service health by probing `ruview-mcp-brain:9876`, `cognitum-rvf-agent:9004`, `ruvector-hailo-worker:50051`; event-bus rate from the `homecore` broadcast channel + its lag counter (already exposed for §4.1/§4.4).
+
+### 11.6 COG supervisor (§4.6)
+
+`GET /api/homecore/cogs`: read each `/var/lib/cognitum/apps/*/manifest.json` ([ADR-100](ADR-100-cog-packaging-specification.md)), the pid file, and verify `binary_sha256` + `binary_signature` (Ed25519) → status/shield. `POST …/cogs/:id/{start,stop,restart}` performs supervised process control; `GET …/cogs/:id/logs` tails `output.log`/`error.log`; `GET/PUT …/cogs/:id/config` reads/writes `config.json`. Hailo-arch COGs join the §11.5 Hailo stats. The Cog Store/App-Registry **browsing** panel was removed per product decision; this is operational management only.
+
+### 11.7 Witness aggregation + privacy (§4.9)
+
+`GET /api/homecore/witness` merges two chains chronologically: the `homecore` Ed25519 state-transition chain (exposed by a small `homecore-api` route over its witness log) and each paired SEED's SHA-256 ingest chain (proxied via the registry), paginated server-side. `GET/POST /api/homecore/privacy` reads/sets per-SEED privacy mode via the SEED privacy control plane ([ADR-141](ADR-141-bfld-privacy-control-plane-modes-attestation.md)) — the POST is the high-stakes confirmed toggle (§4.9). `GET /api/homecore/witness/export` packages both chains into the downloadable attestation bundle.
+
+### 11.8 Event history + automation CRUD (§4.8)
+
+`homecore-api` adds `GET /api/events?since=…` backed by `homecore-recorder` ([ADR-132](ADR-132-homecore-recorder-history-semantic-search.md)) for history (live updates continue over the existing WS). The automation builder persists through `GET/POST/DELETE /api/homecore/automations`, a thin HTTP wrapper over the `homecore-automation` engine's register/list/remove ([ADR-129](ADR-129-homecore-automation-engine.md)). RuView-specific triggers (RoomState thresholds, SEED reflex events) map onto the engine's trigger types.
+
+### 11.9 Entity provenance convention (§4.4/§6)
+
+The first-class provenance badge requires each entity to carry its lineage. Convention: every integration writes `attributes.source` (and, where known, `attributes.seed` / `attributes.cog`) when it sets state; `cog-ha-matter` ([ADR-116](ADR-116-cog-ha-matter-seed.md)) populates these from the ESP32 node → SEED → COG path and HA `via_device`. The gateway/UI resolves node→seed→cog from these attributes (no fabrication; missing lineage renders as "unknown", not invented).
+
+### 11.10 Auth, credentials, config
+
+- **Browser → gateway:** one long-lived access token (the §4.10 LLAT), sent as `Authorization: Bearer`; validated by `homecore-api`'s `LongLivedTokenStore`. The dev default (`allow_any_non_empty`) stays for local runs; production provisions `HOMECORE_TOKENS`.
+- **Gateway → upstreams:** SEED bearer tokens and the calibration token live **only** server-side (SEED registry + `HOMECORE_CALIBRATION_TOKEN`); never sent to the browser. This is the reason the gateway exists.
+- **Config:** `HOMECORE_CALIBRATION_URL`, SEED registry store path, per-proxy timeout (default 2 s), `HOMECORE_UI_DEMO` (dev fixture). No browser CORS needed (same origin); gateway→upstream is server-to-server.
+
+### 11.11 Front-end changes
+
+`api.js`: drop the mock fallback from the production path — methods call the §11.2 gateway routes; `this.base` stays same-origin; the mock layer is reachable only under `?demo=1`/`HOMECORE_UI_DEMO`. Every panel renders a **typed empty/error state** (not mock) when its route returns `503/504`. `mock.js` moves to a dev fixture (kept for the offline test harness, excluded from the production bundle). The §10 frontend tests are re-pointed at the gateway contract (and gain contract tests per §11.2 route).
+
+---
+
+## 12. Delivery plan to full functionality
+
+Staged so each wave is independently shippable behind the gateway, lands real data for a coherent set of panels, and has an explicit acceptance gate. "Class" reuses §11's tags.
+
+| Wave | Scope | Class | Acceptance gate |
+|---|---|---|---|
+| **W1 — Gateway foundation** | `/api/homecore/*` scaffold in `homecore-server`; auth passthrough; per-proxy timeout + typed errors; `api.js` base + remove prod mock (`?demo=1` only); panels get typed empty/error states | NEW-GW | Entities + live WS still green; with no upstreams, every other panel shows "upstream unavailable", **never** mock (unless `?demo=1`); Rust + JS suites pass |
+| **W2 — Rooms + Calibration** | `/api/cal/*` reverse-proxy; `GET /api/homecore/rooms` with the §11.3 RoomState adapter + room registry; wire §4.5 + the §4.7 wizard to real endpoints; delete the in-browser calibration stub | EXISTS (proxy+adapter) | Against a running `calibrate-serve` (replayed CSI), the wizard drives a real baseline→enroll→train→verify and §4.5 shows real `RoomState` with correct stale/veto/null mapping; contract test on the adapter |
+| **W3 — Events + Automations** | `GET /api/events` over `homecore-recorder`; `/api/homecore/automations` over `homecore-automation` | NEW-API | §4.8 history loads from recorder; an automation created in the UI persists and fires via the engine |
+| **W4 — COG management** | `/api/homecore/cogs*` supervisor over `/var/lib/cognitum/apps/` (manifest + pid + sig verify + logs + config) | NEW-GW | §4.6 lists real installed COGs; start/stop/restart works; sha256/signature shield reflects real verification; logs tail |
+| **W5 — SEED tier** | SEED registry + pairing; `/api/homecore/seeds*` device proxy; witness merge + privacy control; ESP32 provisioning | SEED-DEV | Against a real or emulated SEED API, §4.2/§4.3/§4.9/§4.10 show real vector-store/witness/sensor/reflex/cognition data; SEED tokens stay server-side; offline SEED → red tint, not a failed page |
+| **W6 — Appliance + federation + Hailo** | `/api/homecore/appliance` (host metrics + service probes); `/api/homecore/hailo`; `/api/homecore/federation` ([ADR-105](ADR-105-federated-csi-training.md)) | NEW-GW + APPLIANCE | §4.1 health is real; §4.6 Hailo HEF/throughput real; §4.3 federation round/coordinator/Krum real |
+
+**Definition of done (full functionality):** with W1–W6 merged and the upstream tiers running, loading `/homecore` with **no** `?demo=1` flag shows live data on all ten panels, `api.anyDemo()` is false, and no panel renders fabricated values. Panels whose tier is offline show typed empty/error states. The mock layer is reachable only as the `?demo=1` developer fixture.
+
+### 12.1 Wave status (this revision)
+
+| Wave | Status |
+|---|---|
+| **W1 — Gateway foundation** | ✅ DONE — `gateway.rs`, auth passthrough, typed `503/504`, merged into `build_app`; front-end mock removed from prod path + `?demo=1` fixture; typed error states. **Compiled + 12/12 Rust tests + JS suite green + run live.** |
+| **W2 — Rooms + Calibration** | ✅ DONE — `/api/cal/*` reverse-proxy + `GET /api/homecore/rooms` RoomState adapter; front-end calibration stub deleted (now proxies the real API). **Proven live against a calibration upstream** (proxy 200 + adapted shape); null-preservation unit-tested. |
+| **W3 — Events + Automations** | ⏳ gateway returns typed `503` (recorder/automation HTTP wrappers pending); front-end handles it gracefully (history note, builder still usable). |
+| **W4 — COG management** | ✅ supervisor DONE — lists `/var/lib/cognitum/apps/` manifests + pid liveness (returns `[]` live with no apps dir); start/stop/log/config control is the remaining follow-up. |
+| **W5 — SEED tier** | ⏳ gateway returns typed `503` (SEED registry + device proxy pending real/emulated SEED hardware). |
+| **W6 — Appliance + federation + Hailo** | ◑ appliance host metrics from `/proc` + port probes DONE (live `/proc` data verified); Hailo stats + federation remain `503` (need the accelerator stat source / coordinator). |
+
+**Status:** the gateway is **compiled and tested on Rust 1.89** (`cargo test -p homecore-server` = 12/12) and was **run live** (curl proof in §10). The one remaining caveat is intrinsic, not an environment limit: **W3/W5/W6-Hailo/federation depend on services/hardware that are not in this repo** (recorder/automation HTTP wrappers, real SEED nodes, the Hailo stat source), so they return honest typed `503`s and the UI shows error states — exactly as §2.2/§11.2 prescribe. W1/W2/W4/W6-appliance are functional now.
+
+### 12.2 Security review (PR #1082)
+
+A high-effort public-PR review of the merged gateway + front-end surfaced the following, all fixed and pinned by tests (`cargo test -p homecore-server` is now **18/18**):
+
+| # | Severity | Finding | Fix |
+|---|---|---|---|
+| 1 | **HIGH** | **Path-traversal / confused-deputy SSRF** in the `/api/cal/*` reverse-proxy. The wildcard path was interpolated into the upstream URL while `proxy()` attaches the privileged server-side calibration bearer, so `/api/cal/v1/../../x` (or `..%2f`, `%2e%2e`, leading `/`, `\`, double-encoded `%252e`) could escape the `…/api/` scope **with the token**. | `validate_proxy_path()` decode-then-checks and rejects absolute / backslash / dot-segment / encoded-traversal paths with a typed **400 before the URL is built** (GET **and** POST); legit `v1/...` paths still pass. |
+| 2 | Correctness | **CORS + tracing didn't cover gateway routes** — `/api/homecore/*` + `/api/cal/*` were `.merge()`d outside `homecore-api::router()`'s layers. | The audited HC-05 `build_cors_layer()` + `TraceLayer` are now applied to the whole merged app in `main.rs`. |
+| 3 | Honesty (§6) | **Fabricated data** — hardcoded `anomaly.threshold: 0.5` in the adapter; dashboard rendered `"null%"`/`"null°C"`; COG Hailo pill hardcoded `"connected"`; `rooms.js` defaulted a null threshold to `0.8`. | Threshold passes through the real upstream value or emits `null` (withheld); dashboard renders `—`; the Hailo pill reflects the real appliance probe; the UI treats a null threshold as withheld. |
+| 4 | Robustness | A string `hef` (forwarded verbatim) threw on `.forEach`/`.join`; `frames/target` could be `NaN%`/`Infinity%`; calibration Restart leaked the baseline `setTimeout` poll. | `asArray()` coercion; `target > 0` guard; cancellable poll cleared on Restart / panel teardown. |
+| 5 | Perf | Sequential per-bank RoomState fetches; blocking `std::net::TcpStream::connect_timeout` probes on an async handler; `mock.js` statically bundled. | Concurrent `futures::join_all`; async `tokio::net::TcpStream` + `timeout`; demo-only dynamic `import()` of `mock.js`. |
+
+**Known limitations carried forward (not regressions):**
+- **`reqwest` rustls-only is a workspace-wide concern.** `homecore-server` opts into `rustls-tls` only, but cargo feature-unification means any sibling crate enabling the default `native-tls` re-introduces OpenSSL into the final binary. A true "no OpenSSL on the appliance" guarantee requires aligning **every** reqwest-pulling crate on rustls-only — out of scope for this PR; documented at the dependency in `Cargo.toml`.
+- **DEV-mode auth.** When `HOMECORE_TOKENS` is unset, the token store falls back to `allow_any_non_empty()` (any non-empty bearer accepted) on `0.0.0.0`. This is pre-existing and intentionally **unchanged** here; the loud boot `warn!` is retained. Provision real tokens (`HOMECORE_TOKENS=…`) before exposing the server to a network.
@@ -0,0 +1,166 @@
+# ADR-132: HOMECORE-RECORDER — State History + Semantic Search
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted |
+| **Date** | 2026-05-25 |
+| **Deciders** | ruv |
+| **Codename** | **HOMECORE-RECORDER** |
+| **Crate** | `v2/crates/homecore-recorder` |
+| **Relates to** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (HOMECORE master — series map row ADR-132), [ADR-127](ADR-127-homecore-state-machine-rust.md) (HOMECORE-CORE state machine), [ADR-124](ADR-124-rvagent-mcp-ruvector-npm-integration.md) (ruvector/SENSE-BRIDGE), [ADR-130](ADR-130-homecore-rest-websocket-api.md) (HOMECORE-API query surface, downstream) |
+| **Tracking issue** | [#800](https://github.com/ruvnet/RuView/pull/800) (HOMECORE intake) |
+
+> **Documented retroactively (2026-06-12).** The `homecore-recorder` crate shipped under
+> the ADR-126 series map (which planned an "ADR-132 HOMECORE-RECORDER") but the standalone
+> ADR file was never written; the crate's `Cargo.toml`, `README.md`, `lib.rs`, `schema.rs`,
+> and `semantic.rs` all cite "ADR-132". This ADR reverse-documents the decision that the
+> shipped, tested code already embodies (ADR-164 Gap G3 / Coverage-Gaps Lens §A). It does
+> **not** introduce new design; it records what is built. Date reflects the crate's intake
+> era (first commit `e96ebaea8`, 2026-05-25); real-impl pass landed in `7c8071145`
+> (2026-06-11).
+
+---
+
+## 1. Context
+
+ADR-126 (the HOMECORE master) decided to reimplement Home Assistant (HA) natively in Rust.
+HA persists every state change to a SQLite *recorder* database; downstream features
+(history graphs, the logbook, long-term statistics, automation conditions that reference
+past state) all read that store. HOMECORE therefore needs a durable state-history backbone.
+
+Two forces shape the decision:
+
+1. **Migration / coexistence.** Users adopting HOMECORE will have an existing HA
+   `recorder` database. Reusing HA's on-disk schema (rather than inventing a new one) lets
+   HOMECORE read an existing HA `home-assistant_v2.db` directly and lets HA-aware tooling
+   read HOMECORE's store. This is the same trust boundary that `homecore-migrate`
+   (ADR-165) handles for `.storage/*.json`.
+2. **Semantic queries.** HA history is queried with SQL `BETWEEN`/`WHERE` clauses. The
+   HOMECORE platform already carries ruvector (ADR-124) for vector search, so the recorder
+   can additionally embed state changes and answer natural-language queries
+   ("which kitchen devices were warm at 3 PM?") via k-NN — a capability HA does not have.
+
+The recorder is the **durable-state surface**: if it is wrong, history, logbook, and
+historical-condition automations are all wrong. ADR-164 flagged it as a CRITICAL coverage
+gap precisely because such a load-bearing crate had no governing ADR.
+
+## 2. Decision
+
+Ship `homecore-recorder` as a SQLite state-history recorder with an HA-compatible schema
+and an optional ruvector-backed semantic index, in three phases. P1 and P2 are built and
+tested; P3 is planned.
+
+### 2.1 Storage — SQLite with the HA recorder schema (P1, shipped)
+
+- Persist via `sqlx` with the SQLite backend only (no Postgres, no TLS feature set).
+- Mirror HA recorder **schema v48** so the store is bidirectionally readable
+  (`src/schema.rs`):
+  - `state_attributes` — shared attribute JSON blobs, deduped by an FNV-1a 64-bit hash
+    stored as a signed `i64` (matches HA's dedup key);
+  - `states` — one row per state write (`entity_id`, `state`, `attributes_id` FK,
+    `last_changed_ts`/`last_updated_ts` as REAL Unix seconds, `context_id` UUID);
+  - `events` — domain events (`event_type`, `event_data` JSON, `time_fired_ts`);
+  - `recorder_runs` — boot/shutdown bookends for history-gap detection.
+- All DDL uses `CREATE TABLE IF NOT EXISTS`, so schema application is idempotent and safe
+  on every startup.
+- Default persistence path `.homecore/home.db` (configurable).
+
+### 2.2 Capture — listener on the HOMECORE event bus (P1, shipped)
+
+- `RecorderListener` subscribes to the HOMECORE event bus (ADR-127) and captures
+  `StateChanged` events, writing snapshots through `Recorder` (`src/listener.rs`,
+  `src/db.rs`).
+- A `DedupEngine` (`src/dedup.rs`) skips redundant writes when the state hash is unchanged,
+  matching HA's stateful-listener behaviour.
+
+### 2.3 Semantic search — ruvector HNSW (P2, shipped, feature-gated)
+
+- Behind the `ruvector` Cargo feature, the `Recorder` additionally calls a `SemanticIndex`
+  implementation (`src/semantic.rs`) that embeds state attributes and stores vectors in a
+  `ruvector-core` HNSW index for k-NN search.
+- P2 embeddings are **hash-based** (sha2) — a deliberate, honest placeholder. They give a
+  working HNSW surface without claiming sentence-level semantic quality.
+- When the feature is off, `NullSemanticIndex` satisfies the `SemanticIndex` trait bound
+  with no allocation, so the structural recorder ships independently of ruvector.
+
+### 2.4 Real sentence embeddings (P3, planned — not yet built)
+
+- Replace the hash embeddings with ruvector-attention sentence embeddings (dim → 384). Not
+  implemented; tracked as a follow-up. The README and `Cargo.toml` label this P3 explicitly.
+
+### 2.5 Test evidence (as shipped)
+
+- P1: 14 tests (`cargo test -p homecore-recorder --no-default-features`).
+- P2: 20 tests (`cargo test -p homecore-recorder --features ruvector`).
+
+## 3. Consequences
+
+**Positive.**
+
+- HA-schema compatibility makes migration (ADR-165) and coexistence cheap: HOMECORE can
+  read an existing HA `recorder.db`, and any SQLite tool can read HOMECORE's history.
+- The semantic index is **additive** and feature-gated: the durable structural recorder has
+  no hard dependency on ruvector, so the storage backbone ships first.
+- Standard SQLite means no proprietary export format; history is directly queryable.
+
+**Negative / honest limits.**
+
+- P2 semantic search uses **hash embeddings**, not real sentence embeddings — query quality
+  is limited until P3. This is disclosed in the crate docs and here; it must not be cited as
+  semantic-quality-validated.
+- No per-crate benchmarks exist yet; the latency figures in the README
+  (state-write p50 < 2 ms, semantic search < 10 ms on 1 M records) are design targets /
+  estimates, **needs verification** with a criterion baseline.
+- Pinning to HA schema v48 couples HOMECORE to a specific HA recorder schema generation;
+  future HA schema bumps require an explicit migration step.
+
+**Neutral.**
+
+- This ADR governs the recorder crate only. The query/REST surface over recorder data is
+  HOMECORE-API (ADR-130, P3); automation conditions on historical state are
+  HOMECORE-automation (ADR-129, P3).
+
+## 3a. Security review (2026-06, post-ADR-154–159 sweep)
+
+A beyond-SOTA security review of `homecore-recorder` covered SQL injection, retention/purge
+correctness, fail-closed write integrity, semantic-store NaN poisoning, and PII exposure.
+
+**Confirmed clean (with evidence):**
+
+- **SQL injection — clean.** Every query in `db.rs` uses bound `?` parameters; no user- or
+  entity-influenceable value is interpolated into SQL via `format!`/concatenation. The only
+  `format!` builds the `LIKE` *pattern* string, which is itself **bound** as a parameter with
+  `ESCAPE '\\'` and `% _ \` escaping — so a metacharacter payload is matched literally. Pinned
+  by `malicious_entity_id_is_stored_literally_not_executed` (a `'; DROP TABLE states; --` state
+  value leaves the table intact and round-trips verbatim) and
+  `like_metacharacters_in_query_are_literal_not_wildcards`.
+- **NaN-index poisoning — structurally impossible.** Embeddings are SHA-256 → `i32` →
+  `f32`; an `i32`→`f32` cast is always finite (never NaN/Inf), and an all-zero-digest is
+  guarded by the `norm > 1e-10` check. Empty-index search, empty-string query, and `k=0` were
+  probed and all return `Ok(0)` with no panic. (Unlike the calibration/vitals/geo paths, no raw
+  sensor float ever reaches the index.)
+- **Fail-closed writes.** A removal event returns `Ok(None)`; semantic-index failure is logged,
+  not propagated, so it never blocks the durable SQLite write; `EntityId` parse failure falls
+  back to a sentinel rather than panicking.
+
+**Fixed (real bounding bugs):**
+
+- **Memory-DoS — `get_state_history` was unbounded.** No `LIMIT`, so a wide time window over a
+  high-frequency entity loaded an unbounded row set into memory. Now capped at
+  `MAX_HISTORY_ROWS` (1,000,000); sibling search paths were already `k`-bounded.
+- **Disk-DoS / documented-but-missing `purge`.** The README advertised `Recorder::purge`, but
+  no retention path existed → unbounded disk growth. Added a **transactional** `purge(older_than)`
+  with an **exclusive** cutoff (idempotent, no off-by-one) that deletes old `states`/`events` and
+  GCs orphaned `state_attributes` blobs (dedup-shared blobs kept until their last referrer is gone).
+
+`homecore-recorder` tests: 19 → 25 (`--no-default-features`) / 25 → 31 (`--features ruvector`),
+0 failed. Python deterministic proof unchanged (recorder is off the signal proof path).
+
+## 4. Links
+
+- Crate: `v2/crates/homecore-recorder/` — `Cargo.toml`, `README.md`, `src/lib.rs`,
+  `src/db.rs`, `src/schema.rs`, `src/dedup.rs`, `src/listener.rs`, `src/semantic.rs`.
+- [ADR-126](ADR-126-ruview-native-ha-port-master.md) — HOMECORE master (series map: ADR-132 = HOMECORE-RECORDER).
+- [ADR-165](ADR-165-homecore-migrate-from-home-assistant.md) — HOMECORE-MIGRATE (reads HA `.storage`; P2 exports a side-by-side recorder DB).
+- [ADR-164](ADR-164-adr-corpus-gap-analysis.md) — gap analysis that surfaced this missing ADR (Gap G3).
+- [Home Assistant Recorder integration](https://www.home-assistant.io/integrations/recorder/).
@@ -174,3 +174,71 @@ vs. an in-memory array at compile time), which intersects with ADR-084 (RabitQ)
 | **P1** (this ADR) | `intent`, `recognizer` (regex), `handler` (5 built-ins), `runner` (trait + noop), `pipeline` (end-to-end wiring), 10–15 tests |
 | **P2** | Real `tokio::process::Child` runner with Windows-safe teardown; `SemanticIntentRecognizer` with ruvector HNSW |
 | **P3** | STT/TTS bridge, satellite protocol, cloud fallback |
+
+---
+
+## 6. Security review (beyond-SOTA, untrusted-input → action path)
+
+A focused security review of the Assist pipeline — `utterance → recognizer →
+intent → handler → action`, plus `RufloRunner` — treating the utterance as
+untrusted input (voice transcripts, the WebSocket `assist` command). This
+surface was not covered by the ADR-154–159 sweep.
+
+### 6.1 Finding fixed — HC-ASSIST-01 (unbounded-utterance DoS, LOW)
+
+Both `RegexIntentRecognizer::recognize` and the semantic `recognize_scored`
+accepted utterances of **unbounded length** and ran `to_lowercase()` (a full
+clone) + a per-registered-pattern scan (and, in the semantic path, full
+tokenisation + feature-hash embedding) before any bound — an allocation/CPU
+amplification on attacker-controlled input. The `regex` crate is **linear-time**
+(RE2-style finite automaton, no catastrophic backtracking), so this was a
+throughput/memory DoS, not a hang.
+
+**Fix:** `MAX_UTTERANCE_BYTES = 4096` (far above any real spoken command),
+checked at **both** recognizer boundaries *before* any allocation/scan. An
+over-length utterance **fails closed** to `Ok(None)` — no intent, no action,
+identical to an unrecognised phrase — so it can never be coerced into firing a
+handler. Pinned by `over_length_utterance_fails_closed` (an over-length
+utterance that *contains* a valid command resolves to `None`, which would have
+matched on the old code) and `over_length_utterance_fails_closed_semantic`.
+
+### 6.2 Dimensions confirmed clean (with evidence)
+
+- **Command / argument injection — NO SUBPROCESS SURFACE.** The `RufloRunner`
+  has exactly two impls: `NoopRunner` (no process) and `LocalRunner` (runs the
+  local recognizer, no process). There is **no** `std::process` / `tokio::process`
+  / `Command` / process `.spawn()` anywhere in the crate — the trait `spawn` is
+  only a `started: bool` lifecycle flag — and `RufloRunnerOpts.{script_path,env}`
+  are **inert data, never consumed**. The live `node ruflo-agent.js` runner is
+  genuinely data-gated/future (P2). Defence-in-depth: the `entity_id` capture
+  class `[a-z_][a-z0-9_ .]*` **excludes every shell/SQL metacharacter**, so even
+  when an injection-shaped utterance resolves (the regex is not exact-anchored),
+  the captured slot is a clean token — sanitisation by construction. Pins:
+  `shell_metachars_never_survive_into_a_resolved_slot`,
+  `runner_opts_are_inert_no_process_spawned`,
+  `pipeline_injection_shaped_utterance_carries_no_metachars_to_service`.
+- **ReDoS — STRUCTURALLY IMPOSSIBLE.** `regex 1.12.3` (no `fancy-regex` in the
+  dependency tree) is linear-time; a classic `(a+)+$` shape on adversarial input
+  completes in bounded time. Pin:
+  `pathological_backtracking_pattern_completes_in_bounded_time`. Patterns are
+  operator-registered, not user-supplied, in any case.
+- **NaN-poisoning — EMBEDDINGS STRUCTURALLY FINITE.** The embedding path takes
+  only `&str` and produces values via FNV feature-hashing + a guarded L2
+  normalise (`norm > 1e-12`); no external float input, no unguarded division, so
+  a crafted utterance cannot inject NaN/Inf to poison the cosine k-NN. Cosine
+  against the zero vector is a finite `0.0`; an empty index `max_by` returns
+  `None` (no panic); the NaN-safe `partial_cmp().unwrap_or(Equal)` is already in
+  place. Pins: `embeddings_are_structurally_finite`,
+  `cosine_with_zero_vector_is_finite_not_nan`,
+  `empty_utterance_against_empty_index_no_panic_no_match`.
+- **Intent confusion / fail-closed.** An unrecognised utterance → `not_understood()`
+  (no service call); a recognised intent with no registered handler →
+  `not_understood()`; semantic below-threshold / empty-index → regex fallback.
+  No default high-privilege intent, no fail-open path.
+- **Panic-on-input.** No `unwrap`/`expect`/index reachable from a crafted
+  utterance; the one `exemplars[id]` index uses an `id` from `enumerate()` over
+  the append-only exemplar `Vec` (no remove API), so it is always in bounds.
+
+`cargo test -p homecore-assist --no-default-features`: **29→36, 0 failed** (+7);
+default/`semantic`: **39→48, 0 failed** (+9). Python deterministic proof
+unchanged (homecore-assist is off the signal proof path).
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see §8 Implementation Status, commit `11f89727f`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-core` (`types.rs`: `CsiFrame`/`CsiMetadata`); `wifi-densepose-signal/src/ruvsense/mod.rs` (`RuvSensePipeline`, six-stage flow); `v2/Cargo.toml` (workspace topology) |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `4fa3847ac`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-signal` (`ruvsense/multistatic.rs` — `fuse`, `attention_weighted_fusion`); `wifi-densepose-ruvector` (`viewpoint/fusion.rs` — `MultistaticArray`); `wifi-densepose-bfld` (`event.rs`) |
@@ -495,3 +495,34 @@ Rejected. `ViewpointFusionEvent` (viewpoint/fusion.rs lines 183–219) is an int
 **Integration glue -- not yet on the live path:** emission of `CalibrationIdMismatch` / `DriftProfileConflict` / `PhaseAlignmentFailed` once `calibration_id` propagation and the phase-align convergence signal are threaded onto frames; the BFLD witness record emitted on privacy demotion.

 **Trust contribution:** sensor *agreement made explicit* -- fusion records the evidence it relied on, and any disagreement automatically tightens the downstream privacy class.
+
+---
+
+## Witness Integrity Review (2026-06-14) — domain-separation fix
+
+A beyond-SOTA security review of `wifi-densepose-engine` (the composition root
+that builds the §2.7 trust witness in `witness_of`) found a real **witness
+domain-separation gap**, now fixed.
+
+**Finding (witness-gap, HIGH).** `witness_of` concatenated `model_version`,
+`calibration_version`, and `privacy_decision` boundary-to-boundary, and the
+variable-length `evidence` list carried no explicit count. A string straddling a
+field boundary therefore collided with a *different* trust decision —
+e.g. a per-room adapter id (ADR-150 §3.4, operator-influenceable) that absorbs
+the leading bytes of the calibration epoch (`model="…cal:00a"`, `cal="b"`)
+produces the **same** witness as `model="…"`, `cal="cal:00ab"`. Two distinct
+privacy-relevant input tuples → one witness defeats the "any privacy-relevant
+delta → different witness" guarantee this ADR's §2.7 witness exists to provide.
+
+**Fix.** The witness now (a) prepends a domain tag `ruview.engine.witness.v1`,
+(b) writes an explicit 8-byte evidence count, and (c) **length-prefixes every
+field** (8-byte LE length ‖ bytes), so field framing is unambiguous regardless
+of contents. This is a witness-layout change (all prior witness bytes are
+invalidated by design); downstream consumers only assert witness *relationships*
+(`assert_ne`/`assert_eq` across runs), not absolute bytes, so nothing breaks.
+
+Pinned by `witness_distinguishes_model_calibration_boundary` and
+`witness_distinguishes_evidence_model_boundary` (both fail on the old
+concatenation). Witness **determinism** was reviewed and confirmed clean: no
+HashMap iteration and no float formatting feed the hash (floats appear only in
+the `SemanticState` statement, which is outside the witness).
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `fc7674bde`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-signal` (`ruvsense/multiband.rs`, `ruvsense/multistatic.rs`); `wifi-densepose-ruvector` (`viewpoint/geometry.rs`, `viewpoint/coherence.rs`, `viewpoint/attention.rs`, `viewpoint/fusion.rs`) |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `521a012d8`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | New module/crate `wifi-densepose-worldgraph` alongside `v2/crates/wifi-densepose-geo` and `v2/crates/homecore`; petgraph bridge pattern from `v2/crates/ruv-neural/ruv-neural-graph/src/petgraph_bridge.rs`; integrates `homecore/src/registry.rs` `area_id` and `wifi-densepose-mat/src/domain/scan_zone.rs` |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `169a355bd`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-sensing-server/src/semantic/` (`bus.rs`, `common.rs`); `homecore/src/state.rs` + `event.rs`; `homecore-assist` |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `7d88eb84c`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-bfld` (new module `mode.rs` + `attestation.rs`; extends `lib.rs` `PrivacyClass`, `sink.rs`, `privacy_gate.rs`, `identity_risk.rs`, `emitter.rs`, `ha_discovery.rs`) |
@@ -599,3 +599,53 @@ Per ADR-028/ADR-010, three rows are added to the witness log:
 **Integration glue -- not yet on the live path:** wiring the registry into `PrivacyGate` class transitions, the MQTT discovery payload, and a read-only Home Assistant diagnostic entity exposing the active mode + proof hash.

 **Trust contribution:** the *policy spine* -- privacy posture is a tamper-evident, auditable chain rather than a checkbox; an operator's mode choice actively governs whether identity data may even exist.
+
+---
+
+## Privacy Monotonicity Review (2026-06-14) — confirmed clean
+
+A beyond-SOTA security review of the governed-trust cycle
+(`wifi-densepose-engine::StreamingEngine::process_cycle_calibrated`) examined
+the privacy-demotion path this ADR governs. **The monotonicity invariant holds:
+demotion only ever makes the emitted class more restrictive, never less.**
+
+Verification (no behaviour change, the result is a clean bill with evidence):
+
+- Each cycle computes `effective_class` fresh from the active mode's
+  `target_class()` (the floor) and applies at most a **single-step** demotion
+  (`demote_one`, clamped at `Restricted`). There is no cross-cycle state that
+  could let a permissive class overwrite a restrictive one.
+- A forced contradiction (calibration mismatch / array-geometry insufficiency /
+  mesh partition risk, ADR-032) raises the class byte; a clean cycle emits
+  exactly the base class.
+- Pinned by `forced_contradiction_never_relaxes_class`, a property test over
+  **all five** `PrivacyMode`s asserting `effective_class.as_u8() >=
+  base_class.as_u8()` (strictly greater unless already clamped at `Restricted`)
+  under a forced contradiction, and `== base` on a clean cycle.
+
+Fail-closed boundaries were also pinned: an empty cycle errors (no degenerate
+over-permissive output, `empty_cycle_fails_closed`) and the single-node boundary
+is characterized as a valid non-demoting mode (`single_node_cycle_is_well_formed`).
+
+The related witness domain-separation fix from the same review is recorded in
+ADR-137 (the witness folds `effective_class`, so the demotion is auditable).
+## Security & Privacy Review (2026-06-14)
+
+Beyond-SOTA privacy+security review of `wifi-densepose-bfld` (the crate was not in the ADR-154–159 sweep). Two real bugs fixed (each pinned by a fails-on-old test), several dimensions confirmed clean.
+
+### Findings
+
+| # | Severity | Site | Issue | Fix | Pinned by |
+|---|----------|------|-------|-----|-----------|
+| 1 | **privacy-bypass (HIGH)** | `pipeline.rs::process_to_frame` | The documented wire-bytes production path stamped the frame header with the active `PrivacyClass` but serialized the caller's `BfldPayload` **unchanged** via `BfldFrame::from_payload` — never routing through `PrivacyGate::demote`. A frame labeled `Anonymous`(2)/`Restricted`(3) carried the full `compressed_angle_matrix` (identity surface) + amplitude/phase + `csi_delta`. A `NetworkSink` accepts class ≥ `Derived`(1), so the identity surface could cross the node boundary despite the restrictive class byte — the byte lied about content. | Apply `PrivacyGate::demote(frame, active_class)` after construction: a same-class transition that strips the sections the class forbids; `Raw`/`Derived` keep the full payload. | `tests/pipeline_to_frame.rs::process_to_frame_at_anonymous_strips_identity_leaky_sections`, `…_in_privacy_mode_strips_amplitude_and_phase` (both FAILED pre-fix); `…_at_derived_preserves_full_payload` (over-strip guard) |
+| 2 | **PII/injection (MEDIUM)** | `mqtt_topics.rs::render_events` | `zone_activity` payload built as `format!("\"{zone}\"")` with no JSON escaping (while `ha_discovery.rs` already escapes). A zone name with `"`/`\` produced malformed/injectable JSON on the HA state topic. | `json_string_literal()` escaper mirroring `ha_discovery::push_str_field`. Value-identical for normal zone names. | `tests/mqtt_topic_routing.rs::zone_payload_escapes_json_metacharacters` (FAILED pre-fix) |
+
+### Dimensions confirmed clean (with evidence)
+
+- **Event-field privacy gating** — `BfldEvent::apply_privacy_gating` nulls `identity_risk_score` + `rf_signature_hash` at `Restricted`, and `serde(skip_serializing_if = "Option::is_none")` omits them entirely. `render_events`/`render_discovery_payloads` refuse class < `Anonymous` (stricter than the `sink.rs` `NetworkKind` `MIN_CLASS = Derived` — defense in depth toward less leakage). Covered by `event_privacy_gating.rs`, `mqtt_topic_routing.rs`, `ha_discovery.rs`.
+- **Witness/hash framing (the engine `witness_of` bug class)** — CLEAN. `SignatureHasher::compute` prefixes a **fixed 4-byte** `day_epoch` then a **fixed-width canonical-f32** feature block (`IdentityFeatures`: Embedding = `EMBEDDING_DIM*4`, RiskFactors = 16 B). `PrivacyAttestationProof::compute` hashes a fixed 32-byte `prev_hash` + three fixed 1-byte values. No variable-length operator-influenceable string is concatenated into any digest — no length-prefix-framing collision is possible.
+- **Fail-closed** — `payload.rs::from_bytes` rejects truncated/overflowing/trailing-byte sections (`checked_add`, bounds checks); `frame.rs::from_bytes` validates magic/version/length/CRC; `PrivacyClass::try_from` rejects unknown bytes; `identity_risk::score` maps NaN/degenerate factors → 0.0 (privacy-conservative). The `from_score(NaN) → Accept` choice is a documented, deliberate publish-aggregate-only fallback (NaN never reaches it from `score()`); risk-driven NaN cannot leak identity because identity gating is class-byte-driven, not risk-driven.
+
+### Observation (not a bug)
+
+The ADR-141 control plane (`PrivacyMode`/`PrivacyModeRegistry`) is **not yet wired into the emit path** — the emitter/pipeline enforce the raw `PrivacyClass` directly; the registry is exported + unit-tested but advisory. This matches the "Integration glue — not yet on the live path" status above. The class-byte enforcement (emitter + event + renderers + the now-fixed `process_to_frame`) is the live guarantee. Wiring the registry is the documented next step.
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `1f8e180d6`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-signal` (`ruvsense/longitudinal.rs`, `ruvsense/attractor_drift.rs`, `ruvsense/calibration.rs`, `ruvsense/field_model.rs`, `ruvsense/tomography.rs`); `wifi-densepose-bfld` (`privacy_gate.rs`) |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block, v1 fixed-map default; v2 dataset-gated — see Implementation Status, commit `2d4f3dea5`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-signal` (`ruvsense/field_model.rs`, new `ruvsense/rf_slam.rs`); `wifi-densepose-mat` (`tracking/kalman.rs`, `localization/triangulation.rs`); `wifi-densepose-geo`; `wifi-densepose-ruvector` (`mat/triangulation.rs`) |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; no UWB radio in fleet — see Implementation Status, commit `b10bc2e9a`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-hardware` (new UWB driver/parser/auto-detect in `src/`); `wifi-densepose-signal` (`ruvsense/pose_tracker.rs` constraint-aware Kalman update); `wifi-densepose-mat` (`localization/fusion.rs` constraint integration) |
@@ -2,7 +2,7 @@

 | Field | Value |
 |-------|-------|
-| **Status** | Proposed |
+| **Status** | Accepted — partial (built + tested building block; integration glue pending — see Implementation Status, commit `0f336b7d3`) |
 | **Date** | 2026-05-28 |
 | **Deciders** | ruv |
 | **Codebase target** | `wifi-densepose-train` (`src/eval.rs`, `src/metrics.rs`, `src/ruview_metrics.rs`, `src/proof.rs`); `wifi-densepose-signal` (`src/bin/*_proof_runner.rs`); `wifi-densepose-cli` |
@@ -9,8 +9,10 @@
 | Relates to | ADR-134, ADR-136, ADR-139, ADR-140, ADR-143, ADR-144, ADR-146, ADR-147                |

 > **Scope note:** ADR-147 deferred Cosmos WFM to "ADR-148" as an offline data generator.
-> That item is promoted to ADR-149. This ADR takes 148 to address the broader drone swarm
-> control architecture, which is the first consumer of ADR-147's OccWorld occupancy output.
+> That item is promoted to ADR-171 (the swarm-benchmarking/evaluation companion to this ADR;
+> renumbered from ADR-149 to resolve the ADR-149 duplicate-number collision). This ADR takes
+> 148 to address the broader drone swarm control architecture, which is the first consumer of
+> ADR-147's OccWorld occupancy output.

 ---

@@ -874,9 +876,9 @@ validated; ITAR/EAR classification completed by export counsel.
 | GPS spoofing of full swarm simultaneously | Medium | Low | UWB mesh cross-check among all nodes; ≥ 3 nodes must agree on position to confirm |
 | 1000-UAV scale claims (not validated) | Low | High | SWARM+ demonstrated in simulation only; scale claims capped at 50 for production targets |

-### 12.3 Open Issues (Forward to ADR-149)
+### 12.3 Open Issues (Forward to ADR-171)

- Cosmos WFM offline training data generation (deferred from ADR-147) — ADR-149
+- Cosmos WFM offline training data generation (deferred from ADR-147) — ADR-171
 - Fixed-wing hybrid platform support (endurance missions) — future ADR
 - Underwater-aerial cross-domain handoff protocol — future ADR
 - Quantum-enhanced task assignment (E6) — future ADR when hardware matures
@@ -998,4 +1000,4 @@ Implementation tracked at: https://github.com/ruvnet/RuView/issues/861

 *ADR authored with research support from `ruflo-goals:deep-researcher` (2026-05-30).
 Implementation progress tracked by `ruflo-goals:horizon-tracker`.
- OccWorld integration basis: ADR-147. Next: ADR-149 (Cosmos WFM offline data generation).*
+ OccWorld integration basis: ADR-147. Next: ADR-171 (Cosmos WFM offline data generation; renumbered from ADR-149).*
@@ -0,0 +1,308 @@
+# ADR-151: RuView Per-Room Calibration & Specialized Model Training System
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted — Stages 1–5 implemented (statistical specialists); HF-backbone distillation pending |
+| **Date** | 2026-06-09 |
+| **Deciders** | ruv |
+| **Codebase target** | New `wifi-densepose-calibration` crate (orchestration); `wifi-densepose-train` (`rapid_adapt.rs`, `signal_features.rs`, `trainer.rs`); `wifi-densepose-ruvector` (RVF specialist storage); `wifi-densepose-signal/ruvsense/*` (feature extractors); `wifi-densepose-cli` (`enroll`, `train-room`, `room-status` subcommands) |
+| **Relates to** | ADR-135 (Empty-Room Baseline Calibration), ADR-030 (Persistent Field Model), ADR-134 (CIR), ADR-024 (Contrastive CSI Embedding / AETHER), ADR-027 (Cross-Environment Domain Generalization / MERIDIAN), ADR-070 (Self-Supervised Pretraining), ADR-105 (Federated CSI Training), ADR-149 (AetherArena / Hugging Face), ADR-150 (RF Foundation Encoder) |
+
+---
+
+## 1. Context
+
+### 1.1 The thesis — teach the room before you teach the model
+
+RuView's deployment frontier is not a better generic model. ADR-150 documents the wall directly: an MM-Fi pose head scores **81.63% torso-PCK@20 in-domain but ~11.6% leakage-free cross-subject**, and bigger capacity *hurts* cross-subject (transformer 24.8% < conv 27.3%). A single oversized model that "understands the world" overfits the rooms and bodies it has seen. The lever is the opposite of scale: **a small model that understands *one* room and *one* person**, calibrated in minutes, run locally, and specialised per biological signal.
+
+This positions RuView between the two incumbents in ambient sensing:
+
+- **Wearables** — high fidelity, but people forget to wear them, and they only measure the wearer.
+- **Cameras** — powerful, but invasive, store identifiable video, and fail in the dark / under covers.
+
+RuView sits in the middle: it learns the *space*, learns the *person*, and tracks biological rhythm (breathing, heartbeat, restlessness, posture, presence) without seeing skin or storing video. Heartbeat and breathing are not visual problems — they are tiny, repeating disturbances in the RF field. Capturing them well is a *calibration* problem, not a *model-size* problem.
+
+### 1.2 What already exists (and what is missing)
+
+The pieces of a calibration→training pipeline exist as disconnected modules. There is no system that runs them end to end and emits a per-room model bank.
+
+| Capability | Status today | Gap |
+|------------|--------------|-----|
+| Empty-room baseline (environmental fingerprint) | ADR-135 `BaselineCalibration` (Proposed): per-subcarrier amplitude + circular-phase stats, `ruvcal` NVS namespace | Captures the *room*, but there is no step that captures *guided human anchors* on top of it |
+| Field eigenstructure | ADR-030 `field_model.rs` (SVD room eigenmodes) | Consumes calibration; not wired to a training trigger |
+| Shared invariant backbone | ADR-150 RF Foundation Encoder (pose-preserving, subject/room/device-invariant) | Defined as a *foundation* embedding; nothing distills it into per-room specialists |
+| Few-shot adaptation | `train/src/rapid_adapt.rs` — test-time training → LoRA weight deltas (MERIDIAN P5) | Produces a *single* pose-adaptation delta, not a bank of per-modality specialists |
+| Feature extractors | `ruvsense/{bvp,longitudinal,intention,gesture,pose_tracker,adversarial}.rs`, `train/src/signal_features.rs` | Each emits a signal; none is packaged as a labelled training source for enrollment |
+| Small-model storage | `wifi-densepose-ruvector` (RVF cognitive containers, HNSW, sketch) | No schema for "a bank of specialist models scoped to a room_id" |
+| HF publishing | ADR-149 AetherArena (Hugging Face Space + signed scorer), `sensing-server` `from_pretrained` path | Publishes/評価s a *global* model; no notion of a published *base* + private *local* heads |
+
+**The missing system is the connective tissue**: a guided enrollment protocol, a feature-extraction-to-label bridge, a specialist-bank trainer that reuses the frozen HF backbone, and a runtime that fuses the specialists with confidence gating. This ADR defines that system.
+
+### 1.3 The four-step user model (and where each step lands)
+
+The system is deliberately presented to operators as four plain steps. Each maps to existing or new code:
+
+1. **Capture a quiet baseline** — no people, just room/router/reflections/noise/drift → the *environmental fingerprint*. → **Reuse ADR-135** `BaselineCalibration` + **ADR-030** field eigenmodes. No new capture code; the calibration crate calls it.
+2. **Capture guided samples** — stand, sit, lie down, slow vs normal breathing, small movement, sleep posture. Clean anchors, not hours of data. → **NEW** `EnrollmentProtocol` (Section 2.2).
+3. **Extract the useful signal** — CSI phase, amplitude, Doppler shift, micro-motion, periodicity, variance, timing. → **Reuse** `signal_features.rs` + ruvsense extractors, packaged as labelled `AnchorFeature` records (Section 2.3).
+4. **Compress patterns into small ruVector models** — *specialised* per signal: breathing, heartbeat, sleep restlessness, posture, presence, anomaly. → **NEW** `SpecialistBank` trained via `rapid_adapt` LoRA heads over the frozen ADR-150 backbone, stored as RVF (Section 2.4).
+
+---
+
+## 2. Decision
+
+**Build the RuView Per-Room Calibration & Specialized Model Training System: a four-stage, local-first pipeline (`baseline → enroll → extract → train`) that produces a versioned *bank of small specialised ruVector models* scoped to one `room_id`, each a lightweight head distilled/adapted from the frozen, Hugging-Face-published RF Foundation Encoder (ADR-150).** Big model understands the world; small ruVector models understand *your room*.
+
+Two invariants govern every design choice below:
+
+> **(A) Specialisation over scale.** One small model per biological signal, not one large model for all of them. Each specialist is faster, cheaper, more private, and — because it is calibrated to the room's actual fingerprint — often *more accurate* than a general model.
+>
+> **(B) Local-first, base-shared.** The frozen room/subject/device-invariant backbone is the only artifact published to Hugging Face. Per-room baselines and per-specialist heads never leave the device unless the operator opts into federation (ADR-105).
+
+### 2.1 System architecture
+
+```
+                       HUGGING FACE HUB (public, room-agnostic)
+                       ┌───────────────────────────────────────┐
+                       │  RF Foundation Encoder (ADR-150)       │
+                       │  pose-preserving · subject/room/device │
+                       │  -invariant · frozen · safetensors     │
+                       └───────────────┬───────────────────────┘
+                                       │  from_pretrained() once, cached on device
+                                       ▼
+  STAGE 1 baseline        STAGE 2 enroll        STAGE 3 extract         STAGE 4 train (per room_id)
+  ┌──────────────┐        ┌──────────────┐      ┌────────────────┐      ┌─────────────────────────┐
+  │ ADR-135      │        │ Enrollment   │      │ signal_features│      │ SpecialistBank          │
+  │ Baseline-    │──fp──► │ Protocol     │─clip►│ + ruvsense     │─AF──►│  frozen backbone        │
+  │ Calibration  │        │ guided       │      │ extractors     │      │   │  ┌────────────────┐  │
+  │ (env finger- │        │ anchors:     │      │ → AnchorFeature│      │   ├─►│ breathing head │  │
+  │  print)      │        │ stand/sit/   │      │ (phase, amp,   │      │   ├─►│ heartbeat head │  │
+  │ ADR-030      │        │ lie/breathe/ │      │  doppler,      │      │   ├─►│ restless head  │  │
+  │ field eigen  │        │ move/sleep   │      │  micromotion,  │      │   ├─►│ posture head   │  │
+  └──────────────┘        └──────────────┘      │  periodicity,  │      │   ├─►│ presence head  │  │
+        │                                        │  variance,     │      │   └─►│ anomaly head   │  │
+        │  baseline drift > τ → invalidate bank  │  timing)       │      │     (LoRA / ruVector    │
+        └───────────────────────────────────────┴────────────────┴──────┤      small models)      │
+                                                                          └───────────┬─────────────┘
+                                                                                      │ RVF container
+                                                                                      ▼
+                                                              RUNTIME: Mixture-of-Specialists
+                                                              each head emits {value, confidence};
+                                                              coherence_gate (ADR-135) + anomaly
+                                                              head veto → fused RoomState
+```
+
+The shared backbone is loaded **once per device** and frozen. Every specialist is a small head over its embedding — so the marginal cost of a sixth specialist is kilobytes of LoRA weights, not another full model.
+
+### 2.2 Stage 2 — the guided enrollment protocol (NEW)
+
+`EnrollmentProtocol` is a CLI-driven state machine that walks the operator through a fixed sequence of labelled **anchors**. The design rule from the user vision is explicit: *clean anchors, not hours of data.* Each anchor is a short (default 20 s @ 20 Hz = 400 frames) labelled clip captured against the already-recorded baseline.
+
+| Anchor | Label | Duration | Primary signal taught | Feature emphasis |
+|--------|-------|----------|-----------------------|------------------|
+| `empty` | presence=0 | (reuse ADR-135 baseline) | absence reference | amplitude variance floor |
+| `stand_still` | posture=standing, presence=1 | 20 s | static human load | amplitude mean shift, eigenmode delta |
+| `sit` | posture=sitting | 20 s | lower static load | amplitude profile |
+| `lie_down` | posture=lying | 20 s | sleep-position load | amplitude profile, low Doppler |
+| `breathe_slow` | resp≈0.1–0.15 Hz | 30 s | slow respiration | periodicity, micro-Doppler |
+| `breathe_normal` | resp≈0.2–0.3 Hz | 30 s | normal respiration | periodicity, BVP phase |
+| `small_move` | motion=1 | 20 s | limb micro-motion | Doppler spread, variance |
+| `sleep_posture` | posture=lying, restless=0 | 30 s | quiescent sleep baseline | long-window variance, timing |
+
+The protocol is **adaptive**: an anchor is only accepted when its captured features pass a quality gate (coherence ≥ threshold from `coherence_gate.rs`, sufficient SNR vs baseline, no saturation). A failed anchor is re-prompted rather than silently kept — bad anchors poison small models far more than large ones. Total guided enrollment is ~4 minutes of wall-clock, producing 8 clean anchors. This is intentionally far below the "hours of data" that a from-scratch model needs, because the backbone already carries world knowledge; enrollment only teaches *this* room's offsets.
+
+Anchors are persisted as an append-only `EnrollmentSession` (event-sourced, per CLAUDE.md state rules) under `room_id`, so re-enrollment is incremental and auditable.
+
+### 2.3 Stage 3 — feature extraction to labelled records (REUSE + bridge)
+
+Each accepted anchor clip is run through the existing extractor stack, baseline-subtracted per ADR-135, and packaged into an `AnchorFeature` record. No new DSP is invented — this stage is a *bridge*, not a new algorithm.
+
+| Feature group | Source module | Used by specialists |
+|---------------|---------------|---------------------|
+| CSI amplitude mean/variance | ADR-135 baseline subtraction + `signal_features.rs` | presence, posture |
+| CSI phase (sanitised, LO-aligned) | `phase_sanitizer` → `phase_align` | posture, heartbeat |
+| Doppler shift / micro-Doppler | `ruvsense/bvp.rs`, `breathing` path | breathing, small-move |
+| Micro-motion / intention lead | `ruvsense/intention.rs` | restlessness, anomaly |
+| Periodicity / spectral peaks | `bvp.rs` autocorrelation + FFT | breathing, heartbeat |
+| Long-window variance / drift | `ruvsense/longitudinal.rs` (Welford) | restlessness, presence |
+| Timing / inter-frame epoch | `c6_timesync` epoch, frame Δt | all (rhythm alignment) |
+| Field eigenmode coefficients | ADR-030 `field_model.rs` | posture, presence |
+
+`AnchorFeature` = `{ room_id, anchor_label, t_epoch_us, embedding: [f32; D] (backbone output), aux: { resp_hz?, doppler_spread, variance, periodicity_score, eigen_coeffs } }`. The backbone embedding is the *shared* representation; `aux` carries the cheap hand-features that let small heads specialise without re-learning DSP.
+
+### 2.4 Stage 4 — the specialist bank (NEW, the core contribution)
+
+A **`SpecialistBank`** is a versioned collection of small models scoped to one `room_id`, persisted as a single RVF cognitive container (`wifi-densepose-ruvector`). Each specialist is a *head* over the frozen backbone embedding, trained from the labelled `AnchorFeature` records via the existing `rapid_adapt.rs` LoRA machinery (test-time/few-shot training, contrastive + entropy losses), **not** a from-scratch network.
+
+| Specialist | Model type | Params (typ.) | Label source | Output |
+|------------|-----------|---------------|--------------|--------|
+| **breathing** | 1-D temporal head + periodicity regressor | ~8 KB LoRA + aux | `breathe_slow`/`breathe_normal` | resp rate (Hz) + confidence |
+| **heartbeat** | narrowband phase head (harmonic-aware) | ~12 KB | quiescent anchors + periodicity | HR (bpm) + confidence |
+| **sleep restlessness** | variance/drift classifier | ~4 KB | `sleep_posture` vs `small_move` | restlessness score [0,1] |
+| **posture** | k-way prototype classifier (HNSW NN) | prototypes only | `stand/sit/lie` anchors | posture class + margin |
+| **presence** | binary energy/eigenmode gate | ~2 KB | `empty` vs occupied anchors | presence prob |
+| **anomaly** | one-class / physically-impossible detector (`adversarial.rs`) | ~6 KB | baseline + all anchors (novelty) | anomaly score + veto flag |
+
+Design properties that follow from invariant (A):
+
+- **Independently versioned & swappable.** Re-enrolling breathing does not retrain posture. A specialist carries its own `{trained_at, anchor_set_hash, baseline_hash, backbone_rev}`.
+- **HNSW prototype storage for the classifiers.** Posture and presence are nearest-prototype lookups in the RVF index — no inference engine, microsecond latency, and new postures are added by inserting a prototype, not retraining.
+- **SONA online adaptation.** Each specialist may carry a SONA/MicroLoRA online-adaptation slot (`ruvllm_sona_*` / `microlora` primitives) so it tracks slow drift (furniture moved, seasonal RF change) between full re-enrollments, gated by ADR-135 baseline drift.
+- **Teacher–student distillation (optional, offline).** Where a labelled public corpus exists (MM-Fi, Wi-Pose), the ADR-150 backbone acts as teacher to pre-shape a head before per-room fine-tuning, improving cold-start. The *teacher* is global/HF; the *student head* is local.
+
+**Invalidation contract.** The bank stores the `baseline_id` (the baseline UUID) it was trained against. **As implemented**, the runtime marks the bank `STALE` whenever the *current* baseline id differs from the trained one — a conservative trigger that catches re-calibration (room rearranged, AP moved, band changed) because any of those produces a new baseline. A finer **drift-threshold** trigger (mark STALE when ADR-135's per-subcarrier deviation exceeds τ *without* a full re-baseline) is a planned refinement (P6). Either way the runtime prompts re-enrollment rather than emitting silently wrong vitals — the calibration analogue of the #954 `DEGRADED` honesty rule: never report confident numbers from an invalid model.
+
+### 2.5 Runtime — mixture of specialists with confidence gating
+
+At inference, the frozen backbone embeds each CSI window once; every specialist consumes that shared embedding and emits `{value, confidence}`. Fusion rules:
+
+- The **anomaly** specialist holds a **veto**: a high anomaly score (physically-impossible signal per `adversarial.rs`, or a coherence-gate `Reject`) suppresses positive vitals/posture output and raises a flag, rather than propagating a hallucinated reading.
+- **presence=0** short-circuits breathing/heartbeat/posture to `null` (you cannot have a respiration rate in an empty room).
+- Each emitted reading is tagged with the specialist's confidence and the `baseline_hash`/`backbone_rev` provenance, so downstream consumers (sensing-server, MQTT, Home Assistant) can gate on quality — consistent with ADR-135 coherence-gate semantics.
+
+### 2.6 Crate & module layout
+
+New bounded-context crate `wifi-densepose-calibration` (orchestration only; files < 500 lines, typed public APIs, event-sourced sessions — per CLAUDE.md):
+
+```
+wifi-densepose-calibration/
+  src/
+    lib.rs                 # public API: CalibrationSystem facade
+    enrollment.rs          # EnrollmentProtocol state machine (Stage 2)
+    anchor.rs              # Anchor, EnrollmentSession (event-sourced)
+    extract.rs             # AnchorFeature bridge over signal_features + ruvsense (Stage 3)
+    specialist.rs          # Specialist trait, SpecialistKind enum
+    bank.rs                # SpecialistBank (RVF container, versioning, invalidation)
+    runtime.rs             # MixtureOfSpecialists fusion + veto (Stage 5)
+    backbone.rs            # frozen ADR-150 encoder loader (hf_hub from_pretrained, cached)
+    error.rs
+```
+
+Dependencies (no duplication — orchestrates existing crates): `wifi-densepose-signal` (ruvsense extractors, ADR-135 baseline), `wifi-densepose-train` (`rapid_adapt`, `signal_features`, `trainer`), `wifi-densepose-ruvector` (RVF, HNSW), `wifi-densepose-nn` (backbone inference). The `wifi-densepose-cli` gains `enroll`, `train-room`, and `room-status` subcommands, sequenced after the existing ADR-135 `calibrate`.
+
+### 2.7 CLI flow (operator-facing)
+
+```bash
+# Stage 1 — environmental fingerprint (ADR-135, existing)
+wifi-densepose calibrate --room living-room --duration 60s     # empty room
+
+# Stage 2+3 — guided enrollment (NEW); prompts through 8 anchors, ~4 min
+wifi-densepose enroll --room living-room
+#   → "Stand still in view of the sensor…"  [✓ anchor accepted: coherence 0.91]
+#   → "Sit down…"                            [✗ low SNR, retrying]
+#   ...
+
+# Stage 4 — train the specialist bank (NEW); reuses cached HF backbone
+wifi-densepose train-room --room living-room \
+    --specialists breathing,heartbeat,restlessness,posture,presence,anomaly
+
+# Status / invalidation
+wifi-densepose room-status --room living-room
+#   baseline: fresh (drift 0.04 < 0.20) · backbone: rf-foundation@1.2.0
+#   breathing  ✓ trained 2026-06-09  conf p50 0.88
+#   heartbeat  ✓ trained 2026-06-09  conf p50 0.71
+#   posture    ✓ 3 prototypes (stand/sit/lie)
+#   anomaly    ✓  · presence ✓  · restlessness ✓
+```
+
+---
+
+## 3. Consequences
+
+### 3.1 Positive
+
+- **Fidelity through specialisation.** Six small calibrated heads beat one oversized general model on the cross-room/cross-subject frontier that ADR-150 quantified — and each runs in microseconds-to-milliseconds, on-device.
+- **Privacy by construction.** Only the room-agnostic backbone is public (HF). The environmental fingerprint and the person-specific heads stay local; no video, no skin, no cloud round-trip. This is the core differentiator vs cameras and the convenience differentiator vs wearables.
+- **Minutes, not hours.** Because the backbone carries world knowledge, ~4 minutes of clean anchors calibrates a room. Re-enrollment is incremental.
+- **Honest degradation.** The `baseline_hash` invalidation + anomaly veto mean an out-of-calibration room reports `STALE`/flagged rather than confidently wrong — the same honesty principle as the firmware `DEGRADED` flag.
+- **Composable & cheap to extend.** A new biological signal = a new small head over the same embedding, not a new model.
+
+### 3.2 Negative / risks
+
+- **Backbone dependency.** Every specialist rides on ADR-150's encoder; its quality and revision compatibility (`backbone_rev`) are a single point of leverage. Mitigation: pin `backbone_rev` in each specialist; distillation cold-start reduces sensitivity.
+- **Enrollment burden.** 4 minutes is small but non-zero, and anchor quality depends on the operator following prompts. Mitigation: adaptive re-prompting + quality gates; ship sane defaults so a partial bank (presence+posture) works after just the static anchors.
+- **Heartbeat is hard.** Sub-mm chest displacement at HR frequencies is near the ESP32-S3 noise floor; the heartbeat specialist will have lower and more variable confidence than breathing. The confidence-gated runtime surfaces this rather than faking it.
+- **Per-room storage proliferation.** A bank per room per person; needs a clear RVF lifecycle (list/prune/export) — handled by `bank.rs` versioning and the `room-status` CLI.
+
+### 3.3 Alternatives considered
+
+| Alternative | Verdict | Reason |
+|-------------|---------|--------|
+| One large general model for all signals | **Rejected** | The ADR-150 evidence: scale overfits rooms/subjects and collapses cross-domain; also slower, costlier, less private. Directly contradicts invariant (A). |
+| Cloud training of per-room models | **Rejected** | Violates invariant (B): would ship raw CSI of a person's home/sleep to a server. Local-first is the privacy promise. Federation (ADR-105) is the *opt-in* path for shared improvement, exchanging gradients/deltas, never raw CSI. |
+| Skip the backbone; train each specialist from scratch | **Rejected** | Reintroduces the "hours of data" requirement the user vision explicitly rejects, and loses cross-room priors. |
+| Fold this into ADR-135 | **Rejected** | ADR-135 is *room* calibration (no humans). This ADR is *human-anchor* enrollment + model training on top of it. Distinct lifecycles, distinct invalidation; kept as separate bounded contexts. |
+
+---
+
+## 4. Implementation phases
+
+| Phase | Scope | Exit criterion | Status |
+|-------|-------|----------------|--------|
+| **P1** | Scaffold `wifi-densepose-calibration` crate; `AnchorFeature` schema; (backbone via `hf_hub` deferred) | Crate + schema; unit tests | ✅ Done (crate + Stage-1 baseline via `calibrate`/`calibrate-serve`; HF backbone deferred) |
+| **P2** | `EnrollmentProtocol` + `anchor.rs` (event-sourced sessions) + CLI `enroll` with quality gates | 8-anchor enrollment; bad anchors re-prompt | ✅ Done (`anchor.rs`, `enrollment.rs`, CLI `enroll`) |
+| **P3** | `extract.rs` bridge → labelled records; baseline subtraction (ADR-135) | `AnchorFeature` records persisted per `room_id` | ✅ Done (`extract.rs`; autocorr periodicity + variance/motion) |
+| **P4** | `SpecialistBank` + presence/posture (prototype) + breathing (periodicity); persistence + versioning | `train-room` produces a bank; `room-status` reads it back | ✅ Done (`specialist.rs`, `bank.rs`, CLI `train-room`/`room-status`; JSON persistence — RVF/HNSW = future) |
+| **P5** | heartbeat + restlessness + anomaly specialists; `runtime.rs` mixture + veto + confidence gating | End-to-end RoomState on hardware; anomaly veto verified | ✅ Done (`runtime.rs`, CLI `room-watch`; breathing read live on COM8 ESP32) |
+| **P6** | Baseline-drift `STALE` invalidation; SONA online adaptation; optional ADR-105 federation; HF teacher–student distillation | Drift marks bank STALE; AetherArena entry | ◐ Partial (STALE done; SONA/federation/HF-backbone = follow-ups) |
+
+**Current status (2026-06-10):** Stages 1–5 implemented with *statistical* specialists (threshold/prototype/autocorrelation). 55 tests (35 unit incl. multistatic + 1 full-loop integration + 19 CLI), all passing under qemu-aarch64. **Validation scope is precise:** baseline capture + HTTP API + auth are proven on real CSI (Pi-5 nexmon, 6,813 frames; and an ESP32-S3). The complete `baseline → enroll → train-room → infer` loop is now **proven in-process** on deterministic synthetic CSI (`tests/full_loop.rs`: clean baseline with zero motion flags, 8/8 anchors through the quality gate, 6 specialists trained, JSON bank round-trip, trained-bank inference 18±2 BPM positive / absent negative / foreign-baseline STALE; seed-robust). The one live runtime signal (breathing ~16–31 BPM via `room-watch`) used the *stateless* breathing head, **not** a trained bank; the clean empty-room loop has **not** yet run on-target — the remaining gap is strictly the hardware session (empty room + operator anchors). The four behavioral findings from the full-loop test (z-band squeeze, variance-only presence, ungated hz embedding, heart-band lag-floor leakage) are FIXED and regression-guarded — see the integration doc §7. SOTA-intake decisions affecting this system (geometry conditioning, checkerboard alignment) are recorded in ADR-152. Open refinements: `--source-format adr018v6` (drive from the Pi's own nexmon), phase-based breathing carrier, RVF/HNSW storage, and the ADR-150 frozen HF backbone the specialists would distill from.
+
+Validation per CLAUDE.md: `cargo test --workspace --no-default-features` green; hardware verification on the ESP32-S3 (currently COM8) before any release; witness bundle regenerated if the proof surface changes.
+
+---
+
+## 6. Review notes
+
+### 6.1 Correctness + security review (2026-06-14)
+
+Beyond-SOTA correctness+security review of `wifi-densepose-calibration` (this
+ADR's pipeline), un-covered by the ADR-154–159 sweep.
+
+**Finding (FIXED) — NaN-poisoning of the feature path (numerical / fail-closed).**
+`Features::from_series` — the carrier for both live inference and training-anchor
+extraction — computed `mean`/`variance`/`motion` over the raw scalar series with
+no non-finite guard. A single `NaN`/`±inf` sample (corrupt CSI frame) yielded
+`mean=NaN, variance=NaN` and an all-`NaN` prototype embedding. Persisted into a
+`PresenceSpecialist::threshold`/`empty_mean` at train time, the `NaN` **silently
+disabled presence detection** for the bank's lifetime (every `>` / `|·|`
+comparison against `NaN` is false → always reads *absent*, confidence 0), with no
+error — and an asymmetry against the rigorously NaN-guarded `geometry_embedding`.
+Fixed at the production boundary: non-finite samples are dropped (a corrupt frame
+counts as no frame), an all-non-finite series degrades to `Features::ZERO` like
+the empty series. Value-identical for all-finite input (full-loop + extract tests
+unchanged); pinned by `non_finite_samples_do_not_poison_features` and
+`all_non_finite_series_is_zero` (both fail on the old code).
+
+**Clean dimensions (evidence, no invented issues).**
+- *File/path handling:* the crate performs **zero** file/path I/O (no
+  `std::fs`/`Path`/`File`/`read`/`write` in `src/`; only in-memory `serde_json`).
+  Path-traversal / unbounded-read / artifact-path handling live entirely in the
+  `wifi-densepose-cli` consumer (`room.rs`), outside this crate's boundary.
+- *Untrusted-load:* `SpecialistBank::from_json` shape-validates via serde
+  (malformed → `CalibrationError::Serde`); banks are local-first (invariant B),
+  never network-received. A well-formed bank with adversarial numerics is trusted
+  as-is — acceptable under the local-first threat model; a validate-on-load
+  defense-in-depth pass is a possible future hardening, not a present bug.
+- *Receipt/hash integrity:* the crate emits no hash/receipt/witness/signature, so
+  the unframed-concatenation bug class (cf. the engine `witness_of` fix) is
+  structurally absent.
+- *Other numerical paths:* `geometry_embedding` sanitizes every input and sweeps
+  to finite; presence/restlessness/anomaly divisions are `.max(1e-3)`-guarded;
+  `autocorr_dominant` guards `r0`, short signals, and empty bands; `train` rejects
+  empty anchors; anomaly requires ≥2 anchors.
+
+De-magicked the bare specialist threshold literals (breathing/heartbeat default
+min-scores, anomaly outlier-spread multiple + label cutoff) into named documented
+consts, value-identical, pinned by const-equality tests. Tests
+**58→62 unit + 1 integration, 0 failed**; Python deterministic proof unchanged
+(off the signal proof path).
+
+---
+
+## 5. Summary
+
+> Big models understand the world. Small ruVector models understand *your room*.
+
+ADR-151 makes that operational: a local-first `baseline → enroll → extract → train` pipeline that turns ~4 minutes of clean human anchors — layered on ADR-135's empty-room fingerprint and ADR-150's Hugging-Face-published invariant backbone — into a versioned bank of tiny, specialised, privacy-preserving models for breathing, heartbeat, restlessness, posture, presence, and anomaly. Specialisation over scale; local heads over a shared base; honest `STALE` degradation over confident error.
@@ -0,0 +1,125 @@
+# ADR-152: WiFi-Pose SOTA 2026 Intake — Geometry-Conditioned Calibration, External Benchmarks, and the Foundation-Encoder Training Recipe
+
+| Field | Value |
+|-------|-------|
+| **Status** | Proposed |
+| **Date** | 2026-06-10 |
+| **Deciders** | ruv |
+| **Codebase target** | `wifi-densepose-calibration` (geometry conditioning, ADR-151 Stage 2), `wifi-densepose-train` (camera-supervised path, MAE recipe), `wifi-densepose-cli` (benchmark harness), docs |
+| **Relates to** | ADR-151 (Per-Room Calibration), ADR-150 (RF Foundation Encoder), ADR-135 (Empty-Room Baseline), ADR-079 (Camera-Supervised Pose), ADR-027 (MERIDIAN), ADR-024 (AETHER), ADR-149 (AetherArena), ADR-029 (Multistatic) |
+| **Research provenance** | Deep-research run 2026-06-10: 22 sources fetched, 110 claims extracted, 25 adversarially verified (3-vote), 24 confirmed / 1 refuted. Evidence grades per source below. |
+
+---
+
+## 1. Context
+
+A structured survey of the 2025–2026 WiFi human-sensing state of the art was run on 2026-06-10 to answer: *what should RuView integrate next, and does anything published invalidate our current direction?* Every claim below was verified against the primary source by independent adversarial reviewers; **evidence grades distinguish what the papers measured from what they merely claim**. Almost all performance numbers are author-self-reported preprint results — treated here as CLAIMED until reproduced on our hardware.
+
+### 1.1 The five verified findings
+
+**(F1) "Coordinate overfitting" is a named, diagnosed failure mode of camera-supervised WiFi pose — and our ADR-079 pipeline has the exact shape of it.**
+PerceptAlign (arXiv [2601.12252](https://arxiv.org/abs/2601.12252), accepted ACM MobiCom 2026) shows that models regressing CSI directly to camera-frame coordinates memorize the deployment-specific transceiver layout; SOTA baselines degrade to >600 mm MPJPE in unseen scenes. Their fix is cheap: a <5-minute calibration using two checkerboards and a few photos to align WiFi and vision in one shared 3D frame, plus **fusing transceiver-position embeddings with CSI features**. Claimed: −12.3% in-domain error, −60%+ cross-domain error. They release the claimed-largest cross-domain 3D WiFi pose dataset (21 subjects, 5 scenes, 18 actions, **7 device layouts**). *Evidence: improvements CLAIMED (preprint w/ MobiCom acceptance); the failure mode itself is corroborated across the cross-domain literature — and independently by our own ADR-150 data (81.63% in-domain vs ~11.6% leakage-free cross-subject torso-PCK).*
+
+**(F2) An external model named "WiFlow" claims 97.25% PCK@20 with 2.23M params and ships everything.**
+arXiv [2602.08661](https://arxiv.org/abs/2602.08661) (Apr 2026) — spatio-temporal-decoupled CSI pose, 97.25% PCK@20 / 99.48% PCK@50 / 0.007 m MPJPE, 2.23M parameters (~2.2 MB int8). Code, pretrained weights, and a 360k-sample CSI-pose dataset are public under Apache-2.0 ([repo](https://github.com/DY2434/WiFlow-WiFi-Pose-Estimation-with-Spatio-Temporal-Decoupling), Kaggle dataset). *Evidence: artifact availability MEASURED (verified by direct repo inspection); PCK numbers CLAIMED (5-subject, in-domain, self-collected dataset; hardware unspecified; 15 keypoints vs our 17).* ⚠️ **Name collision:** this is unrelated to RuView's internal WiFlow model. In all RuView docs the external model is referred to as **WiFlow-STD (DY2434)**.
+
+**(F3) For CSI foundation encoders, data scale — not model capacity — is the bottleneck, and the tokenization recipe is now known.**
+UNSW's MAE pretraining study (arXiv [2511.18792](https://arxiv.org/abs/2511.18792), Nov 2025) — the largest heterogeneous CSI pretraining run to date (1,320,892 samples, 14 public datasets incl. MM-Fi, Widar 3.0, Person-in-WiFi 3D; 4 devices; 2.4/5/6 GHz; 20–160 MHz) — reports zero-shot cross-domain gains of 2.2–15.7% over supervised baselines, with unseen-domain performance scaling **log-linearly with pretraining data, unsaturated at 1.3M samples**, while ViT-Base adds only 0.4–0.9% over ViT-Small. Optimal recipe: **80% masking ratio, small (30,3) patches** (+4.7% over (40,5) by preserving fine temporal dynamics). *Evidence: MEASURED within-study (ablations verified in body text) but preprint; downstream tasks are classification, NOT pose — pose transfer is a hypothesis. Independently corroborates ADR-150's finding that capacity hurts cross-subject.*
+
+**(F4) Hardware/standards: 802.11bf is finished; Espressif ships official sensing; Wi-Fi 6 AP CSI is reachable.**
+- **IEEE 802.11bf-2025** published **2025-09-26** (verified against the IEEE SA record) — sensing standardization is complete for both sub-7 GHz and >45 GHz, with formal sensing setup/feedback procedures. No ESP32 silicon implements it yet. *Evidence: MEASURED (standards-body record).*
+- **Espressif `esp_wifi_sensing`** (Apache-2.0, v0.1.x, ESP Component Registry): official CSI presence/motion FSM; esp-csi actively maintained (commit 2026-04-22, verified), CSI confirmed across ESP32/S2/C3/S3/C5/C6/C61. *Evidence: MEASURED (vendor pages + commit log).* ⚠️ A stronger "drop-in compatible with RuView nodes" claim was **REFUTED 0-3** — WiFi-6 parts use a different CSI acquisition config struct.
+- **ZTECSITool** (arXiv [2506.16957](https://arxiv.org/abs/2506.16957), [code](https://github.com/WiFiZTE2025/ZTE_WiFi_Sensing)): CSI from commercial Wi-Fi 6 APs at up to 160 MHz / 512 subcarriers (~5–10× ESP32 subcarrier count; the gain is aperture, not per-Hz granularity). Firmware is gated behind a ZTE serial-number approval. *Evidence: capability CLAIMED by the vendor-authored tool paper; code artifact MEASURED.*
+
+**(F5) Nothing in 2025–2026 does full DensePose UV regression from commodity WiFi.** Keypoint pose remains the field's frontier. Three "wireless foundation model" papers were screened out by full-text inspection (HeterCSI = simulated cellular channels only; the NeurIPS-2025 FMCW pilot = mmWave radar, presence-only; arXiv 2509.15258 = survey, no artifacts). *Evidence: MEASURED (absence verified by full-text inspection of the candidates that surfaced; absence of evidence across the whole literature is necessarily weaker).*
+
+### 1.2 What this means for the ADR-151 calibration system
+
+ADR-151's enrollment protocol captures guided human anchors but does **not** record or condition on transceiver geometry. F1 says that omission is precisely the thing that makes camera-supervised (and, plausibly, anchor-supervised) heads layout-brittle. ADR-151's per-room thesis ("teach the room before you teach the model") is *strengthened* by F1 — PerceptAlign is independent evidence that layout must be modeled explicitly — and the fix composes naturally with our Stage-2 enrollment.
+
+ADR-150's masked-CSI-encoder design is *validated* by F3, which also hands us the hyperparameters and the priority call: **collect/aggregate more heterogeneous CSI before scaling the encoder.**
+
+## 2. Decision
+
+Adopt four changes, ordered by effort-vs-gain:
+
+### 2.1 Geometry-condition the calibration system (extends ADR-151 Stage 2) — ACCEPTED
+
+1. **Record transceiver geometry at enrollment.** `EnrollmentProtocol` gains an optional `NodeGeometry` record per node (position estimate, antenna orientation, inter-node distances where known). Stored alongside the room baseline in the bank; schema-versioned so existing banks remain readable.
+2. **Fuse geometry embeddings into specialist training.** Where a specialist head consumes the (future, ADR-150) backbone embedding, concatenate a small learned embedding of `NodeGeometry` — the PerceptAlign mechanism, transplanted to our per-room banks. Statistical specialists (current) ignore it; LoRA heads (ADR-151 P6) consume it.
+3. **Adopt the two-checkerboard alignment for the camera-supervised path (ADR-079).** When MediaPipe supervision is used, calibrate camera↔WiFi into one shared 3D frame before regression (<5 min, two checkerboards, a few photos). This is the direct defense against F1 for our camera-supervised pipeline. ~~92.9%-PCK@20~~ — *that figure was retracted during measurement (b) (2026-06-10): the surviving holdout shows a constant-output model under an absolute (non-torso) threshold on 69 near-static frames; mean predictor scores 100% under the same protocol. The §2.2 no-citation rule now applies to it.*
+4. **Evaluate on the PerceptAlign cross-domain dataset** (21 subjects / 7 layouts) as the MERIDIAN cross-layout benchmark — *gated on confirming its license and downloadability* (open question; repo per paper: github.com/Trymore-lab/PerceptAlign).
+   > **Gate resolved (2026-06-10, MEASURED by repo inspection):** repo exists, **MIT license**, dataset downloadable from HuggingFace (5 per-scene repos, raw CSI + separate vision keypoints; Intel 5300, 1TX×3RX×3 ant, 57 subcarriers — same order as ESP32 subcarrier counts; Scene3 ships 3 distinct layouts). Code present, no pretrained weights. Benchmark adoption unblocked; dataset-side license terms inherit HF dataset terms (not separately stated — check at download time).
+
+### 2.2 Benchmark against WiFlow-STD (DY2434) — ACCEPTED
+
+Pull the Apache-2.0 weights + 360k-sample dataset; run three measurements: (a) their model on their data (reproduce 97.25% claim), (b) their model fine-tuned on our ESP32 17-keypoint eval set, (c) our internal WiFlow on their dataset (15-keypoint subset mapping). Until (a)–(c) are measured, **no RuView doc may cite 97.25% as a comparable number** — different dataset, subjects, keypoints.
+
+> **Status (2026-06-10, measurement (a) complete — `benchmarks/wiflow-std/RESULTS.md`):** shipped checkpoint REFUTED (0.08% PCK@20 — wrong keypoint normalization, predates published code); released code does not run as published (6 defects, incl. broken package import and an unreachable test phase); released dataset's last 13 files are corrupted (9,072 windows: NaN + float32-max garbage, diverges fp16 training via BatchNorm poisoning). After repairing both, retraining with upstream defaults reproduced **96.09% PCK@20 full-test / 96.61% corruption-free / MPJPE 0.0094–0.0098** (published: 97.25% / 0.007) on an RTX 5080. Accuracy claims graded MEASURED-EQUIVALENT; params (2.23M) and FLOPs (~0.055G) verified. (b)/(c) remain open.
+
+### 2.3 Apply the UNSW recipe to the ADR-150 encoder — ACCEPTED (amends ADR-150 §2.3)
+
+- Pretraining corpus: start from the same 14 public datasets (1.3M samples) + our home/MM-Fi frames; data aggregation takes priority over architecture work.
+- Tokenization: 80% masking, (30,3)-class small patches; encoder stays ViT-Small-class (~15M params) — F3 and our own DANN/transformer results agree that capacity does not pay.
+- The published log-linear scaling (unsaturated) sets the expectation: more heterogeneous CSI in, better zero-shot out.
+
+### 2.4 Hardware watch items — ACCEPTED (no code now)
+
+- **802.11bf**: track silicon/certification; OTA binding remains deferred until commodity chipsets expose standardized sensing measurements. **Amended by ADR-153** (2026-06-10): implement a pure Rust forward-compatibility protocol layer now — typed procedure models, a deterministic session FSM, a transport abstraction, simulation tests, and an `OpportunisticCsiBridge` that maps today's ESP32 CSI batches into standardized sensing-report shape.
+- **esp_wifi_sensing**: benchmark our presence pipeline against the vendor FSM (one afternoon; useful external baseline). Do **not** treat as drop-in (refuted claim).
+- **ZTECSITool AP**: optional high-resolution anchor node for the ADR-029 multistatic mesh — procurement-gated; only pursue if a 160 MHz anchor materially helps tomography.
+
+### 2.5 Explicitly NOT adopted
+
+- No pivot toward "wireless foundation model" papers that don't ship WiFi-CSI artifacts (HeterCSI, FMCW pilot, surveys).
+- No DensePose-UV work item: the field has not demonstrated UV regression from commodity WiFi; keypoints remain our supervised target (F5).
+
+### 2.6 RuVector vendor sync + integration opportunities (added 2026-06-10)
+
+**Vendor sync record.** `vendor/ruvector` moved from pin `e38347601` (2026-05-07) to `a083bd77f` (origin/main, 3 commits past tag `ruvector-v0.2.28`; vendored workspace version 2.2.3). 111 commits in the range, roughly half NAPI-binary/lint chores. Substantive: graph condensation + differentiable min-cut (#547), core HNSW correctness fixes v2.2.3 (#502), RUSTSEC/clippy hardening (#504), ONNX embedder API-contract fix (#523/#525 — npm/TypeScript package only), dead parallel-worker import removal (#532). *Evidence: MEASURED (git range + commit-stat inspection).*
+
+**Opportunity table.** Workspace policy is crates.io versions only, so unpublished crates are WATCH by definition regardless of fit.
+
+| Crate | What it offers | wifi-densepose target | crates.io | Verdict |
+|---|---|---|---|---|
+| `ruvector-graph-condense` (new, #547) | Training-free min-cut graph condensation + **differentiable normalized-cut loss** (`DiffCutCondenser`, analytic MinCutPool-style gradients, gradient-checked tests; provenance-retaining super-nodes) | `subcarrier_selection.rs` (condense 114 subcarriers into cut-preserving regions instead of raw min-cut); auxiliary clustering regularizer for `wifi-densepose-train`; `DynamicPersonMatcher` region structure | **Not published** | **WATCH** — strongest technical fit in the sync; adopt when published. README's "no published method uses graph-cut condensation" is CLAIMED; the diffcut implementation + tests are MEASURED |
+| `ruvector-attention` 2.1.0 | #304 SOTA modules: MLA, KV-cache, SSM, sparse/MoE, hybrid search, Graph RAG (publish date 2026-03-27 matches the #304 commit — MEASURED) | Supersedes pinned 2.0.4 used by `model.rs` spatial attention + `bvp.rs`; SSM/MLA are candidate pure-Rust edge-inference primitives for the ADR-150 encoder | 2.1.0 (pinned **2.0.4**) | **ADOPT** (minor bump; API-compat check first) |
+| `ruvector-gnn` 2.2.0 | panic→`Result` constructors, gradient clipping, MSE/CE/BCE losses, seeded-RNG layer init (#495 is post-2.2.0) | `wifi-densepose-train` GNN path (pinned 2.0.5, `default-features = false`) | 2.2.0 (pinned **2.0.5**) | **ADOPT** (bump) |
+| `ruvector-mincut` / `ruvector-solver` 2.0.6 | Patch-level fixes (workspace republish 2026-03-25) | `metrics.rs` DynamicPersonMatcher, subcarrier interpolation, triangulation | 2.0.6 (pinned **2.0.4** each) | **ADOPT** (routine patch bump) |
+| `ruvector-core` 2.2.3 (vendor) | HNSW correctness: k=0 guard, sorted results, flat-index fixes, cross-integration helpers (#502 — MEASURED, `index/hnsw.rs` + new integration tests) | `homecore-recorder` `RuvectorSemanticIndex` (real HNSW consumer); `sketch.rs` quantization unaffected | **2.2.0 = latest published**; 2.2.3 unpublished | **WATCH** — bump the moment 2.2.3 publishes |
+| `ruvector-cnn` 2.0.6 | Pure-Rust SIMD conv kernels (AVX2/NEON/WASM), MobileNetV3, INT8 quantization, contrastive losses (InfoNCE/triplet, #252) | **Not** the WiFlow-STD training port — `wiflow_std/model.rs` is tch/libtorch (MEASURED). Relevant to the *edge inference* path of the trained ~2.2 MB int8 model, and InfoNCE/triplet overlaps AETHER (ADR-024) | 2.0.6 | **EVALUATE** — only if/when we commit to a no-libtorch edge runtime for WiFlow-STD-class models |
+| `ruvector-acorn` (new-ish) | ACORN predicate-agnostic filtered HNSW (SIGMOD'24 algorithm; γ·M denser graphs for low-selectivity filters) | Metadata-filtered pattern search over ADR-151 calibration banks — speculative; bank sizes are far below where filtered-ANN recall collapse matters | **Not published** | **WATCH** |
+| `ruvector-cluster` 2.0.6 | Distributed sharding, gossip discovery, DAG consensus | No current need; ADR-029 mesh coordination is ESP32-side, not vector-DB-side | 2.0.6 | **WATCH** |
+| ONNX embedder fix (#523/#525) | API-contract + packaging fixes in `npm/packages/ruvector` (TypeScript) | None — `wifi-densepose-nn`'s ONNX backend is Rust (ort/tract), untouched by this change (MEASURED: commit touches npm/ only) | n/a | No action |
+| `ruvector-perception` (new, #547) | "Physical perception substrate" (hypothesis/topology/witness modules) — agent-perception oriented, not RF | None identified | Not published | WATCH (name-overlap only) |
+
+**Security note (RUSTSEC #504).** The substantive fixes target `ruvllm`, `ruvector-dag`, `prime-radiant`, `rvagent-*`, and the `ruvector-server` HTTP endpoint (NaN-safe `partial_cmp`, input-validation guards, env-allowlisted exec) — **none of which we pin**. The commit states `cargo audit` returns clean across the workspace. *Evidence: MEASURED (commit message + file list). Conclusion: no pinned version has an outstanding advisory; no urgent bump required.* The NaN-sort hardening is panic-robustness hygiene our pinned 2.0.4-era crates predate, which is one more reason for the routine bumps below.
+
+**Version-bump recommendations (follow-up PR — no Cargo.toml change in this ADR):** `ruvector-mincut` 2.0.4→2.0.6, `ruvector-solver` 2.0.4→2.0.6, `ruvector-attention` 2.0.4→2.1.0, `ruvector-gnn` 2.0.5→2.2.0. Current: `ruvector-core` 2.2.0, `ruvector-attn-mincut` 2.0.4, `ruvector-temporal-tensor` 2.0.6, `ruvector-crv` 0.1.1 — all at latest published. Nothing in the sync changes §2.1.2 geometry conditioning (our `viewpoint/attention.rs` `GeometricBias` already implements the fusion mechanism) or the ADR-150 MAE recipe (training stays in tch).
+
+## 3. Consequences
+
+**Positive:** the calibration system gains the one mechanism (geometry conditioning) the 2026 literature identifies as the difference between layout-brittle and layout-robust supervised WiFi pose; ADR-150 gets a measured training recipe instead of a guessed one; we acquire two external benchmarks (WiFlow-STD, PerceptAlign dataset) to keep our claims honest.
+
+**Negative / risks:** geometry records add schema surface to banks (mitigated: optional + versioned); every adopted number is preprint-grade until our own benchmark runs land (mitigated by §2.2's no-citation rule); PerceptAlign dataset license is unconfirmed (gated); name collision risk in docs (mitigated: "WiFlow-STD (DY2434)" naming rule).
+
+**Re-check by 2026-12:** 802.11bf silicon, esp_wifi_sensing maturity (v0.1.x today), and the preprint field (newest source Apr 2026).
+
+## 4. Open questions (carried from the research run)
+
+1. Does WiFlow-STD retain accuracy when fine-tuned on ESP32-S3/C6 CSI (fewer subcarriers, lower SNR), scored on our 17-keypoint set? (§2.2 answers this.)
+   > **Partial answer (MEASURED 2026-06-11, measurement (b) on 2,046 single-room windows — `benchmarks/wiflow-std/RESULTS.md`):** pretrained init shows strong *optimization* transfer (65% PCK@20 vs scratch's 0% collapse under the same budget) but **no feature transfer** (frozen-trunk + linear adapter ≈ 0%). And no run beat the mean-pose baseline (95.9% PCK@20 — single subject, near-static normalized coords), so no CSI→pose capability is citable from this data. A definitive answer needs multi-subject/multi-position data where the mean pose is weak.
+2. Is the PerceptAlign dataset downloadable under a usable license, and does the two-checkerboard procedure work with ESP32 transceiver geometry? (§2.1.4 gate.)
+3. Will esp_wifi_sensing evolve toward 802.11bf compliance, replacing opportunistic CSI extraction?
+
+## 5. Source register (evidence-graded)
+
+| Source | Type | Used for | Grade |
+|---|---|---|---|
+| arXiv 2601.12252 (PerceptAlign, MobiCom'26) | preprint+acceptance | F1, §2.1 | CLAIMED numbers; failure mode corroborated |
+| arXiv 2602.08661 + DY2434 repo (WiFlow-STD) | preprint + code | F2, §2.2 | numbers CLAIMED; artifacts MEASURED |
+| arXiv 2511.18792 (UNSW MAE) | preprint | F3, §2.3 | ablations MEASURED in-study; pose transfer hypothesis |
+| IEEE SA 802.11bf-2025 record | standards body | F4, §2.4 | MEASURED |
+| Espressif component registry + esp-csi repo | vendor | F4, §2.4 | MEASURED; "drop-in" REFUTED 0-3 |
+| arXiv 2506.16957 + ZTE repo (ZTECSITool) | vendor preprint + code | F4, §2.4 | capability CLAIMED; code MEASURED |
+| arXiv 2601.18200 (HeterCSI), OpenReview LMufK3vzE5 (FMCW pilot), arXiv 2509.15258 (survey) | preprints | F5, §2.5 (screened out) | MEASURED (full-text inspection) |
@@ -0,0 +1,168 @@
+# ADR-153: IEEE 802.11bf-2025 Forward-Compatibility Protocol Model for wifi-densepose-hardware
+
+- **Status**: accepted
+- **Date**: 2026-06-10
+- **Deciders**: ruv
+- **Tags**: hardware, protocol, sensing, 802.11bf, forward-compatibility
+
+## Context
+
+IEEE 802.11bf-2025 (WLAN Sensing) is an **Active Standard**: board approval
+2025-05-28, published 2025-09-26 (verified against the IEEE SA record,
+<https://standards.ieee.org/ieee/802.11bf/11574/>). Its scope modifies the
+MAC, HE and EHT PHY service interfaces, plus DMG and EDMG PHYs, for WLAN
+sensing in **1–7.125 GHz** and **above 45 GHz** bands, with formal sensing
+measurement setup, measurement instance, feedback/reporting, and
+sensing-by-proxy (SBP) procedures (ADR-152 F4, evidence grade MEASURED).
+
+No commodity silicon implements the standard yet — ESP32 parts included.
+ADR-152 §2.4 therefore decided "track silicon; no code now", with RuView's
+opportunistic CSI extraction remaining the mechanism. That left a gap: when
+silicon does land, RuView would have no typed model of the standard's
+procedures to bind to, and the integration would start from zero.
+
+ADR-152 §2.4 originally classified 802.11bf as a hardware watch item with no
+implementation work until commodity silicon exposes standardized sensing
+measurements. This ADR amends that clause: OTA binding remains deferred, but
+a pure Rust protocol model, session FSM, transport seam, and opportunistic
+CSI bridge will be implemented now so RuView consumers can target a stable
+standardized sensing interface before silicon arrives.
+
+The user directed (2026-06-10) that this **forward-compatibility protocol
+model** — a protocol surface, not a conformance implementation — be built
+now.
+
+## Decision
+
+Implement an `ieee80211bf` **forward-compatibility protocol model** in
+`wifi-densepose-hardware` (pure Rust, no internal deps, simulation-testable,
+no OTA path):
+
+> This module is not a certified 802.11bf implementation. It models the
+> public procedure shape needed by RuView and RuvSense, while intentionally
+> avoiding OTA frame binding until chipset support and vendor APIs exist.
+
+1. **`types.rs`** — typed structures for the standard's sensing procedures
+   (sub-7 GHz focus; DMG stubbed): Sensing Measurement Setup (setup ID,
+   initiator/responder and transmitter/receiver roles, bandwidth,
+   periodicity, threshold-based reporting parameters), Sensing Measurement
+   Instance, Sensing Measurement Report (CSI-variant payload), SBP
+   request/response, termination. Two future-proofing requirements:
+
+   - **Version gates** — every negotiated surface is tagged with a spec
+     profile, because vendors will expose partial or renamed capabilities
+     first:
+
+     ```rust
+     pub enum SpecProfile {
+         DraftCompatible,
+         Ieee80211Bf2025,
+         VendorExtension(String),
+     }
+     ```
+
+   - **Capability negotiation** — no hardcoded ESP32 assumptions in the
+     future-silicon path:
+
+     ```rust
+     pub struct SensingCapabilities {
+         pub sub_7_ghz: bool,
+         pub dmg: bool,
+         pub edmg: bool,
+         pub csi_report: bool,
+         pub threshold_reporting: bool,
+         pub sensing_by_proxy: bool,
+         pub max_bandwidth_mhz: u16,
+         pub max_period_ms: u32,
+         pub max_active_setups: u16,
+     }
+     ```
+
+   - **Privacy and governance fields** — sensing is presence inference, not
+     just radio telemetry. Every `SensingMeasurementSetup` carries policy
+     metadata (required, not optional), for enterprise, elderly-care,
+     retail, workplace, and municipal deployments:
+
+     ```rust
+     pub enum ConsentMode {
+         LabOnly,
+         ExplicitConsent,
+         ManagedEnterprisePolicy,
+         Disabled,
+     }
+     ```
+
+2. **`session.rs`** — deterministic event-driven session state machine:
+   `Idle → SetupNegotiating → Active → Terminating → Idle`, with explicit
+   rejection paths (unsupported parameters, setup-ID collision) and timeout
+   handling.
+3. **`transport.rs`** — a `SensingTransport` trait abstracting frame
+   exchange; a `SimTransport` test double; and an `OpportunisticCsiBridge`
+   adapter mapping today's ESP32 CSI extraction onto the report path
+   (measurement instances ≈ CSI frame batches), so current hardware sits
+   behind the standardized interface. **Replaceability benchmark
+   (acceptance test):** RuvSense must consume either ESP32 opportunistic CSI
+   or future 802.11bf chipset reports through the same `SensingTransport`
+   and `SensingMeasurementReport` path, with no consumer-side rewrite — a
+   future chipset adapter replaces `OpportunisticCsiBridge` without changing
+   consumers.
+
+Constraints: input validation at boundaries (typed errors, no panics on
+adversarial input), files under 500 lines, all protocol tests runnable
+without hardware.
+
+### Acceptance checklist
+
+| Area            | Acceptance test                                                      |
+| --------------- | -------------------------------------------------------------------- |
+| Types           | Serde round trip for setup, instance, report, SBP, termination       |
+| FSM             | Idle → setup → active → terminating → idle                           |
+| Rejection       | Unsupported bandwidth, invalid period, duplicate setup ID            |
+| Timeout         | Negotiation timeout returns typed error and resets to Idle           |
+| Threshold       | Report emitted only when threshold condition is crossed              |
+| SBP             | Proxy request maps to responder path without direct sensor coupling  |
+| Bridge          | ESP32 CSI batch becomes standardized measurement report              |
+| Safety          | No panics on malformed inputs                                        |
+| CI              | All protocol tests run without hardware                              |
+| Maintainability | Each file under 500 lines                                            |
+
+### Non-Goals
+
+This ADR does not claim IEEE 802.11bf conformance, certification, or OTA
+interoperability. It creates a typed protocol compatibility layer so RuView
+can consume standardized sensing reports when commodity silicon exposes
+them. Vendor-specific frame exchange, firmware hooks, trigger-frame
+sounding, and certification test vectors remain future ADRs.
+
+## Consequences
+
+### Positive
+- RuView can adopt standardized WLAN sensing the day any chipset exposes
+  802.11bf measurements — the data model, session FSM, and transport seam
+  already exist and are tested.
+- The `OpportunisticCsiBridge` gives current ESP32 nodes a standardized-shape
+  interface now, decoupling RuvSense consumers from the extraction mechanism.
+- Simulation transport enables protocol-level tests in CI without hardware.
+- `SpecProfile` + `SensingCapabilities` give a clean escape hatch for the
+  partial/renamed vendor capabilities that will certainly arrive first.
+- Consent/policy metadata is structural from day one, not retrofitted.
+
+### Negative
+- Code written against a standard with zero silicon risks drift: vendor
+  implementations may interpret parameters differently; the layer may need
+  rework at first real binding (drift risk scored 7/10 at acceptance).
+- Adds maintenance surface to wifi-densepose-hardware before any
+  user-visible benefit (maintenance cost scored 3/10 — small without OTA).
+
+### Neutral
+- ADR-152 §2.4's "watch item" remains: revisit when silicon/certification
+  appears (re-check by 2026-12). This ADR changes only the "no code now"
+  clause.
+
+## Links
+
+- ADR-152 — WiFi-Pose SOTA 2026 Intake (F4, §2.4 — amended by this ADR)
+- ADR-028 — ESP32 capability audit (opportunistic CSI extraction baseline)
+- ADR-029 — RuvSense multistatic sensing mode (consumer of sensing reports)
+- IEEE 802.11bf-2025 — Active Standard, board approval 2025-05-28, published
+  2025-09-26: <https://standards.ieee.org/ieee/802.11bf/11574/>
@@ -0,0 +1,242 @@
+# ADR-154: Signal/DSP Beyond-SOTA Sweep — Milestone 0 (Correctness, Provable Perf, and the SOTA Landscape)
+
+| Field | Value |
+|-------|-------|
+| **Status** | Proposed |
+| **Date** | 2026-06-11 |
+| **Deciders** | ruv |
+| **Codebase target** | `wifi-densepose-signal` (`ruvsense/`, `features.rs`, `csi_processor.rs`, `spectrogram.rs`, `bvp.rs`), benches, docs |
+| **Relates to** | ADR-134 (CIR sparse recovery), ADR-135 (Empty-Room Baseline), ADR-029/030/032 (Multistatic mesh + security), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-153 (802.11bf forward-compat) |
+| **Scope** | Milestone 0 of the beyond-SOTA signal/DSP sweep: high-leverage **correctness/security fixes**, two **measured** perf wins, the per-module SOTA landscape with evidence grades, and a prioritized roadmap. **45 review findings were explicitly deferred** (§7 backlog) — **now all addressed across Milestones 0–3** (§7.4 backlog cleared 2026-06-13); nothing was silently dropped. |
+
+---
+
+## 0. PROOF discipline (this ADR's contract)
+
+This project has been publicly accused of "AI slop." This ADR answers that with **evidence, not adjectives**:
+
+- Every claimed code improvement ships with a **committed regression test** (correctness) or a **committed criterion bench** (performance).
+- Every perf number below is **MEASURED before/after** with the exact reproduce command. A perf claim without a measured before/after is **UNPROVEN** and is not made here.
+- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **THEORETICAL**, distinguishing what a paper *measured* from what it *asserts* and from what is merely *plausible*.
+- The headline finding — a **dead CIR coherence gate that silently fell back in production for every canonical frame** — is disclosed in full (§2), not buried.
+
+Test machine for the perf numbers: Windows 11, `cargo bench --release`, criterion 0.5. Numbers are wall-clock medians on this box; they are about **ratios** (before/after), which are stable across machines, not absolute ns.
+
+---
+
+## 1. Context
+
+The RuvSense signal stack (16 `ruvsense/` modules + the classic `features.rs`/`csi_processor.rs`/`spectrogram.rs`/`bvp.rs` pipeline) grew quickly across ADR-014/029/030/134/135. A beyond-SOTA review surfaced ~50 findings ranging from two **critical correctness/security defects** to micro-optimizations and SOTA-gap research items. Milestone 0 closes the **provable, high-leverage subset**: the two criticals, a divide-by-zero trio, two measured perf wins, and the research landscape. The remaining ~45 are catalogued in §7 so the backlog is explicit and auditable.
+
+---
+
+## 2. The headline finding — the ADR-134 CIR coherence gate was DEAD in production (CRITICAL, FIXED)
+
+### 2.1 What was wrong
+
+`MultistaticFuser` fuses **canonical CSI frames**: `hardware_norm.rs` resamples every chipset onto a uniform **56-tone canonical grid** before fusion (`HardwareNormalizer`, default `canonical_subcarriers = 56`). The ADR-134 CIR coherence gate (`cir_gate_coherence`, multistatic.rs) is supposed to blend a CIR dominant-tap ratio into the cross-node coherence — `coherence = 0.7·freq + 0.3·dominant_tap_ratio`.
+
+But the gate was wired to `CirEstimator::new(CirConfig::ht20())` (`with_cir_ht20`), and `ht20()` expects **64 FFT bins or 52 active tones**. A canonical-56 frame matches *neither*, so every call returned `CirError::SubcarrierMismatch` and `cir_gate_coherence` hit its **silent `Err(_) => freq_coherence` fallback** (multistatic.rs). Net effect: **the CIR gate never ran on a single production frame** — `use_cir_gate = true` was indistinguishable from `false`. This is the exact shape of "AI slop": a feature that compiles, has tests on the *estimator*, and is dead at the *integration seam*.
+
+### 2.2 The fix (the gate now actually runs)
+
+- New `CirConfig::canonical56()` (cir.rs): 64-bin HT20 framing, **56 active tones**, 168 delay taps, Φ built over a contiguous −28..+28 active-tone grid (also the native Atheros-56 layout). `bandwidth_hz`/`tap_spacing` stay physically correct for a 20 MHz HT20 channel; only the active-tone count differs from `ht20()`.
+- New `MultistaticFuser::with_cir_canonical56()` — the **correct default** for the RuvSense pipeline. `with_cir_ht20()` is retained for genuine raw-64/52 feeds and now carries a loud doc-warning.
+- `active_indices()` handles `(64, 56)` explicitly and the fallback now selects the slice whose length matches `num_active` (so Φ's column count is always self-consistent — no silent fall-through to the 52-index slice).
+- The remaining silent fallback is made **LOUD**: a `SubcarrierMismatch` inside `cir_gate_coherence` now fires a `debug_assert!` naming the misconfiguration ("CIR gate DEAD … build it with `CirConfig::canonical56()`"). A *config* error can no longer hide as a graceful runtime degrade.
+- `cir_estimate_first()` exposes the raw `estimate()` verdict so a test can **count Ok vs Err** on a canonical-56 stream.
+
+### 2.3 The PROOF (committed regression tests, `ruvsense::multistatic::tests`)
+
+| Test | Asserts | Result |
+|------|---------|--------|
+| `cir_gate_ht20_is_dead_on_canonical56` | old ht20 estimator on 8 canonical-56 frames → **0 Ok, 8 `SubcarrierMismatch`** | the dead gate, measured |
+| `cir_gate_canonical56_is_alive` | new canonical56 estimator on the same 8 frames → **8 Ok, 0 Err** | the gate runs |
+| `cir_gate_on_changes_coherence_vs_off` | `coherence(gate on)` ≠ `coherence(gate off)` (\|Δ\| > 1e-6) | the CIR term is actually applied |
+| `cir_gate_dead_ht20_equals_gate_off` (release-only) | dead-ht20 coherence == gate-off coherence (\|Δ\| < 1e-9) | confirms the silent degradation the fix removes |
+
+**Reproduce:**
+```bash
+cd v2 && cargo test -p wifi-densepose-signal --no-default-features --lib \
+  ruvsense::multistatic::tests::cir
+# 3 passed (the 4th is #[cfg(not(debug_assertions))], add --release to run it)
+```
+
+**Resolution: FIXED** (not merely loud-fail-documented). The gate now decodes 100% of canonical-56 frames where it previously decoded 0%.
+
+---
+
+## 3. The second critical — NaN/inf adversarial-detector bypass (CRITICAL, FIXED)
+
+### 3.1 What was wrong
+
+`AdversarialDetector::check` (adversarial.rs) takes per-link `link_energies: &[f64]`. A single **NaN/inf** entry bypassed the whole detector: every `e > threshold` test is `false` on NaN, the Gini sort used `partial_cmp().unwrap_or(Equal)`, and the final `anomaly_score.clamp(0,1)` returns NaN on a NaN input. A real RF link can never have NaN/inf energy, so a non-finite input is *itself* the strongest possible spoof — yet it could slip through as "clean."
+
+### 3.2 The fix
+
+Finite-validate at the boundary: the first non-finite `link_energies` entry now **short-circuits to a definite anomaly** (`anomaly_detected = true`, `anomaly_score = 1.0`, `affected_links = [bad_idx]`, `FieldModelViolation`), and the poisoned frame is **not** seeded into the temporal-continuity state.
+
+### 3.3 The PROOF
+
+| Test | Asserts |
+|------|---------|
+| `nan_link_energy_flags_anomaly` | a NaN link energy → `anomaly_detected`, score 1.0, affected link reported, `anomaly_count == 1` |
+| `inf_link_energy_flags_anomaly` | both `+inf` and `−inf` → anomaly, score 1.0 |
+
+```bash
+cd v2 && cargo test -p wifi-densepose-signal --no-default-features --lib \
+  ruvsense::adversarial::tests::nan_link ruvsense::adversarial::tests::inf_link
+```
+
+---
+
+## 4. Divide-by-(n−1) window trio (CORRECTNESS, FIXED)
+
+Three windowing helpers divided by `(n − 1)` with no small-`n` guard:
+
+| Site | Bug | Fix |
+|------|-----|-----|
+| `csi_processor.rs` `CsiPreprocessor::hamming_window(n)` | `n=0` underflowed `0usize − 1`; `n=1` divided by 0 → all-NaN window | `match n { 0 => [], 1 => [1.0], _ => … }` |
+| `bvp.rs` Hann window | `window_size=1` divided by 0 → NaN BVP | length-1 guard → constant `[1.0]` |
+| `spectrogram.rs` `make_window` | `size=1` divided by 0 for Hann/Hamming/Blackman | `size <= 1` short-circuit → `vec![1.0; size]` |
+
+The standard convention for a length-1 window is the constant `1.0`; length-0 is empty.
+
+**PROOF:** `test_hamming_window_degenerate_sizes` (csi_processor), `bvp_window_size_one_is_finite` (bvp), `make_window_size_0_and_1_are_safe` (spectrogram) — each asserts finiteness at sizes 0/1/2.
+
+The Python deterministic proof (`archive/v1/data/proof/verify.py`) still prints **VERDICT: PASS** with the **same** pipeline hash `f8e76f21…46f7a` — the reference path uses `n ≥ 2`, so the guard is bit-transparent there.
+
+---
+
+## 5. Measured performance wins (MEASURED before/after; benches committed)
+
+Both changes are **bit-equivalent** (asserted by a committed test) — they only remove wasted work. New criterion benches in `benches/features_bench.rs` (registered in `Cargo.toml`).
+
+**Reproduce both:**
+```bash
+cd v2 && cargo bench -p wifi-densepose-signal --no-default-features --bench features_bench
+# compile-only: append --no-run
+```
+
+### 5.1 FFT-planner caching for PSD (features.rs)
+
+`PowerSpectralDensity::from_csi_data` constructed a fresh `FftPlanner` and re-planned the FFT **on every frame** — and `FeatureExtractor::extract` calls it per frame on the hot path. New `from_csi_data_with_fft(csi, fft_size, &Arc<dyn Fft>)` reuses a plan cached in `FeatureExtractor` (built once in `new()`). Output is **bit-identical** (`psd_cached_fft_bit_identical_to_fresh` compares `f64::to_bits` of values + all summary stats across 6 FFT sizes).
+
+Bench group `psd_fft_planner` — `fresh_planner` (before) vs `cached_planner` (after), per frame:
+
+| fft_size | before (fresh plan), median | after (cached), median | speedup |
+|----------|------------------------------|-------------------------|---------|
+| 64  | 5.84 µs/frame | 1.89 µs/frame | **3.09×** |
+| 128 | 9.31 µs/frame | 3.61 µs/frame | **2.58×** |
+| 256 | 13.77 µs/frame | 6.73 µs/frame | **2.04×** |
+
+Medians from criterion (warm-up 1 s, 20 samples). Raw three-point estimates (low/median/high), per frame:
+`fresh/64 [5.27, 5.84, 6.34] µs` vs `cached/64 [1.76, 1.89, 2.03] µs`;
+`fresh/256 [13.29, 13.77, 14.32] µs` vs `cached/256 [6.26, 6.73, 7.43] µs`.
+The win is the re-planned `FftPlanner` construction the cache hoists out of the per-frame loop; it grows in *relative* terms at small FFTs (planning is a larger fraction of a cheap transform) and stays a flat ~2× at 256.
+
+### 5.2 DTW Sakoe-Chiba band honored (gesture.rs)
+
+`dtw_distance` computed the band bounds `j_start/j_end` but still iterated the **full** `1..=m` row, `continue`-ing on out-of-band cells — so the band constrained the *path* but not the *work* (still O(n·m)). The fix iterates only `j_start..=j_end` (O(n·band)), resetting just the two boundary-guard cells the recurrence can read, and computes the endpoint reachability (`|n−m| ≤ band`) at the return site. Result is **bit-identical** to the full-row version across 12 shapes × 8 band widths (`dtw_banded_bit_identical_to_fullrow`).
+
+Bench group `dtw_sakoe_chiba` — `full_row` (before) vs `banded` (after):
+
+| case | before (full row), median | after (banded), median | speedup |
+|------|-----------------------------|--------------------------|---------|
+| n=m=100, band=5  | 33.45 µs | 13.77 µs | **2.43×** |
+| n=m=200, band=5  | 122.32 µs | 29.55 µs | **4.14×** |
+| n=m=200, band=10 | 159.98 µs | 60.19 µs | **2.66×** |
+
+Medians from criterion (warm-up 1 s, 20 samples). Raw (low/median/high):
+`full_row n200_band5 [107.6, 122.3, 146.5] µs` vs `banded n200_band5 [26.4, 29.5, 33.1] µs`.
+The speedup tracks the inner-loop cell-count ratio `m / (2·band+1)` — n=m=200, band=5 → 200/11 ≈ 18× fewer cells, but euclidean-distance cost and loop overhead dominate at these sizes so the wall-clock win is ~4× (still the **largest at the longest sequence / narrowest band**, exactly as the algorithm predicts). It shrinks toward 1× as the band widens to cover the whole matrix (band=10 → 2.66×), and grows with sequence length (band=5: 2.43× at n=100 → 4.14× at n=200).
+
+> **Note on the other re-plan sites.** `spectrogram.rs`/`bvp.rs` plan their FFT **once per call** and reuse it across all frames/subcarriers (already amortized), so caching there is marginal — deferred (§7). The PSD site was the only one re-planning *per frame*.
+
+---
+
+## 6. Per-module SOTA landscape (evidence-graded)
+
+Grades: **MEASURED** (the source measured it, ideally with public method/code), **CLAIMED** (asserted, no reproducible artifact), **THEORETICAL** (plausible, no published target).
+
+### 6.1 CSI → CIR (cir.rs — our ISTA/L1 sparse recovery)
+
+- **Deep-unfolded ISTA / LISTA for CSI→CIR — MEASURED.** Learned ISTA unrolling reports ~**3 dB NMSE** improvement over classical OMP/FISTA for channel/CIR estimation (arXiv [2211.15440](https://arxiv.org/abs/2211.15440); survey [2502.05952](https://arxiv.org/abs/2502.05952)). Public methods; numbers measured in-paper. **This is our #1 future item (§7) — our `cir.rs` already builds the sub-DFT Φ that LISTA would make trainable.**
+- **Diffusion CIR prior — MEASURED (artifact).** [github.com/benediktfesl/Diffusion_channel_est](https://github.com/benediktfesl/Diffusion_channel_est) ships **public weights** for a diffusion-model channel-estimation prior. Heavier than our edge budget; tracked, not adopted.
+- **Coherence gating (the §2 gate) — THEORETICAL.** Our 0.7/0.3 freq/CIR blend is an engineering heuristic with no published accuracy target; now that it *runs*, it can finally be A/B-measured.
+
+### 6.2 Adversarial robustness (adversarial.rs)
+
+- **Adversarial-robustness eval for WiFi sensing — MEASURED.** arXiv [2511.20456](https://arxiv.org/abs/2511.20456) + the **Wi-Spoof** benchmark provide a measured evaluation protocol for spoofed/injected CSI. Our detector's physical-plausibility checks (consistency/Gini/temporal/energy) are in the same spirit; adopting Wi-Spoof as an external benchmark is a §7 item. (The §3 NaN fix is a precondition: a detector that NaN-bypasses can't be benchmarked honestly.)
+
+### 6.3 Multi-AP / multistatic fusion (multistatic.rs)
+
+- **Bayesian multi-AP fusion — CLAIMED.** arXiv [2512.02462](https://arxiv.org/abs/2512.02462) proposes a Bayesian fusion across APs; **no code released**, numbers self-reported. Our attention-weighted fusion is a different (cheaper) mechanism; tracked as a comparison target, not adopted.
+
+### 6.4 RF intention-lead / pre-movement (intention.rs) — THEORETICAL
+
+The 200–500 ms pre-movement "lead signal" framing has **no published commodity-WiFi target** we can grade. Honestly THEORETICAL; no work item.
+
+---
+
+## 7. Decision, roadmap, and the deferred-findings backlog
+
+### 7.1 Accepted now (this milestone)
+
+The §2–§5 fixes are **ACCEPTED and committed**: dead CIR gate fixed, NaN bypass fixed, window trio fixed, calibration dead-branch de-misled, two measured perf wins. All `cargo test -p wifi-densepose-signal --no-default-features` (and `--features cir`) green; Python proof PASS.
+
+### 7.2 Top accepted-future item — LISTA-for-CIR (NOT implemented here)
+
+**Unroll the existing ISTA in `cir.rs` into trainable layers (LISTA).** Effort: **M**. The sensing matrix Φ and the ISTA recurrence already exist; LISTA replaces the fixed step size / threshold with per-layer learned parameters over a fixed unroll depth. Measured target to beat: **~3 dB NMSE over OMP/FISTA** (arXiv 2211.15440 — MEASURED). Proposed, not built in Milestone 0.
+
+### 7.3 Other graded-future items
+
+- Adopt **Wi-Spoof** (arXiv 2511.20456, MEASURED) as the external adversarial benchmark for `adversarial.rs`.
+- Evaluate the **diffusion CIR prior** (public weights, MEASURED) as an offline quality ceiling — *not* an edge target.
+- Bayesian multi-AP fusion (2512.02462, CLAIMED) — comparison only, pending released code.
+
+### 7.4 Deferred Milestone-0 review findings (explicit backlog)
+
+Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent, **P2** perf, **P3** clarity/style.
+
+**Milestone-1 update (2026-06-13):** the **four P1 backlog items** (#1, #9, #10, #13) are now cleared — #1 and #10 **RESOLVED (MEASURED)**, #9 and #13 **RESOLVED-PARTIAL (DATA-GATED:** de-magicked + boundary-tested, operating values unchanged**)**. Each fix is pinned by a regression test that fails on the old behaviour (commits `fd32f094a`, `4a9f2bcf4`, `d672fa602`, `5193f6369`); workspace `--no-default-features` green, Python proof unchanged (bit-exact).
+
+**Milestone-2 update (2026-06-13):** the **bench-first P2 perf subset** (#5, #6, #7, #8, #20) and the **three missing boundary tests** (#14, #16, #19) are now cleared — ~36 P2/P3 items remained deferred *(now cleared — see the Milestone-3 update)*. PROOF discipline (§0): every perf item was **benched before being touched** — committed in `benches/dsp_perf_bench.rs` (criterion, this Windows box). Only **#20** proved hot and was optimized; **#5/#6/#7** are committed **MEASURED-NULLs** (benched, not hot, left as-is for clarity — exactly the §5.1 "already amortized" pattern); **#8** is **MEASUREMENT-ONLY** but its `eigenvalue`/BLAS backend won't build on this Windows host, so its µs cost must come from a Linux/BLAS box (recorded, not fabricated). Commits `e839fa8f1` (#20 fix), `02e5dd13a` (#14/#16/#19 tests), `aad9464f0` (benches). Workspace `--no-default-features` green; Python proof unchanged (#20 is bit-identical, off the proof path).
+
+**Milestone-3 update (2026-06-13):** the lumped **row #21–45** P3 backlog — *"remaining clarity/doc/magic-constant/missing-boundary-test findings across `ruvsense/*`, `features.rs`, `motion.rs`"* — is now **cleared, and with it the residual P3 items #2/#12/#17/#18.** Honest enumeration first (`grep`, not the ADR's "21–45" estimate — that was a count, not 25 distinct findings): after M0–M2 the genuinely-bare in-function literals resolved to **22 de-magicked constants across 11 modules** (each → a named, documented **EMPIRICAL-DEFAULT** const that **equals the prior literal exactly**), **6 added boundary/characterization tests**, **~4 doc-only fixes** (no-behaviour-change), and **a handful of agent-flagged "findings" that were NOT real** and are reported as skipped (below). **No operating value or behaviour changed** — every module carries a `*_consts_unchanged_from_literals` pin test and every boundary test pins *current* behaviour, so a future retune is a visible, tested change. Resolution by module: `motion.rs` (**#18** — fusion weights / Doppler+variance+phase scales / confidence weights / adaptive-threshold clamp; 5 tests), `gesture.rs` (**#12** — `euclidean_distance` length-mismatch `debug_assert` documenting the silent-`zip`-truncation caller contract, behaviour-preserving in release; + confidence epsilon; + DTW n=0/m=0 boundary), `longitudinal.rs` (7-day/2σ/3-day/7-day drift thresholds + EMA-α + cosine epsilon; day-6/7 + zero-vector boundaries; the duplicated `>=7` deduped), `cross_room.rs`/`multiband.rs`/`intention.rs`/`hampel.rs` (**#17** — division-guard epsilons `1e-9`/`1e-12`/`1e-10`/`1e-15` + zero-norm/zero-variance/zero-MAD boundaries + the previously-untested `hampel half_window==0` error path + `# Errors` doc), `rf_slam.rs` (`NS_PER_DAY` + `MIGRATION_MIN_SPAN_DAYS` + fixed-map defaults; single-sighting zero-span guard), `attractor_drift.rs` (`METRIC_BUFFER_CAPACITY`/`STABLE_CENTER_WINDOW`; **documented** the implicit `recent.len()>=1` divide-safety; `min_observations` off-by-one boundary), `coherence.rs` (**#9 completion** — the residual bare `1e-6` variance-floor ×4 + default `0.95` decay; floor-effect test), `calibration.rs` (**#2 completion** — `DEFAULT_MIN_FRAMES` deduped across all 4 tier constructors + `AMP_STD_FLOOR`/`MOTION_AMP_Z_THRESHOLD`/`MOTION_PHASE_DRIFT_THRESHOLD`/`SUBTRACT_MIN_NORM`), `fusion_quality.rs` (`CONTRADICTION_PENALTY` 0.8 / bound-halfwidth 0.1; n=0 identity boundary), `temporal_gesture.rs` (confidence epsilon + L2-norm quantization scale). **NOT-REAL / skipped (reported honestly, no churn manufactured):** an agent-flagged `attractor_drift.rs:301` "divide-by-zero" is **unreachable** — the `count < min_observations` guard guarantees `recent.len()>=1` before the `PointAttractor` branch (documented + boundary-tested, **not** guarded, per the no-behaviour-change rule); agent-flagged `gesture.rs` `2.0`/`π·6` motion thresholds **do not exist** in that file (a confusion with `calibration.rs::deviation`); **`features.rs` was deliberately left untouched** (it is on the deterministic Python-proof PSD/Doppler path — its `1e-10` guards already exist and are already correct; doc-only-skipped to protect the bit-exact hash). Commits `c794d1a0c` (motion #18), `adf9ed8e4` (gesture #12), `19f5b6335` (longitudinal), `19e0373c8` (epsilon helpers #17), `c6a09b69a` (rf_slam + attractor_drift), `5a1839f33` (coherence #9 completion), `df25a303e` (calibration #2 completion), `0f931ff2f` (fusion_quality + temporal_gesture). Signal crate lib `--no-default-features` **476 passed / 0 failed / 1 ignored**; `--no-default-features --features cir` **476 / 0**; workspace `--no-default-features` **3,275 / 0 failed** (single clean run); Python proof **VERDICT: PASS**, hash `f8e76f21…46f7a` **UNCHANGED (bit-exact)**. **§7.4 backlog is now fully cleared — ADR-154's deferred findings are addressed across M0–M3 with nothing silently dropped.**
+
+| # | Module | Finding | Pri | Why deferred |
+|---|--------|---------|-----|--------------|
+| 1 | cir.rs ~937 | `phase_variance` uses **linear** variance on **wrapped** angles (doc says "variance of phase angles") — spuriously inflates near ±π | P1 | **RESOLVED (`fd32f094a`) — metric MEASURED, threshold DATA-GATED.** Replaced with Mardia's circular variance V = 1 − R̄ ∈ **[0,1]**, invariant to the cluster's position on the circle (branch-cut artefact gone). Guard re-derived against the bounded metric via named const `GHOST_TAP_CIRCULAR_VARIANCE_MAX = 0.99` (fires only when R̄ ≤ 0.01 — essentially uniform phase). The **threshold value is DATA-GATED**: a clean single-path ramp also sweeps the circle, so V alone can't separate clean from unsanitized without labelled frames — the default is deliberately conservative (strictly more permissive at the wrap boundary than the buggy linear guard). Fails-on-old: `phase_variance_circular_not_fooled_by_branch_cut` (old linear variance > TAU on wrap-straddling phases while circular V≈0, guard no longer trips), `phase_variance_circular_is_bounded_and_extremal`. |
+| 2 | calibration.rs ~311 | `subtract_in_place` had a vacuous `if active_input {ki} else {ki}` branch implying a full-FFT→bin remap that didn't exist | P3 | **Resolved (M0 + M3 `df25a303e`).** Branch removed in M0 (sequential-convention documented). M3 completed the de-magic: `DEFAULT_MIN_FRAMES=600` deduped across all four tier constructors, plus `AMP_STD_FLOOR`/`MOTION_AMP_Z_THRESHOLD`/`MOTION_PHASE_DRIFT_THRESHOLD`/`SUBTRACT_MIN_NORM` named + `calibration_consts_unchanged_from_literals`. Behaviour unchanged. |
+| 3 | spectrogram.rs / bvp.rs | FFT planner built once-per-call (already amortized across frames) | P2 | Marginal vs the per-frame PSD site; cache if these become hot. |
+| 4 | features.rs ~347 | Doppler FFT planner planned once per call, reused across subcarriers | P2 | Already amortized within the call. |
+| 5 | multistatic.rs | `node_attention_weights` recomputes consensus/softmax each call; no SIMD | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** `multistatic_attention/weights`: **181 ns** (2 nodes) … **848 ns** (8 nodes) @ 56 subcarriers — sub-µs, no hot-path allocation. A precompute/SIMD rewrite buys nothing measurable at the realistic 2–8 node fan-in; the cosine/softmax cost is dwarfed by the surrounding fusion + per-frame FFT. Bench `multistatic_attention` in `dsp_perf_bench.rs`. |
+| 6 | tomography.rs | ISTA L1 solver re-allocates voxel buffers per solve | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** A full 50-iteration `reconstruct` (256 voxels): **47.5 µs** (16 links) / **60.4 µs** (32 links). The two voxel buffers (`x`, `gradient`; ~4 KB) are already allocated *once* per `reconstruct()` and `.fill`-reused across iterations — the per-solve alloc is a negligible fraction of the O(iters·links·voxels) inner product. Reusing scratch across *calls* would force `reconstruct(&self)`→`&mut self` (API break) for no measurable gain. Bench `tomography_reconstruct`. |
+| 7 | pose_tracker.rs | Kalman gain matrices reallocated per update | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** A Kalman predict+update cycle: **150 ns** (17 keypoints) / **2.82 µs** (170). The "gain matrices" (`s:[f32;3]`, `k:[[f32;3];6]`) are fixed-size **stack** arrays, *not* heap — there is no per-update allocation to reuse; the compiler keeps them in registers/stack. Bench `pose_kalman_update`. |
+| 8 | field_model.rs | SVD recomputed on every perturbation extract | P2 | **MEASUREMENT-ONLY (`aad9464f0`) — BLAS-gated, not measurable on this host.** Correction: `extract_perturbation` does **not** recompute the SVD — it projects against the cached `modes` from `finalize_calibration`. The real per-call eigendecomposition is in the `eigenvalue`-feature `estimate_occupancy` (`cov.eigh()` on a 56×56 covariance, an O(n³)≈175k-flop symmetric eigensolve + O(n²·frames) covariance build, run per call). The bench (`dsp_perf_bench`'s `eig` module) is committed, but `openblas-src` **fails to build on this Windows box** ("Non-vcpkg builds are not supported on Windows" — the very reason the project gate runs `--no-default-features`), so a measured µs number must come from a Linux/BLAS host; **not estimated/fabricated here.** Incremental SVD remains a sized future project, not a micro-fix. |
+| 9 | coherence.rs / coherence_gate.rs | Z-score thresholds are magic constants, untested at boundaries | P1 | **RESOLVED-PARTIAL (`5193f6369`) — DATA-GATED.** De-magicked `classify_drift` (`DRIFT_STABLE_SCORE=0.85`, `DRIFT_STEP_CHANGE_MAX_STALE=10`) and the `coherence_gate.rs` defaults (`DEFAULT_ACCEPT_THRESHOLD`/`…REJECT…`/`…MAX_STALE_FRAMES`/`…PREDICT_ONLY_NOISE`) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (`classify_drift_*_boundary`) + `*_consts_unchanged_from_literals`. **Operating values explicitly NOT changed** — defensible values still need labelled stable/drifting traces. The gate already exposed these via `GatePolicyConfig` (config seam). |
+| 10 | longitudinal.rs | Welford update not numerically guarded for n=0 | P1 | **RESOLVED (`4a9f2bcf4`) — MEASURED.** The shared `WelfordStats` (`field_model.rs`, consumed by longitudinal.rs) `count < 2` guards already prevent the n=0 NaN / n=1 div0 / `(count−1)` underflow, but the boundary was untested. Added `welford_finite_at_n0_and_n1` (finite + documented 0.0 sentinel at n=0/n=1). Fails-on-old proof: removing the `sample_variance` guard makes the test panic with "attempt to subtract with overflow" at the `(count − 1)` underflow. |
+| 11 | cross_room.rs | Fingerprint hash collisions unhandled | P2 | Low collision prob; needs design. |
+| 12 | gesture.rs | `euclidean_distance` no length-mismatch guard | P3 | **RESOLVED (M3 `adf9ed8e4`).** Added a `debug_assert_eq!` on the two slice lengths + a doc block stating the same-`feature_dim` caller contract and that `zip()` silently truncates on a mismatch. Behaviour-preserving (no-op in release, the operating path). Also de-magicked the confidence `1e-10` epsilon and pinned the DTW `n=0`/`m=0` boundary (`dtw_empty_sequence_is_infinite`). |
+| 13 | adversarial.rs | Gini/consistency thresholds are magic constants | P1 | **RESOLVED-PARTIAL (`d672fa602`) — DATA-GATED.** Lifted the bare literals in `check`/`check_consistency` (`FIELD_MODEL_GINI_VIOLATION=0.8`, `ENERGY_RATIO_HIGH_VIOLATION=2.0`, `ENERGY_RATIO_LOW_VIOLATION=0.1`, `CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1`, `SCORE_W_*`) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (`energy_ratio_high_boundary`, `energy_ratio_low_boundary`, `field_model_gini_boundary`, `consistency_active_fraction_boundary`) + `tuning_consts_unchanged_from_literals`. **Operating values explicitly NOT changed** — defensible values still need labelled spoofed/clean CSI (Wi-Spoof, §6.2/§7.3). Bumping a const fails a boundary test (verified). |
+| 14 | cir.rs | `fft_operator` path changes the witness hash (documented) — no test that it's *numerically close* to dense | P2 | **RESOLVED (`02e5dd13a`) — tolerance test added.** `fft_operator_within_tolerance_of_dense_canonical56` pins the **full `Cir` output** of the FFT path within a *documented* relative tolerance of the dense path on the production **canonical-56** config across τ ∈ {20,50,90} ns: every tap within `1e-2·|dominant|`, identical `dominant_tap_idx`, `active_tap_count`, `ranging_valid`, `dominant_tap_ratio` within `1e-2`, `rms_delay_spread` within `1e-2` rel. A regression that lets the FFT path drift (scaling/Φ-column bug) now fails here instead of silently corrupting a downstream witness. Extends the existing HT20/single-τ `fft_estimate_matches_dense_dominant_tap`. |
+| 15 | multistatic.rs | `cir_gate_coherence` only estimates the **first** node/channel; multi-node CIR consensus unused | P2 | Design item (which node's CIR is authoritative?). |
+| 16 | phase_align.rs | Iterative LO offset estimation has no convergence cap test | P2 | **RESOLVED (`02e5dd13a`) — cap test added.** `refinement_terminates_at_iteration_cap_when_not_converging` forces non-convergence (`tolerance = 0.0`, unreachable since `max_update ≥ 0`) and asserts the loop runs **exactly `max_iterations`** then returns — proving the cap (not convergence) bounds the loop, so a non-converging input can never spin forever. Companion `refinement_converges_before_cap_on_easy_input` proves the cap is an upper bound, not the only exit. Internal-only refactor: `estimate_phase_offsets` still returns the identical offset vector; a `…_counted` core surfaces the iteration count for the test. |
+| 17 | hampel.rs | Window edge handling at series boundaries | P3 | **RESOLVED (M3 `19e0373c8`).** De-magicked the zero-MAD `1e-15` epsilon (`ZERO_MAD_EPSILON`), documented `hampel_filter`'s `# Errors`, and added the previously-untested `half_window == 0` error-path boundary (`test_zero_half_window_error`) + a zero-MAD constant-window characterization (`test_zero_mad_constant_window`). Window-edge handling itself is correct (`saturating_sub`/`.min(n)`); it is now pinned. |
+| 18 | motion.rs | Threshold constants undocumented | P3 | **RESOLVED (M3 `c794d1a0c`).** Lifted the fusion weights, Doppler/variance/phase full-scale divisors, confidence-indicator weights, and adaptive-threshold clamp into named, documented EMPIRICAL-DEFAULT consts (`motion_tuning_consts_unchanged_from_literals` pins them) + small-`n` boundary tests (correlation `n<2`, temporal-variance `len<2`, adaptive-threshold history 9-vs-10, Doppler full-scale saturation). Doc-only-plus: values unchanged. |
+| 19 | csi_ratio.rs | Division guard relies on `1e-12` epsilon; no test | P2 | **RESOLVED (`02e5dd13a`) — boundary test added.** Finding clarification: `csi_ratio.rs` implements the CSI *ratio model* as the **conjugate product** `H_i·conj(H_j)` (SpotFi/IndoTrack) — there is **no division**, hence no literal `1e-12` epsilon; the classic `H_i/H_j` ratio (which a `1e-12` guard protects) is deliberately avoided. `ratio_finite_at_and_below_1e_12_epsilon` pins the property the finding cares about: at and below the `1e-12` target magnitude (and at exact zero — where a division ratio is ±inf/NaN) the conjugate-product output is **finite**, exactly the conjugate product (bit-exact), collapses toward zero (the physically correct "no path" answer), and stays finite through `ratio_to_amplitude_phase`. |
+| 20 | spectrogram.rs | `compute_multi_subcarrier_spectrogram` re-plans per subcarrier via `compute_spectrogram` | P2 | **MEASURED-HOT (`e839fa8f1`) — optimized, bit-identical.** Hoisted the FFT plan + window out of the per-subcarrier loop (new `compute_spectrogram_with_plan` core). **56-subcarrier** multi-spectrogram: **467.88 µs → 254.75 µs = 1.84×** (window 128); **627.27 µs → 448.39 µs = 1.40×** (window 256). The removed cost is the per-subcarrier `FftPlanner` re-plan (~1.86 µs/plan @ w128 × 56). Bit-identical (`multi_subcarrier_hoisted_plan_bit_identical`, `f64::to_bits` across all 4 windows × {power,magnitude}). The most likely real win predicted by the §7.4 intro — confirmed. (Relates to #3, which stays deferred: `spectrogram.rs`/`bvp.rs` single-signal callers already plan once-per-call.) |
+| 21–45 | (assorted) | Remaining clarity/doc/magic-constant/missing-boundary-test findings across `ruvsense/*`, `features.rs`, `motion.rs` | P3 | **RESOLVED (Milestone-3, 2026-06-13).** Enumerated honestly (the "21–45" was an estimate, not 25 distinct findings): **22 bare in-function literals de-magicked → named EMPIRICAL-DEFAULT consts (each == prior literal, pinned)**, **6 boundary/characterization tests added**, **~4 doc-only fixes**, across 11 modules (`motion`, `gesture`, `longitudinal`, `cross_room`, `multiband`, `intention`, `hampel`, `rf_slam`, `attractor_drift`, `coherence`, `calibration`, `fusion_quality`, `temporal_gesture`). **No operating value changed.** **Skipped-as-not-real (reported, no churn):** `attractor_drift.rs:301` "divide-by-zero" is unreachable (guarded by `count < min_observations`) → documented + boundary-tested, not guarded; agent-flagged `gesture.rs` `2.0`/`π·6` motion thresholds don't exist there (confusion with `calibration::deviation`); **`features.rs` left untouched** (on the deterministic Python-proof path; its `1e-10` guards already exist & are correct — doc-only-skipped to keep the `f8e76f21…` hash bit-exact). See the Milestone-3 update note above and the per-row #2/#12/#17/#18 entries. |
+
+> **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n−1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). **Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged).** **Milestone-2 DONE (2026-06-13): bench-first P2 perf subset + missing boundary tests cleared — spectrogram per-subcarrier FFT re-plan #20 (MEASURED-HOT, 1.40–1.84×, bit-identical); attention/tomography/Kalman #5/#6/#7 (MEASURED-NULL — benched, not hot, left as-is); field_model eigendecompose #8 (MEASUREMENT-ONLY, BLAS un-buildable on this Windows host, number deferred to a BLAS box, NOT fabricated); fft_operator tolerance #14, phase-align convergence-cap #16, csi-ratio epsilon #19 (RESOLVED, tests added).** **Milestone-3 DONE (2026-06-13): the lumped §7.4 row #21–45 P3 backlog cleared, and with it residual P3 items #2/#12/#17/#18 — 22 magic constants de-magicked into named EMPIRICAL-DEFAULT consts (each pinned == prior literal) + 6 boundary/characterization tests across 11 modules; ~4 doc-only; not-real findings (unreachable attractor_drift div0, non-existent gesture thresholds, proof-path features.rs) reported + skipped, no churn; no operating value changed; workspace 3,275/0, Python proof bit-exact `f8e76f21…`.** **§7.4 deferred backlog is now FULLY CLEARED across M0–M3 — nothing silently dropped.**
+
+> **Sibling-crate sweep extension (2026-06-14) — `wifi-densepose-geo` + `wifi-densepose-pointcloud`.** The ADR-154-class numerical-robustness sweep (non-finite-input-poisons-persistent-state + divide-by-zero / asin-domain / degenerate-geometry) was extended to two crates *outside* this ADR's signal scope. **Two real `geo` bugs FIXED, each fails-on-old-pinned:** `terrain.rs::parse_hgt` usize-underflow panic on empty/sub-2x2 SRTM data (`1.0/(side-1)` → panic in debug / inf `cell_size_deg` poisoning `ElevationGrid::get` in release — a truncated download / 404 HTML body reaches it; now `bail!`s when `side < 2`); `coord.rs::haversine` `asin(>1)→NaN` for near-antipodal points (`h` rounds to `1.0+4e-16`; clamped to `[0,1]`). The ±90° pole `cos(lat)=0` ENU singularity is pinned no-panic without changing the transform. **`pointcloud` is confirmed-robust (no manufactured finding):** its only persistent auto-accumulating state (`occupancy` EMA + vitals) is fed solely by the integer-rssi/`sqrt`/`atan2` parser (always finite) and is provably self-healing even under an adversarial NaN/inf `CsiFrame` (`motion_score=(NaN/100).min(1.0)→1.0`; breathing `→0→clamp(5,40)→5.0`) — pinned by `nonfinite_frame_does_not_poison_persistent_state` + degenerate-voxel-fusion no-panic tests. `geo` 9→15 lib / 8 integration; `pointcloud` 18→22; 0 failed; workspace green; Python proof bit-exact `f8e76f21…`. See CHANGELOG `[Unreleased] → Fixed`.
+
+---
+
+## 8. Consequences
+
+- **Positive:** the ADR-134 CIR gate is alive for the first time in production; the adversarial detector can no longer be NaN-bypassed; three latent divide-by-zero NaN sources are gone; the per-frame PSD path and gesture DTW are measurably faster with bit-identical output; the SOTA landscape and a concrete LISTA-for-CIR roadmap are graded and recorded.
+- **Negative / honest limits:** `canonical56()` models the canonical grid as a contiguous 56-tone band — a reasonable physical interpretation of a *resampled* grid, but not a literal hardware tone map; the CIR gate still uses only the first node's CIR (#15). The `phase_variance` **metric** is now correct (Mardia circular variance, Milestone-1 #1), so the branch-cut false-trip is gone — but its ghost-tap **threshold** (`GHOST_TAP_CIRCULAR_VARIANCE_MAX = 0.99`) is a conservative DATA-GATED default, not a calibrated operating point, and still awaits labelled sanitized/unsanitized frames to tune. Likewise the de-magicked coherence/adversarial thresholds (#9/#13) keep their pre-existing empirical values pending labelled calibration.
+- **Neutral:** no public API removed; `with_cir_ht20()` kept (warned); files stay scoped; new bench is additive.
@@ -0,0 +1,259 @@
+# ADR-155: NN / Training Beyond-SOTA Sweep — Milestone 1 (Claim Integrity, Honest Validation, the Unified Metric, and the SOTA Landscape)
+
+| Field | Value |
+|-------|-------|
+| **Status** | Proposed |
+| **Date** | 2026-06-11 |
+| **Deciders** | ruv |
+| **Codebase target** | `wifi-densepose-train` (`metrics.rs`, `dataset.rs`, `proof.rs`, `rapid_adapt.rs`, `ruview_metrics.rs`, `config.rs`, `ablation.rs`, `subcarrier.rs`, `bin/train.rs`, `bin/verify_training.rs`), `wifi-densepose-nn` (`tensor.rs`, `translator.rs`, `onnx.rs`), benches, docs |
+| **Relates to** | ADR-154 (Signal/DSP sweep, Milestone 0), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-150 (RF Foundation Encoder), ADR-079 (Camera-Supervised Pose), ADR-027 (MERIDIAN), ADR-024 (AETHER) |
+| **Scope** | Milestone 1 of the beyond-SOTA NN/training sweep: the **integrity-critical** fixes that let the training/metrics subsystem substantiate a clean accuracy claim (the unified metric, leak-free validation, honest TTA, rigorous proof), a focused set of **correctness/security** fixes, two **measured** perf wins, the NN SOTA landscape with evidence grades, and a prioritized backlog. **~45 review findings are explicitly deferred (§8)** — nothing is silently dropped. |
+
+---
+
+## 0. PROOF discipline (this ADR's contract)
+
+This project has been publicly accused of "AI slop." Milestone 1 is the **most integrity-critical** of the sweep because a gap review found the training/metrics subsystem **could not substantiate a clean accuracy claim**: there were four divergent PCK implementations and three divergent OKS implementations, a model trained on real data was validated against a *synthetic* set, the dataset had no leak-free split, the test-time-adaptation path descended a *fake* gradient, and the deterministic proof self-certified on any loss decrease (including float noise) with no committed baseline.
+
+We answer that with **evidence, not adjectives**:
+
+- Every integrity fix ships with a **committed regression test that would have caught the bug**.
+- Every perf number is **MEASURED before/after** with the exact reproduce command. A perf claim without a measured before/after is **UNPROVEN** and is not made here.
+- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **THEORETICAL**.
+- We disclose, in full, what the proof does **not** prove and what remains unmeasured.
+
+### Build/test constraint (disclosed)
+
+The reportable-metric code (`metrics.rs`, `trainer.rs`, `proof.rs`, `model.rs`, `losses.rs`) is gated behind the `tch-backend` Cargo feature (libtorch FFI). libtorch is **not installed on the development host**, so the project's standard gate is `cargo test --workspace --no-default-features` (no tch). The canonical-metric *logic* is therefore validated two ways: (1) the non-tch reachable surface (`compute_pck`/`compute_oks` free functions, `dataset.rs` split, `rapid_adapt.rs`, `ruview_metrics.rs`) runs under the workspace test suite with new regression tests; (2) the `tch`-gated accumulator/trainer/proof changes are routed through those same canonical functions, so the metric definition is identical whether or not tch is present. This limitation is disclosed rather than hidden.
+
+---
+
+## 1. Context — the seven divergent metric definitions
+
+The gap review found **four** PCK and **three** OKS implementations that disagreed on normalization, on the zero-visible-joint case, and on the OKS scale:
+
+| # | Location | Normalizer | Zero-visible PCK | OKS scale |
+|---|----------|-----------|------------------|-----------|
+| PCK-1 | `metrics.rs` `MetricsAccumulator` (the trainer's) | bbox **diagonal** | **1.0** (false-perfect bug) | normalized-coord diag² |
+| PCK-2 | `metrics.rs` `compute_pck` | torso **hip↔shoulder** | 0.0 | — |
+| PCK-3 | `metrics.rs` `compute_pck_v2` | torso **hip↔hip** (pixel) | 0.0 | — |
+| PCK-4 | `training_bench.rs` | **raw threshold** (no torso) | 0.0 | — |
+| OKS-1 | `metrics.rs:443` `compute_oks` | — | — | caller `s` (`1.0` ⇒ fake Gold) |
+| OKS-2 | `metrics.rs:994` `compute_oks_v2` | — | — | `sqrt(area)` (could be 0) |
+| OKS-3 | `ruview_metrics.rs:642` | — | — | caller `s` (`1.0` ⇒ fake Gold) |
+
+Two of these are not merely inconsistent, they are **wrong in a claim-inflating direction**:
+
+- **The `MetricsAccumulator` zero-visible-joint bug** scored a sample with *no visible joints* as PCK = 1.0 ("no errors to measure"). An empty or garbage prediction could thus *inflate* the reported metric.
+- **The OKS `s = 1.0`-on-normalized-coordinates bug** ("fake Gold tier"): with keypoints in `[0,1]` and the scale fixed at `1.0`, every squared distance is ≈0 and the exponential kernel returns ≈1.0 for *any* pose. OKS looked near-perfect regardless of prediction quality.
+
+This is the same metric-bug class ADR-152 flagged. Milestone 1 closes it for real.
+
+---
+
+## 2. Decision — TIER 1: CLAIM INTEGRITY (the "prove everything" core)
+
+### 2.1 Unify the metrics — ONE canonical definition — ACCEPTED & IMPLEMENTED
+
+There is now exactly **one** PCK and one OKS that may be used for any *reported* number, in the `canonical` region of `metrics.rs`:
+
+- **`pck_canonical(pred, gt, vis, k)` — torso-normalized PCK@k.** A keypoint `j` is correct iff `‖pred_j − gt_j‖₂ ≤ k · torso`, where `torso = ‖left_hip(11) − right_hip(12)‖₂` in the keypoint coordinate space, with a **bounding-box-diagonal fallback** when the hips are not both visible. This is the COCO / ADR-152 convention validated in `benchmarks/wiflow-std/RESULTS.md` (the ~96% PCK@20 reproduction — hip↔hip torso, COCO Setting). **Zero visible joints ⇒ `(0, 0, 0.0)`** — a sample with no measurable evidence scores 0, never 1.
+- **`oks_canonical(pred, gt, vis)` — COCO OKS.** `s = sqrt(area)` is derived from the **GT pose extent** (the canonical torso size as a robust, always-positive scale proxy), never a fixed `1.0`. There is no escape hatch that makes OKS ≈ 1.0 for any pose; a degenerate (zero-extent) pose returns 0.0.
+
+**Single source of truth, enforced.** `MetricsAccumulator::update` (the trainer's), `compute_pck`, `compute_per_joint_pck`, `compute_oks`, `aggregate_metrics`, and the deprecated `compute_pck_v2`/`compute_oks_v2`/`MetricsAccumulatorV2` **all route through** `pck_canonical`/`oks_canonical`. So `Trainer::evaluate()` → `MetricsAccumulator` → canonical; the WiFlow-STD bench definition (RESULTS.md) is the reference the canonical *matches*. `eval.rs` reports MPJPE (a distinct, non-divergent error metric, unchanged). The `v2` functions and the `training_bench.rs` raw-threshold kernel are annotated **`#[deprecated]` / "DO NOT USE for reported metrics"**.
+
+**The two claim-inflating bugs are fixed and pinned by regression tests:**
+
+- `canonical_pck_zero_visible_is_zero_not_one` — no-visible ⇒ PCK 0.0 (was 1.0).
+- `canonical_oks_not_one_for_wrong_pose_on_normalized_coords` — a pose off by 3× the torso on `[0,1]` coords yields OKS < 0.2 (the old `s=1.0` path returned ≈1.0).
+- `canonical_pck_uses_hip_to_hip_torso`, `canonical_torso_falls_back_to_bbox_when_hips_hidden` — pin the normalizer.
+- `all_invisible_gives_zero_pck` (renamed from `all_invisible_gives_trivial_pck`, comment cites this ADR) — the trainer accumulator now scores no-visible as 0.
+
+**Legitimately changed test expectations** (each updated with a comment citing this finding): the historical "perfect on an all-coincident pose" fixtures used keypoints at a single point, which is *correctly unscoreable* under canonical (zero extent ⇒ no scale). Test fixtures were given a real ±0.05 hip span so the canonical normalizer is positive; `all_invisible_*` flipped from 1.0 → 0.0.
+
+### 2.2 Honest validation — leak-free split + synthetic-val disclosure — ACCEPTED & IMPLEMENTED
+
+**The leak.** MM-Fi windows are extracted with **stride 1** (`MmFiEntry::num_windows = num_frames − window_frames + 1`), so adjacent windows overlap by `window_frames − 1` frames (~99% at the default 100-frame window). And `bin/train.rs` validated a *real* MM-Fi training run against a **synthetic** val set "for pipeline verification" — any PCK it printed was meaningless on two counts.
+
+**The fix (mirroring the leak-free discipline of `occupancy_bench::EvalSplit`):**
+
+- `MmFiDataset::subject_disjoint_split(test_subject_fraction, seed) → (train_view, test_view)` partitions **whole subjects** to one side. Because every window of a subject travels with that subject, the two views share **no subject and no window** — leak-free by construction, deterministic per seed. Returns `DatasetError::InvalidSplit` on <2 subjects, bad fraction, or an empty side.
+- `assert_split_leak_free(train, test)` independently verifies subject-disjointness **and** window-index-disjointness, and is called inside the split so a leaky split can never be handed out.
+- `bin/train.rs` now **prefers the real split**; the synthetic path is reachable only as a labelled fallback (single-subject data) and is routed through a new `run_smoke_test` that prefixes every metric `[SMOKE-TEST] (DO NOT REPORT)`. `--dry-run` is likewise relabelled. A synthetic-val PCK can no longer be mistaken for a measurement.
+
+**Leak-free proof (tests):** `subject_split_is_subject_and_window_disjoint` (no shared subject, no shared window index, partition covers every window once), `subject_split_is_deterministic_for_seed`, `subject_split_rejects_single_subject`, `subject_split_rejects_bad_fraction`, `assert_leak_free_detects_injected_subject_leak` (the validator catches a deliberately-injected subject overlap — a guard against future partitioner bugs).
+
+### 2.3 rapid_adapt honesty — real gradients, scoped claim — ACCEPTED & IMPLEMENTED
+
+`rapid_adapt.rs`'s `contrastive_step`/`entropy_step` wrote a **fake gradient** (`grad += v * 0.01`) unrelated to the stated triplet / entropy objective — so any "TTA improves the metric" was unsupported by the code.
+
+**Resolution: real gradients (not removal).** The two `*_loss` functions are now **pure evaluators** of the real objective; `RapidAdaptation::adapt` descends them with a **central finite-difference gradient** of that exact loss (`∂L/∂wᵢ ≈ (L(w+εeᵢ) − L(w−εeᵢ))/2ε`). Finite differences genuinely minimize the stated objective (to O(ε²) truncation), so "the adaptation loss decreases" is now a **real, reproducible** measurement rather than an artefact of a hand-tuned step. The returned `final_loss` is the *actual* objective at the produced weights.
+
+**Honest scope caveat (recorded in the module and here):** this minimizes a *self-supervised proxy* (temporal-contrastive + prediction entropy) over a tiny LoRA bottleneck on raw CSI. It is **NOT** wired to the pose model, and **there is no measured end-to-end PCK gain on WiFi pose from this path.** TTA-on-pose is a future, **not-yet-measured** capability — no PCK improvement may be cited from this module.
+
+**Tests:** `contrastive_loss_decreases` and `entropy_loss_decreases` (20/30 real gradient steps do not increase the loss vs 0 steps), `reported_loss_is_the_real_objective_not_a_placeholder` (the returned `final_loss` equals an independent recomputation of the objective at the output weights — i.e. it is the real loss, not a fabricated number).
+
+### 2.4 proof.rs rigor — margin + committed-hash requirement — ACCEPTED & IMPLEMENTED
+
+The deterministic proof self-certified: `generate_expected_hash` blessed whatever the pipeline emitted, PASS counted *any* loss decrease (including 1e-9 float noise), and a *missing* expected hash defaulted to PASS.
+
+**Two hardenings:**
+
+1. **Minimum-decrease margin.** `MIN_LOSS_DECREASE = 1e-4`. A run counts as "learning" only when `initial − final ≥ MIN_LOSS_DECREASE` — well above float noise, far below a real step's decrease. A pipeline that only wanders by noise now **FAILS**.
+2. **No-hash is a SKIP, never a PASS.** `ProofResult::is_pass()` requires `hash_matches == Some(true)` (a *committed* `expected_proof.sha256`). An absent baseline yields SKIP (exit 2). The `verify-training` binary additionally **fails fast** on a sub-margin loss *before* the hash comparison, so a missing baseline can never downgrade a non-learning pipeline to SKIP.
+
+**What this proves — and what it does NOT (disclosed):** the proof certifies **reproducibility and determinism** (same seed ⇒ same weights ⇒ same hash) and that the optimiser *measurably* reduces a loss. It runs on a deterministic *synthetic* dataset by construction, so it does **not** prove the shipped weights came from real MM-Fi data, nor that any accuracy claim is met. Accuracy is substantiated separately (`benchmarks/wiflow-std/RESULTS.md`). There is currently **no committed `expected_proof.sha256` for the Rust proof**, so it is honestly in the SKIP state until a baseline is committed on a libtorch-enabled host — and SKIP is now reported as SKIP, not green.
+
+**Tests:** `no_committed_hash_is_skip_not_pass`, `submargin_loss_change_fails_even_without_hash`, `committed_matching_hash_with_real_decrease_passes`.
+
+---
+
+## 3. Decision — TIER 2: CORRECTNESS / SECURITY
+
+Each fix ships a test that would have caught the bug (all in the non-tch, workspace-tested surface).
+
+| Finding | File | Fix | Test |
+|---------|------|-----|------|
+| `softmax(axis)` ignored the axis (whole-tensor normalize — breaks densepose per-pixel probs) | `nn/tensor.rs` | softmax along the given axis per lane; out-of-range axis ⇒ `NnError` (no panic) | (tier-2 suite) |
+| `apply_attention` identity/uniform stub (any "with attention" ablation == without) | `nn/translator.rs` | **implemented real single-head scaled-dot-product attention** (`softmax(QKᵀ/√d)V` with Q/K/V/output projections); mis-shaped checkpoint projections rejected so a bad checkpoint can't silently become a no-op | `test_attention_is_not_uniform_stub`, `test_attention_rejects_wrong_weight_shape` |
+| `config.validate()` had no UPPER bounds (config-OOM class still open) | `train/config.rs` | upper bounds on `window_frames`/subcarriers/`backbone_channels`/`heatmap_size`/keypoints/parts/`batch_size`; reject negative `gpu_device_id` | rejection tests; defaults+presets still validate |
+| `subcarrier.rs` panic on non-contiguous input | `train/subcarrier.rs` | graceful path / typed error on strided input | non-contiguous-input test |
+| `ablation.rs` `latency_percentiles` `partial_cmp().unwrap()` NaN panic | `train/ablation.rs` | `total_cmp` / NaN-guarded compare | NaN-input no-panic test |
+| `onnx.rs` unchecked `-1` dim cast | `nn/onnx.rs` | reject negative/zero output dims with `NnError` | guarded-dim test |
+| `ruview_metrics` `compute_single_oks` `s=1.0` fake-Gold + unguarded `[j]<17` | `train/ruview_metrics.rs` | derive scale from GT extent when none supplied; reject `s≤0`; bound the loop to array extents | `oks_rejects_nonpositive_scale`, `oks_does_not_panic_on_short_arrays`, `oks_not_perfect_for_wrong_pose_with_derived_scale` |
+
+`rf_encoder.rs` was inspected and found to contain **no checkpoint-deserialization assert**: its `assert_eq!`s in `LinearHead::new` / `ContrastiveBatcher::new` are documented construction-time API contracts on *programmer-supplied* vector lengths, not adversarial-input panics — the described bug does not exist there. Any genuine checkpoint-load assert lives in the tch-gated `proof.rs`/`trainer.rs` path and is deferred (§8) as unverifiable without libtorch. Test pass counts: nn `--no-default-features` **35 passed**, nn `--features onnx onnx::tests` **3 passed**, train `--no-default-features` lib **176 passed**.
+
+---
+
+## 4. Decision — TIER 3: MEASURED perf wins (new criterion benches)
+
+All numbers MEASURED on the Windows dev host with the `onnx` feature (`ort 2.0.0-rc.11`, runtime auto-downloaded), committed in `nn/benches/onnx_bench.rs`.
+
+### 4.1 Zero-copy ORT input — LANDED, MEASURED
+
+`onnx.rs` built the ORT input via `arr.iter().cloned().collect::<Vec<f32>>()` — a full element-wise copy. Replaced with a contiguous fast path (`arr.as_slice() ⇒ single memcpy`, iterator fallback only for strided views).
+
+- **Reproduce:** `cargo bench -p wifi-densepose-nn --no-default-features --features onnx --bench onnx_bench -- onnx_input_copy`
+- **Measured** (input `[1,256,64,64]` = 1.05M f32): **1.972 ms → 1.336 ms (~1.48× faster)**, 532 → 785 Melem/s. Strided fallback unchanged (within noise), correctness preserved. End-to-end real-model inference: ~45.9 µs.
+
+### 4.2 ONNX per-inference write-lock — DIAGNOSED, NOT LANDABLE (honest)
+
+`OnnxBackend::run` takes a `parking_lot::RwLock` **write** lock per inference, serializing concurrency. The intended fix was a read-lock. **It is not landable on `ort 2.0.0-rc.11`:** the safe `Session::run` is `&mut self` (verified against the vendored source) — there is no `&self` run path, so a read-lock fails the borrow checker. The underlying C++ `OrtSession::Run` is thread-safe, but exploiting that would require an `unsafe` interior-mutability bypass; we did **not** introduce that soundness risk. The write lock was kept, with a doc comment recording the upgrade path (a future `ort` with `&self` run ⇒ flip to `read()`).
+
+- **Harness landed anyway**, empirically proving the serialization: `cargo bench -p wifi-densepose-nn --no-default-features --features onnx --bench onnx_bench -- onnx_concurrency` → throughput **drops** with more threads (1 thr 19.4 Kelem/s → 2 thr 16.9K → 4 thr 14.0K → 8 thr 14.3K). When `ort` exposes `&self` run, the one-line lock change will show the speedup on this same bench.
+
+The native-conv naive-loop rewrite was **deferred** (§8) as out of scope for a measured milestone.
+
+---
+
+## 5. The NN / training SOTA landscape (graded)
+
+| Candidate | What | Grade | Verdict |
+|-----------|------|-------|---------|
+| **GraphPose-Fi** (arXiv 2511.19105, code github.com/Cirrick/GraphPose-Fi) | Graph/skeleton pose **decoder** for cross-environment WiFi pose; MM-Fi, 17 joints — matches our setup. ADR-150 §2.2 named a graph decoder but never built it. | **CLAIMED** (preprint; cross-env gains author-reported) | **Top beyond-SOTA candidate. Propose as ACCEPTED-future — NOT built here.** Best fit because the decoder is a drop-in on our 17-joint MM-Fi backbone and directly targets the cross-environment brittleness ADR-150/ADR-027 fight. |
+| **ONNX INT4** | Extend our **measured** INT8 ONNX quantization to INT4 for edge. | **THEORETICAL** for our pipeline (INT8 is MEASURED; INT4 untested here) | #2 priority — natural extension of a measured capability. |
+| **CSI-JEPA vs MAE A/B** | Joint-embedding predictive pretraining vs the ADR-152 §2.3 MAE recipe. | **CLAIMED** (JEPA strong elsewhere) — **honest caveat: no JEPA *or* MAE result exists on WiFi POSE yet** (ADR-152 F3: UNSW MAE downstream tasks are classification, not pose). | #3 — run as a measured A/B, do not pre-announce a winner. |
+| **"Mamba-CSI-pose"** | A state-space-model CSI pose backbone. | — | **Does NOT exist. Do not propose it.** No such artifact in the 2025–2026 literature; naming it would be exactly the kind of unfounded claim this sweep exists to prevent. |
+
+---
+
+## 6. Validation
+
+- `cargo test --workspace --no-default-features` — green (the metric unification legitimately changed a handful of test expectations; each was updated with a comment citing the finding, and the trainer/eval/proof now all route through the one canonical metric).
+- `python archive/v1/data/proof/verify.py` — `VERDICT: PASS` (Python pipeline proof, independent of the Rust changes).
+- New criterion benches compile and run under the `onnx` feature.
+
+---
+
+## 7. What changed, file by file
+
+- `metrics.rs` — `canonical_torso_size`, `pck_canonical`, `oks_canonical` (single source of truth); `MetricsAccumulator`/`compute_pck`/`compute_per_joint_pck`/`compute_oks`/`aggregate_metrics` route through them; `compute_pck_v2`/`compute_oks_v2`/`MetricsAccumulatorV2` deprecated → canonical; zero-visible and `s=1.0` bugs fixed; canonical bug-catching tests.
+- `dataset.rs` — `subject_disjoint_split`, `MmFiSplitView`, `assert_split_leak_free`; leak-free split tests.
+- `error.rs` — `DatasetError::InvalidSplit`.
+- `bin/train.rs` — prefer real subject-disjoint split; synthetic path relabelled `run_smoke_test` ("DO NOT REPORT").
+- `proof.rs` + `bin/verify_training.rs` — `MIN_LOSS_DECREASE` margin; no-hash ⇒ SKIP-not-PASS; sub-margin ⇒ FAIL-not-SKIP; new tests.
+- `rapid_adapt.rs` — fake gradient removed; finite-difference gradient of the real objective; honesty docs + tests.
+- `ruview_metrics.rs` — OKS scale derived from GT extent (no `s=1.0`); `s≤0` rejected; OKS loop bounded; tests.
+- `config.rs` / `ablation.rs` / `subcarrier.rs` / `nn/tensor.rs` / `nn/translator.rs` / `nn/onnx.rs` — Tier-2 fixes (§3) + Tier-3 perf (§4).
+- `training_bench.rs`, `sensing-server/training_api.rs` — divergent local PCK kernels annotated "DO NOT USE for reported metrics"; the sensing-server torso-height PCK unification is a **deferred** backlog item (separate service + tch boundary).
+
+---
+
+## 8. Deferred backlog (NOT silently dropped)
+
+The gap review surfaced ~60 findings; this milestone scoped to the provable integrity-critical subset plus two measured perf wins. The remainder are tracked here for a future ADR-155 milestone:
+
+- **GraphPose-Fi graph decoder** — build the §5 top candidate (ACCEPTED-future, not built).
+- **ONNX INT4** quantization; **CSI-JEPA vs MAE** A/B; the rest of the §5 roadmap.
+- **ONNX read-lock concurrency win** — blocked on an `ort` release exposing `&self` `Session::run` (§4.2); harness already committed.
+- ~~**native-conv naive-loop** perf rewrite (§4).~~ — **RESOLVED in Milestone-2 (see §8.2): bench-first → MEASURED-INCONCLUSIVE, no perf change shipped.**
+- ~~**`rf_encoder.rs` `assert_eq!`-on-checkpoint**~~ — **RESOLVED in Milestone-2 (see §8.2): a pure-Rust fallible `LinearHead::try_new` guard was added.** Any genuine **tch-gated** panic-on-input sites remain deferred — they require a libtorch host to compile/verify (`model.rs` `amp_fc1` unbounded alloc is *indirectly* guarded by the new `config.validate()` upper bounds, but a direct guard + test is deferred).
+- ~~**`sensing-server/training_api.rs` PCK**~~ — **RESOLVED in Milestone-1b (see §8.1, Goal C).** Relabelled (not unified) — and the audit found the *real* live divergence is in `trainer.rs`, not the orphaned `training_api.rs`.
+- ~~**`test_metrics.rs` reference kernels**~~ — **RESOLVED in Milestone-1b (see §8.1, Goal B).** Canonical core hoisted to an un-gated module; the integration test now validates the production functions against hand-computed fixtures + a differential cross-check.
+- **`metrics.rs` `compute_pck_v2`/`compute_oks_v2`/`MetricsAccumulatorV2`/`evaluate_dataset_v2`/`hungarian_assignment_v2`** — confirmed to have **zero external callers** (only `evaluate_dataset_v2`→`MetricsAccumulatorV2` internally). They are already `#[deprecated]` and route through canonical, so they are not a *divergent-definition* risk, only dead weight. Left in place this pass (public API in a tch-gated module; deleting needs a deprecation-cycle + tch host to verify) — flagged here for a future cleanup, NOT deleted silently.
+- **`sensing-server/trainer.rs` `pck_at_threshold` (raw) + `oks_map(area=1.0)` and the `training_bench.rs` raw kernel** — relabelled in Milestone-1b (§8.1); true unification onto `pck_canonical`/`oks_canonical` (needs a torso scale + the train crate as a sensing-server dep) remains deferred.
+- ~~The remaining ~40 lower-severity review findings (style, micro-opt, doc).~~ — **RESOLVED in Milestone-2 (§8.2): the host-verifiable subset is cleared.** The "~40" was an estimate; the actual host-verifiable (non-tch) train/nn surface is smaller. Enumerated resolution below.
+
+### 8.2 Milestone-2 — host-verifiable §8 P3 backlog clearance — RESOLVED
+
+Mirroring the ADR-154 M3 cleanup discipline, M2 closed the **host-verifiable (non-tch) subset** of the §8 backlog in `wifi-densepose-train` (+ the pure-Rust `rf_encoder.rs`/`densepose.rs` in `wifi-densepose-nn` that the §3/§4 items named). Everything behind `#[cfg(feature = "tch-backend")]` (`metrics.rs`, `model.rs`, `losses.rs`, `proof.rs`, `trainer.rs`, `wiflow_std/{layers,model}.rs`) is **out of host-verifiable scope** — it cannot be compiled/verified without libtorch and stays genuinely deferred (not dropped).
+
+**PROOF discipline held:** every de-magicked constant is pinned `== prior literal` by a `*_consts_unchanged_from_literals` test; every boundary test characterizes CURRENT behaviour; no operating-value or behaviour change; the Python proof stays bit-exact at `f8e76f21…46f7a` (the metrics path is off the signal proof path — asserted, not assumed). A smaller-but-true count was reported rather than inventing 40 fixes.
+
+**Enumerated finding → resolution (real counts):**
+
+| # | Finding (location) | Action | Pin/characterization test |
+|---|---|---|---|
+| 1 | `metrics_core.rs` — `0.5` vis / `1e-6` extent / `0.07` OKS-fallback sigma | de-magic → `VISIBILITY_THRESHOLD` / `MIN_REFERENCE_EXTENT` / `OKS_FALLBACK_SIGMA` | `metrics_core_consts_unchanged_from_literals`; `visibility_threshold_boundary_is_inclusive`; `degenerate_extent_below_floor_is_unscoreable` |
+| 2 | `ruview_metrics.rs` — `17` / `0.5` / `0.2` / `1e-3` / `1e-6` | de-magic → `NUM_KEYPOINTS` / `VISIBILITY_THRESHOLD` / `PCK_THRESHOLD` / `MIN_BBOX_DIAG` / `MIN_DURATION_MINUTES` | `ruview_metrics_consts_unchanged_from_literals`; `tracking_zero_duration_does_not_divide_by_zero`; `oks_short_array_is_bounded_at_keypoint_count` |
+| 3 | `subcarrier.rs` — sparse-interp `0.15`/`1e-4`/`0.1`/`1e-8`/`1e-5`/`500` | de-magic → 6 `SPARSE_*` consts | `sparse_interp_consts_unchanged_from_literals`; `compute_interp_weights_single_target_is_index_zero`; `sparse_interp_single_target_is_finite` |
+| 4 | `eval.rs` — `1e-10` division guard (×3) | de-magic → `MIN_POSITIVE_MPJPE` | `eval_min_positive_mpjpe_unchanged_from_literal`; `domain_gap_infinite_when_in_domain_perfect_but_cross_nonzero`; `domain_gap_unity_when_everything_perfect` |
+| 5 | `domain.rs` — `1e-5` LayerNorm eps | de-magic → `LAYER_NORM_EPS` | `layer_norm_eps_unchanged_from_literal` (n=0/zero-var boundary already covered) |
+| 6 | `virtual_aug.rs` — `1e-10` Box-Muller / room-scale guards | de-magic → `BOX_MULLER_U1_FLOOR` / `MIN_ROOM_SCALE` | `virtual_aug_guard_consts_unchanged_from_literals`; `augment_frame_zero_room_scale_passes_amplitude_finite` |
+| 7 | `rf_encoder.rs` — `20.0` softplus overflow threshold | de-magic → `SOFTPLUS_LINEAR_THRESHOLD` | `softplus_threshold_unchanged_from_literal` |
+| 8 | `rf_encoder.rs` — panic-only `LinearHead::new` for untrusted weights (§3) | add pure-Rust fallible `try_new` → typed `RfHeadError` (additive; `new` unchanged) | `try_new_accepts_valid_and_rejects_each_bad_shape` |
+| 9 | `densepose.rs::apply_conv_layer` naive-loop (§4) | **bench-first → MEASURED-INCONCLUSIVE**, no perf change shipped; committed bench + characterization anchor | `native_conv_matches_reference` + `benches/native_conv_bench.rs` |
+| 10 | `rapid_adapt.rs` module-doc "O(ε)" inconsistency | doc-only fix → "O(ε²)" (central differences) | n/a (doc) |
+| 11 | `geometry.rs` `DeepSets::encode` missing `# Panics` | doc-only fix (documents existing `assert!`) | n/a (doc) |
+
+**Tally:** **7 de-magicked (const + pin test)**, **9 new boundary/characterization tests**, **1 added input guard (`try_new`) + test**, **2 doc-only fixes**, **1 perf item bench-first MEASURED-INCONCLUSIVE (not shipped, deferred)**. New tests: train `--no-default-features` **303** (was 288, +15); nn `--no-default-features` lib **38** (was 35, +3).
+
+**Skipped honestly (flagged-but-not-real):** `ablation.rs` (NaN sort + boundary already fixed/tested in M1 — clean), `signal_features.rs` (consts already named, n=0 boundary already tested), `mae.rs` (no bare guard literals found), `metrics_core` already had thorough zero-visible/hip-normalizer coverage from M1. No churn was manufactured to hit a count.
+
+**Genuinely data-gated / tch-gated — remaining backlog (blocked, not dropped):** GraphPose-Fi graph decoder, ONNX INT4, CSI-JEPA vs MAE A/B (all **data/model-gated** — need a training run + datasets); ONNX read-lock concurrency win (**upstream-gated** on `ort`); the tch-gated panic-on-input sites in `proof.rs`/`trainer.rs`/`model.rs` and the `metrics.rs` `*_v2` dead-code deletion (**tch-gated** — need a libtorch host to compile/verify). **The non-tch-verifiable subset of §8 is now cleared.**
+
+### 8.1 Milestone-1b — metric-definition unification (the §8 metric subset) — RESOLVED
+
+This milestone closed the two metric-integrity items above. The work is pinned by tests, graded MEASURED, and surfaced findings the §1 table missed.
+
+**The complete, honest PCK / OKS audit map (every definition in `v2/`):**
+
+| Definition (file:line) | Normalization basis | Threshold convention | Status |
+|---|---|---|---|
+| `metrics_core.rs` `pck_canonical` (was `metrics.rs`) | **hip↔hip torso WIDTH** (bbox-diag fallback), `[0,1]` coords | `k·torso` | **CANONICAL** |
+| `metrics_core.rs` `oks_canonical` | `s=sqrt(area)` from GT pose extent | COCO kernel | **CANONICAL** |
+| `metrics.rs` `compute_pck` / `compute_per_joint_pck` / `compute_oks` | — (thin wrappers) | — | route to canonical |
+| `metrics.rs` `aggregate_metrics` / `MetricsAccumulator` | — | — | route to canonical |
+| `metrics.rs` `compute_pck_v2` / `compute_oks_v2` / `MetricsAccumulatorV2` | hip↔hip (folded) | — | **legacy-redundant, deprecated, NO callers** — route to canonical |
+| `tests/test_metrics.rs` local `compute_pck`/`compute_oks` (removed) | raw-threshold reimpl | raw | **was independent reimpl** → now validate canonical + 1 differential kernel |
+| `benches/training_bench.rs` `compute_pck` | raw-threshold | raw | distinct-by-design (bench-only), annotated DO-NOT-REPORT |
+| `sensing-server/training_api.rs` `compute_pck` | **torso-HEIGHT** (nose→hip), **pixel-space** | `ratio·torso_h`, 50px floor | **distinct-by-design** — and **ORPHAN file (not `mod`-declared, does not compile)**; relabelled `compute_pck_torso_height` |
+| `sensing-server/trainer.rs` `pck_at_threshold` | **RAW (no normalization)** | raw `thr` | **distinct, LIVE** (drives `best_pck`); **MISSED by §1 table**; relabelled `pck_raw@0.2` |
+| `sensing-server/trainer.rs` `oks_map`→`oks_single(area=1.0)` | `area=1.0` | COCO kernel | **fake-Gold, LIVE** (drives `best_oks`); **MISSED by §1 table**; relabelled `oks_map(area=1.0 proxy)` |
+
+**Findings the §1 seven-definition table under-counted (honest correction):** the live sensing-server claim surface is `trainer.rs` (in `lib.rs`), **not** the named `training_api.rs` — which is an **orphan file, never `mod`-declared, so it does not compile into the crate**. The live `best_pck` is a **raw, unnormalized** PCK and the live `best_oks` still uses the **`area=1.0` fake-Gold** path ADR-155 §2.1 reported as closed elsewhere. So the true metric landscape is **messier than §1 documented**: ≥3 PCK and ≥1 OKS live in `sensing-server`, two of them on the inflating side, and the file the ADR named for the fix was dead code. This is a finding, not a failure — recorded here rather than hidden.
+
+**Goal B (`test_metrics.rs`) — RESOLVED, MEASURED.** The canonical core (`pck_canonical`/`oks_canonical`/`canonical_torso_size`/sigmas/`bounding_box_diagonal`) was hoisted into a new **un-gated** `metrics_core` module (the full `metrics` module is `tch-backend`-gated, so the canonical definition was previously unreachable from the workspace test gate; `metrics` now re-exports it → still ONE implementation). `tests/test_metrics.rs` now asserts the **production** functions against hand-computed fixtures — `canonical_pck_matches_hand_computed_fixture` (3/4 correct ⇒ 0.75, hand-derived), zero-visible⇒0.0, hip↔hip normalizer pin, OKS perfect⇒1.0, the fake-Gold pin — plus `test_kernel_agrees_with_canonical`, a differential test where an independent raw-threshold reference must AGREE with canonical in the torso=1.0 regime. (10→12 tests.)
+
+**Goal C (`training_api.rs` PCK) — RESOLVED by RELABEL, MEASURED.** Torso-height is **load-bearing** (pixel-space, vertical nose→hip scale, `[17×3]` layout, no `ndarray`/train dep), so unifying would silently change the live numbers' meaning — exactly what to avoid. Resolution: relabel everywhere the metric surfaces so it is never read as canonical, in both the named `training_api.rs` (now `compute_pck_torso_height`, struct/JSON-field docs, `pck_torso_h@0.2` logs) **and** — the real fix — the LIVE `trainer.rs` path (`pck_at_threshold` documented raw-unnormalized; `oks_map` `area=1.0` flagged fake-Gold; `main.rs` prints `pck_raw@0.2` / `oks_map(area=1.0 proxy)`). No wire-format field or `pub`-fn renames (no silent API break). Pinned by `torso_pck_is_labelled_distinctly_from_canonical` (training_api) and `pck_at_threshold_is_raw_unnormalized_not_canonical` (the live kernel). True unification (route the live server through `pck_canonical`/`oks_canonical`) remains a deferred §8 item — it needs a torso scale on the live data and the train crate as a dep.
+
+---
+
+## 9. Consequences
+
+**Positive.** The training/metrics subsystem can now substantiate a clean accuracy claim: one documented metric used everywhere, a leak-free split, an honest TTA path, a proof that fails on noise and refuses to bless an unbaselined run, and two of the most claim-inflating bugs (false-perfect PCK, fake-Gold OKS) closed and pinned by regression tests. The unmeasured/unprovable parts are **disclosed**, not hidden.
+
+**Negative / honest.** The reportable-metric tch-gated code cannot be compiled on the dev host (libtorch absent), so its validation rests on routing through the workspace-tested canonical functions plus review; the Rust deterministic proof is in SKIP until a baseline is committed on a tch host; the ONNX concurrency win is blocked upstream; and ~45 findings are deferred. None of these is presented as done.
+
+**Picture changed by Milestone-1b (§8.1) — corrected, not hidden.** The §1 "seven divergent metrics" count was an **under-count**. The metric-unification audit (Goal A) found the live `wifi-densepose-sensing-server` carries additional, divergent definitions the §1 table omitted: a **raw, unnormalized** `pck_at_threshold` and an **`area=1.0` fake-Gold** `oks_map` in `trainer.rs` — and these, not the orphaned `training_api.rs` the backlog named, are what actually drive the live-reported `best_pck`/`best_oks`. Milestone-1b **relabelled** them (load-bearing math on different data; relabel beats false unification) and pinned the divergence with tests; full unification onto the canonical definition stays deferred. So the canonical *train/nn* metric is unified and test-validated end-to-end, but the *sensing-server* still computes (now clearly-labelled, non-canonical) progress proxies — disclosed here as the honest current state.
@@ -0,0 +1,265 @@
+# ADR-156: RuVector / Cross-Viewpoint Fusion Beyond-SOTA Sweep — Milestone 2 (Correctness Integrity, an Honest GDOP, Crafted-Input Safety, a Measured Hot-Path Win, and the ANN/Fusion SOTA Landscape)
+
+| Field | Value |
+|-------|-------|
+| **Status** | Proposed |
+| **Date** | 2026-06-11 |
+| **Deciders** | ruv |
+| **Codebase target** | `wifi-densepose-ruvector` — `viewpoint/` (`attention.rs`, `geometry.rs`, `fusion.rs`, `coherence.rs`), `mat/` (`triangulation.rs`, `heartbeat.rs`), `sketch.rs`, benches, docs |
+| **Relates to** | ADR-031 (RuView sensing-first RF mode), ADR-016/017 (RuVector integration), ADR-024 (AETHER re-ID), ADR-027 (MERIDIAN cross-env), ADR-084 (RaBitQ similarity sensor), ADR-138 (ClockQualityGate), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-154 (Signal/DSP sweep M0), ADR-155 (NN/Training sweep M1) |
+| **Scope** | Milestone 2 of the beyond-SOTA sweep: four **correctness/integrity/security** fixes on the cross-viewpoint fusion path (each pinned by a regression test that fails on the old code), one **measured** hot-path perf win + a new criterion bench, the ANN/fusion SOTA landscape graded MEASURED/CLAIMED/data-gated, and a prioritized deferred backlog. **Nothing is silently dropped.** |
+
+---
+
+## 0. PROOF discipline (this ADR's contract)
+
+This project has been publicly accused of "AI slop." Milestone 2 answers with **evidence, not adjectives** — the same contract as ADR-154/155:
+
+- Every correctness/integrity fix ships a **committed regression test that fails on the old code and passes on the new**. We verified each by reverting the fix and observing the test fail (recorded in §6).
+- Every perf number is **MEASURED before/after** with the exact reproduce command and a committed criterion bench. A perf claim without a measured before/after is **UNPROVEN** and is not made here.
+- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **DATA-GATED**, distinguishing what a paper *measured* from what it *asserts* from what our own prior measurement (ADR-152) says is **not currently the bottleneck**.
+- We disclose, in full, the **one staged finding that turned out to be a numeric no-op** (§2.1): the geometric-bias "angular wrap bug" is real as a *contract* violation but, because the bias kernel is `cos()` (even and 2π-periodic), it changes **no output value** under the current kernel. We land the fix anyway (it matches the documented contract and reuses the canonical helper) but we **do not claim a behaviour change** — that would be exactly the kind of inflation this sweep exists to prevent.
+
+Test machine for the perf numbers: Windows 11, `cargo bench --release`, criterion 0.5. Numbers are wall-clock medians on this box; the **ratio** (before/after) is the claim, not the absolute ns.
+
+Build/test gate: `cargo test --workspace --no-default-features` (the project's standard gate — no `crv`/GPU features). All fixes in this milestone are on the **default, non-feature-gated surface**, so they are fully exercised by the standard gate.
+
+---
+
+## 1. Context
+
+The cross-viewpoint fusion stack (`viewpoint/` — ADR-031) combines per-viewpoint AETHER embeddings into one fused embedding via geometric-bias attention, gated by phase coherence, with array-geometry quality scored by a Geometric Diversity Index and a Cramér-Rao bound. The `mat/` survivor-localisation helpers (`triangulation.rs`, `heartbeat.rs`) share the same crate. A beyond-SOTA review surfaced findings spanning a **mislabeled metric**, an **angular-distance contract violation**, **crafted-input panics on a network-reachable path**, and a **redundant clone in the fusion hot path**, plus an ANN/fusion SOTA-research gap. Milestone 2 closes the provable subset and grades the research landscape.
+
+---
+
+## 2. Decision — CORRECTNESS / INTEGRITY FIXES
+
+Each fix ships a regression test (all on the non-feature-gated, workspace-tested surface).
+
+### 2.1 GeometricBias angular separation — use the canonical *wrapped* distance — ACCEPTED & IMPLEMENTED (honest: numeric no-op under the current cos kernel)
+
+**The finding.** `attention::GeometricBias::build_matrix` computed the pairwise angular separation as the **raw** `|azimuth_i − azimuth_j|`. That can exceed π and mis-states the separation across the 0/2π seam (350° and 10° are 20° apart, but raw `|Δ|` = 340°). The module already had a correct wrapped helper, `geometry::angular_distance` (returns `[0, π]`), but it was **private** and `GeometricBias` did not use it.
+
+**The honest correction (disclosed, not hidden).** The bias kernel is `w_angle·cos(theta_ij)`. Because `cos` is **even and 2π-periodic**, `cos(raw) == cos(wrapped)` for every pair (verified numerically: max abs diff `1.1e-16` across seam-crossing test cases). So under the *current* kernel this "bug" produces **identical bias values** — it is a **contract violation, not a behaviour bug**. We say so plainly rather than dressing a no-op as a fix.
+
+**Why land it anyway.** (1) It makes the code satisfy its own documented contract (`theta_ij`: "angular separation in radians", which must be `[0, π]`). (2) It reuses the **single canonical** `angular_distance` helper (now made `pub`), eliminating a divergent angle computation — the same single-source-of-truth discipline ADR-155 applied to metrics. (3) It is **correct by construction** for any future non-even angular kernel (e.g. a linear `w_angle·theta_ij` penalty), which the raw-diff form would silently break.
+
+**Tests:** `geometric_bias_angular_separation_uses_wrapped_distance` (pins that a seam-crossing pair's wrapped distance is 20° while its raw `|Δ|` exceeds π, and that `build_matrix` is symmetric across the seam) and `geometric_bias_linear_angular_kernel_would_catch_raw_diff` (pins the wrapped value ∈ `[0, π]` — the invariant a future linear kernel relies on; the raw-diff form gives 190° where the wrapped form gives 170°).
+
+### 2.2 Crafted-input panics on the fusion/localisation path — typed `None` instead of panic — ACCEPTED & IMPLEMENTED (the security item)
+
+**The finding (DoS).** Two functions on a path that can carry **network-sourced multistatic frames** panicked on crafted input:
+
+- `mat::triangulation::solve_triangulation` indexed `ap_positions[0]` (panics on an empty AP table) and `ap_positions[i]` / `ap_positions[j]` (panics when a TDoA measurement references an **out-of-range AP index**). A remote peer supplying a TDoA tuple `(i=99, …)` with only 3 APs triggers an out-of-bounds panic — a remotely-triggerable denial of service.
+- `mat::heartbeat::CompressedHeartbeatSpectrogram::band_power` computed `self.n_freq_bins - 1`, which **underflows** (usize `0 − 1`) for a zero-bin spectrogram — a debug panic / release `usize::MAX` (then an out-of-range index).
+
+**The fix.** `solve_triangulation` uses `ap_positions.first()?` and `ap_positions.get(i)?` / `.get(j)?` — any empty table or out-of-range index returns `None`, never panics. `band_power` guards `n_freq_bins == 0` up front and **clamps both bounds** into `[0, last]`, returning `0.0` for empty/inverted ranges. No out-of-range index, no subtraction overflow, on any input.
+
+**Tests:** `triangulation_out_of_range_index_returns_none_no_panic`, `triangulation_empty_ap_positions_returns_none_no_panic`, `heartbeat_band_power_zero_bins_no_panic`, `heartbeat_band_power_out_of_range_bounds_no_panic`. Each **panics on the old code** (verified by reverting — §6) and returns a clean `None`/`0.0` on the new.
+
+### 2.3 GDOP mislabel — compute a real, dimensionless GDOP — ACCEPTED & IMPLEMENTED
+
+**The finding.** `geometry::CramerRaoBound` exposed a field named `gdop` ("Geometric Dilution of Precision") that was computed as `(crb_x + crb_y).sqrt()` — **identical to `rmse_lower_bound`**. That is the RMSE (metres, noise-dependent), **not** a GDOP. GDOP is a *dimensionless geometry factor* independent of the noise level; the name was a lie about the quantity.
+
+**The fix (honest rename was the fallback; real GDOP was cheap, so we computed it).** True GDOP `= sqrt(trace(G⁻¹))` where `G` is the **unit-variance** bearing-geometry matrix (the Fisher matrix with every `1/σ²` set to 1). It depends only on the array/target geometry and relates noise to position error as `rmse ≈ GDOP·σ`. We accumulate `G` alongside the FIM in both `estimate` and `estimate_regularised` (cheap 2×2), and report `INFINITY` (not NaN/panic) for a degenerate collinear geometry. The doc comment now states exactly what the field is and what it used to (wrongly) be.
+
+**Test:** `gdop_is_dimensionless_and_noise_independent` — scales every sensor's noise by 10× and asserts GDOP is unchanged while RMSE scales ~10×, and that `rmse ≈ GDOP·σ` at both noise levels. The old `gdop = sqrt(crb_x + crb_y)` **fails** this (it scaled with noise, proving it was RMSE) — verified by reverting (§6).
+
+### 2.4 `fuse()` double-clone in the aggregation hot path — eliminate the redundant clone — ACCEPTED & IMPLEMENTED (MEASURED — §4)
+
+**The finding.** `MultistaticArray::fuse` (and `fuse_ungated`) cloned every viewpoint embedding **twice** per fusion: once into the `extracted` tuple vector (`v.embedding.clone()`), then **again** when building the attention input (`extracted.iter().map(|(_, e, _, _)| e.clone())`). At the AETHER dimension (128 f32 = 512 B) over up to 8 viewpoints, that is a wholly redundant second heap allocation + memcpy per viewpoint, every TDM cycle.
+
+**The fix.** Build `extracted` once (the unavoidable clone out of the borrowed `self.viewpoints`), then **consume** `extracted` by value and **move** each embedding into the attention input (`embeddings.push(emb)`), capturing geometry/ids by `Copy` in the same pass. One clone per viewpoint instead of two. Measured win in §4.
+
+---
+
+## 3. Security review (touched files)
+
+The §2.2 crafted-input panics **are** the security item: a DoS via out-of-range indices / zero-bin underflow on a fusion/localisation path that may be driven by network-sourced multistatic frames. Beyond those, the touched files were swept for further panic-on-untrusted-input / unbounded-alloc sites:
+
+- `attention.rs` — all indexing is over internally-sized `n × n` / `d` loops bounded by validated input lengths (`DimensionMismatch` is returned for ragged embeddings); softmax denominators are floored with `f32::EPSILON`. No unbounded alloc (sizes derive from caller-supplied vector lengths already validated against `d_in`). **No further action.**
+- `geometry.rs` — `det`/`det_g` are floored before division; degenerate geometry yields `None`/`INFINITY`, never NaN-panic. **No further action.**
+- `fusion.rs` — embedding dimension is validated in `submit_viewpoint`; the event log is bounded (`max_events`, oldest-half drain). **No further action.**
+- `coherence.rs` — circular buffer is fixed-capacity; gate thresholds are clamped. **No further action.**
+
+No `unsafe`, no `unwrap()` on external input, and no unbounded allocation remain on the touched paths after §2.2.
+
+---
+
+## 4. MEASURED perf win (new criterion bench)
+
+A new bench, `crates/wifi-densepose-ruvector/benches/fusion_bench.rs`, covers the fusion hot path. It has two groups: `fusion_pipeline` (end-to-end `MultistaticArray::fuse_ungated()` at 2/4/8 viewpoints, dim 128) and an isolated A/B of the §2.4 marshalling step (`embedding_extract/before_double_clone` vs `after_single_clone`).
+
+- **Reproduce:** `cargo bench -p wifi-densepose-ruvector --bench fusion_bench`
+- **Measured (`embedding_extract`, 8 viewpoints × 128-d), medians:** `before_double_clone` **1.0029 µs** → `after_single_clone` **461.6 ns** — **~2.17× faster** on the marshalling step. The result is what theory predicts (two embedding clones collapse to one), confirming the redundant clone was the cost, not noise.
+- **End-to-end `fusion_pipeline` (medians):** 2 vp = 56.3 µs, 4 vp = 99.5 µs, 8 vp = 202.1 µs. The marshalling (~0.5–1 µs) is **well under 1%** of total fusion cost (dominated by the `n×n` attention), so the **end-to-end** effect is modest by construction; the `embedding_extract` A/B isolates and proves the clone-elimination itself. We report this honestly rather than attributing the full 2.17× to the pipeline.
+
+The double-clone elimination is also correctness-neutral: all 100 `viewpoint`/`mat` lib tests pass unchanged.
+
+---
+
+## 5. The ANN / cross-viewpoint-fusion SOTA landscape (graded)
+
+| # | Candidate | What | Grade | Verdict |
+|---|-----------|------|-------|---------|
+| **1** | **SymphonyQG** (SIGMOD 2025, public code) | Unified quantization + graph ANN; source reports **3.5–17× QPS over HNSW at equal recall**, pure-CPU / edge-portable. | **MEASURED-direction-tested** (was CLAIMED) — **[ADR-261](ADR-261-ruvector-graph-ann-index.md)** built the missing HNSW baseline + a SymphonyQG-style 1-bit quantized-traversal variant and **measured** the ratio on our hardware. | **DONE — direction REFUTED at our scale (honest negative).** ADR-261 built the real HNSW baseline (**~25× QPS over linear scan at recall ≥0.99**, the substrate this row wanted) and a quantized variant. At N=10k the 1-bit Hamming traversal is **too coarse** — its best recall is 0.738, never reaching the ≥0.90 equal-recall point, so **no QPS win over float HNSW** (the SymphonyQG 3.5–17× is *not* reproduced by our 1-bit construction here). Caveat: **our HNSW + our 1-bit quant, not SymphonyQG's system**; expected crossover at large N + a multi-bit code. We did **not** tune to manufacture a speedup. |
+| **2** | **Multi-bit / Extended RaBitQ + unbiased estimator** | Extends our existing **1-bit** `sketch.rs` (ADR-084): Pass-2 rotation, multi-bit Pass-3, and the **real RaBitQ unbiased distance estimator** (Gao & Long SIGMOD 2024) reranking the candidate set from the 1-bit code + 8 B/vec side info (§11). | **MEASURED-on-our-hardware** (was CLAIMED) — rotation (§10), multi-bit (§10), and the estimator (§11) all implemented + benchmarked. Rotation lifts strict-K 36%→46%; multi-bit (≤4-bit) reaches 74% strict; **the estimator reaches 49.71% strict (cosine rerank), still short of 90%.** All clear 90% only with over-fetch (estimator improves the factor: 95% at candidate_k=24 vs sign 91.6%). | **DONE — RESOLVED-PARTIAL / NEGATIVE.** Rotation (§10) + estimator (§11) built and MEASURED. The honest negative (no strict-bar 90% from rotation, ≤4-bit, **or the unbiased estimator**) is recorded, not hidden. Over-fetch + Pass-2 is the path that meets the bar (ADR-084's "candidate set" pattern); the estimator lowers the over-fetch factor needed. |
+| **3** | **GraphPose-Fi-style learned antenna-attention + ChebGConv fusion head** | Would replace the current **untrained identity-projection + mean-pool** "attention" (the `CrossViewpointAttention` default is `ProjectionWeights::identity` — not a *learned* attention) with a learned graph fusion head. | **DATA-GATED** (per ADR-152 measurement (b): architecture is **NOT** the current bottleneck — **data is**) | **ACCEPTED-future, data-gated. Do NOT build now.** ADR-152's measured lesson was that swapping architecture without more/better paired data does not move PCK. Building a learned fusion head before the data exists would repeat the mistake ADR-155 §5 also flagged for GraphPose-Fi. |
+| — | **Cramér-Rao / sensor-placement** (`geometry.rs` CRB) | Investigated for a 2026 advance beating the textbook Fisher-information CRB already implemented. | **Investigated — NO ACTION** | **Cleared honestly.** No 2026 method beats the closed-form Fisher-information CRB for this 2-D bearing problem; our implementation is already correct SOTA. (Recording a negative result is a deliberate anti-slop signal.) The only CRB change this milestone is the §2.3 *GDOP* honesty fix, which is a labelling/quantity correction, not an algorithmic one. |
+
+---
+
+## 6. Validation
+
+- **Bug-catching tests verified to bite.** Each §2.2/§2.3/§2.4-adjacent fix was reverted and the corresponding test observed to **fail on the old code**, then restored:
+  - `triangulation_out_of_range_index_returns_none_no_panic` / `triangulation_empty_ap_positions_returns_none_no_panic` — **panic** (index out of bounds) on old code.
+  - `heartbeat_band_power_zero_bins_no_panic` — **panic** ("attempt to subtract with overflow") on old code.
+  - `gdop_is_dimensionless_and_noise_independent` — **assertion failure** (GDOP scaled with noise) on old code.
+  - §2.1 (angular wrap) is the **disclosed no-op**: its tests pin the *contract* (wrapped value ∈ `[0, π]`), since the cos kernel makes the bias value numerically identical with or without the fix. We do not claim a behaviour change.
+- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **100 passed / 0 failed** (was 93; +7 new tests).
+- **`cd v2 && cargo test --workspace --no-default-features`** — **3050 passed / 0 failed** (full-workspace aggregate across all crates and test binaries; the +7 new `wifi-densepose-ruvector` tests are included and green).
+- **`python archive/v1/data/proof/verify.py`** — **`VERDICT: PASS`** (the Python pipeline proof is independent of these Rust changes — confirmed unaffected).
+- New `fusion_bench` compiles and runs under the default feature set.
+
+---
+
+## 7. What changed, file by file
+
+- `viewpoint/geometry.rs` — `angular_distance` made `pub` (single canonical wrapped-angle helper); real dimensionless GDOP (`sqrt(trace(G⁻¹))`) in `estimate`/`estimate_regularised` (was RMSE mislabelled); `gdop` doc states the quantity and the prior bug; `gdop_is_dimensionless_and_noise_independent` test.
+- `viewpoint/attention.rs` — `GeometricBias::build_matrix` uses the canonical wrapped `angular_distance` (contract fix; numeric no-op under cos — disclosed); two contract-pinning tests.
+- `viewpoint/fusion.rs` — `fuse`/`fuse_ungated` move embeddings out of `extracted` (single clone, not double); existing tests unchanged and green.
+- `mat/triangulation.rs` — `first()?` / `get(i)?` / `get(j)?` guards (no panic on empty table / crafted indices); two no-panic tests.
+- `mat/heartbeat.rs` — `band_power` zero-bin guard + bounds clamp (no underflow / out-of-range index); two no-panic tests.
+- `benches/fusion_bench.rs` (new) + `Cargo.toml` `[[bench]]` — fusion hot-path bench + the double-clone A/B.
+
+---
+
+## 8. Deferred backlog (NOT silently dropped)
+
+The review surfaced more than this milestone scoped. Tracked here for a future ADR-156 milestone:
+
+- **SymphonyQG reproduction** (§5 #1) — **RESOLVED-DIRECTION-TESTED** (see [ADR-261](ADR-261-ruvector-graph-ann-index.md)). The missing HNSW baseline + a SymphonyQG-style 1-bit quantized-traversal variant were built and **MEASURED**: float HNSW is ~25× over linear scan at recall ≥0.99 (the baseline this gap needed), but our 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at N=10k** (best recall 0.738) — the 3.5–17× is **not reproduced** by our construction. Honest negative recorded; expected crossover is large N + a multi-bit traversal code. (Caveat: our HNSW + our 1-bit quant, not SymphonyQG's exact system.)
+- **Multi-bit / Extended RaBitQ** (§5 #2) — **RESOLVED-PARTIAL** (see §10). Pass-2 randomized rotation (FHT + seeded ±1 sign flips, `src/rotation.rs`) and a multi-bit Pass-3 experiment landed and were MEASURED against the ADR-084 ≥90% bar. **Honest result: rotation helps (+10pp at the strict bar) and Pass-2 reaches 90% with ~3× over-fetch, but NEITHER rotation nor multi-bit (up to 4-bit) clears the strict candidate_k==K 90% bar on the tested anisotropic distribution.** The original `1-bit sign quantization ships first; rotation/more-bits later if benchmark-measured top-K coverage drops below 90%` deferral is therefore retired: the rotation is built, the bar is characterised, and the residual gap is documented rather than deferred.
+- **Learned cross-viewpoint fusion head** (§5 #3, GraphPose-Fi-style) — **data-gated**: blocked on the paired multi-room data ADR-152 measurement (b) identified as the real bottleneck; do not build the architecture first.
+- **`CrossViewpointAttention` learned projections** — the default `ProjectionWeights::identity` + mean-pool is honest but unlearned; wiring real learned Q/K/V projections is part of the data-gated item above (no learned weights ⇒ the "attention" is currently a geometric-bias-weighted average, which the code/docs should keep stating plainly).
+- **`coherence.rs` / `fusion.rs` micro-opts and the remaining lower-severity review findings** (style, doc, further hot-path tuning) from the fusion gap review.
+
+---
+
+## 9. Consequences
+
+**Positive.** The fusion path now: uses one canonical wrapped angular-distance helper; reports a **real** dimensionless GDOP instead of a mislabeled RMSE; cannot be panicked by crafted multistatic indices or a zero-bin spectrogram (DoS closed); and does one embedding clone per viewpoint instead of two (measured). Every fix is pinned by a test that fails on the old code, and the ANN/fusion SOTA landscape is graded so the near-term (multi-bit RaBitQ) and the data-gated (learned fusion) are not confused.
+
+**Negative / honest.** The headline angular-wrap fix is a **numeric no-op** under the current cos kernel — we land it for contract/maintainability, not because it changes an output, and we say so. The two strongest external candidates (SymphonyQG, learned fusion) are **not built here** — one is CLAIMED-pending-reproduction, the other is data-gated by a prior measurement. The perf win is a **local hot-path** improvement, modest in the end-to-end pipeline (attention dominates). None of these is presented as more than it is.
+
+---
+
+## 10. RaBitQ Pass-2 / multi-bit — IMPLEMENTED & MEASURED (§8 backlog item #2)
+
+Milestone-1 of the §8 backlog. Status: **RESOLVED-PARTIAL** — built, measured, honest negative on the strict bar.
+
+### 10.1 What landed
+
+- **`crates/wifi-densepose-ruvector/src/rotation.rs`** (new) — `Rotation`, a deterministic randomized orthogonal rotation `R = H·D`: a **Fast Hadamard Transform** (`O(d log d)`, in-place butterfly, `1/√m` normalized so it is norm-preserving) composed with a diagonal of **seeded ±1 sign flips** (SplitMix64 from a stored `u64` seed). Chosen over a dense `d×d` matrix because that is `O(d²)` memory/time and infeasible at the 65,535-d the wire format provisions for; FHT is the standard fast-orthogonal (randomized-Hadamard / fast-JL) construction. Non-power-of-two `d` zero-pads to `next_pow2(d)` and reads back the first `d` coords.
+- **`sketch.rs`** — additive Pass-2 API: `Sketch::from_embedding_rotated`, `SketchBank::with_rotation` + `insert_embedding` / `topk_embedding` / `novelty_embedding`. **Pass 1 (`from_embedding`) is byte-for-byte unchanged**; a Pass-2 sketch has identical `embedding_dim` / packed-byte length / wire shape, so `WireSketch` and existing callers (`event_log.rs`, `signal/longitudinal.rs`) are untouched. Default behaviour preserved.
+- **`coverage.rs`** (new) — single-source-of-truth top-K coverage harness on a deterministic **anisotropic planted-cluster** fixture (cosine ground truth, the metric a sign sketch approximates). Backs both the `pass2_coverage_report` unit test and the `sketch_bench` coverage table.
+- **Multi-bit Pass-3 experiment** — `coverage::measure_multibit`: rotate, then `b`-bit uniform scalar-quantize each coord, rank by L1 over codes. Measures the bit/coverage tradeoff.
+
+### 10.2 Pre-existing bug found and fixed (disclosed)
+
+Building the coverage harness surfaced a **pre-existing correctness bug in `SketchBank::topk`** (shipped in ADR-084): the `n > k` heap path used `BinaryHeap<Reverse<(dist,id)>>` (a *min*-heap) but its comment/logic treated the peek as the max, so it evicted the *nearest* and returned the **k farthest** sketches as "nearest." The shipped unit tests only exercised the `n ≤ k` fast path (≤ 3 entries), so it was never caught. Fixed to a plain max-heap. Pinned by **`topk_heap_path_returns_nearest`** (fails on the old heap when entries are inserted farthest-first) and **`tight_clusters_give_high_coverage_with_overfetch`** (measured **0.072** coverage on the old code — random — vs **>0.99** fixed). This is a real, measured behaviour fix, not a no-op.
+
+### 10.3 MEASURED top-K coverage
+
+Test machine: Windows 11, `cargo bench --release` / `cargo test`. Fixture: **dim=128, N=2048, K=8, 64 planted clusters, intra-cluster noise=0.35, 128 queries, master_seed=0xAD000084, rotation_seed=0x5EEDC0DE12345678**, ground-truth metric = cosine. Reproduce: `cargo test -p wifi-densepose-ruvector --no-default-features pass2_coverage_report -- --nocapture` or `cargo bench -p wifi-densepose-ruvector --bench sketch_bench -- pass2_coverage`.
+
+**Coverage vs over-fetch (`coverage = |sketch_topK ∩ float_cosine_topK| / K`):**
+
+| candidate_k | Pass-1 (1-bit, no rot) | Pass-2 (1-bit, rot) | vs 90% bar |
+|---|---|---|---|
+| **8 (= K, strict bar)** | **36.13%** | **46.39%** | both **BELOW** |
+| 16 | 62.79% | 75.59% | below |
+| 24 | 83.89% | **91.60%** | **Pass-2 clears** |
+| 32 | 100.00% | 100.00% | clears |
+| 64 | 100.00% | 100.00% | clears |
+
+**Multi-bit Pass-3 at the strict bar (candidate_k = K = 8):**
+
+| Variant | Coverage | Memory |
+|---|---|---|
+| Pass-1 (1-bit, no rot) | 36.13% | 16 B/vec |
+| Pass-2 (1-bit, rot) | 46.39% | 16 B/vec |
+| Pass-3 (rot, 2-bit) | 54.39% | 32 B/vec |
+| Pass-3 (rot, 3-bit) | 66.70% | 48 B/vec |
+| Pass-3 (rot, 4-bit) | 74.22% | 64 B/vec |
+
+### 10.4 Honest verdict
+
+- **Rotation consistently helps** — +10.3 pp at the strict bar (36.13%→46.39%) and a uniform lift at every over-fetch level. The FHT construction is verified norm-preserving and deterministic.
+- **Neither rotation nor multi-bit (≤4-bit) clears the strict candidate_k==K 90% bar** on this anisotropic distribution. 1-bit sign quantization simply cannot resolve 8-of-2048 from sign bits alone; even 4× memory (4-bit) reaches only 74%.
+- **Pass-2 reaches the 90% bar at candidate_k=24 (~3× over-fetch)** — i.e. fetch ≥24 sketch candidates, refine to K with full float. This is exactly the "candidate set, then full refinement" deployment pattern ADR-084 specifies, so the bar is met *in the deployment the sensor is designed for*, just not at strict K=K.
+- **This is a measured, partial win, reported as such.** No benchmark was tuned to manufacture a pass. The strict-bar gap (and the multi-bit tradeoff that doesn't close it) is documented rather than spun.
+
+### 10.5 Deferred sub-items (graded, not dropped)
+
+- **Strict-bar 90% from a richer code** — neither rotation nor uniform multi-bit closes it here. A learned/asymmetric quantizer or the full RaBitQ residual-distance estimator (not just a uniform scalar code) might. **RESOLVED-NEGATIVE (§11): the estimator is now built and MEASURED — it lifts strict-K 46.39%→49.71% but does NOT clear the 90% strict bar.** The residual strict-bar gap is a published negative, not a deferral.
+- **Distribution sensitivity** — the result is for one synthetic anisotropic distribution; on real AETHER traces the strict-bar number may differ. Re-measuring on recorded embeddings is deferred to the ADR-084 post-merge soak.
+- **Promoting a `MultiBitSketch` type** — the multi-bit code lives in the measurement harness, not as a shipped sketch type. Building the production type is gated on a use site actually needing strict-K (vs over-fetch), which the measurement says is not required today.
+
+---
+
+## 11. RaBitQ unbiased distance estimator — IMPLEMENTED & MEASURED (Milestone-2, §8 backlog item #2 / §10.5 strict-bar item)
+
+Milestone-2 of the §8 backlog. Status: **RESOLVED-NEGATIVE** — the estimator is built, measured, and lifts strict-K coverage, but the honest result is that it does **not** clear the ADR-084 ≥90% strict-K bar on this distribution. The negative is reported as such, exactly like the Pass-2 rotation result.
+
+### 11.1 What landed
+
+- **`crates/wifi-densepose-ruvector/src/estimator.rs`** (new) — the real Gao & Long (SIGMOD 2024) contribution: an **unbiased estimator of the inner product / squared distance** recovered from the 1-bit code plus per-vector side info, on top of the Pass-2 rotation. Pass-1/Pass-2 ranked candidates by raw Hamming over sign bits — a coarse proxy. This module reranks by the unbiased estimate.
+  - `EstimatorSketch` — Pass-2 sign code (over the **padded** FHT length `D = next_pow2(dim)`, the frame `x̄` is unit in) **plus** the side info.
+  - `SideInfo` = `{ residual_norm: f32, x_dot_o: f32 }` = **8 bytes/vector** (2× f32).
+  - `EstimatorQuery` — query rotated once, reused across all candidates.
+  - `DistanceEstimator` — `estimate_inner_product`, `estimate_sq_distance`, `ranking_key` (euclidean), `cosine_ranking_key` (the correct key vs a cosine ground truth — needs only the code + `x_dot_o`).
+  - `EstimatorBank` — `topk_estimated` (euclidean) / `topk_estimated_cosine`; optional `with_centroid` (the paper's centroid path).
+- **`coverage.rs`** — `measure_estimator` (cosine rerank) + `measure_estimator_euclidean`, on the **bit-identical** fixture / cluster centres / query stream / cosine ground truth as `measure_pass1`/`measure_pass2`. Single source of truth for the §11.3 table; backs both `estimator_coverage_report` and the `sketch_bench` coverage table.
+- **Additive + backward-compatible.** New types only; Pass-1 `Sketch` / Pass-2 `SketchBank` / `WireSketch` wire format are untouched. All external callers (`event_log.rs`, `signal/longitudinal.rs`, `sensing-server`) use Pass-1 `from_embedding` and are unaffected.
+
+### 11.2 The estimator formula (and the zero-centroid simplification, stated honestly)
+
+Let `P` be the Pass-2 orthogonal rotation (`R = H·D`), `D = next_pow2(dim)`. For data `o_raw`, query `q_raw`, centroid `c`:
+
+1. **Centroid — SIMPLIFIED to zero/global `c = 0`.** The paper centres on a per-cluster centroid (`o_r = o_raw − c`); we use `c = 0` (`o_r = o_raw`), because the current sketch path has no IVF/k-means cluster structure. This costs accuracy when the data is far off-origin. **We document it, do not hide it,** and built the paper-faithful centroid path (`from_embedding_centred` / `EstimatorBank::with_centroid`) so the simplification is a measured choice, not an assumption. (We do **not** report a centroid coverage number against the *cosine* ground truth: centroid-subtraction changes the metric — cosine-of-residual ≠ cosine-of-raw — so a centroid number vs raw-cosine truth would be a metric mismatch, itself dishonest. Zero-centroid is the correct match for this raw-cosine harness.)
+2. **Unit residual + 1-bit code.** `o = o_r/‖o_r‖`, `o' = P·o`, code `x̄_i = sign(o'_i)·(1/√D)` — a unit vector at the nearest hypercube corner.
+3. **Side info:** `residual_norm = ‖o_r‖` and `x_dot_o = ⟨x̄, o'⟩ ∈ (0,1]` (the paper's `⟨x̄, o⟩`).
+4. **Unbiased estimator** (paper Eq.): `⟨o', q'⟩ ≈ ⟨x̄, q'⟩ / ⟨x̄, o'⟩ = ⟨x̄, q'⟩ / x_dot_o`. The random rotation makes the code's quantization error orthogonal **in expectation** to `q'`, so the rescale is unbiased (paper's `O(1/√D)` bound). Per candidate: one length-`D` signed sum (`x̄ ∈ {±1/√D}`), as cheap as Hamming + a multiply.
+5. **Distance / cosine.** `⟨o_r,q_r⟩ = ‖o_r‖·(⟨x̄,q'⟩/x_dot_o)`; `‖q_r−o_r‖² = ‖q_r‖²+‖o_r‖²−2⟨o_r,q_r⟩`. For a **cosine** ground truth (AETHER / this harness), rank by `−⟨o,q_r⟩ = −(⟨x̄,q'⟩/x_dot_o)` (needs only the code + `x_dot_o`).
+
+**Unbiasedness is pinned** (`estimator_unbiased_on_fixture`): averaging the estimate of `⟨o_r,q_r⟩` over 4000 random rotation seeds converges to the true inner product within ~6% of the `‖o‖‖q‖` envelope — a biased estimator (or sign-only proxy) would be systematically off.
+
+### 11.3 MEASURED strict-K coverage
+
+Same fixture/seeds as §10 (dim=128, N=2048, K=8, 64 clusters, noise=0.35, 128 queries, `master_seed=0xAD000084`, `rotation_seed=0x5EEDC0DE12345678`), cosine ground truth. Reproduce: `cargo test -p wifi-densepose-ruvector --no-default-features estimator_coverage_report -- --nocapture` or `cargo bench -p wifi-densepose-ruvector --bench sketch_bench -- pass2_coverage`.
+
+| candidate_k | Pass-1 (sign) | Pass-2 (sign) | **Pass-2 + estimator (cosine)** | Pass-2 + estimator (euclid) | vs 90% bar |
+|---|---|---|---|---|---|
+| **8 (= K, strict bar)** | 36.13% | 46.39% | **49.71%** | 49.02% | **all BELOW** |
+| 16 | 62.79% | 75.59% | 79.20% | 77.93% | below |
+| 24 | 83.89% | 91.60% | **95.12%** | 93.65% | estimator clears |
+| 32 | 100.00% | 100.00% | 100.00% | 100.00% | clears |
+| 64 | 100.00% | 100.00% | 100.00% | 100.00% | clears |
+
+Side-info memory overhead: **8 bytes/vector** (2× f32) on top of the 16 B/vec 1-bit sketch.
+
+### 11.4 Honest verdict
+
+- **The estimator helps, and the cosine key beats the euclidean key** (49.71% vs 49.02% at strict-K; cosine is the apples-to-apples match for the cosine ground truth — both it and sign-Hamming are angular). The unbiased rescale is a real, consistent lift at every over-fetch level (e.g. 24: 91.60%→95.12%).
+- **It does NOT clear the strict candidate_k==K 90% bar.** Strict-K goes 36.13% (Pass-1) → 46.39% (Pass-2-sign) → **49.71% (Pass-2 + estimator)** — a **+3.3 pp** improvement over sign-only, **still ~40 pp short of 90%**. This is a **published negative**, the same class of honest result as the Pass-2 rotation (§10).
+- **Why the strict-K gain is modest:** the binding constraint at strict K is the **1-bit code's information ceiling** (resolving 8-of-2048 from a single sign bit per coordinate), not the *estimator's variance* — the estimator sharpens the ranking but cannot add information the 1-bit code never captured. The estimator's larger wins are at over-fetch, where there is room to re-rank a wider candidate pool.
+- **The bar is still met the way ADR-084 deploys the sensor:** at candidate_k=24 (~3× over-fetch) the estimator reaches **95.12%** (vs Pass-2-sign 91.60%) — the "candidate set, then full refinement" pattern. The estimator **improves the over-fetch factor needed** but does not eliminate it.
+- **No benchmark was tuned to manufacture a pass.** The strict-bar gap is documented, not spun.
+
+### 11.5 Pinning tests
+
+- `estimator::estimator_is_deterministic` — fixed seed ⇒ identical estimate + identical bank top-K.
+- `estimator::estimator_unbiased_on_fixture` — Monte-Carlo mean over 4000 seeds converges to the true inner product within tolerance (the unbiasedness claim).
+- `coverage::estimator_rerank_not_worse_than_sign` — estimator-reranked coverage ≥ sign-only Pass-2 on a fixed fixture (must not regress).
+- Plus: `estimator_self_distance_is_small`, `x_dot_o_in_unit_range`, `zero_input_does_not_panic`, `bank_self_query_ranks_self_first`, `centroid_path_self_query_ranks_self_first`, `centroid_zero_matches_default`, `estimator_coverage_is_deterministic`.
@@ -0,0 +1,193 @@
+# ADR-157: Hardware / Sensing-Acquisition Layer Beyond-SOTA Sweep — Milestone 3 (An Already-Hardened Layer, Three Small Real Fixes, an Honestly-Null Perf Win, and a Mostly-NO-ACTION SOTA Landscape)
+
+| Field | Value |
+|-------|-------|
+| **Status** | Proposed |
+| **Date** | 2026-06-11 |
+| **Deciders** | ruv |
+| **Codebase target** | `wifi-densepose-vitals` (`heartrate.rs`, `breathing.rs`, `anomaly.rs`, `store.rs`), `wifi-densepose-wifiscan` (`pipeline/breathing_extractor.rs`, `pipeline/correlator.rs`, `adapter/netsh_scanner.rs`), `wifi-densepose-hardware` (`esp32_parser.rs`, `sync_packet.rs`, `esp32/secure_tdm.rs`, `ieee80211bf/*`), `wifi-densepose-calibration` (`geometry_embedding.rs`), benches, docs |
+| **Relates to** | ADR-021 (ESP32 CSI vitals), ADR-022 (multi-BSSID WiFi sensing), ADR-028 (ESP32 capability audit + witness), ADR-032 (multistatic mesh security), ADR-110 (HE PPDU bandwidth), ADR-151 (per-room calibration), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-153 (802.11bf forward-compat), ADR-154 (Signal/DSP sweep M0), ADR-155 (NN/Training sweep M1), ADR-156 (RuVector/Fusion sweep M2) |
+| **Scope** | Milestone 3 of the beyond-SOTA sweep across the four hardware/sensing-acquisition crates. The honest headline: **this layer is already well-hardened** — the real work is small. Three correctness/stability fixes (each pinned by a test that fails on the old code), one algorithmic perf change whose end-to-end win is **null at realistic window sizes** (disclosed, not inflated) with a committed bench, one defense-in-depth hardening on an unreachable path, a **MEASURED negative-results section** (the centerpiece — what was investigated and found already-correct), a graded SOTA landscape that is **mostly NO-ACTION**, and a deferred backlog. **Nothing is silently dropped.** |
+
+---
+
+## 0. PROOF discipline (this ADR's contract)
+
+This project has been publicly accused of "AI slop." Milestone 3 answers with **evidence, not adjectives** — the same contract as ADR-154/155/156:
+
+- Every correctness/stability fix ships a **committed regression test that fails on the old code and passes on the new**. Each was verified by reverting the fix and observing the test fail (recorded in §6).
+- Every perf number is **MEASURED before/after** with the exact reproduce command and a committed criterion bench. Where the win is below noise, we **say so and claim nothing** — see §4, which is a deliberately-disclosed near-null result.
+- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **DATA-GATED**, and where the right answer is "do nothing," we record the negative result explicitly (§5) — a stronger anti-slop signal than a fix.
+- The headline of this milestone is itself a negative result: **the acquisition layer was already hardened.** We disclose what we *checked and did not change* (§3) in as much detail as what we changed (§2), because "investigated, already correct, no action" is the most honest thing a sweep can report when it is true.
+
+Test machine for the perf numbers: Windows 11, `cargo bench --release`, criterion 0.5. Numbers are wall-clock medians on this box; the **ratio** (before/after) is the claim, not the absolute ns.
+
+Build/test gate: `cargo test --workspace --no-default-features` (the project's standard gate — no GPU/`crv` features). All fixes in this milestone are on the **default, non-feature-gated surface**, so they are fully exercised by the standard gate. The serde-validated `ieee80211bf` types are additionally verifiable with `--features serde`; the live-QUIC path in `secure_tdm` is structurally tested (HMAC/replay/tamper) but not live-socket-tested in CI.
+
+---
+
+## 1. Context
+
+The hardware/sensing-acquisition layer is the bottom of the stack: it turns raw RF (ESP32 CSI frames, multi-BSSID netsh scans, 802.11bf measurement reports) into typed, validated domain objects that the signal/fusion/NN layers above consume. A beyond-SOTA review of the four crates surfaced far **fewer** real defects than the signal (ADR-154) or fusion (ADR-156) sweeps — because this layer was written defensively from the start: length-gated parsers, `Option`-returning helpers, `#[serde(try_from)]` validate-on-deserialize, FSMs that return `Result` instead of panicking, and HMAC-authenticated + replay-protected TDM beacons.
+
+The genuine findings are three: an **O(n²) sliding-window data-structure choice** in the vital-sign extractors (perf, latent), a **partial-weights scale-mixing bug** in breathing fusion (correctness), and an **IIR resonator that can diverge at pathologically low sample rates** (stability). Everything else the review flagged turned out to be already-safe — documented in §3 as MEASURED negative results.
+
+---
+
+## 2. Decision — the fixes that landed
+
+Each correctness/stability fix ships a regression test on the non-feature-gated, workspace-tested surface.
+
+### 2.1 §A1 — `Vec::remove(0)` O(n²) sliding windows → `VecDeque` (PERF, latent; MEASURED via bench — near-null at realistic sizes, disclosed)
+
+**The finding.** Every fixed-length sliding window in the extractors was a `Vec<f64>`/`Vec<f32>` whose oldest-sample eviction used `Vec::remove(0)` — an **O(n) shift of the whole buffer on every sample**, making a full-window `extract()` sweep O(n²). Six sites:
+
+| File | Site | Buffer |
+|------|------|--------|
+| `vitals/heartrate.rs` | `extract` history window | `Vec<f64>` → `VecDeque<f64>` |
+| `vitals/breathing.rs` | `extract` history window | `Vec<f64>` → `VecDeque<f64>` |
+| `vitals/anomaly.rs` | `rr_history` / `hr_history` | `Vec<f64>` → `VecDeque<f64>` (×2) |
+| `vitals/store.rs` | `readings` ring buffer | `Vec<VitalReading>` → `VecDeque<VitalReading>` |
+| `wifiscan/pipeline/breathing_extractor.rs` | filtered history | `Vec<f32>` → `VecDeque<f32>` |
+| `wifiscan/pipeline/correlator.rs` | per-BSSID histories | `Vec<Vec<f32>>` → `Vec<VecDeque<f32>>` |
+
+**The fix.** Swap to `VecDeque` with `push_back` + `pop_front` (O(1) eviction). Where the autocorrelation / zero-crossing / Pearson loop needs a contiguous slice, call `make_contiguous()` (or `as_slices().0` after it) **once per `extract()`**. This matches the idiom already used correctly in `wifiscan/pipeline/orchestrator.rs`. **Output is bit-identical** — no behavior test bites; the change is bench-gated.
+
+**The honest measurement (§4).** In **isolation**, the eviction cost collapses from O(n²) to O(n): a microbenchmark of pure eviction shows **34.6× at window=3000 and 3158× at window=100000**. But in the **full `extract()` path at realistic ESP32 window sizes** (heartrate ~1500, breathing ~3000), the per-frame DSP (autocorrelation is O(window·lags); zero-crossing is O(window)) **dominates the eviction entirely**, so the end-to-end win is **below noise** — measured `heartrate` 42.8 ms (before) vs 44.4 ms (after), `breathing` 7.95 ms vs 7.86 ms: overlapping confidence intervals, **no measurable change**. We land A1 because it is the correct data structure and removes a latent O(n²) that *would* bite at higher sample rates or longer windows — **not** because it speeds up the current hot path, which it does not measurably. Claiming an end-to-end speedup here would be exactly the inflation this sweep exists to prevent (the same discipline ADR-156 §2.1 applied to its cos no-op).
+
+### 2.2 §A2 — `breathing.rs` partial-weights scale-mixing (CORRECTNESS, real)
+
+**The finding.** `BreathingExtractor::extract` fused per-subcarrier residuals as `Σ residuals[i]·w[i]` where `w[i] = weights.get(i).unwrap_or(1/n)`. The result was **never normalized**. When `weights` was supplied **shorter than** `n`, the supplied entries (e.g. attention weights ~10.0) were used **raw** while the missing tail defaulted to `uniform_w = 1/n` (~0.125) — two scales summed with no renormalization, **silently mis-scaling the breathing signal** by a factor that depends on `weights.len()`. A caller passing 2 high attention weights for an 8-subcarrier frame got a fused value ~20× too large.
+
+**The fix.** Extracted the fusion into `fuse_weighted_residuals(residuals, weights, n)` and normalized by `Σ(effective weights)` — `weighted_sum / weight_total` — mirroring the **already-correct** pattern in `heartrate::compute_phase_coherence_signal`. A partial weight slice now produces a true weighted average in the residual range, independent of `weights.len()`.
+
+**Tests (fail on old code, verified by reverting — §6):**
+- `partial_weights_are_renormalized_not_scale_mixed` — `residuals=[1.0;8]`, `weights=[10.0,10.0]` → fused value `1.0` (the renormalized weighted mean), and explicitly **not** the old scale-mixed sum `2·10 + 6·0.125 = 20.75`.
+- `partial_weights_fusion_is_weighted_average` — differing residuals → a proper weighted average within `[0, 2]`, which the old un-normalized sum is not.
+
+### 2.3 §A3 — IIR resonator divergence at pathologically low sample rate (STABILITY, real)
+
+**The finding.** Both extractors' `bandpass_filter` set the resonator pole radius `r = 1 - bw/2` with `bw = 2π(f_high − f_low)/fs`. The **research report's stated trigger ("`fs` below ~4 Hz") is incorrect**, and we say so: the resonator pole *magnitude* is `|r|`, and the filter is stable for any `|r| < 1` — a merely-**negative** `r` is still stable. Divergence requires `|r| ≥ 1`, i.e. `bw ≥ 4`, i.e. `fs` very low **relative to the band width** (e.g. `fs = 0.5` Hz with a 0.1–0.9 Hz band → `bw = 10.05`, `r = −4.03`, `|r| = 4.03 > 1`). When that holds, the filter **diverges exponentially**: a unit-step input reaches `~10^183` within 300 frames and **overflows f64 to ±inf within ~600 frames**. Once one inf enters `filtered_history`, the autocorrelation `acf0`/zero-crossing path produces NaN and the extractor is **permanently dead** (silent stall until `reset()`).
+
+**The fix.** Two layers of defense-in-depth:
+1. **Clamp** `r` to a stable range: `r = (1.0 - bw/2.0).clamp(0.0, 0.9999)` — keeps the pole inside the unit circle for **any** sample-rate / band-edge configuration. (We document honestly that the divergence condition is `|r| ≥ 1`, not "`r` negative.")
+2. **Finite-guard** before the history push: `if !filtered.is_finite() { return None; }` — mirrors the NaN-bypass guard in ADR-154 §3, so even a future divergence cannot poison the buffer.
+
+Applied to **both** `heartrate.rs` and `breathing.rs` (identical resonator block).
+
+**Tests (fail on old code, verified by reverting — §6):** `heartrate::low_sample_rate_filter_stays_finite` and `breathing::low_sample_rate_filter_stays_finite` — construct at `fs=0.5` with a 0.1–0.9 Hz band, feed a unit step for 600 frames, assert **every** `filtered_history` sample is finite. On the old code these **panic** (a `filtered_history[i]` is inf/NaN); on the new code all samples are finite.
+
+### 2.4 §D1 — new `vitals/benches/vitals_bench.rs` (MEASURED)
+
+A new criterion bench (`harness = false`, registered in `Cargo.toml`) drives each extractor from empty to a full window (`heartrate` 1500 samples, `breathing` 3000) so the A1 sliding-window bookkeeping is exercised across the whole buffer. Follows the criterion style of the existing `hardware/benches/transport_bench.rs` and ADR-156's `fusion_bench`. Numbers and the honest interpretation are in §4.
+
+### 2.5 §B1 — `ieee80211bf/transport.rs` drop-instead-of-truncate (HARDENING, unreachable path — disclosed)
+
+`OpportunisticCsiBridge::ingest` built `CsiReportPayload { n_subcarriers: self.amp_accum.len() as u16, … }`. The `as u16` would silently wrap a count above 65 535. **This is unreachable in practice**: `ingest` gates `frame.subcarrier_count() > MAX_REPORT_SUBCARRIERS` (484) at entry and returns `None`, and `report.validate()` independently rejects oversized counts downstream. We replaced the cast with `u16::try_from(self.amp_accum.len()).ok()?` (drop-instead-of-truncate) so the construction is **correct-by-construction** rather than relying on the upstream gate. We disclose this as **defense-in-depth on an unreachable path, not a live bug** — no behavior change, no new test (the gate already prevents the input that would exercise it).
+
+### 2.6 §B4 — constant-time HMAC tag compare: **RESOLVED — no-dependency hand-rolled constant-time compare (Milestone-1)**
+
+`secure_tdm.rs` compared the 8-byte HMAC tag with `self.hmac_tag == expected` (data-dependent, non-constant-time: short-circuits on the first differing byte, leaking through verification latency how many leading bytes a forged tag matched — a byte-by-byte tag-recovery oracle). Milestone-3 deferred this **only** to avoid adding the `subtle` crate as a direct dependency. Milestone-1 resolves it **without any dependency**: a hand-rolled `constant_time_tag_eq(a, b)` that XOR-accumulates every byte difference into a single `u8` with **no early exit**, then compares the accumulator to zero exactly once. `#[inline(never)]` + `core::hint::black_box(diff)` stop the optimizer from reintroducing a short-circuit or lowering the loop into a non-constant-time `memcmp`; a length mismatch returns `false` without inspecting contents. The former `==` verify site now calls this helper.
+
+**Test (fails on old code, the hard gate):** `tag_compare_is_constant_time_shape` — asserts correct accept/reject for equal, first-byte-differ, last-byte-differ, all-byte-differ, and length-mismatch tags, plus an end-to-end `verify()` last-byte-only tamper. Verified to **bite**: introducing a classic constant-time bug (loop `take(LEN-1)`, skipping the last byte) makes it fail on `last-byte-differ must reject`. A coarse timing-invariance smoke check `tag_compare_timing_invariance_smoke` exists but is `#[ignore]`d (noisy host — not a CI gate). **Grade MEASURED** (constant-time *construction*; micro-timing on a noisy host is only a smoke check, disclosed honestly). Tracked RESOLVED in §8.
+
+---
+
+## 3. The MEASURED negative-results section (the centerpiece — what was investigated and found already-correct)
+
+This is the core of ADR-157. The acquisition layer was hardened before this sweep; the strongest anti-slop evidence is an honest accounting of what we **checked and did not need to change**. Each is verified against the live code with a file:line citation.
+
+| Area | Claim verified | Evidence (file:line) | Verdict |
+|------|----------------|----------------------|---------|
+| **ESP32 parser subcarrier index math** | A crafted CSI frame cannot panic via the subcarrier-index arithmetic. The total-frame-size length gate (`data.len() < HEADER_SIZE + n_antennas·n_subcarriers·2 → Err`) dominates **every** subsequent `data[byte_offset]`/`[+1]` access; `n_subcarriers ≤ 256`, `n_antennas ≤ 4` are header-bounded, and the `index` math is pure i16 arithmetic with no indexing. | `esp32_parser.rs:211` (length gate) guards the loop at `:224–242` | **Already safe — NO ACTION** |
+| **`sync_packet.rs` `try_into().unwrap()`** | The four `try_into().unwrap()` calls are **infallible**: each slices a fixed-width sub-range (`[0..4]`, `[8..16]`, `[16..24]`, `[24..28]`) of a buffer already guaranteed `len() >= SYNC_PACKET_SIZE` (32) by the early `return Err(InsufficientData)`. | `sync_packet.rs:88` (length gate) → `:94,102,103,104` (fixed-width slices) | **Already safe — NO ACTION** |
+| **The entire `ieee80211bf/` 802.11bf model** | Validate-on-deserialize and no-panic-by-construction throughout. `MeasurementSetupId` is `#[serde(try_from = "u8")]` rejecting `> MAX_SETUP_ID` (127); `ThresholdParams` is `#[serde(try_from = "RawThresholdParams")]` routing every deserialize through `ThresholdParams::new`; the session FSM `handle()` returns `Result<Vec<Action>, BfError>` (never panics) and enforces **single-role** (`self.role != Initiator/Responder → Err`) on every transition; the SBP request is validated through the **same** single `evaluate_setup` chain as a direct setup (no SBP-only policy bypass). | `types.rs:160–161` (setup-id try_from), `:225–226` (threshold try_from), `:165` (range check); `session.rs:118` (`handle` → Result), `:130/143/166/182` (single-role), `messages.rs:130–147` (SBP single-evaluate) | **Already SOTA-shaped — NO ACTION** |
+| **`secure_tdm.rs` HMAC + replay** | Beacon authentication (HMAC-SHA256, 8-byte tag), tamper rejection, and replay-window protection are correct and tested. (The non-constant-time compare at `:284` is the only nit — §2.6, deferred as out-of-threat-model for an 8-byte LAN tag.) | `secure_tdm.rs:279` (`verify`), `:284` (compare), tests `:614–673` (replay), `:728` (tamper) | **Correct — NO ACTION (B4 deferred)** |
+| **`netsh_scanner.rs` command + parse** | No shell-injection surface: the scanner uses a **fixed argv** (`Command::new("netsh").args(["wlan","show","networks","mode=bssid"])`) — no shell, no interpolation. Parsing is **`Option`-based** (`try_parse_ssid_line`/`try_parse_bssid_line`/`try_parse_signal_line` → `Option`, with `.unwrap_or(default)`), so hostile/garbled netsh output is silently skipped, never panicked. | `netsh_scanner.rs:50–51` (fixed argv), `:96–102` (`unwrap_or` defaults), `:242/257/270` (`Option` parsers) | **Already safe — NO ACTION** |
+| **`calibration/geometry_embedding.rs` overflow guard** | The geometry embedding clamps every position/std-dev component into `±MAX_COORD_M` (1000 m) via `clamp_m`, explicitly to stop adversarial coordinates from overflowing the covariance accumulation into `inf`; the documented invariant ("every value is finite, never NaN/inf") holds. | `geometry_embedding.rs:55` (`MAX_COORD_M`), `:145/150` (`clamp_m` on centroid + std-dev) | **Already safe — NO ACTION** |
+
+---
+
+## 4. The §D1 perf measurement (MEASURED — honestly near-null end-to-end)
+
+New bench: `crates/wifi-densepose-vitals/benches/vitals_bench.rs`, two functions covering a full-window fill of each extractor.
+
+- **Reproduce:** `cargo bench -p wifi-densepose-vitals --bench vitals_bench`
+  (compile-only: append `--no-run`; the medians below used `-- --warm-up-time 1 --measurement-time 3 --sample-size 20`).
+
+**End-to-end `extract()` full-window fill, medians:**
+
+| Bench | Before (`Vec::remove(0)`) | After (`VecDeque`) | Verdict |
+|-------|---------------------------|--------------------|---------|
+| `heartrate_extract_full_window_1500` | 42.81 ms `[42.19, 42.81, 43.46]` | 44.37 ms `[43.55, 44.37, 45.19]` | **no measurable change** (after marginally slower; intervals overlap) |
+| `breathing_extract_full_window_3000` | 7.95 ms `[7.86, 7.95, 8.05]` | 7.86 ms `[7.66, 7.86, 8.04]` | **no measurable change** (intervals overlap) |
+
+The end-to-end effect is **null within noise** because the per-frame DSP dominates: heartrate runs an O(window·lags) autocorrelation every frame (≈1500·125 multiply-adds), which utterly swamps the O(window) eviction the A1 change improves; breathing's O(window) zero-crossing and the `make_contiguous` rotation are the same order as the old `remove(0)` memmove at these sizes.
+
+**Where the win actually lives (isolated eviction-only microbench, supporting evidence — not in the committed bench):**
+
+| Window | `Vec::remove(0)` (eviction only) | `VecDeque` | Speedup |
+|--------|----------------------------------|------------|---------|
+| 3 000 | 1.00 ms | 0.029 ms | **34.6×** |
+| 20 000 | 94.5 ms | 0.122 ms | **773×** |
+| 100 000 | 3 139 ms | 0.994 ms | **3 158×** |
+
+So A1 is **algorithmically correct and removes a real latent O(n²)** that would bite at higher sample rates or longer analysis windows — but at the **current** ESP32 window sizes the end-to-end win is below noise, and we claim nothing more. This is the §0 contract in action: a perf claim without a measured before/after improvement is **not made**.
+
+---
+
+## 5. The hardware/sensing SOTA landscape (graded — mostly NO-ACTION, honest)
+
+Grades: **MEASURED** (source measured it, ideally public method/code), **CLAIMED** (asserted, no reproducible artifact), **DATA-GATED** (blocked on data we don't have, per a prior ADR-152 measurement).
+
+| # | Area | Candidate / question | Grade | Verdict |
+|---|------|----------------------|-------|---------|
+| 1 | **CSI vital signs (HR/BR)** | Deep-CSI vital-sign models report **MAE ~2–3 BPM** vs our classical IIR-bandpass + autocorrelation/zero-crossing. | **DATA-GATED + CLAIMED** | **NO ACTION on method.** A deep model needs **paired PPG/ECG ground truth** we do not have, and no public ESP32 artifact reproduces the cited MAE on commodity CSI. Our classical method is the honest commodity baseline; the real wins this milestone are the A1/A3 robustness fixes, not a new model. |
+| 2 | **802.11bf-2025 conformance** | Adopt a conformance test-vector suite for the `ieee80211bf/` forward-compat model. | **CLAIMED (not public)** | **NO ACTION.** No commodity silicon ships a conformant 802.11bf interface as of 2026, and the conformance suites are **WBA / Wi-Fi Alliance pre-certification** material, **not public**. Our model's "no OTA encoding until silicon exists" posture (ADR-153) is the correct one. Tracked in §8: *add SBP conformance vectors when the WFA publishes a test plan* — we will **not invent vectors**. |
+| 3 | **Per-room calibration (ADR-151)** | Bank-of-specialists + drift-veto vs a 2026 calibration SOTA. | **CLAIMED on numbers, DATA-GATED on a head-to-head** | **NO ACTION on architecture.** The bank-of-specialists + drift-veto design is SOTA-shaped, but we have **no head-to-head PCK** against a published method (no paired multi-room data). The geometry-conditioned LoRA head is **built-but-unconsumed** and data-gated → **ACCEPTED-FUTURE** (§8), not built now. |
+| 4 | **Multi-BSSID throughput (wifiscan)** | The module docs assert a native `wlanapi.dll` FFI 10–20 Hz path; the current `WlanApiScanner` wraps `netsh` (~2 Hz). | **MEASURED (Milestone-1)** | **IMPLEMENTED + MEASURED — real positive win.** Status corrected: the native FFI is **fully implemented and wired live** (`wlanapi_native::scan_native` calls `WlanOpenHandle`/`WlanEnumInterfaces`/`WlanGetNetworkBssList`/`WlanFreeMemory`/`WlanCloseHandle`; `WlanApiScanner::scan_instrumented` runs it native-first with a netsh fallback). Milestone-1 **measured both paths on this box** (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13) over an identical 10 s wall-clock window via a new `benchmark_backend`: **native 21.42 Hz vs netsh 3.84 Hz = 5.57× MEASURED** (mean 5.0 BSSIDs/scan each; native-only run 18.0 Hz). Native genuinely beats netsh — a real measured multiple, **not** a fabricated 10×; the achieved 21.4 Hz lands in the asserted >2 Hz regime though below the asserted 10–20 Hz upper bound. 50 back-to-back native scans = 50/50 OK, no handle leak. → §8 MEASURED. |
+
+---
+
+## 6. Validation
+
+- **Bug-catching tests verified to bite.** Each §A2/§A3 fix was reverted and the corresponding test observed to fail on the old code, then restored:
+  - `partial_weights_are_renormalized_not_scale_mixed`, `partial_weights_fusion_is_weighted_average` — **assertion failure** (returned the old un-normalized scale-mixed sum) on old code.
+  - `heartrate::low_sample_rate_filter_stays_finite`, `breathing::low_sample_rate_filter_stays_finite` — **panic** (a `filtered_history[i]` is inf/NaN) on old code.
+  - §A1 is the **disclosed bit-identical change**: no behavior test bites (correctly — output is unchanged); the bench (§4) is the gate, and it shows **no measurable end-to-end change**, which we report honestly.
+  - §B1 is on an **unreachable path** (gated upstream), so it carries no new test — disclosed as defense-in-depth, not a live bug.
+- **`cd v2 && cargo test -p wifi-densepose-vitals -p wifi-densepose-hardware -p wifi-densepose-wifiscan -p wifi-densepose-calibration --no-default-features`** — all green. Lib-test counts: `wifi-densepose-vitals` **55** (was 51; +4 net new bug-catching tests — two §A2, two §A3), `wifi-densepose-hardware` **163**, `wifi-densepose-wifiscan` **87**, `wifi-densepose-calibration` **58**. 0 failures across all four.
+- **`cd v2 && cargo test --workspace --no-default-features`** — **3054 passed / 0 failed** (M2 left the workspace at 3050; the +4 net new bug-catching tests are included and green).
+- **`python archive/v1/data/proof/verify.py`** — **`VERDICT: PASS`**, pipeline hash unchanged `f8e76f21…46f7a` (these are Rust-only changes; the Python pipeline proof is independent and confirmed unaffected).
+- New `vitals_bench` compiles and runs under the default feature set.
+- **Disclosed validation limits:** the live-QUIC transport in `secure_tdm` is **structurally** tested (HMAC compute/verify, tamper, replay-window) but **not live-socket-tested** in CI; the serde-gated `ieee80211bf` types are additionally verifiable with `--features serde`. Clippy is not installed in the local 1.89 toolchain, so the per-crate lint pass was not run locally (the project gate is `cargo test`).
+
+---
+
+## 7. What changed, file by file
+
+- `vitals/heartrate.rs` — `filtered_history: Vec<f64>` → `VecDeque<f64>` (`push_back`/`pop_front`, `make_contiguous` once per `extract`); resonator `r` clamped to `[0, 0.9999]`; finite-guard before history push; corrected divergence-condition doc (`|r| ≥ 1`, not "`r` negative"); `low_sample_rate_filter_stays_finite` test.
+- `vitals/breathing.rs` — same `VecDeque` + clamp + finite-guard changes; weighted fusion extracted to `fuse_weighted_residuals` and **normalized by Σ(effective weights)** (the §A2 fix); three new tests (two A2, one A3).
+- `vitals/anomaly.rs`, `vitals/store.rs` — sliding/ring buffers → `VecDeque` (O(1) eviction); `store::history` takes `&mut self` to hand back a contiguous slice via `make_contiguous` (no external callers; observable contents unchanged).
+- `wifiscan/pipeline/breathing_extractor.rs` — `VecDeque<f32>` + `make_contiguous`.
+- `wifiscan/pipeline/correlator.rs` — per-BSSID histories → `Vec<VecDeque<f32>>`; contiguous-ize each touched buffer once before the Pearson pass.
+- `hardware/ieee80211bf/transport.rs` — `n_subcarriers: … as u16` → `u16::try_from(…).ok()?` (§B1 drop-instead-of-truncate, unreachable-path hardening).
+- `vitals/Cargo.toml` + `vitals/benches/vitals_bench.rs` (new) — criterion dev-dep, `[[bench]]`, the §D1 full-window benches.
+
+---
+
+## 8. Deferred backlog (NOT silently dropped)
+
+- **§B4 constant-time HMAC compare** — **RESOLVED (Milestone-1).** Replaced the short-circuiting `==` on the 8-byte tag with a hand-rolled branch-free `constant_time_tag_eq` (XOR-accumulate, no early exit, `#[inline(never)]` + `black_box`). **No new dependency** — the `subtle` crate was the only reason this was deferred, and a fixed 8-byte compare needs none. Pinned by `tag_compare_is_constant_time_shape` (proven to fail on a last-byte-skipping bug). Grade MEASURED (constant-time construction). See §2.6.
+- **802.11bf SBP conformance vectors** (§5 #2) — add real conformance test vectors to the `ieee80211bf/` model **when the Wi-Fi Alliance / WBA publishes a public test plan**. Do not invent vectors before then.
+- **Geometry-conditioned LoRA calibration head** (§5 #3) — built-but-unconsumed and **data-gated** on paired multi-room PCK data (ADR-152 measurement (b): data, not architecture, is the bottleneck). ACCEPTED-FUTURE.
+- **Native `wlanapi.dll` FFI multi-BSSID fast path** (§5 #4) — **RESOLVED + MEASURED (Milestone-1).** The native FFI is implemented and wired live (native-first, netsh fallback). Measured on this box (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13): **native 21.42 Hz vs netsh 3.84 Hz = 5.57×**, mean 5.0 BSSIDs/scan, 50/50 native scans with no handle leak. Real positive result — no fabricated 10×. See §5 #4. (Note: a prior sweep recorded 9.74 Hz on a different/older adapter; the per-adapter number varies, the ratio over netsh is the claim.)
+- **Deep-CSI vital-sign model** (§5 #1) — DATA-GATED on paired PPG/ECG ground truth. No public ESP32 artifact reproduces the cited ~2–3 BPM MAE. Not on the near-term path.
+
+---
+
+## 9. Consequences
+
+**Positive.** The vital-sign extractors now use the correct O(1)-eviction data structure (no latent O(n²)), cannot mis-scale a breathing estimate from a partial attention-weight slice, and cannot be silently killed by a diverging IIR filter at a pathological sample rate. The 802.11bf construction site drops-instead-of-truncates on an (already-gated) oversized count. Most importantly, the layer's existing hardening — length-gated parsers, infallible fixed-width slices, validate-on-deserialize, no-panic FSMs, fixed-argv scanning, HMAC+replay TDM, overflow-clamped geometry embeddings — is now **documented as MEASURED negative results** with file:line evidence, so a reader can verify the "already safe" claims rather than take them on faith.
+
+**Negative / honest limits.** The §A1 perf change is **null end-to-end** at realistic window sizes — we land it for correctness, not speed, and the committed bench proves the null rather than hiding it. The research report's stated §A3 divergence trigger ("`fs` below ~4 Hz") was **physically inaccurate** (divergence needs `|r| ≥ 1` ⇒ `bw ≥ 4`, a far lower `fs`); we corrected it in the code comments and the test parameters and disclose the correction here. The strongest external SOTA candidates (deep-CSI vitals, learned calibration, native FFI scanning) are **all NO-ACTION or ACCEPTED-FUTURE** — data-gated, unmeasured, or blocked on a non-public conformance suite — and **none is presented as more than it is.** §B4 is consciously deferred. Nothing in this milestone is inflated beyond what a reverting reviewer can reproduce.
@@ -0,0 +1,212 @@
+# ADR-158: MAT / World-Model Cluster — Beyond-SOTA Sweep, Anti-"AI-Slop" Hardening
+
+- **Status**: accepted
+- **Date**: 2026-06-11
+- **Deciders**: ruv
+- **Tags**: mat, life-safety, localization, triage, worldmodel, worldgraph, geo, engine, prove-everything
+
+## Context
+
+This ADR records the beyond-SOTA sweep over the MAT / world-model cluster
+(`wifi-densepose-mat`, `-worldmodel`, `-worldgraph`, `-geo`, `-engine`), executed
+under the project's **prove-everything / anti-"AI-slop"** directive: every stub is
+either implemented with real logic or replaced by an honest typed error; no
+fake/always-empty/random outputs; tests pass on real behaviour; results are graded
+**MEASURED** (reproduced here with the command recorded), **CLAIMED**,
+**DATA-GATED** (real code path present, needs hardware/data we lack), or
+**NO-ACTION** (already-SOTA — cited as a positive).
+
+The Mass Casualty Assessment Tool touches life-safety. A triage metric that is
+disconnected from the decision it gates, or a survivor count that inflates, is the
+worst class of slop: it produces confident, wrong rescue prioritisation. An audit
+against live code found six concrete defects, four of which were silent
+correctness bugs (not missing features) in the triage → gate → record path and in
+the localization/dedup path.
+
+Grading vocabulary follows ADR-152 (F-evidence grades) and the sweep convention:
+- **MEASURED** — reproduced in this worktree, command recorded below.
+- **DATA-GATED** — real code path implemented; returns a typed error / honest
+  provenance flag where hardware or labelled data is genuinely absent.
+- **NO-ACTION (already-SOTA)** — audited, found correct, cited as a positive.
+- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
+
+## Graded SOTA Landscape
+
+| Capability | Grade | Note |
+|------------|-------|------|
+| RF-through-rubble survivor detection | **DATA-GATED** | Real detection + triage + localization code paths run end-to-end on real CSI bytes; field detection *accuracy* is unproven without instrumented rubble trials and is **not fabricated** here. |
+| OccWorld occupancy architecture (`-worldmodel`) | **NO-ACTION (current)** | `occupancy.rs` voxel mapping is clamp-proven bounds-safe; converts WorldGraph person positions to a 200×200×16 grid with no out-of-bounds path. |
+| WorldGraph provenance / privacy / pruning (`-worldgraph`) | **NO-ACTION (already-SOTA)** | `graph.rs` implements append-with-provenance (`DerivedFrom`), deterministic LRU pruning, and a privacy rollup (`PrivacyLimitedBy`). Cited as a positive; no changes needed. |
+| Point-cloud parser bounds-safety (`-pointcloud`) | **NO-ACTION (already-SOTA)** | Another agent's crate; cited only — its parser is bounds-checked. Out of scope for this ADR's edits. |
+| Learned multi-person counter | **DATA-GATED** | Deferred; requires labelled multi-occupant CSI. The zone+vitals-signature dedup (below) is the honest non-learned stand-in. |
+| RF point-cloud generation | **ACCEPTED-FUTURE** | Not dropped; tracked as future work. |
+
+## Decision — Fixes Landed (MEASURED)
+
+### §1 Unify the two divergent triage engines (CRITICAL)
+
+**Was:** `EnsembleClassifier::determine_triage` (ensemble gate) and
+`TriageCalculator::calculate` (survivor record) were two different START-protocol
+approximations with different rate bands and movement handling. The pipeline
+gated on the ensemble's confidence (`lib.rs:489`), discarded the ensemble triage
+(`lib.rs:524`, `_ensemble`), and recomputed via `TriageCalculator` in
+`Survivor::new` (`survivor.rs:194`). A survivor could be admitted at one priority
+and recorded at another.
+
+**Now:** `determine_triage` delegates to `TriageCalculator` — the **single source
+of truth** used by both the gate and the survivor record. The only ensemble-
+specific behaviour retained is the confidence gate (low confidence → `Unknown`,
+except `Immediate`, which is never suppressed — a missed survivor in distress is
+costlier than a false positive). Rate bands follow START (<10 / >30 bpm →
+Immediate).
+
+**Failing-on-old test:** `detection::ensemble::tests::test_divergent_boundary_28bpm_tremor_gate_equals_survivor`
+— 28 bpm Normal + Tremor. Old gate → Delayed, old survivor record → Immediate
+(divergent). Unified result: gate == survivor == **Immediate**. Companion tests
+(`test_no_vitals_is_unknown_canonical`, `test_normal_breathing_no_movement_is_immediate_canonical`,
+the updated `integration_adr001::test_ensemble_classifier_triage_logic`) assert
+gate-vs-record equality on every boundary.
+
+### §2 Real RSSI/ToA localization + kill count-inflation (HIGH)
+
+**Was:** `fusion.rs:79 simulate_rssi_measurements` always returned `vec![]`, so
+every survivor got `location: None`, so spatial dedup (`disaster_event.rs:285`,
+which only fired on `Some` location) was disabled. One trapped person re-detected
+across N scan cycles became **N survivors** — a fabricated mass-casualty count.
+
+**Now, two real mechanisms:**
+1. **Real RSSI source:** `SensorPosition` gains an optional `last_rssi`
+   (populated by the hardware layer from actual signal-strength readings).
+   `collect_rssi_measurements` reads only real per-sensor RSSI and feeds the
+   existing triangulator; it **never fabricates** a value. With `< min_sensors`
+   real readings, `estimate_position` returns `None` (honest).
+2. **Zone + vitals-signature dedup:** when no usable location exists,
+   `record_detection` matches an existing *active, un-located* survivor in the
+   same zone whose latest vital signature (breathing presence + START rate band,
+   heartbeat presence, movement class) is compatible — collapsing repeat
+   detections of one person while keeping genuinely distinct survivors separate.
+
+**MEASURED:** `test_identical_vitals_no_location_dedup_to_one` — 3× identical-vitals
+/ `None`-location → **1 survivor** (old code: 3). `test_distinct_vitals_no_location_stay_separate`
+keeps two distinct survivors at 2 (no under-count). `test_estimate_position_uses_real_rssi`
+yields a position from 3 real-RSSI sensors; `test_estimate_position_none_without_real_rssi`
+yields `None` (no fabrication).
+
+### §3 Real ESP32/UDP/PCAP CSI ingest; honest typed errors elsewhere (HIGH)
+
+**Was:** `hardware_adapter.rs read_esp32_csi` / `read_udp_csi` / `read_pcap_csi`
+returned "not yet implemented" — even though `csi_receiver.rs` already contained a
+working `CsiParser` (ESP32 CSV, JSON, Intel5300/Atheros/Nexmon byte decoders) and a
+real `PcapCsiReader`.
+
+**Now:**
+- **UDP** — binds, receives one datagram, parses (auto-detect) → `CsiReadings`.
+  End-to-end test sends a real JSON datagram on the wire.
+- **PCAP** — `load` + `read_next` + parse. End-to-end test writes a real
+  little-endian `.pcap` with one record and reads it back.
+- **ESP32** — parses `CSI_DATA` CSV via the real parser. Live serial byte I/O is
+  behind an optional `serial` cargo feature (native `serialport` kept off the
+  default / aarch64 appliance build); with the feature off, live reads return a
+  typed `UnsupportedAdapter` while the byte parser still works.
+- **Intel 5300 / Atheros / PicoScenes** — return typed
+  `AdapterError::HardwareUnavailable` / `UnsupportedAdapter` (no device, no
+  driver, or no validatable format here). **Never fake CSI.** New error variants
+  added to make the gating typed rather than a `String` "Hardware" soup.
+
+**MEASURED:** `test_esp32_bytes_parse_end_to_end`, `test_udp_read_end_to_end`,
+`test_pcap_read_end_to_end`, `test_intel_and_atheros_are_honestly_unavailable`.
+
+### §4 Real parabolic peak interpolation in `find_dominant_frequency` (MED)
+
+**Was:** `breathing.rs:243` comment claimed interpolation but returned the bin
+center, capping breathing-rate resolution at ±half a bin.
+
+**Now:** 3-point parabolic (quadratic) peak interpolation,
+`δ = 0.5·(yL − yR)/(yL − 2y0 + yR)`, clamped to `[-0.5, 0.5]`, with an edge
+fallback to bin center.
+
+**MEASURED:** `test_find_dominant_frequency_parabolic_interpolation` — for a
+parabola-shaped peak at true bin 10.4 the recovery is exact (δ = 0.4); the test
+asserts the result lands within half a bin of truth and strictly beats the
+old bin-center estimate.
+
+### §5 GDOP honesty (LOW)
+
+**Was:** `triangulation.rs:248 estimate_gdop` returned an ad-hoc average-pair-angle
+factor *labelled* GDOP (the same defect class ADR-156 §2.3 fixed elsewhere).
+
+**Now:** real, dimensionless **GDOP = √(trace((HᵀH)⁻¹))** from the range-measurement
+Jacobian `H` (unit target→sensor bearings), returning `None` for singular
+(collinear) geometry, which the caller treats as factor 1.0 (no fabrication).
+
+**MEASURED:** `test_gdop_is_real_dilution` — a well-spread array gives a lower GDOP
+than a near-collinear one, cross-checked against the closed form;
+`test_gdop_singular_collinear_is_none` confirms singular geometry returns `None`.
+
+### §6 OccWorld trajectory-prior consumer honesty (fail-safe)
+
+**Finding:** `wifi-densepose-mat` does **not** consume OccWorld trajectory priors
+and has no `-worldmodel`/`-worldgraph`/occworld dependency (grep-verified: zero
+hits across `crates/wifi-densepose-mat/`). There is therefore no random-derived
+prior being consumed. **No code change** is warranted; the fail-safe (ignore
+priors until a typed `weights_complete`/`stubbed` flag exists) is already the
+status quo by absence. Recorded here so a future consumer wires the flag rather
+than re-introducing the risk.
+
+## Negative Results (Confirmed — NO-ACTION)
+
+These were audited and found genuinely correct; they are cited as positives, not
+edited:
+
+- **`worldgraph` provenance / privacy / pruning** (`graph.rs`) — append-with-
+  provenance (`add_semantic_state` + `DerivedFrom`), deterministic LRU pruning
+  (`prune_semantic_states`, with `prune_is_deterministic_for_equal_timestamps`),
+  and a privacy rollup (`apply_privacy_mode` → `PrivacyLimitedBy`). Already-SOTA.
+- **`worldmodel` occupancy clamp** (`occupancy.rs:74–125`) — `to_voxel_xy` /
+  `to_voxel_z` `.clamp()` voxel indices into `[0, GRID-1]`; the flat index is
+  always in-bounds. No out-of-bounds / fabrication path.
+- **`pointcloud` parser bounds-safety** — another agent's crate; cited only, its
+  parser is bounds-checked.
+
+## Deferred Backlog (Nothing Dropped)
+
+- **Learned multi-person counter** — DATA-GATED on labelled multi-occupant CSI.
+  The zone+vitals-signature dedup (§2) is the honest non-learned stand-in until
+  then.
+- **RF point-cloud generation** — ACCEPTED-FUTURE.
+- **PicoScenes container decode** — DATA-GATED; needs matching NIC/plugin to
+  validate against. Returns `UnsupportedAdapter` today.
+- **Intel 5300 / Atheros live capture** — DATA-GATED on patched drivers; byte
+  parsers exist and are exercised on supplied bytes.
+
+## Consequences
+
+- Triage is now a single auditable function; gate and survivor record can never
+  diverge.
+- Survivor counts cannot inflate from repeat detection of one un-located person.
+- The CSI ingest layer either produces real data or fails with a typed error that
+  names *why* — no path silently substitutes simulated/fabricated CSI.
+- `SensorPosition` grows an optional `last_rssi` field (serde-`default`, non-
+  breaking for deserialisation; 7 constructors updated).
+- A new optional `serial` feature isolates the native `serialport` dependency from
+  the default / appliance builds.
+
+## Reproduction (MEASURED)
+
+```bash
+cd v2
+# MAT — default features (181 unit + 6 + 3[3 ignored] integration)
+cargo test -p wifi-densepose-mat
+# MAT — all features (same counts; exercises ruvector + api + serde paths)
+cargo test -p wifi-densepose-mat --all-features
+# MAT — serial feature compiles (native serialport path)
+cargo check -p wifi-densepose-mat --features serial
+# Sibling crates (cited NO-ACTION; confirmed green)
+cargo test -p wifi-densepose-worldmodel   # 12 + 1
+cargo test -p wifi-densepose-worldgraph   # 9
+cargo test -p wifi-densepose-geo          # 9 + 8
+cargo test -p wifi-densepose-engine       # 27
+```
+
+Result at time of writing: MAT **181 passed; 0 failed** (default and all-features);
+worldmodel **13**, worldgraph **9**, geo **17**, engine **27** — all 0 failed.
@@ -0,0 +1,242 @@
+# ADR-159: Cognitum Appliance Cluster — Beyond-SOTA Sweep, Anti-"AI-Slop" Hardening
+
+- **Status**: accepted
+- **Date**: 2026-06-11
+- **Deciders**: ruv
+- **Tags**: cognitum, cogs, person-count, pose-estimation, ha-matter, drone-swarm, remote-id, manifest, prove-everything
+
+## Context
+
+This ADR records the beyond-SOTA sweep over the Cognitum appliance cluster
+(`cog-person-count`, `cog-pose-estimation`, `cog-ha-matter`, `ruview-swarm`),
+executed under the project's **prove-everything / anti-"AI-slop"** directive: the
+claim surface every cog presents (manifests, descriptions, runtime events,
+broadcast fields) must match what the code and the shipped weights actually do.
+
+### Headline — the "never identified anyone" accusation is REFUTED
+
+A read-only audit raised the worst-class accusation: that these cogs are slop that
+"never identified anyone." That accusation is **refuted by byte-level evidence**:
+
+- `cog-pose-estimation` and `cog-person-count` ship **real, trained Candle models**
+  (`pose_v1.safetensors`, `count_v1.safetensors`), not placeholders. The forward
+  passes (`PoseNet`, `CountNet`) mirror the training scripts exactly and run on
+  real CSI bytes.
+- The artifacts are **SHA-pinned and Ed25519-signed**: the on-disk
+  `manifests/x86_64/manifest.json` carries a real `binary_sha256`
+  (`051614ce…388b3` for person-count, `a434739a…71fa` for pose), a real
+  `weights_sha256`, and a `binary_signature` over `sig_algo: Ed25519`.
+- The manifests are **brutally honest about accuracy**: person-count's
+  `build_metadata` ships `training_class1_accuracy = 0.343` and a candid
+  `training_caveat`; pose ships `training_pck20 = 3.0` / `training_pck50 = 18.5`.
+  Nothing is inflated. That honesty *is* the anti-slop win — the models are weak
+  in the field, and the manifests say so.
+
+So the cogs **do** run real trained inference and **do** disclose how weak it is.
+What the audit correctly found were not fabrications but **claim-surface
+overclaims** — four places where the surface said more than the weights deliver.
+This ADR tightens those four (A1–A4) and cites the already-correct subsystems as
+NO-ACTION positives.
+
+Grading vocabulary follows ADR-152 / ADR-158:
+- **MEASURED** — reproduced in this worktree, command + failing-on-old test recorded.
+- **DATA-GATED** — real code path present; honestly flagged where data/hardware is absent.
+- **NO-ACTION (already-SOTA)** — audited, found correct, cited as a positive.
+- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
+
+## Graded SOTA Landscape
+
+| Capability | Grade | Note |
+|------------|-------|------|
+| CSI person counting (`cog-person-count`) | **DATA-GATED** | Real Candle count head + Bayesian fusion; weights trained only on classes 0/1 (presence). Multi-occupant accuracy is genuinely unproven and is **not fabricated** — counts above the trained range are now flagged `low_confidence` and clamped. |
+| CSI pose estimation (`cog-pose-estimation`) | **DATA-GATED** | Real Candle encoder + 17-keypoint head; field accuracy honestly weak (PCK@50 = 18.5%, disclosed in the manifest). The default-install gate bug (A1) is fixed so it actually emits frames. |
+| Signed cog manifests (Ed25519 + SHA-256) | **NO-ACTION (already-SOTA)** | On-disk manifests are real, signed, SHA-pinned, and honest about accuracy. The CLI now emits them verbatim (A4). |
+| HA bridge (`cog-ha-matter`) MQTT + witness | **NO-ACTION (already-SOTA)** | Real Ed25519 hash-chain witness, mDNS, embedded broker. Matter commissioning is honestly deferred to v0.8 (TLS off, LAN-only) — description softened to stop claiming Matter (honest-absence). |
+| Drone-swarm MARL (`ruview-swarm`) | **DATA-GATED / honest** | `candle_ppo.rs` is real autodiff PPO; it is **untrained at runtime** (random init) by design — the swarm must be trained before deploy, which the code does not hide. |
+| ASTM F3411 Remote ID | **MEASURED (A3)** | Basic ID message is real; the Location/Vector message is honestly *not* implemented (NED metres are no longer mislabelled as WGS84 lat/lon). |
+
+## Decision — Fixes Landed (MEASURED)
+
+### §A1 Pose runtime emitted ZERO frames under default config (HIGH)
+
+**Overclaim (silent correctness bug):** `inference.rs` hardcoded
+`confidence: 0.185` for every inference, `config.rs default_min_confidence()`
+returned `0.3`, and `runtime.rs` gated emission on `confidence >= min_confidence`.
+A default install therefore **never emitted a single `pose.frame`** while
+`health` reported healthy — the cog *claimed* to be a running pose estimator but
+silently produced nothing.
+
+**Real fix:** `pose_v1` has **no confidence head** (the head emits 34 keypoint
+coordinates only), so a real per-frame confidence is genuinely unavailable. We
+took the disclosed "ok" path rather than silently lowering the threshold:
+- Introduced `inference::MODEL_TYPICAL_CONFIDENCE = 0.185` (the validation PCK@50)
+  as the single published per-frame confidence, used by both `infer()` and the
+  config default.
+- Pinned `default_min_confidence()` to `MODEL_TYPICAL_CONFIDENCE` so a default
+  install clears its own gate and emits.
+- Documented the trade-off in the config field doc, the JSON schema
+  (`default` 0.3 → 0.185, with a description), **and** added a `run.started`
+  warning in `main.rs` that fires when an operator raises `min_confidence` above
+  the model's typical confidence — so a deliberately-high threshold is loud, not
+  silent.
+
+**Failing-on-old test:** `cog_pose_estimation` smoke
+`default_config_emits_frames_with_real_model` — parses a default config and
+asserts `min_confidence <= MODEL_TYPICAL_CONFIDENCE` (and, with the real model
+loaded, that `infer().confidence >= min_confidence`). **Proven to fail** on the
+old `default_min_confidence()=0.3`:
+`default min_confidence 0.3 exceeds model typical confidence 0.185 — a default
+install would emit zero pose.frame events`.
+
+**Grade: MEASURED.**
+
+### §A2 8-class count head on a 2-class-trained model (MEDIUM)
+
+**Overclaim:** `inference.rs COUNT_CLASSES = 8` with argmax over {0..7}, but
+`count_train_results.json` has support only for classes 0 and 1 (`per_class_accuracy`
+keys `"0"`/`"1"`). The model is a **presence detector**, not a calibrated
+multi-occupant counter; an argmax on classes 2..=7 is out-of-distribution, yet the
+cog would emit it as a confident headcount. The Cargo.toml billed it as a
+"learned multi-person counter."
+
+**Real fix (no network change — DATA-GATED, accuracy not fabricated):**
+- Added `inference::MAX_TRAINED_CLASS = 1`, plus `CountPrediction::is_low_confidence()`
+  (argmax beyond the trained ceiling) and `clamped_count()` (report clamped to the
+  trained range, raw argmax kept for audit).
+- `person.count` events now carry `low_confidence` + `raw_count`, and downgrade to
+  `level: "warn"` when out-of-distribution; the reported `count` is clamped so we
+  never emit a fabricated headcount the weights can't back.
+- `run.started` discloses `count_max_trained_class` and `count_classes`.
+- Cargo.toml description changed from "learned multi-person counter" to
+  "presence detector + (data-gated) person count".
+
+**Failing-on-old test:** `cog_person_count` smoke
+`untrained_class_argmax_is_flagged_low_confidence` — a prediction whose argmax is
+class 5 is asserted `is_low_confidence() == true` and `clamped_count() ==
+MAX_TRAINED_CLASS`; a class-1 prediction is asserted *not* flagged. Fails on old
+code (no such methods/flag existed).
+
+**Grade: MEASURED (mechanism); multi-occupant accuracy DATA-GATED.**
+
+### §A3 Remote ID broadcast NED metres as WGS84 lat/lon (MEDIUM — safety/compliance)
+
+**Overclaim (compliance hazard):** `security/remote_id.rs update()` stored
+`state.position.x/.y` (NED **metres**) into `drone_lat`/`drone_lon`, so the Remote
+ID broadcast would carry physically-impossible coordinates (e.g. "latitude =
+37.5 m"). The module doc claimed a "Basic ID + Location/Vector message," but only
+`encode_basic_id()` exists.
+
+**Real fix (honest naming — never broadcast impossible coordinates):**
+- Renamed `drone_lat`/`drone_lon` → `drone_north_m`/`drone_east_m` (NED metres
+  relative to the operator/takeoff datum), with field docs stating they are *not*
+  geodetic. `operator_lat`/`operator_lon` remain true WGS84 (from the operator's
+  GNSS).
+- Corrected the module doc to claim **Basic ID only**; the Location/Vector encoder
+  is explicitly deferred until a datum-anchored NED→WGS84 transform lands
+  (ACCEPTED-FUTURE), rather than removing a real feature.
+
+**Failing-on-old test:** `security::remote_id::tests::test_ned_offset_stored_as_metres_not_latlon`
+— a 37.5 m north / −12.0 m east NED offset is asserted to land in
+`drone_north_m`/`drone_east_m`; the operator's real WGS84 fix stays in range. Fails
+on old code, where these values were stored into `drone_lat`/`drone_lon`.
+
+**Grade: MEASURED.**
+
+### §A4 Hollow CLI manifest (LOW)
+
+**Overclaim:** `cog-person-count main.rs cmd_manifest` emitted a null skeleton
+(`binary_sha256: null`, no training metadata), making the CLI look unsigned even
+though the **real signed manifest** existed at
+`cog/artifacts/manifests/x86_64/manifest.json`.
+
+**Real fix:** new `cog_person_count::manifest` module `include_str!`-embeds the
+real signed manifests (x86_64 + arm), selected by build target arch.
+`cmd_manifest` now parses-then-emits the embedded signed manifest — exactly the
+pattern `cog-pose-estimation`'s `manifest_roundtrips` test demonstrates. The CLI
+now reports the real `binary_sha256`, `weights_sha256`, Ed25519 signature, and
+honest `build_metadata` (`training_class1_accuracy = 0.343`).
+
+**Failing-on-old test:** `manifest::tests::embedded_manifest_has_non_null_binary_sha256`
+asserts a 64-hex-char `binary_sha256`; companions assert the embedded manifest is
+signed (`sig_algo == Ed25519`) and `id == COG_ID`. End-to-end verified:
+`cog-person-count manifest` prints `binary_sha256:
+051614ce6ba63df704fae848a67ad095df4bb88862fdff05ef3c0419cc8388b3`.
+
+**Grade: MEASURED.**
+
+### §A5 cog-ha-matter description claimed Matter before it exists (LOW — honest-labeling)
+
+**Overclaim:** the Cargo.toml description said "Home Assistant + Matter
+integration," but Matter commissioning is deferred to v0.8 (`TlsConfig::Off`,
+LAN-only, asserted by `runtime.rs tls_defaults_to_off_for_v1_lan_only`).
+
+**Real fix (no code change):** softened the description to "Home Assistant (MQTT)
+integration … LAN-only (no TLS); Matter Bridge commissioning is deferred to v0.8
+and not yet implemented." Mirrors ADR-158 §6 honest-absence: state what isn't
+there rather than implying it is.
+
+**Grade: MEASURED (label).**
+
+## Negative Results (Confirmed — NO-ACTION positives)
+
+Audited and found genuinely correct; cited as positives, not edited:
+
+- **`cog-ha-matter` witness chain** (`witness.rs` / `witness_signing.rs`) — real
+  Ed25519 hash-chained witness log. Already-SOTA.
+- **`cog-person-count` fusion** (`fusion.rs`) — real Bayesian product-of-experts
+  multi-node fusion (Stoer-Wagner-bounded clip), not a heuristic. Already-SOTA.
+- **`ruview-swarm` PPO** (`marl/candle_ppo.rs`) — real Candle autodiff PPO with a
+  genuine policy-gradient update; its `randn` uses (init, action sampling,
+  exploration) are all legitimate, not fake-output substitutes. Untrained at
+  runtime by design (the swarm must be trained before deploy), which the code
+  does not hide. Already-SOTA / honest.
+
+## Deferred Backlog (Nothing Dropped)
+
+- **Multi-occupant count accuracy** — DATA-GATED on labelled multi-occupant CSI.
+  The `low_confidence` flag + clamp (§A2) is the honest stand-in until then.
+- **Remote ID Location/Vector message** — ACCEPTED-FUTURE; requires a
+  datum-anchored local-tangent-plane NED→WGS84 transform with an operator datum.
+  Basic ID ships today.
+- **Matter Bridge commissioning** — ACCEPTED-FUTURE (v0.8); LAN-only MQTT ships today.
+- **Criterion benches** for cog inference latency and `mesh_guard` — ACCEPTED-FUTURE
+  (cold-start timings are recorded in the manifests' `build_metadata`, not yet a
+  regression bench).
+- **`wasm-edge` skill accuracy** — unvalidated; **now honestly labelled, not
+  claimed** (done in ADR-160: medical/affect/security/exotic claim surfaces
+  disclaimed, renamed, and feature-gated; per-skill accuracy remains DATA-GATED).
+
+## Consequences
+
+- A default pose-estimation install now actually emits `pose.frame` events;
+  raising the threshold above the model's reach is a loud `run.started` warning,
+  not a silent dropout.
+- A person-count reading on an untrained class is flagged `low_confidence`,
+  clamped, and downgraded to `warn` — no fabricated headcounts.
+- The Remote ID broadcast can never carry physically-impossible coordinates; NED
+  metres live in honestly-named metre fields.
+- `cog-person-count manifest` now reports the real signed manifest instead of a
+  hollow null skeleton.
+- No cog Cargo.toml description claims a capability (multi-person counting, Matter)
+  the code/weights don't yet deliver.
+
+## Reproduction (MEASURED)
+
+```bash
+cd v2
+cargo test -p cog-person-count -p cog-pose-estimation -p cog-ha-matter -p ruview-swarm \
+  --no-default-features
+# ruview-swarm train path compiles (PPO autodiff)
+cargo check -p ruview-swarm --features train
+# A4 end-to-end — real signed manifest, non-null binary_sha256
+cargo run -q -p cog-person-count --no-default-features -- manifest
+```
+
+Result at time of writing (all 0 failed):
+- `cog-person-count` — **19 passed** (lib 10 incl. 3 manifest; smoke 9)
+- `cog-pose-estimation` — **8 passed** (smoke)
+- `cog-ha-matter` — **64 passed** (unchanged; description-only edit)
+- `ruview-swarm` — **117 passed** (default features); `--features train` compiles clean.
+
+Scope was limited to the four named crates. NO-ACTION positives (witness chain,
+fusion, PPO + randn audit) were verified by inspection and left untouched.
@@ -0,0 +1,257 @@
+# ADR-160: Edge Skill Library (`wifi-densepose-wasm-edge`) — Honest Labeling & Soundness Cleanup
+
+- **Status**: accepted
+- **Date**: 2026-06-11
+- **Deciders**: ruv
+- **Tags**: wasm-edge, esp32, edge-skills, claim-surface, medical-overclaim, affect, prove-everything, soundness, static-mut
+- **Amends**: ADR-159 (deferred-backlog line for wasm-edge now TRUE)
+
+## Context
+
+Beyond-SOTA sweep Milestone 6, over `v2/crates/wifi-densepose-wasm-edge` only,
+executed under the project's **prove-everything / anti-"AI-slop"** directive.
+
+### Headline — 0 stubs, 0 theater, all real DSP (REFUTES the slop accusation)
+
+A read-only audit found this crate has **zero stubs and zero fake-output theater:
+every one of the ~70 edge skills runs real DSP** (Welford statistics,
+autocorrelation, DTW, sliced-Wasserstein, ISTA-style recovery, Kalman/HNSW, etc.).
+The forward paths are genuine signal processing on real CSI-derived inputs. That
+is the anti-slop win and it is cited here as a positive, not a fabrication.
+
+What the audit correctly found was **not fake code but an over-confident claim
+surface**: skill *names* and doc-comments asserting clinical/affective/security
+capabilities that the **unvalidated** code cannot back, concentrated in the
+medical (`med_*`) and affect (`exo_happiness`/`exo_emotion`) skills. The fix is
+**honest labeling — making the labels TRUE — NOT making the claimed capability
+real.** You cannot validate seizure detection, affect inference, or weapon
+discrimination without clinical/labelled data and reference standards; this ADR
+does not pretend to. It disclaims, renames, softens, and feature-gates so the
+surface matches what the DSP actually delivers.
+
+Grading vocabulary follows ADR-152 / ADR-158 / ADR-159:
+- **MEASURED** — reproduced in this worktree, command + failing-on-old test recorded.
+- **DATA-GATED** — real code path present; honestly flagged where data is absent.
+- **NO-ACTION (already-honest)** — audited, found correct, cited as a positive.
+- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
+
+## Per-prefix classification
+
+| Prefix | Class | Note |
+|--------|-------|------|
+| `sig_*` (signal intelligence) | **REAL-DSP, honest** | Algorithm-named (flash-attention, sparse-recovery, optimal-transport, temporal-compress, mincut). Names describe the math, not an overclaimed outcome. NO-ACTION on labels; A5 soundness applied. |
+| `lrn_*` (adaptive learning) | **REAL-DSP, honest** | DTW/EWC/meta-adapt/attractor — algorithm-named. NO-ACTION on labels; A5 applied. |
+| `spt_*` / `tmp_*` | **REAL-DSP, honest** | PageRank/HNSW/spiking-tracker; LTL-guard/GOAP/pattern-sequence. Algorithm-named. NO-ACTION on labels; A5 applied. |
+| `qnt_*` | **REAL-DSP, honest (disclosed analogy)** | "quantum-**inspired**" / Grover-**inspired** are already disclosed analogies. NO-ACTION (DO-NOT-touch); A5 applied (mechanical, no label/behavior change). |
+| `bld_*` / `ret_*` / `ind_*` / `occupancy`/`intrusion` | **REAL-DSP, honest** | Occupancy/queue/forklift/clean-room etc. describe physical observables. NO-ACTION on labels; A5 applied. |
+| `sec_weapon_detect` | **REAL-DSP, overclaiming NAME** → fixed (A3) | Variance-ratio reflectivity renamed off "weapon". |
+| `med_*` (5) | **REAL-DSP, overclaiming NAME/DOC** → fixed (A1) | Clinical detection asserted as fact; now disclaimed + softened + feature-gated. |
+| `exo_happiness` / `exo_emotion` | **REAL-DSP, overclaiming NAME/DOC** → fixed (A2) | Affect outputs reframed as proxies; uncited stat removed. |
+| `exo_dream_stage` / `exo_gesture_language` | **REAL-DSP, quasi-medical/over-named** → fixed (A4) | Disclaimers added; Research tag promoted to header. |
+| `exo_time_crystal` / `exo_ghost_hunter` | **REAL-DSP, honest novelty** | Disclosed exploratory/novelty skills. NO-ACTION (DO-NOT-touch); A5 applied. |
+| `nvsim` | out of scope | Disclaimer gold standard; copied its tone. |
+
+## Decision — Fixes Landed
+
+### §A1 Medical overclaim (HIGH) — MEASURED
+
+The five `med_*` modules (`med_seizure_detect`, `med_cardiac_arrhythmia`,
+`med_respiratory_distress`, `med_sleep_apnea`, `med_gait_analysis`) stated clinical
+detection as fact with no disclaimer ("Detects tonic-clonic seizures…").
+
+**Real fix (honest labeling — the DSP is kept, untouched):**
+- **(a)** Every module's `//!` header now carries a mandatory disclaimer block,
+  modelled on `sec_weapon_detect.rs` and `nvsim/src/lib.rs`: *"EXPERIMENTAL
+  RESEARCH MODULE — NOT VALIDATED AGAINST CLINICAL DATA. NOT A MEDICAL DEVICE.
+  Flags candidate <X>-like signatures only,"* citing ADR-160.
+- **(b)** Doc verbs softened: *"Detects tonic-clonic seizures"* →
+  *"Flags candidate tonic-clonic-seizure-like motion signatures (experimental)"*;
+  similarly for cardiac/respiratory/apnea/gait.
+- **(c)** All five gated behind a new **non-default** cargo feature
+  `medical-experimental` (`#[cfg(feature = "medical-experimental")]` in `lib.rs`,
+  `medical-experimental = []` in `Cargo.toml`, **not** in `default`) so they cannot
+  be silently built into a shipping artifact.
+
+**Failing-on-old tests** (`tests/honest_labeling.rs`):
+`a1_med_modules_have_clinical_disclaimer`,
+`a1_med_modules_gated_behind_medical_experimental`,
+`a1_seizure_verbs_softened`. All fail on the old, undisclaimed, ungated source.
+**Grade: MEASURED (label); per-skill clinical accuracy DATA-GATED.**
+
+### §A2 Affect overclaim (HIGH) — MEASURED
+
+`exo_happiness_score.rs` carried an **uncited** "Happy people walk ~12% faster"
+statistic and emits `HAPPINESS_SCORE`; `exo_emotion_detect.rs` emits
+`STRESS_INDEX`/`CALM_DETECTED`/`AGITATION_DETECTED`.
+
+**Real fix (honest labeling — math kept):**
+- Deleted the uncited "12% faster" / "~12% above" / "Happy people walk" statements.
+- Added a prominent *"speculative, unvalidated affect heuristic; outputs are NOT
+  measurements of emotion"* disclaimer to both `//!` headers, citing ADR-160.
+- Reframed `HAPPINESS_SCORE` in the docs as a **"gait-energy proxy, not a validated
+  affect measure."**
+
+**Failing-on-old tests:** `a2_affect_modules_have_unvalidated_disclaimer`,
+`a2_uncited_12_percent_stat_removed`, `a2_happiness_reframed_as_proxy`.
+**Grade: MEASURED (label); affect validity DATA-GATED.**
+
+### §A3 Security event-name overclaim (MEDIUM) — MEASURED
+
+`sec_weapon_detect.rs`'s module doc was already honest (research-grade,
+calibration-required), but the event/const names claimed weapon-grade
+discrimination a variance ratio cannot deliver.
+
+**Real fix (honest physical-quantity naming — behavior unchanged):**
+- `EVENT_WEAPON_ALERT` → `EVENT_HIGH_METAL_REFLECTIVITY` (event id 221 unchanged).
+- `WEAPON_RATIO_THRESH` → `HIGH_REFLECTIVITY_THRESH`.
+- Internal fields/consts renamed (`weapon_run`→`high_refl_run`,
+  `cd_weapon`→`cd_high_refl`, `WEAPON_DEBOUNCE`→`HIGH_REFLECTIVITY_DEBOUNCE`).
+- `lib.rs` `event_types` registry: `WEAPON_ALERT` → `HIGH_METAL_REFLECTIVITY`.
+- A reflectivity-vs-weapons honest-naming note added to the header.
+The detector still flags a high amplitude-variance/phase-variance ratio (real RF
+reflectivity); it just no longer *names* that "weapon".
+
+**Failing-on-old tests:** `a3_weapon_names_renamed_to_reflectivity`,
+`a3_registry_no_longer_exports_weapon_alert` (registry no longer exports a
+`WEAPON_ALERT` name). **Grade: MEASURED.**
+
+### §A4 Quasi-medical / sign-language exotic modules (MEDIUM) — MEASURED
+
+`exo_dream_stage.rs` ("sleep stage classification", quasi-medical) and
+`exo_gesture_language.rs` ("sign language letter recognition").
+
+**Real fix (honest labeling — DSP kept):** added an experimental "NOT VALIDATED"
+disclaimer to each `//!` header (citing ADR-160) and promoted the
+**Exotic/Research** registry tag into the header where a reader sees it.
+`exo_gesture_language` additionally states it is a coarse gesture-cluster
+classifier that **does not recognize true sign language** (never evaluated on a
+labelled ASL set).
+
+**Failing-on-old test:** `a4_exotic_modules_have_experimental_disclaimer`.
+**Grade: MEASURED (label); accuracy DATA-GATED.**
+
+### §A5 `static mut` event-buffer soundness (MEDIUM) — the one real code fix — MEASURED
+
+~61 per-call event scratch buffers across the crate used a module-level
+`static mut EVENTS: [(i32,f32); N]` (a handful named `EV`/`TE`/`EMPTY`) and returned
+`&EVENTS[..n]`. On a `cdylib`+`rlib` linkable into multithreaded/reentrant host
+code this is latent aliasing UB, and `static_mut_refs` is deny-by-default on newer
+Rust.
+
+**Real fix (mechanical, behavior-preserving):** moved each scratch buffer off
+`static mut` into an **owned per-instance field** (`events: [(i32,f32); N]` on the
+detector struct, written via `&mut self` and returned as `&self.events[..n]`). The
+public `-> &[(i32, f32)]` signature is **unchanged**, so no caller (in-module
+tests, `ghost_hunter` bin, `budget_compliance`) needed editing. Two helper methods
+that built events under `&self` (`spt_pagerank_influence::build_events`,
+`spt_spiking_tracker::build_events`) and `sig_temporal_compress::on_timer` were
+promoted to `&mut self`. Leftover now-redundant `unsafe { }` wrappers were removed.
+
+**Count: 61 scratch buffers across 60 module files fixed** (the only `static mut`
+left in `src/` are the two **legitimate WASM module singletons** — `lib.rs STATE`
+and `bin/ghost_hunter.rs DETECTOR` — `#[cfg(target_arch="wasm32")]`,
+`#[no_mangle]`, accessed via `core::ptr::addr_of_mut!`, single-threaded by the
+wasm runtime contract; these are *not* the aliasing-UB scratch pattern and are
+left as-is).
+
+**Verification:** the full host build (`--features std` and
+`std,medical-experimental`) compiles with **0 warnings** — there is no longer any
+`static mut <name>` + `&<name>` source for `static_mut_refs` to fire on in the 60
+fixed modules. (The pure-`wasm32-unknown-unknown` build, where the lint is
+deny-by-default, could not be run in this worktree because the `wasm32` target is
+not installed on the build toolchain; the source-level elimination is the
+evidence, asserted per-module by `a5_claim_bearing_modules_have_no_static_mut_event_buffer`.)
+**Grade: MEASURED (source-eliminated; residual = 2 legitimate singletons).**
+
+## Negative Results (NO-ACTION positives — cited, not edited for labels)
+
+Audited and found genuinely honest; cited as positives:
+- **`qnt_quantum_coherence.rs`** — discloses "quantum-**inspired**" analogy.
+- **`exo_time_crystal.rs`**, **`exo_ghost_hunter.rs`** — disclosed exploratory/novelty.
+- **`qnt_interference_search.rs`** — disclosed "Grover-**inspired**".
+- **`sig_*` / `lrn_*`** algorithm-named skills — names describe the DSP, not an outcome.
+- **`nvsim`** — out of scope; the project's disclaimer gold standard (its tone was
+  copied into the A1/A2/A4 disclaimers).
+
+(These were A5-soundness-fixed mechanically where they used `static mut`, with no
+label or behavior change, consistent with leaving their claim surface intact.)
+
+## Deferred Backlog (Nothing Dropped)
+
+- **Per-skill accuracy validation** — **PARTIALLY MEASURED-on-synthetic**
+  (2026-06-13). For the subset of skills whose detection target is *constructible*
+  with known ground truth, a synthetic-ground-truth harness
+  (`tests/synthetic_validation.rs`, 12 tests) plants signals with known answers,
+  runs the real detector, and **measures** detection accuracy / rate-error:
+  `vital_trend`, `exo_time_crystal` (periodic-vs-aperiodic — its sub-harmonic-vs-
+  clean-period claim is NOT separable, recorded honestly), `exo_ghost_hunter`
+  (hidden breathing), `occupancy`, `intrusion`, `exo_rain_detect`,
+  `sig_flash_attention` (8/8 peak localization), `spt_spiking_tracker` (4/4 zone
+  localization, sparse plant), `sig_optimal_transport`, `sig_mincut_person_match`
+  (0 id-swaps), `lrn_dtw_gesture_learn` (enrollment) — all 1.000 where claimed;
+  `sig_sparse_recovery`'s recovery accuracy is reported **negative** (−2.2% vs
+  unrecovered baseline) — only its trigger path is validated. Full numbers +
+  reproduce commands in `benchmarks/edge-skills/RESULTS.md`.
+  The **med_*/affect/sign-language/weapon** claims remain **DATA-GATED**:
+  validating them requires labelled clinical/affective/ASL/metal-object data and
+  reference standards that do not exist in this repo. Planting a "seizure-/weapon-/
+  happy-like" synthetic signal validates nothing real and is explicitly refused;
+  RESULTS.md lists each with the real data it needs. The disclaimers + feature gate
+  are the honest stand-in. Nothing is claimed that is not measured.
+- **Unified edge pipeline** — **MEASURED** (2026-06-13). `src/pipeline_all.rs`
+  (`EdgePipeline`) + `src/skill_registry.rs` register **every** runtime skill
+  behind one uniform `EdgeSkill` trait and run them all per CSI frame; `med_*` are
+  registered only under `--features medical-experimental` (preserves the §A1 gate).
+  `tests/pipeline_all.rs` (4 tests) proves all 59 default / 64 medical skills run
+  without panic over 300 synthetic frames with a well-formed aggregated event
+  stream. `examples/run_all_skills.rs` is a runnable demo. No skill DSP changed.
+- **Criterion benches for `process_frame` budget claims** — **DONE (host)**
+  (ADR-163, 2026-06-12). `benches/process_frame_bench.rs` benches the heaviest
+  hot paths (`exo_time_crystal` 256×128 autocorrelation, `exo_ghost_hunter`
+  periodicity, `sec_weapon_detect` per-subcarrier Welford, `med_seizure_detect`
+  clonic rhythm) and reports committed **host** medians
+  (`benchmarks/edge-latency/RESULTS.md`). `tests/budget_compliance.rs` continues
+  to assert the L/S/H tier wall-clock budgets (25 tests, passing). **ESP32-on-
+  hardware (Xtensa/WASM3) latency remains PENDING** — the host bench is an
+  upper-bound algorithm-cost proxy, NOT the ESP32 figure (needs hardware).
+- **`wasm32-unknown-unknown` `static_mut_refs` confirmation** — **ACCEPTED-FUTURE**
+  (toolchain): the source pattern is eliminated; a CI job on the wasm target should
+  assert zero `static_mut_refs` once the target is added to the build image.
+- **The 2 residual `static mut` singletons** (`lib.rs STATE`, `ghost_hunter DETECTOR`)
+  — **ACCEPTED-FUTURE**: these are the canonical wasm module-state pattern; migrating
+  them to a safe cell is a separate, larger change with no current UB (single-threaded
+  wasm runtime, `addr_of_mut!` access).
+
+## Reproduction (MEASURED)
+
+```bash
+cd v2/crates/wifi-densepose-wasm-edge   # excluded from the v2 workspace; build here
+cargo test --features std                          # default
+cargo test --features std,medical-experimental     # med_* skills enabled
+cargo test --no-default-features --features std     # no default-pipeline
+cargo test --features std --test honest_labeling   # A1–A5 label invariants
+```
+
+(`std` is required for host tests — the crate is `no_std` for `wasm32`; pure
+`--no-default-features` builds only on `wasm32-unknown-unknown`, where it
+intentionally has no panic handler on the host.)
+
+Result at time of writing (all 0 failed):
+- **DEFAULT** (`--features std`) — **615 passed** (lib 504; budget 25; honest_labeling 10; bench 1; vendor 75)
+- **MEDICAL** (`--features std,medical-experimental`) — **653 passed** (lib 542; +38 med_* tests; others unchanged)
+- **NO-DEFAULT** (`--no-default-features --features std`) — **615 passed**
+- Full host build emits **0 warnings**; **61** `static mut` scratch buffers eliminated, **2** legitimate wasm singletons remain.
+
+## Consequences
+
+- No edge skill's name or doc-comment claims a clinical, affective, security, or
+  sign-language capability the unvalidated DSP cannot back.
+- The five medical skills cannot be silently compiled into a shipping artifact
+  (non-default `medical-experimental` gate).
+- The security skill can never emit a "weapon alert" — it reports
+  `HIGH_METAL_REFLECTIVITY`, the physical quantity it actually measures.
+- The latent `static mut` aliasing-UB / `static_mut_refs` exposure is removed from
+  60 modules; the public API and all runtime behavior are unchanged (615/653 tests
+  prove behavior preservation).
+- ADR-159's deferred-backlog statement *"wasm-edge … honestly labelled, not
+  claimed"* is now actually TRUE.
@@ -0,0 +1,338 @@
+# ADR-161: HOMECORE Server Layer — WebSocket Auth Bypass, Reply-Theater & Documented-but-No-Op Automation (Security & Honest Labeling)
+
+- **Status**: accepted
+- **Date**: 2026-06-12
+- **Deciders**: ruv
+- **Tags**: homecore, http-ws-boundary, websocket-auth-bypass, security, automation-engine, documented-no-op, prove-everything, soundness, honest-labeling
+- **Amends**: ADR-130 (HOMECORE-API WS protocol), ADR-129 (HOMECORE-AUTO automation engine), ADR-128 (plugin manifest)
+
+## Context
+
+Beyond-SOTA sweep **Milestone 7**, over the HOMECORE **server/network layer**
+crates only — `homecore-api`, `homecore-server`, `homecore-automation`,
+`homecore-hap`, `homecore-plugins` — executed under the project's
+**prove-everything / anti-"AI-slop"** directive.
+
+### Headline — the library cores are real, but the network boundary was unsound
+
+The same audit pattern as ADR-160 held for the *library logic*: the automation
+trigger/condition/template/action evaluators, the REST handlers, the HAP
+mapping, and the plugin manifest parser are **real, tested code** — not stubs.
+That is the anti-slop positive and it is cited here as such.
+
+What the audit found was **not fake business logic but an unsound trust
+boundary plus documented-but-no-op features**:
+
+1. A **CRITICAL WebSocket authentication bypass** — the WS handshake accepted
+   any non-empty token, ignoring the provisioned token whitelist the REST path
+   enforces.
+2. **Reply-theater** — WS command responses were computed, then logged and
+   **discarded**; no `result`/`pong`/`event` ever reached the client.
+3. **Documented-but-idle automation** — the engine was constructed and dropped
+   (never started); time triggers, `RunMode`, `Choose` branches, and template
+   conditions were each **documented as working but were no-ops in the live
+   path**.
+
+This is a worse class than ADR-160's over-naming: here the **doc claimed a
+capability the code did not deliver** (auth enforcement, reply transport,
+running automations). The fix is **implement where feasible, honestly relabel
+where not — never leave a false doc.** Every fix is pinned by a test that
+**fails on the old code**.
+
+Grading vocabulary (ADR-152 / ADR-158 / ADR-160):
+- **MEASURED** — reproduced in this worktree, command + failing-on-old test recorded.
+- **NO-ACTION (already-honest/already-hardened)** — audited, found correct, cited as a positive.
+- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
+
+## Decision — Fixes Landed
+
+### §A1 — WebSocket auth bypass (CRITICAL, security) — MEASURED
+
+`homecore-api/src/ws.rs` handshake checked only `token.trim().is_empty()` and
+sent `auth_ok` for **any** non-empty token. It never called
+`state.tokens().is_valid()` — the check the REST path uses via
+`auth::BearerAuth`. With a provisioned `HOMECORE_TOKENS` whitelist, **any
+attacker-chosen non-empty token got full WS access** (read all states, call any
+service, subscribe to all events).
+
+**Real fix:** the handshake now calls
+`state.tokens().is_valid(&token).await` (the *same* store + method as REST).
+A wrong token receives `auth_invalid` and the socket closes. DEV (`allow_any`)
+mode still accepts any non-empty bearer with a warn, so smoke tests keep
+working; the empty token is rejected inside `is_valid`.
+
+**Failing-on-old test** (`tests/ws_handshake.rs`):
+`wrong_token_is_rejected` — provisions a real (non-dev) store with one good
+token, sends a DIFFERENT non-empty token over the WS handshake, asserts
+`auth_invalid`. On the old source the client received
+`{"type":"auth_ok",…}` (verified: the test panics on old `ws.rs` with
+`left: "auth_ok", right: "auth_invalid"`). Companion: `correct_token_is_accepted`.
+**Grade: MEASURED. This is the milestone headline.**
+
+### §A2 — WS replies never transmitted (HIGH, functional) — MEASURED
+
+`ws.rs::Connection::run` moved the socket into a recv-only task; the only
+consumer of the response mpsc just did `debug!("ws emit: {msg}")` and dropped
+every message. No command reply ever reached the wire.
+
+**Real fix:** the socket is split with `futures_util::StreamExt::split`. A
+dedicated **writer task** drains the response channel onto `sink.send(...)`
+(text frames; a `__pong:<n>` sentinel maps to a Pong control frame); the reader
+task parses commands concurrently. On reader exit the senders drop and the
+writer task ends cleanly.
+
+**Failing-on-old tests:** `result_reply_is_received` (connect → auth →
+`get_states` → assert a `result` reply is RECEIVED within 5s) and
+`ping_pong_reply_is_received`. Both time out on the old source (verified:
+`Elapsed` panic). **Grade: MEASURED.**
+
+### §A8 — `homecore-api` bin: no env-token path, network-exposed (HIGH, security) — MEASURED
+
+`homecore-api/src/bin/server.rs` bound `0.0.0.0:8123` with
+`SharedState::new()` → `allow_any_non_empty()` and **no** `HOMECORE_TOKENS`
+path (unlike `homecore-server`), so a provisioned operator had no way to lock
+it down.
+
+**Real fix:** the bin now mirrors `homecore-server`'s provisioning — prefer the
+`HOMECORE_TOKENS` whitelist (`LongLivedTokenStore::from_env()`), fall back to an
+**explicitly warn-logged** DEV mode only when unset. It also defaults the bind
+address to **`127.0.0.1`** (loopback) so a bare `cargo run` is not
+network-exposed, with `HOMECORE_BIND` to opt into LAN.
+
+**Failing-on-old test** (`tests/server_bin_auth.rs`):
+`provisioned_bin_rejects_wrong_bearer` reproduces the bin's exact provisioning
+path (a populated, non-dev store) and asserts a wrong bearer → 401;
+`from_env_path_enforces_whitelist` proves `from_env()` is not dev mode and
+enforces the list. The old bin's `allow_any_non_empty()` accepted the wrong
+bearer. **Grade: MEASURED.**
+
+### §A3 — Automation engine never started (HIGH) — MEASURED
+
+`homecore-server/src/main.rs` did `let _automation_engine = AutomationEngine::new(...)`
+then dropped it immediately, while the header doc claimed "Automation engine
+subscribed to the state machine."
+
+**Real fix:** the engine is now built into a long-lived binding and `.start()`
+is called, spawning the event loop + timer task; the header/log lines state it
+is started with N automations and which trigger classes are active. (With A4–A7
+the running engine is genuinely functional, not theater.)
+
+**Evidence:** the engine-behavior tests below run against the same
+`AutomationEngine::start()` path now wired into the bin. **Grade: MEASURED.**
+
+### §A4 — `Trigger::Time` hard-coded `false`, no timer (HIGH) — MEASURED
+
+`trigger.rs::matches_sync` returned `false` for `Time` and there was **no timer
+task** anywhere, so time automations could never fire.
+
+**Real fix:** `AutomationEngine::start_timer` — a 1 Hz tokio interval that
+compares each `time:` automation's `at` (`HH:MM` or `HH:MM:SS`) against the
+local wall-clock second and fires it once per match (conditions still gate it).
+`matches_sync` returning `false` for `Time` is now **correct and documented**
+(it is a wall-clock trigger with no state-change context); a public
+`fire_time_for_test` exposes the same path deterministically.
+
+**Failing-on-old test** (`tests/engine_behaviors.rs`):
+`time_trigger_fires_via_timer_path` (+ unit `time_at_matches_handles_hh_mm_and_hh_mm_ss`).
+The method does not exist on the old engine. **Grade: MEASURED.**
+
+### §A5 — `RunMode` documented as AtomicBool-enforced but unbounded-parallel (HIGH) — MEASURED
+
+`engine.rs` doc claimed "RunMode::Single is enforced via a per-automation
+AtomicBool" — but no such code existed and **every** trigger spawned an
+unbounded parallel task regardless of `mode`.
+
+**Real fix:** each registered automation carries a `running: Arc<AtomicBool>`.
+`Single`/`IgnoreFirst` modes `compare_exchange` the flag before spawning and
+**skip** the trigger if a run is already in flight, clearing it on completion;
+`Parallel` (and, for now, `Restart`/`Queued`) spawn on every trigger.
+
+**Failing-on-old tests** (`tests/engine_behaviors.rs`):
+`single_mode_does_not_double_fire_on_rapid_triggers` (two rapid triggers while
+the first run sleeps → exactly **1** run; old code fired **2**, verified) and
+`parallel_mode_does_fire_concurrently` (→ 2). **Grade: MEASURED (Single/Parallel
+honored; bounded `Queued`/`Restart`/`max` ordering → ACCEPTED-FUTURE, see below).**
+
+### §A6 — `Action::Choose` ignored branches (HIGH) — MEASURED
+
+`action.rs` discarded `choices` and always ran `default`.
+
+**Real fix:** `ChoiceBranch::matches` deserialises each branch's
+`serde_yaml::Value` conditions into `Condition` and evaluates them (AND
+semantics, against an `EvalContext` now carried on `ExecutionContext`). `Choose`
+runs the **first matching branch's** sequence and falls to `default` only if
+none match.
+
+**Failing-on-old tests** (`action.rs` inline):
+`choose_runs_matching_branch_not_default` (matching branch runs, default does
+NOT — old code ran default, verified) and
+`choose_falls_to_default_when_no_branch_matches`. **Grade: MEASURED.**
+
+### §A7 — Template conditions always false in the live engine (MEDIUM) — MEASURED
+
+`condition.rs` returned `false` for `Template` whenever `template_env` was
+`None`, and the engine built every `EvalContext` with `template_env: None`
+(`EvalContext::new`), so `template:` conditions could never be true in
+production — only in unit tests that hand-built a template env.
+
+**Real fix:** the engine constructs one `TemplateEnvironment` over the state
+machine and threads it into every `EvalContext` via
+`EvalContext::with_templates` (event loop, timer task, and
+`ExecutionContext` for `Choose` branches).
+
+**Failing-on-old tests** (`tests/engine_behaviors.rs`):
+`template_condition_evaluates_true_in_engine` (a `{{ is_state(...) }}` condition
+gates an action true) and `template_condition_evaluates_false_blocks_action`.
+On the old engine the action never ran (template always false, verified).
+**Grade: MEASURED.**
+
+### §B5 — Plugin manifest sig/hash "verified before execution" doc was false (LOW, honesty) — relabeled
+
+`homecore-plugins/src/manifest.rs` documented `wasm_module_hash` as "verified
+before execution" and carried `wasm_module_sig` / `publisher_key`, but these
+fields are **never read** for verification (only ever set to `None` in tests).
+
+**Fix (honest labeling — no false capability claimed):** the three fields are
+re-doc'd **"(P4 — not yet enforced, ADR-161/B5)"** — parsed and round-tripped,
+but no integrity/signature check happens before a plugin runs. No verification
+code was added (that is P4); the doc now matches the code.
+**Grade: doc-honesty (no behavior change).** *(Superseded by ADR-162 §P4:
+the hash/signature gate is now implemented and enforced.)*
+
+## Negative Results (NO-ACTION positives — audited, found correct, cited not edited)
+
+These were checked and are genuinely sound/honest; cited as positives, **not**
+touched:
+- **CSPRNG correctness** — all IDs are `uuid::v4`; the rng/`randn` suspicion was
+  **REFUTED**. No weak-randomness issue exists.
+- **CORS allowlist** (`app.rs`) — already hardened (explicit `AllowOrigin::list`,
+  no `permissive()`, `allow_credentials(false)`, env override). NO-ACTION.
+- **No path traversal in `homecore-migrate`** — audited, clean.
+- **No secrets in logs** — audited, clean.
+- **HAP pairing stub** — honestly disclaimed as a surface stub; not over-claimed.
+- **`InProcessRuntime` "no sandbox" disclaimer** — honest; left as-is.
+
+## Deferred Backlog (Nothing Dropped)
+
+- **Plugin authority-isolation (P5)** — ~~`homecore_permissions` claims are parsed
+  but not enforced at the host-call boundary.~~ **DONE — ADR-162 §P5.**
+  `hc_state_set` now consults a `PermissionSet` distilled from the manifest;
+  an undeclared write returns a typed `-3` to the guest.
+- **Plugin signature/hash verification (P4)** — ~~implement the
+  `wasm_module_hash`/`wasm_module_sig`/`publisher_key` gate that B5 now honestly
+  says is absent.~~ **DONE — ADR-162 §P4.** `WasmtimeRuntime::load_plugin` now
+  SHA-256-checks the module, Ed25519-verifies the signature against
+  `publisher_key`, and enforces a `PluginPolicy` trust allowlist
+  (secure-default rejects unsigned/untrusted/tampered modules).
+- **HAP real pairing (P2)** — SRP/HKDF pairing + encrypted sessions; current
+  bridge is an accessory-mapping surface. **ACCEPTED-FUTURE (honestly stubbed).**
+- **`RunMode::Queued`/`Restart`/`max` ordering** — ~~`Single`/`Parallel` are
+  honored; bounded queueing, restart-kill, and `max` concurrency are not yet
+  wired (every non-Single mode is parallel).~~ **DONE — ADR-162 §A5.** Restart
+  aborts the in-flight task, Queued serializes via a per-automation async mutex,
+  and `max: N` caps concurrency via a per-automation semaphore.
+- **Automation YAML load-at-boot** — the engine starts empty; a YAML loader is
+  P-next. The bin log states "0 automations registered" honestly.
+
+## Reproduction (MEASURED)
+
+```bash
+cd v2
+cargo test -p homecore-api -p homecore-server -p homecore-automation -p homecore-hap --no-default-features
+cargo test -p homecore-plugins --features wasmtime
+cargo build --workspace --no-default-features
+```
+
+Result at time of writing (all 0 failed):
+- **homecore-api** — **25 passed** (lib 18; `server_bin_auth` 3; `ws_handshake` 4)
+- **homecore-automation** — **42 passed** (lib 37; `engine_behaviors` 5)
+- **homecore-hap** — **17 passed**
+- **homecore-server** — bin, **0 tests**
+- (**homecore-plugins** — **15 passed**: lib 12; integration 3)
+- Full workspace `cargo build --workspace --no-default-features` succeeds.
+
+## Consequences
+
+- The WebSocket path can no longer be entered with a forged token — it enforces
+  the same `LongLivedTokenStore` whitelist as REST (A1).
+- WS clients now actually receive `result`/`pong`/`event` frames (A2).
+- The `homecore-api` dev bin defaults to loopback and honors `HOMECORE_TOKENS`
+  (A8); it is no longer an open `0.0.0.0` accept-any endpoint by default.
+- The automation engine is started for real and its time triggers, `Single`
+  run-mode, `Choose` branches, and `template:` conditions all function — no doc
+  claims a capability the code lacks (A3–A7).
+- The plugin manifest no longer claims signature verification it does not
+  perform (B5).
+- Files kept under the 500-line guideline (`engine.rs` 462; behavioral tests
+  moved to `tests/engine_behaviors.rs`).
+
+## Addendum — `homecore-api` follow-up security review (beyond-SOTA pass)
+
+A later network-facing review of `homecore-api` (the remote REST + WS attack
+surface) — independent of the ADR-154–159 sweep — found and fixed two real
+issues the original M7 pass (which focused on the WS auth bypass HC-WS-01, the
+reply-theater HC-WS-02, and the bin token provisioning HC-WS-08) did not catch.
+Both are LOW severity and reported at true severity.
+
+### HC-API-AUTH-01 — `GET /api/` was unauthenticated (FIXED)
+
+`rest::api_root` took no headers and unconditionally returned
+`200 {"message":"API running."}`, while every sibling route gates on
+`BearerAuth::from_headers`. HA's `APIStatusView` inherits `requires_auth = True`,
+so `/api/` must return **401** for a missing/wrong bearer. HA clients use the
+status route as a token-validation probe; a 200 told a bad-token client its
+token was valid and let an unauthenticated party confirm a live endpoint.
+LOW severity (the body is a static string; no entity/state data leaks).
+
+**Fix:** `api_root(headers, State)` now validates the bearer like `get_config`.
+**Pinned by** (fail-on-old, `tests/server_bin_auth.rs`):
+`api_root_rejects_missing_bearer`, `api_root_rejects_wrong_bearer` (both 200→401),
+guarded by `api_root_accepts_correct_bearer` (still 200 with a valid token).
+
+### HC-WS-LAG-01 — `subscribe_events` killed the stream on a broadcast lag (FIXED)
+
+The per-subscription task matched `Err(_) => break` on both broadcast
+`recv()` arms. `RecvError::Lagged(n)` (a slow consumer falling
+>`EVENT_CHANNEL_CAPACITY` = 4,096 events behind) is **recoverable** — the bus
+doc says "Lagged receivers must re-sync" and HA keeps the subscription alive
+across a lag. The old code treated the first lag as fatal, so after an event
+burst the client's stream went permanently silent with no error frame — a
+self-inflicted event-delivery DoS under load.
+
+**Fix:** `Lagged(_) => continue` (skip the dropped window, re-sync),
+`Closed => break`, on both the system and domain arms of the `select!`.
+**Pinned by** `subscription_survives_broadcast_lag` (`tests/ws_handshake.rs`):
+subscribes to a filtered event type, floods 6,000 unrelated events past the
+4,096 capacity to force a `Lagged`, then asserts a subsequent subscribed event
+is still delivered (old code: 5s-timeout panic).
+
+### Dimensions confirmed clean (with evidence)
+
+- **AuthN/AuthZ** — all 7 other REST handlers gate on `BearerAuth::from_headers`
+  → `LongLivedTokenStore::is_valid` before any work; the WS handshake validates
+  the `auth` token against the same store before the command loop, and
+  privileged commands are unreachable pre-`auth_ok`. Token compare is
+  `HashSet::contains` (content-independent timing — not the byte-`==` oracle of
+  ADR-157 §B4), so no timing-oracle finding. No route skips the gate; no
+  result-ignored check; no default/empty token accepted.
+- **Path traversal** — no route maps user input to a filesystem path (state is an
+  in-memory `DashMap`); `:entity_id` passes through `EntityId::parse`, a strict
+  `[a-z0-9_]+\.[a-z0-9_]+` ASCII allowlist that rejects `..`, `/`, `\`, and
+  absolute paths. No traversal surface.
+- **Injection** — no SQL, no shell/subprocess, no `format!`-into-response;
+  service/state bodies are typed `serde_json::Value` handed to the in-process
+  registry (HA-equivalent).
+- **Info-leak** — `ApiError` maps to fixed status + a typed `{message}`;
+  `ServiceError::HandlerFailed(String)` is integration-controlled (HA surfaces
+  the handler error too), never framework internals/paths/stack-traces — no
+  ADR-080-class leak.
+- **CORS** — explicit allowlist with `allow_credentials(false)` (HC-05),
+  not `permissive()`.
+- **De-magic** — no bare security-relevant literals in the crate worth
+  extracting (`EVENT_CHANNEL_CAPACITY` is already named in `homecore`; CORS
+  dev-default ports are documented).
+
+**Tests:** `homecore-api --no-default-features` **25 → 29** (+2 api-root auth,
+1 api-root accept-guard, +1 WS lag-survival), 0 failed. Workspace green.
+Python deterministic proof unchanged (homecore-api is off the signal proof
+path).
@@ -0,0 +1,186 @@
+# ADR-162: HOMECORE Plugin Security (Signature + Capability Isolation) & Bounded Automation RunModes — Making ADR-161's Deferred Claims TRUE
+
+- **Status**: accepted
+- **Date**: 2026-06-12
+- **Deciders**: ruv
+- **Tags**: homecore, homecore-plugins, homecore-automation, plugin-security, wasm-signature-verification, ed25519, capability-isolation, runmode, prove-everything, soundness, honest-labeling
+- **Amends**: ADR-161 (relabelled P4/P5 + §A5 deferrals → now enforced), ADR-128 (plugin manifest), ADR-129 (automation engine)
+
+## Context
+
+Beyond-SOTA sweep **Milestone 8**, scoped to `homecore-plugins` and
+`homecore-automation` only, under the project's **prove-everything /
+anti-"AI-slop"** directive.
+
+ADR-161 (Milestone 7) did the honest thing with three plugin/automation
+items it could not finish in that window: rather than fake them, it **relabelled
+them as deferred** —
+
+- **P4** (plugin signature verification): the manifest's `wasm_module_hash` /
+  `wasm_module_sig` / `publisher_key` were re-doc'd "(P4 — not yet enforced,
+  ADR-161/B5)" — parsed and round-tripped, but **never checked** before a
+  plugin runs.
+- **P5** (plugin authority isolation): `homecore_permissions` claims were
+  parsed but **never consulted**; `hc_state_set` let any plugin write any
+  entity, including `lock.*` / `alarm_control_panel.*`.
+- **§A5** (`RunMode`): `Single`/`Parallel` were honored; `Restart`/`Queued`/
+  `max: N` were honestly documented as still **unbounded-parallel**.
+
+### Headline — the deferred security items are now ENFORCED + TESTED
+
+M8 turns those honest deferrals into real, tested behavior. The plugin trust
+boundary is now sound (a tampered module, an untrusted publisher, or an
+unsigned module is rejected by the secure default), an over-privileged plugin
+write is denied with a typed error, and the bounded run-modes actually bound.
+**Every fix is pinned by a test that FAILS on the pre-M8 code** — each of the
+three RunMode tests was additionally run against a simulated unbounded-parallel
+dispatch and confirmed to panic.
+
+The Ed25519 crypto reuses the in-repo `cog-ha-matter::witness_signing` pattern
+(same `ed25519-dalek` 2.x API, same deterministic-test-key convention). SHA-256
+matches the `sha256:` prefix the manifest already declared and the
+`cog-ha-matter` cog manifest's `binary_sha256` hex convention. No new external
+dependency tree was introduced — `ed25519-dalek` / `sha2` / `hex` / `base64`
+were already in the workspace `Cargo.lock` (cog-ha-matter / bfld pull them in);
+only new dependency *edges* were added to `homecore-plugins`.
+
+Grading vocabulary (ADR-152 / ADR-158 / ADR-160 / ADR-161):
+- **MEASURED** — reproduced in this worktree, command + failing-on-old test recorded.
+- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
+
+## Decision — Fixes Landed
+
+### §P4 — Plugin signature & integrity verification (SECURITY) — MEASURED
+
+`homecore-plugins/src/manifest.rs` declared `wasm_module_hash` /
+`wasm_module_sig` / `publisher_key` but they were **never read** for
+verification; the load path (`wasmtime_runtime.rs`) instantiated any `.wasm`
+bytes handed to it.
+
+**Real fix** (`src/verify.rs`, wired into `WasmtimeRuntime::load_plugin`):
+before instantiation the runtime now —
+
+1. computes the **SHA-256** of the actual `.wasm` bytes and rejects if it ≠ the
+   manifest's `wasm_module_hash` (`sha256:<hex>`) — tamper detection;
+2. verifies the **Ed25519** `wasm_module_sig` (`ed25519:<base64>`, 64-byte raw)
+   over the 32-byte digest against `publisher_key` (`ed25519:<base64>`, 32-byte
+   raw) and rejects on failure;
+3. enforces a configurable **trust policy** — `PluginPolicy::trusted(&[keys])`
+   is an allowlist of publisher verifying keys; `PluginPolicy::AllowUnsigned`
+   is an explicit dev escape hatch that LOGS a loud `warn` on every load it
+   waves through. The **secure default rejects unsigned and unknown-publisher
+   modules.** `PluginPolicy::deny_all()` trusts no publisher.
+
+A typed `PluginError::SignatureRejected` is returned (no host panic). The
+legacy permission-free `load_wasm` is retained for first-party/trusted/test
+modules; production loading goes through `load_plugin`.
+
+**Failing-on-old tests** (`tests/integration.rs`, `--features wasmtime`) — all
+drive `load_plugin`, which **did not exist** on the old code (so the gate is
+genuinely new):
+- `p4_tampered_module_is_rejected` — a byte-flipped `.wasm` → hash mismatch → rejected.
+- `p4_valid_sig_from_trusted_key_loads` — a valid sig from an allowlisted key loads.
+- `p4_valid_sig_from_untrusted_key_is_rejected` — a correctly-signed module from a key NOT on the allowlist is rejected.
+- `p4_unsigned_module_rejected_by_default_loads_only_under_allow_unsigned` — unsigned rejected under `deny_all`, loads (with warn) only under `AllowUnsigned`.
+- Unit (`src/verify.rs`): `valid_sig_from_trusted_key_passes`, `tampered_module_is_rejected`, `valid_sig_from_untrusted_key_is_rejected`, `forged_signature_is_rejected`, `unsigned_module_rejected_under_default_policy`.
+
+A real deterministic keypair signs real `.wasm` bytes in the tests.
+The manifest doc now reads **"(P4 — ENFORCED, ADR-162)"**. **Grade: MEASURED. Milestone headline.**
+
+### §P5 — Plugin authority / capability isolation (SECURITY) — MEASURED
+
+`wasmtime_runtime.rs::hc_state_set` applied any write a plugin requested,
+ignoring the manifest's `homecore_permissions`.
+
+**Real fix** (`src/permissions.rs` + `hc_state_set`): the manifest's
+`homecore_permissions` (the `state:write:<glob>` form, or a bare entity glob
+like `light.*`) are distilled into a `PermissionSet` installed in the plugin's
+Wasmtime store. The `hc_state_set` host import consults
+`permissions.may_write(entity_id)` before applying a write and returns a typed
+`-3` (permission denied) to the guest on a violation — **the host is not
+panicked.** Wasmtime already gives memory isolation; this adds **authority**
+isolation. A plugin with **no** write grants can write nothing (secure default).
+
+**Failing-on-old tests** (`tests/integration.rs`, `--features wasmtime`):
+- `p5_declared_light_plugin_may_write_light_but_not_lock` — a `light.*` plugin writes `light.kitchen` (succeeds) but is REJECTED (`-3`, and the entity is not written) when it tries `lock.front_door`.
+- `p5_plugin_with_no_permissions_can_write_nothing` — a plugin with empty `homecore_permissions` cannot write `light.kitchen`.
+- Unit (`src/permissions.rs`): domain-glob, exact-grant, wildcard, read-grants-don't-confer-write, no-permissions, and explicit `state:write:` form.
+
+The manifest doc now reads **"(P5 — ENFORCED, ADR-162)"**. **Grade: MEASURED.**
+
+### §A5 — Bounded automation RunModes (Restart / Queued / max) — MEASURED
+
+`homecore-automation/src/engine.rs` (per ADR-161) honored `Single`/`Parallel`
+but spawned an unbounded parallel task for `Restart`/`Queued`/`max`.
+
+**Real fix** (`src/runmode.rs`, a per-automation `RunState` the engine owns and
+dispatches through at all three trigger sites — event loop, timer, test hook):
+- **Restart** — aborts the in-flight action task via `tokio::task::AbortHandle`, then starts a fresh one.
+- **Queued** — serializes runs in arrival order via a per-automation async `Mutex`: sequential, never concurrent, nothing dropped.
+- **max: N** — caps concurrency at N via a per-automation `Semaphore`; triggers beyond N **queue** (await a permit) rather than running concurrently. (HA bounded `parallel`/`queued` semantics — chosen and documented as *queue beyond N*, not drop.)
+- `Single`/`IgnoreFirst` re-entrancy guard and `Parallel` preserved.
+
+`engine.rs` trimmed to **433 lines**; the run-mode machinery lives in the new
+`runmode.rs` (153 lines) to keep both under the 500-line guideline.
+
+**Failing-on-old tests** (`tests/engine_behaviors.rs`) — each was run against a
+simulated unbounded-parallel dispatch and confirmed to panic:
+- `restart_mode_cancels_prior_run` — prior run is aborted: exactly **1** completion (old: both ran → 2).
+- `queued_mode_runs_sequentially_not_concurrently` — 3 rapid triggers all run, **max observed concurrency = 1** (old: 3).
+- `max_two_caps_concurrency_at_two` — 4 rapid triggers all run, **max observed concurrency ≤ 2** (old: 4).
+
+**Grade: MEASURED. Restart, Queued, and `max: N` all implemented — no remaining RunMode deferral.**
+
+## Threat model closed
+
+| Threat | Before (ADR-161) | After (ADR-162) |
+|--------|------------------|-----------------|
+| **Tampered module** — attacker swaps `.wasm` bytes after signing | loaded unconditionally (hash never checked) | rejected: SHA-256 mismatch |
+| **Untrusted publisher** — valid sig from a key the host doesn't trust | loaded (sig/key never read) | rejected: publisher_key not on allowlist |
+| **Unsigned module** — no integrity material at all | loaded | rejected by secure default; loads only under explicit `AllowUnsigned` (loud warn) |
+| **Over-privileged plugin write** — a `light.*` plugin writes `lock.front_door` / `alarm_control_panel.*` | applied (permissions never consulted) | denied: typed `-3` to guest, write not applied |
+| **Run-mode resource exhaustion** — `max`/`Queued` spawn unbounded tasks | unbounded parallel | bounded: Restart cancels, Queued serializes, `max: N` caps at N |
+
+## Remaining honest deferral (Nothing Dropped)
+
+- **Plugin-key provisioning / rotation** — the host's trust allowlist
+  (`PluginPolicy::trusted`) is supplied by the caller; sourcing it from the
+  Cognitum control-plane key store (as `cog-ha-matter` does for Seed keys) and
+  key rotation are **ACCEPTED-FUTURE** (out of M8 scope — same boundary
+  `witness_signing` draws).
+- **`InProcessRuntime` (native first-party plugins)** — has no `.wasm` bytes to
+  hash, so P4/P5 apply only to the WASM (`wasmtime`) path; native plugins remain
+  trusted-by-compilation. Honestly noted, not over-claimed.
+- **HAP real pairing (P2)** — unchanged from ADR-161; out of M8 scope.
+
+## Reproduction (MEASURED)
+
+```bash
+cd v2
+# P4/P5 (wasmtime feature needs rustc 1.91+; workspace pins 1.89 for the rest):
+cargo +1.91.1 test -p homecore-plugins --features wasmtime
+# Bounded RunModes:
+cargo test -p homecore-automation --no-default-features
+# Full workspace still builds (1.89 toolchain, no wasmtime):
+cargo build --workspace --no-default-features
+```
+
+Result at time of writing (all 0 failed):
+- **homecore-plugins** `--features wasmtime` — **32 passed** (lib 23; integration 9). (ADR-161 baseline was 15.)
+- **homecore-automation** `--no-default-features` — **45 passed** (lib 37; `engine_behaviors` 8). (ADR-161 baseline was 42.)
+- Full workspace `cargo build --workspace --no-default-features` succeeds.
+
+## Consequences
+
+- A HOMECORE WASM plugin can no longer be loaded with a tampered binary, an
+  untrusted publisher, or (by default) no signature at all — the trust boundary
+  ADR-161/B5 honestly said was absent is now real (P4).
+- A plugin can no longer write entities outside its declared
+  `homecore_permissions`; the lock/alarm escalation path is closed (P5).
+- The automation engine's `Restart`, `Queued`, and `max: N` run-modes are now
+  bounded as documented — no run-mode claims a capability the code lacks.
+- No new external dependency tree (reuses the cog-ha-matter Ed25519 stack
+  already in the lock); source files kept under the 500-line guideline
+  (`engine.rs` 433, `runmode.rs` 153, `verify.rs` 397, `permissions.rs` 168;
+  `wasmtime_runtime.rs` non-test source < 500, inline WAT tests as ADR-161 left
+  them).
@@ -0,0 +1,123 @@
+# ADR-163: Edge-Latency Measurement — CLAIMED budgets → MEASURED-on-host
+
+- **Status**: accepted
+- **Date**: 2026-06-12
+- **Deciders**: ruv
+- **Tags**: edge-latency, wasm-edge, esp32, cog-inference, criterion, prove-everything, measurement-debt
+- **Amends**: ADR-160 (deferred "criterion benches for process_frame budget claims" line now DONE-on-host); ADR-159 (cog inference latency)
+
+## Context — Milestone 9 of the beyond-SOTA sweep
+
+Prior milestones (M5/M6, ADR-159/ADR-160) flagged **measurement debt**: edge
+latency budgets asserted in doc-comments and manifests but **never reproduced by
+a committed benchmark**. Specifically:
+
+- Many `wifi-densepose-wasm-edge` skill modules document a timing budget *"on
+  ESP32-S3 WASM3"* (e.g. `exo_time_crystal`: "H (heavy, <10 ms)"). These were
+  **CLAIMED**, not benchmarked. ADR-160's deferred backlog named exactly this:
+  *"Criterion benches for `process_frame` budget claims — ACCEPTED-FUTURE."*
+- `cog-pose-estimation`'s manifest cites `cold_start_ms_avg: 5.4`, but neither
+  cog had a `benches/` directory or any committed inference-latency number.
+
+Under the project's **prove-everything / anti-"AI-slop"** directive, a CLAIMED
+latency budget that a skeptic cannot reproduce is debt. M9 pays it down — benches
+and docs only, **no production-code behavior change** (so nothing republishes).
+
+## Headline
+
+**Converted the CLAIMED edge-latency budgets into MEASURED-on-host numbers, with
+the honest host-vs-ESP32 caveat stated everywhere.** Added committed criterion
+benches over the heaviest hot paths and a results file a skeptic can re-run. The
+ESP32-on-hardware figure remains explicitly **UNMEASURED** — this milestone does
+not pretend a laptop reproduces an Xtensa/WASM3 budget.
+
+## Decision — benches landed
+
+### T1 — wasm-edge `process_frame` budget benches
+
+`v2/crates/wifi-densepose-wasm-edge/benches/process_frame_bench.rs` (criterion,
+`harness = false`, `required-features = ["std"]`). The crate is **excluded from
+the v2 workspace**, so it runs from the crate dir. Benches the M6-audit-named
+heaviest hot paths over a **fixed synthetic CSI frame**, each driven through the
+public `process_frame` after warming the relevant ring/phase buffers so the
+expensive path actually executes:
+
+- `exo_time_crystal::process_frame` — full 256-pt × 128-lag autocorrelation.
+- `exo_ghost_hunter::process_frame` — empty-room periodicity / hidden-breathing.
+- `sec_weapon_detect::process_frame` — per-subcarrier (MAX_SC=32) Welford.
+- `med_seizure_detect::process_frame` — clonic-rhythm path (`#[cfg(feature =
+  "medical-experimental")]`, only built/run with that gate).
+
+The lib's `bench = false` was set so the libtest harness does not intercept
+criterion CLI flags; the `ghost_hunter` bin is already `standalone-bin`-gated and
+not built under `--features std`.
+
+**Measured host medians** (Intel Core Ultra 9 285H, native `--release`):
+`exo_time_crystal` **17.3 µs** · `exo_ghost_hunter` **1.44 µs** ·
+`sec_weapon_detect` **0.42 µs** · `med_seizure_detect` **0.10 µs**.
+
+### T2 — cog inference latency benches
+
+`v2/crates/cog-person-count/benches/infer_bench.rs` and
+`v2/crates/cog-pose-estimation/benches/infer_bench.rs` (criterion,
+`harness = false`). Each loads the **real** shipped weights from the in-repo
+`cog/artifacts/`, asserts the Candle CPU backend (so the stub can never be
+silently benched), warms one forward, then times steady-state
+`InferenceEngine::infer` over a fixed CSI window on `Device::Cpu`.
+
+**Measured host medians:** cog-person-count **305 µs** · cog-pose-estimation
+**305 µs** (steady-state, CPU, real weights).
+
+### T3 — results file
+
+`benchmarks/edge-latency/RESULTS.md`, in the `benchmarks/wiflow-std/RESULTS.md`
+style: each number with its exact reproduce command, the machine, the
+MEASURED-on-host grade, and the honest caveat.
+
+## The honest caveat (recorded, non-negotiable)
+
+1. **Host ≠ ESP32.** The wasm-edge benches run native x86_64, not Xtensa/WASM3.
+   A host median is an **upper bound on algorithm work**, not the ESP32 number;
+   WASM3 interpretation on a ~240 MHz core is 1–2 orders of magnitude slower than
+   native `-O`. A host median under budget does **not** prove the ESP32 meets it.
+   **The ESP32 figure is NOT reproduced here — it needs hardware.**
+2. **Bench ≠ the doc-claimed measurement.** The cogs' manifest cites a
+   **cold-start** number (weight-load included); these benches measure
+   **steady-state** per-frame `infer`. We report both, labelled, and do not
+   conflate them. Empirically, pose steady-state (305 µs host) is ~18× under the
+   5.4 ms cold-start — the expected shape, and exactly why conflating would lie.
+
+## Deferred / still-pending (nothing dropped)
+
+- **ESP32-on-hardware `process_frame` latency** — **PENDING (hardware)**. Needs
+  the `wasm32-unknown-unknown` target built + flashed to an ESP32-S3 and timed
+  under WASM3. The host bench is the algorithm-cost proxy until then.
+- **Per-skill *accuracy*** remains **DATA-GATED** (unchanged from ADR-160) —
+  this ADR measures latency only, never claims detection accuracy.
+
+## Reproduction (MEASURED)
+
+```bash
+# T1 — wasm-edge (workspace-excluded → run from the crate dir)
+cd v2/crates/wifi-densepose-wasm-edge
+cargo bench --features std -- --warm-up-time 1 --measurement-time 2
+cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure
+
+# T2 — cogs (workspace members)
+cd v2
+cargo bench -p cog-person-count   --no-default-features --bench infer_bench
+cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench
+
+# existing tests still green (behavior unchanged)
+cargo test -p cog-person-count -p cog-pose-estimation --no-default-features
+```
+
+## Consequences
+
+- ADR-160's deferred *"Criterion benches for `process_frame` budget claims"* line
+  is now **DONE (host)**; the ESP32-on-hardware confirmation is explicitly the
+  one remaining pending item.
+- The cogs now ship committed, reproducible steady-state inference-latency
+  numbers, cleanly distinguished from the manifest's cold-start claim.
+- No runtime behavior changed; no crate republishes. `PROOF.md`'s performance
+  table and `scripts/prove.sh`'s gated section reference the new benches.
@@ -0,0 +1,125 @@
+# ADR-164: ADR Corpus Gap Analysis & Remediation Backlog
+
+- **Status:** proposed
+- **Date:** 2026-06-12
+- **Deciders:** ruv
+- **Tags:** governance, meta
+
+## Context
+
+The corpus has grown to **162 ADR entries across 156 distinct files** (ADR-001 through ADR-171; the 5 duplicate-number collisions / 6 displaced files originally noted here were RESOLVED by renumbering the displaced files to ADR-166…171 — see Gap Register G1). It now spans nine subsystems — signal/DSP, NN/training, ESP32 firmware, RuvSense multistatic, RuView desktop, Cognitum cogs, HOMECORE (HA reimplementation), BFLD privacy, and the streaming engine — written over roughly a year by many agent-driven sessions.
+
+Two forces motivate a corpus-wide gap analysis *now*:
+
+1. **The beyond-SOTA / anti-AI-slop sweep (ADR-154–163) just landed.** That sweep is itself a structured retraction layer: each ADR exists *because* an earlier accepted-or-shipped claim was found false (a dead CIR coherence gate, a fake-gradient TTA path, a self-certifying proof, a WebSocket auth bypass, an inflated survivor count). The sweep hardened five subsystems but was narrowly scoped — it never touched the two largest capability gaps (camera-teacher training validation; federation/BFLD privacy chains). A ledger is needed to record what the sweep retracted and what it left open.
+2. **The status field can no longer be trusted as a source of truth.** A five-lens audit (status-distribution, supersession-chains, contradictions, coverage-gaps, data-hardware-gated) found ~24 ADRs mislabeled `Proposed` while their own commit-pinned Implementation-Status notes report them built and tested; 6 ADR numbers collide; 3 files have no Status header at all. An auditor reading headers would conclude "not built" for landed code, and "built/Accepted" for unvalidated capability.
+
+The detailed lens outputs and the full per-ADR census live in `docs/adr/gap-analysis/` (`lens-findings.md`, `census.md`). This ADR is the authoritative summary and remediation backlog.
+
+## Decision
+
+**This ADR is the authoritative gap ledger and remediation backlog for the ADR corpus as of 2026-06-12.** It does not change any subsystem behavior. It records, with cited ADR ids:
+
+- the status/impl distribution and the bookkeeping-drift problem;
+- a prioritized Gap Register with a recommended action per gap;
+- supersession-integrity defects;
+- the contradiction/retraction list (the anti-slop centerpiece);
+- shipped capabilities with no governing ADR;
+- the genuinely open data/hardware-gated backlog.
+
+Until the Gap Register items are worked, **treat the ADR Status header as advisory, not authoritative**, and treat any accuracy number authored before ADR-155 landed as CLAIMED (not MEASURED) until re-derived through the post-155 leak-free validation split.
+
+## Status Distribution
+
+Counts are approximate (`~`) where a status string is non-canonical or dual-valued; the per-ADR breakdown is in `census.md`.
+
+| Status bucket | Count | impl_state | Count |
+|---|---|---|---|
+| Accepted (incl. partial/in-progress/Phase-1 variants) | ~56 | implemented | ~36 |
+| Proposed (incl. conditional/research-only) | ~88 | partial | ~50 |
+| Superseded | 1 (ADR-002) | proposed-only | ~64 |
+| Rejected | 1 (ADR-098) | stale-or-contradicted | 3 (029/030/031) |
+| Missing / no Status header | 3 (ADR-168-proof [was 147], ADR-167-ddd [was 052], ADR-134) | unknown | 5 (034/044/167-ddd/168-proof/…) |
+| Mixed/dual status in one ADR | 3 (115, 149×2, 133) | superseded | 1 (ADR-002) |
+
+**Headline:** ~114 of 162 ADRs (≈70%) are decisions that never fully landed (proposed-only + partial + stale + unknown). The dominant failure mode is **stale Status headers**, not abandoned work.
+
+## Gap Register
+
+Severity: CRITICAL (corpus integrity / tooling-breaking / life-safety / security) · HIGH · MEDIUM · LOW. Action vocabulary: *implement · supersede · mark-stale · write-missing-ADR · close-as-gated · renumber · reconcile-docs*.
+
+| ID | Gap | Severity | Affected ADRs | Recommended action |
+|----|-----|----------|---------------|--------------------|
+| G1 | ~~6 duplicate ADR numbers (two ADRs answer to one number; breaks index/`/adr` tooling)~~ **RESOLVED (duplicate-number item)** | CRITICAL | 050×2, 052×2, 147×3, 148×2, 149×2; 134 (identity split, separate) | ~~renumber 2-of-3 at 147, 1 each at 050/148/149; demote 052-ddd to appendix; resolve 134 identity~~ **DONE: displaced files renumbered to the next free numbers (166–171), keepers = first-committed file per number (date ties broken by inbound-ref count / parent-appendix relationship): 050 keeps provisioning-tool-enhancements → quality-engineering-security-hardening = ADR-166; 052 keeps tauri-desktop-frontend → ddd-bounded-contexts appendix = ADR-167 (still linked to parent 052); 147 keeps nvidia-cosmos/OccWorld → benchmark-proof = ADR-168, adam-mode-light-theme = ADR-169; 148 keeps drone-swarm-control-system → yoga-mode-pose-system = ADR-170; 149 keeps public-community-leaderboard-huggingface → swarm-benchmarking-evaluation-methodology = ADR-171. In-file headers, intra-file self-refs, all inbound cross-references (README index, census, lens-findings, user-guide, CHANGELOG, proof-of-capabilities, research docs), and this register updated. `ls docs/adr/ADR-*.md | … | uniq -d` is now EMPTY. The ADR-134 identity split is NOT a filename collision; resolved separately under G3 (→ ADR-165).** |
+| G2 | 3 files with no Status header (cannot triage) — **INVESTIGATED in `docs/adr-gap-remediation-1`: only 2 genuinely lack one, both owner-gated** | CRITICAL | ADR-168-benchmark-proof (was 147), ADR-167-ddd-appendix (was 052), ~~134-CIR~~ | add canonical `## Status`; relocate ADR-168-proof to `benchmarks/`; label ADR-167-ddd as appendix — **NOTE: ADR-134-CIR DOES have a Status (`\| Status \| Proposed \|` in its header table) — mislabeled here. The two real misses (ADR-168-benchmark-proof [was 147], ADR-167-ddd [was 052]) were inside the owner-gated duplicate-number collisions (147×3, 052×2); those collisions are now resolved (G1) but the missing Status headers themselves remain owner-gated, so left untouched pending owner. The early ADRs (048/049/068/070 etc.) use `\| Status \|` not `\| **Status** \|` — different-format-but-present, not missing. Net: 0 headers added.** |
+| G3 | ~~Shipped crates cite a non-existent or wrong-identity governing ADR~~ **RESOLVED in `docs/adr-gap-remediation-1`** | CRITICAL | homecore-recorder→"ADR-132" (no file); homecore-migrate→"ADR-134" (file is CIR) | ~~write-missing-ADR (HOMECORE-RECORDER, HOMECORE-MIGRATE)~~ DONE: wrote ADR-132 (recorder, Accepted) + ADR-165 (migrate, Accepted — P1 scaffold); repointed migrate's ADR-134 refs → ADR-165 |
+| G4 | Anti-slop retractions: accuracy/security/function provably false until sweep landed | CRITICAL | 155, 154, 079, 161 (see Contradictions) | already fixed in-code by 154/155/161/162; this ledger records the retraction |
+| G5 | ~~10 streaming-engine ADRs marked `Proposed` while §Impl-Status reports Built + commits + tests~~ **RESOLVED in `docs/adr-gap-remediation-1`** | HIGH | 136–145 | ~~mark-stale → "Accepted — partial (integration glue pending)" (one batch)~~ DONE: all 10 (136–145) flipped to "Accepted — partial"; each retains its commit-pinned Implementation-Status note. NB: notes describe *building blocks built + tested*, **not** live-path integration — "partial" is the honest label, not full "Accepted" |
+| G6 | Stale `Proposed` headers on built+published code | HIGH | 029/030/031, 095/096, 152, 154–157, 024/027/072, 150 | mark-stale; reconcile with downstream/CLAUDE.md evidence |
+| G7 | Status-graph inversion: Accepted ADR depends on Proposed parent | HIGH | 032→029/030/031; 053→052; 048→045; 077→075/076; 104→103 | promote parents to match built reality, or downgrade dependents |
+| G8 | ADR-002 supersession not reciprocated by successors; 5 children stranded | HIGH | 002→016/017; children 003/007/008/009/010 | reconcile-docs (add reciprocal language or downgrade); split 002 to "partially superseded" |
+| G9 | Streaming-engine integrator crate has no governing ADR (composition/back-pressure/live-path seam) | HIGH | wifi-densepose-engine (composes 135–146) | write-missing-ADR |
+| G10 | CLAUDE.md doc-vs-header drift (doc says one status, header another) | HIGH | 017, 024, 027, 072, 152 | reconcile-docs |
+| G11 | ~~Open security HIGH findings, gate FAILED, never marked done~~ **RESOLVED (2026-06-13, branch `fix/adr-080-sensing-server-security`)** | HIGH | 080 (XFF bypass, leaked stack traces, JWT-in-URL CWE-598) | ~~implement (sensing-server boundary — NOT covered by HOMECORE sweep 161/162)~~ DONE: verified all three against the *current Rust* `wifi-densepose-sensing-server`. **#2 leaked errors** was the one live exposure — 6 `main.rs` handlers serialized internal `Display`/`JoinError` into response bodies; fixed via a new `error_response` module (generic body + correlation id, detail logged server-side only). **#1 XFF** and **#3 JWT-in-URL** were verified *absent* on the Rust boundary (no IP-rate-limit/allowlist reads XFF; token is header-only, WS handlers take no query token) and pinned with regression tests that fail if either is re-introduced. ADR-080 P0 §1–3 marked RESOLVED. |
+| G12 | ADR-052→054 edge unacknowledged by successor; likely mis-modeled (impl, not replacement) | MEDIUM | 052-tauri, 054 | reconcile-docs (054 is the impl plan *for* 052, not a replacement) |
+| G13 | Capability governed only by remediation/deploy ADR, no creation/architecture ADR | MEDIUM | wasm-edge (only 160/163); occworld-candle (147 blessed Python path only); pointcloud (094 = viewer deploy only) | write-missing-ADR (taxonomy/ABI for wasm-edge; Candle backend swap; pointcloud data contract) |
+| G14 | Conflicting decisions on one topic, none superseding the others | MEDIUM | person-count 037/075/103; PQ-sign 007/109; fed key-exchange 107/108; provisioning 050/060/052; audit 010/028; RVF-WASM 009-vs-shipped | reconcile (pick one, supersede the rest) |
+| G15 | ~50 Proposed-forever chains pollute every gap analysis | MEDIUM | 003/007–010, 105–109, 118–125, HOMECORE 124–133, 033/046/049/067/074/085 | close-as-gated or mark Deferred/Rejected + open tracking issues |
+| G16 | De-facto supersessions never recorded (lifecycle graph incomplete) | MEDIUM | 098/099, 063/064, 042/153, 050/060, 035/023, 100/109, 117 retracts PyPI v1.1.0 | reconcile (add supersedes/superseded_by fields) |
+| G17 | Accepted but no implementation evidence ("unverified done") | MEDIUM | 034 (FieldView app — no crate); 044 (wifi-densepose-geo — bare Accepted, no Date/Deciders) | implement or downgrade to Proposed |
+| G18 | Workspace has ~38 crates; CLAUDE.md publishing list (12-step) and crate table (15) are stale | MEDIUM | corpus-wide (crate-graph topology) | write-missing-ADR (crate-graph / publish boundaries) + reconcile CLAUDE.md |
+
+## Supersession Integrity
+
+Only **3 formal supersession edges** exist; all three are defective (see G8/G12; full detail in `lens-findings.md` Lens 2):
+
+- **ADR-002 → ADR-016 / ADR-017** is one-directional. ADR-016 never mentions ADR-002 (its References list only 014/015); ADR-017 only *corrects* ADR-002's "fictional crate names" and never says "supersede." The census `supersedes:["ADR-002"]` on 016/017 is **file-unsupported** — the superseded ADR points up at two successors that do not point back.
+- **ADR-002 is an umbrella** whose children 003/007/008/009/010 are still `Proposed`. ADR-016/017 realize only the training/signal/MAT integration points; the RVF-container (003), PQ-crypto (007), Raft (008), WASM-edge-runtime (009), and witness-chains (010) decisions are **neither implemented nor formally superseded**. Marking the parent fully "Superseded" silently buries 5 live-but-abandoned child decisions. Recommended: split ADR-002 to "partially superseded."
+- **ADR-052-tauri → ADR-054** is declared by the predecessor but ADR-054 contains zero references to ADR-052. ADR-054 ("Full Implementation", in progress) is the impl plan *for* 052, not a replacement — likely a mis-modeled edge.
+- **No cycles** detected. The graph is clean structurally; the defect is missing reciprocity and ~7 unrecorded de-facto supersessions (G16).
+
+## Contradictions & Retractions (anti-slop centerpiece)
+
+The four CRITICAL items are the corpus's load-bearing AI-slop admissions — each an accepted-or-shipped surface whose stated accuracy/security/function was provably false until the sweep landed. **Every accuracy number predating ADR-155 should be treated as CLAIMED until re-derived through the post-155 leak-free split.** Source-cited evidence is in `lens-findings.md` Lens 3.
+
+- **[CRITICAL] ADR-155** retracts every prior NN accuracy/TTA/proof claim: real MM-Fi training validated against a *synthetic* val set with stride-1 (~99%) window leakage (§2.2); a *fake gradient* `grad += v*0.01` in the TTA path (§2.3); a *self-certifying* proof that blessed whatever the pipeline emitted and PASSed on 1e-9 float noise (§2.4).
+- **[CRITICAL] ADR-154** proves the ADR-134 CIR coherence gate was **dead in production for every canonical 56-tone frame** (`SubcarrierMismatch`, 0 Ok / 8 mismatch), silently degrading coherence to freq-only. Any "CIR-enhanced coherence/ToF" claim before this fix overstated reality.
+- **[CRITICAL] ADR-079** carries three mutually inconsistent values for its own central metric: proxy PCK@20 = 2.5% (prose) vs 35.3% (baseline table — equal to the *target*) vs 0% upper-body joints; #640 measured 0% on real local data. An Accepted ADR whose headline 10–20x improvement is self-refuting.
+- **[CRITICAL] ADR-161** fixes a HOMECORE WebSocket **auth bypass** (any non-empty token accepted) + reply-theater + no-op automation; **ADR-162** then enforces plugin Ed25519 signature verification, capability isolation, and bounded RunModes — retracting ADR-128/129/130's implied security guarantees.
+- **[HIGH]** ADR-152 self-refutes 1 of 25 claims (ESP WiFi-6 "drop-in" REFUTED 0-3); CLAUDE.md's "WiFlow-STD MEASURED-EQUIVALENT ~96% PCK" contradicts §F1's own gating (97.25% is CLAIMED until measurements (a)–(c) run). ADR-150 retracts the implied cross-subject capability (81.63% in-domain vs ~11.6% leakage-free cross-subject; DANN ~0 gain). ADR-159 ships real models but discloses person-count `training_class1_accuracy = 0.343` and renames "learned multi-person counter" → "presence detector," gutting ADR-103/104's claim.
+- **[MEDIUM]** ADR-163 leaves the ESP32/Xtensa on-hardware latency figure UNMEASURED; ADR-098↔099 partial reversal on midstream; ADR-147 self-retracts Cosmos for OccWorld.
+
+## Coverage Gaps (shipped capability, no/broken governing ADR)
+
+- ~~**CRITICAL — `homecore-recorder`** (SQLite state history + semantic search) cites "ADR-132", which **does not exist**. The durable-state backbone is ungoverned. → write HOMECORE-RECORDER ADR.~~ **RESOLVED in `docs/adr-gap-remediation-1`:** ADR-132 written (`ADR-132-homecore-recorder-history-semantic-search.md`, Status: Accepted — reverse-documented from the shipped crate).
+- ~~**CRITICAL — `homecore-migrate`** (reads untrusted Python-HA `.storage/*.json`) cites "ADR-134", but on-disk ADR-134 is CIR. A data-integrity-sensitive importer governed by a phantom identity. → resolve 134 collision + write HOMECORE-MIGRATE ADR (trust boundary).~~ **RESOLVED in `docs/adr-gap-remediation-1`:** ADR-165 written (`ADR-165-homecore-migrate-from-home-assistant.md`, Status: Accepted — P1 scaffold); crate's `ADR-134` refs repointed → ADR-165; on-disk ADR-134 (CIR) left intact. ADR-126's series-map row (which labels the *role* "ADR-134 HOMECORE-MIGRATE") is owner-gated and unchanged.
+- **HIGH — `wifi-densepose-engine`** composes ADR-135..146 onto the live 20 Hz path but **no ADR governs the integrator contract** (ordering, back-pressure, "one pipeline cycle" boundary).
+- **MEDIUM — `wasm-edge`** (~70 skills) governed only by remediation ADRs 160/163 — no creation/taxonomy/ABI ADR. **`occworld-candle`** is a Rust-native backend swap ADR-147 explicitly deferred. **`pointcloud`** has only a viewer-deploy ADR (094), no data-format contract.
+- **MEDIUM — workspace topology:** ~38 crates exist; the CLAUDE.md 15-crate table and 12-step publishing order are stale, and no ADR governs crate-graph/publish boundaries at this scale.
+- Verified-governed (scoped out): worldmodel→147, worldgraph→139, cog-*→101/103/116, ruview-swarm→148, nvsim→089/092, bfld→118-123/141, calibration→151, homecore-hap→125, geo→044, desktop→052/054.
+
+## Open / Gated Backlog (genuinely unresolved, honestly labeled)
+
+The ADR-154–163 sweep was narrowly scoped. The two largest **capability** gaps it did not touch:
+
+- **CRITICAL — Camera-teacher training validation (ADR-079 / 072 / 150).** P7–P9 Pending; blocker is a real synchronized camera+ESP32 paired-capture session + GPU training on the fleet (ruvultra RTX 5080). Cross-subject collapse (11.6%) is data-gated on a heterogeneous multi-subject CSI dataset, per ADR-150 §F3 / ADR-152 F3 (the lever is *more data*, not capacity). Accepted-on-paper, not proven.
+- **HIGH — Federation + BFLD privacy chains (ADR-105–109, 118–125).** All Proposed-only, ACs unchecked. Blockers: KIT BFId dataset (121), Pi5/Nexmon CBFR capture hardware (123 — ESP32 structurally cannot sniff CBFR), Soul-Signature + cog-ha-matter (122/125). The privacy control *plane* (ADR-141) is built; the *capture/scoring* chain it gates is not.
+- ~~**HIGH — Sensing-server security (ADR-080).** Distinct from the HOMECORE boundary the sweep fixed; XFF bypass / stack-trace leakage / JWT-in-URL remain open.~~ **RESOLVED (2026-06-13, G11):** verified against the current Rust sensing-server — stack-trace leakage was the one live finding (fixed via `error_response` generic bodies); XFF bypass and JWT-in-URL were verified absent and regression-pinned. See ADR-080 P0 §1–3.
+- **MEDIUM — gold-standard deferrals (model to follow):** ADR-163 (ESP32 on-hardware latency UNMEASURED), ADR-160 (medical/affect/weapon NOT validated, relabelled), ADR-158 (RF-through-rubble + learned counter DATA-GATED). Code is real, the claim is withheld pending absent hardware/labelled data — labels are honest.
+- **MEDIUM — purely hardware/data-gated Proposed decisions (no overreach):** ADR-023, 027, 042, 063/064, 065/066, 070, 073/078, 083, 086, 091, 103, 110 (HE-CSI needs ESP-IDF ≥5.5), 113, 114, 134/135, 143-v2, 144. *needs verification* where flags rely on downstream prose rather than direct file inspection.
+
+## Consequences
+
+**Positive.** One authoritative ledger replaces scattered, drifting status fields. The anti-slop retractions are recorded in a citable place, so the "AI slop" accusation is met with a structured admission + fix-trail rather than denial. The Gap Register is a concrete, severity-ordered work queue. Batch-fixing G5 (10 streaming-engine headers) and G1/G2 (numbering + missing headers) is high-ROI and unblocks ADR tooling.
+
+**Negative.** This ADR is a snapshot; it goes stale the moment the next ADR lands. Counts marked `~` are approximate and a few impl_state values are *needs verification* (downstream-prose-derived, not file-confirmed). Acting on the register (renumbering, status flips, supersession edits) touches ~30 files and risks transient cross-reference breakage if not done atomically.
+
+**Neutral.** No subsystem behavior changes. Renumbering decisions (which of the colliding files keeps each number) are deferred to the follow-up remediation PR — this ADR records the collision, not the resolution. Whether to close abandoned chains as `Rejected` vs `Deferred` is a judgment call left to the deciders per chain.
+
+## Links
+
+- `docs/adr/gap-analysis/census.md` — full per-ADR census (162 entries).
+- `docs/adr/gap-analysis/lens-findings.md` — five-lens findings (status-distribution, supersession-chains, contradictions, coverage-gaps, data-hardware-gated), verbatim.
+- Anti-slop sweep: ADR-154, ADR-155, ADR-156, ADR-157, ADR-158, ADR-159, ADR-160, ADR-161, ADR-162, ADR-163.
+- Most-cited defects: ADR-079, ADR-134, ADR-002, ADR-136–145, ADR-152.
+- Governance: CLAUDE.md (crate table + publishing order — stale per G18); ADR-038 (prior roadmap census, now stale).
@@ -0,0 +1,148 @@
+# ADR-165: HOMECORE-MIGRATE — Migration Tooling from Python Home Assistant
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted — P1 scaffold (full conversion deferred to P2) |
+| **Date** | 2026-05-25 |
+| **Deciders** | ruv |
+| **Codename** | **HOMECORE-MIGRATE** |
+| **Crate** | `v2/crates/homecore-migrate` |
+| **Relates to** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (HOMECORE master — series map row "ADR-134 HOMECORE-MIGRATE"), [ADR-127](ADR-127-homecore-state-machine-rust.md) (HOMECORE-CORE), [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (HOMECORE-RECORDER — P2 side-by-side export target) |
+| **Tracking issue** | [#800](https://github.com/ruvnet/RuView/pull/800) (HOMECORE intake) |
+
+> **Number-collision resolution (2026-06-12).** The HOMECORE series in ADR-126 §4 planned
+> "ADR-134 = HOMECORE-MIGRATE", and the `homecore-migrate` crate cites "ADR-134" throughout.
+> But the on-disk `ADR-134-csi-to-cir-time-domain-multipath.md` is a **different, unrelated
+> decision** (First-Class CIR Support, a signal-processing tier). The migrate crate was
+> therefore governed by a phantom identity (ADR-164 Gap G3 / Coverage-Gaps Lens §A). This
+> ADR takes the next free number (**165**) and becomes the real governing record for
+> HOMECORE-MIGRATE; the `ADR-134` references inside `v2/crates/homecore-migrate/` are
+> repointed to ADR-165. The real ADR-134 (CIR) is untouched. ADR-126's series-map row still
+> labels the *role* "ADR-134 HOMECORE-MIGRATE" for historical traceability; that registry
+> renumber is owner-gated and left for the follow-up. This ADR reverse-documents the shipped
+> P1 scaffold; it introduces no new design.
+
+---
+
+## 1. Context
+
+ADR-126 decided to reimplement Home Assistant (HA) natively in Rust. A user adopting
+HOMECORE has an existing HA install whose configuration lives in two places on disk:
+
+- `.storage/*.json` — versioned JSON envelopes (`{ version, minor_version, data }`) holding
+  the entity registry, device registry, and config entries;
+- top-level YAML — `secrets.yaml`, `automations.yaml`.
+
+To migrate, HOMECORE must read this foreign, **untrusted** on-disk state. It is untrusted in
+the security sense: the schema can drift between HA releases, and silently mis-parsing a
+registry would corrupt the imported home. ADR-164 flagged this as a CRITICAL coverage gap —
+a data-integrity-sensitive importer governed by a non-existent ADR identity.
+
+The decision an ADR must pin here is the **trust boundary and import contract**: which HA
+files are read, how schema versions are validated, and what happens on an unknown version.
+
+## 2. Decision
+
+Ship `homecore-migrate` as a CLI + library that reads an existing HA filesystem and imports
+its configuration into HOMECORE. P1 is a **scaffold**: it parses and inspects everything and
+converts the entity registry; full conversion of the remaining artifacts is deferred to P2.
+
+### 2.1 Storage reader + versioned format gate (P1, shipped)
+
+- `HaStorageDir` / `HaStorageEnvelope` read HA's `.storage/` directory; `read_envelope(path)`
+  deserializes a `.storage/*.json` envelope (`src/storage.rs`).
+- Versioned parsers live under `storage_format::v<N>` (e.g. `v13` for the entity registry)
+  (`src/storage_format/`).
+- **Schema-version validation is the load-bearing safety rule (§6 Q5 of this ADR):** an
+  unknown `minor_version` is a **hard error** (`MigrateError::UnsupportedSchemaVersion`),
+  never a silent best-effort parse. Better to refuse than to corrupt.
+
+### 2.2 Per-artifact parsers (P1, shipped)
+
+- `entity_registry::load()` — `core.entity_registry` → `Vec<homecore::EntityEntry>`
+  (ready for import).
+- `device_registry::load()` — `core.device_registry` → `Vec<DeviceImport>` (P1 diagnostic;
+  full conversion P2).
+- `config_entries::load()` — `core.config_entries` → domain counts + integration names
+  (the format is undocumented per §6 Q5; treated diagnostically).
+- `secrets::load_secrets()` — `secrets.yaml` → `HashMap<String, String>` (resolution P2).
+- `automations::load()` — `automations.yaml` → count + ID/alias list (conversion P2).
+
+### 2.3 CLI (P1, shipped)
+
+- `homecore-migrate inspect <ha-dir>` previews what will be migrated (entity/device/config
+  counts, redacted secret/automation lists) (`src/cli.rs`, `src/main.rs`).
+- `import-entities` and `export-for-sidecar` are declared but their full behaviour is P2.
+
+### 2.4 Structured errors (P1, shipped)
+
+- `MigrateError` carries context (`path`, line/field) for I/O, JSON, YAML, missing-field,
+  unsupported-schema-version, and entity-id parse failures (`src/lib.rs`).
+- **Secret-leak hardening (security review, 2026-06).** `secrets.yaml` parse failures must
+  NOT use the generic `MigrateError::YamlParse { source }` variant: `serde_yaml`'s message
+  for a typed-tag coercion error (e.g. `port: !!int <value>`) embeds the offending scalar
+  verbatim (`invalid value: string "<the-secret-value>"`), and that error propagates through
+  the `InspectSecrets` CLI path to stderr — leaking a secret value despite the CLI's
+  deliberate `<redacted>` design. `read_secrets` now maps such failures to a dedicated
+  redacting variant `MigrateError::SecretsParse { path, line, column }` that carries only the
+  file path and a coarse location (`serde_yaml::Error::location()`), never the scalar content.
+  Pinned by `secrets::tests::malformed_secrets_error_never_contains_secret_value` (asserts the
+  rendered error **and its full `#[source]` chain** never contain the secret value).
+  **Review dimensions confirmed clean with evidence:** source is never mutated (no
+  `fs::write`/`remove`/`create` anywhere — P1 reads source, writes nothing); paths are
+  user-supplied dirs joined with fixed filenames (no `..`/absolute traversal beyond the
+  user's own privileges); malformed/typed/truncated `.storage` JSON and YAML **error, never
+  panic** (every production `unwrap`/`expect` is test-only); unknown schema `minor_version`
+  hard-errors fail-closed; no SQL/shell/path injection surface (the tool emits diagnostics
+  only, persists nothing in P1).
+
+### 2.5 Deferred to P2+ (NOT built — honestly labelled)
+
+- Convert `config_entries` → HOMECORE plugin manifests.
+- Convert `automations.yaml` → `homecore-automation` YAML.
+- Side-by-side runtime mode (requires `homecore-recorder`, ADR-132; behind the `recorder`
+  Cargo feature, currently a no-op stub).
+- `!secret` reference resolution in non-secrets YAML files.
+
+### 2.6 Test evidence (as shipped)
+
+- 21 tests (`cargo test -p homecore-migrate`) — 19 as originally shipped plus 2 added by the
+  2026-06 security review (`secrets::tests::malformed_secrets_error_never_contains_secret_value`,
+  `malformed_secrets_error_reports_location`).
+
+## 3. Consequences
+
+**Positive.**
+
+- The trust boundary is explicit: unknown HA schema versions are rejected, not guessed, so a
+  schema drift fails loudly instead of corrupting an imported home.
+- Reusing HA's own `.storage` and YAML formats means no intermediate export step; the tool
+  reads a live HA install directly.
+- P1 `inspect` gives users a no-risk dry run before any write.
+
+**Negative / honest limits.**
+
+- P1 is a **scaffold**: only the entity registry is conversion-ready. Device registry,
+  config-entry→plugin, automation, and secret-resolution conversions are P2 and **not yet
+  built** — the Status field and crate docs say so.
+- The side-by-side recorder export depends on ADR-132 and is currently a feature-gated
+  no-op.
+- Performance figures in the README (envelope parse < 5 ms, 1 000-entity load < 50 ms) are
+  estimates, **needs verification** with a benchmark.
+
+**Neutral.**
+
+- This resolves only the *identity* of the migrate decision (134→165). The broader 6-way
+  duplicate-number cleanup (incl. ADR-126's series-map registry row) is owner-gated.
+
+## 4. Links
+
+- Crate: `v2/crates/homecore-migrate/` — `Cargo.toml`, `README.md`, `src/lib.rs`,
+  `src/storage.rs`, `src/storage_format/`, `src/entity_registry.rs`,
+  `src/device_registry.rs`, `src/config_entries.rs`, `src/secrets.rs`,
+  `src/automations.rs`, `src/cli.rs`, `src/main.rs`.
+- [ADR-126](ADR-126-ruview-native-ha-port-master.md) — HOMECORE master (series map: HOMECORE-MIGRATE).
+- [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) — HOMECORE-RECORDER (P2 side-by-side export target).
+- [ADR-134](ADR-134-csi-to-cir-time-domain-multipath.md) — First-Class CIR Support (the *unrelated* decision the crate was mistakenly citing).
+- [ADR-164](ADR-164-adr-corpus-gap-analysis.md) — gap analysis that surfaced this collision (Gap G3).
+- [Home Assistant `.storage` format](https://developers.home-assistant.io/docs/storage/).
@@ -1,4 +1,4 @@
-# ADR-050: Quality Engineering Response — Security Hardening & Code Quality
+# ADR-166: Quality Engineering Response — Security Hardening & Code Quality

 | Field | Value |
 |-------|-------|
@@ -1,4 +1,8 @@
-# ADR-052 Appendix: DDD Bounded Contexts — Tauri Desktop Frontend
+# ADR-167 Appendix: DDD Bounded Contexts — Tauri Desktop Frontend
+
+> Appendix to [ADR-052](ADR-052-tauri-desktop-frontend.md). Renumbered from ADR-052
+> to ADR-167 to resolve the ADR-052 duplicate-number collision (per ADR-164 Gap Register
+> G1); the parent decision remains ADR-052.

 This document maps out the domain model for the RuView Tauri desktop application
 described in ADR-052. It defines bounded contexts, their aggregates, entities,
@@ -158,7 +162,7 @@ Represents an over-the-air firmware update to a running node.
 | `target_node` | `MacAddress` | Target node MAC |
 | `target_ip` | `IpAddr` | Target node IP |
 | `firmware` | `FirmwareBinary` | The binary being pushed |
-| `psk` | `Option<SecureString>` | PSK for authentication (ADR-050) |
+| `psk` | `Option<SecureString>` | PSK for authentication (ADR-166) |
 | `phase` | `OtaPhase` | Uploading / Rebooting / Verifying / Done / Failed |
 | `progress` | `Progress` | Upload progress |

@@ -1,4 +1,4 @@
-# ADR-147 Benchmark Proof — OccWorld on RTX 5080
+# ADR-168 Benchmark Proof — OccWorld on RTX 5080
 Date: 2026-05-29
 Hardware: NVIDIA GeForce RTX 5080 (15.47 GB VRAM), CUDA 12.8
 Model: OccWorld TransVQVAE (random weights — pre-domain-fine-tuning baseline)
@@ -0,0 +1,226 @@
+# ADR-169: adam-mode — light theme toggle for the three.js realtime demo
+
+| Field | Value |
+|-------|-------|
+| **Status** | Proposed |
+| **Date** | 2026-06-02 |
+| **Deciders** | ruv |
+| **Codename** | **adam-mode** |
+| **Scope** | `examples/three.js/demos/05-skinned-realtime.html` (primary), demos 01–04 (follow-on) |
+| **Relates to** | ADR-019 (sensing-only UI), ADR-035 (live sensing UI accuracy) |
+| **Tracking issue** | none yet |
+
+---
+
+## 1. Context
+
+`examples/three.js/demos/05-skinned-realtime.html` (build stamp `2026-05-15-fps-tune`) is the live MediaPipe → Mixamo retargeting + ESP32 CSI overlay demo. It currently ships a single, opinionated **dark theme**:
+
+- Body `--bg: #050507` (near-black), `--text: #d8c69a` (warm beige).
+- Amber accents (`--amber: #ffb840`, `--amber-hot: #ffe09f`) on panels and controls.
+- Two full-screen overlays: a radial-vignette `.overlay-frame` and a 50%-opacity CRT-style `.scanlines` layer.
+- Three.js scene matches: `scene.background = new THREE.Color(0x050507)` and `scene.fog = new THREE.FogExp2(0x050507, 0.06)` (lines 269–270).
+
+The dark/amber CRT aesthetic is intentional for screen-recording and "command-centre" feel, but it has real failure modes:
+
+1. **Daylight visibility** — Demoing the live capture on a laptop in a sunlit room is unreadable; the dark background absorbs ambient glare and the amber-on-dark contrast disappears.
+2. **Recording for embedded/print contexts** — When the demo's screen is captured for documentation, blog posts, or HA blueprints, the dark theme bleeds into surrounding white content and looks heavy.
+3. **Accessibility** — A subset of users with light-sensitive retinas (the inverse of typical photophobia) report the high amber-on-near-black combination strains them; high-contrast light themes are easier.
+4. **Operator pairing with a light-mode IDE** — Many operators run a light-mode browser alongside a dark-mode IDE and want the demo to match the browser, not the IDE.
+
+A toggle is the right answer because none of these reasons are universal — some sessions and some users want each mode.
+
+### 1.1 What this ADR is *not*
+
+- Not a redesign. The amber accent stays; only the surface colours and overlays swap. The information density, panel layout, and three.js scene geometry are unchanged.
+- Not a multi-theme system. We add exactly two themes: the existing dark (default, unnamed) and **adam-mode** (light). Future themes would need a new ADR.
+- Not a backend / data-model change. Pure presentation.
+- Not yet propagated to demos 01–04. Those follow-on after adam-mode lands on demo 05 and is validated.
+
+## 2. Decision
+
+Add a **client-side theme toggle** to `05-skinned-realtime.html` that switches between the existing dark theme and a new light theme called **adam-mode**, driven by a `data-theme="adam"` attribute on `<body>` plus a sibling `:root[data-theme="adam"]` CSS block that re-defines the existing custom properties. A new toggle button in the existing `#helpers` panel switches between modes and persists the choice in `localStorage` under the key `ruview.theme`.
+
+### 2.1 CSS — the colour swap
+
+Add immediately after the existing `:root { ... }` block in `<style>`:
+
+```css
+:root[data-theme="adam"] {
+    --bg: #f6f2ea;
+    --bg-panel: rgba(252, 250, 246, 0.92);
+    --amber: #b8741a;        /* deeper amber, readable on cream */
+    --amber-hot: #8a5612;    /* deepest amber for emphasis text */
+    --cyan: #1a6f8a;         /* slate cyan */
+    --magenta: #a8348a;      /* slate magenta */
+    --text: #2a241c;         /* near-black warm */
+    --text-mute: #7a6f5d;    /* warm grey */
+    --green: #1f7a32;        /* forest green */
+    --red: #b03a1a;          /* burnt sienna */
+    --border: rgba(184, 116, 26, 0.28);
+}
+```
+
+Every existing element already reads from these custom properties, so the swap is automatic for panels, text, borders, and bar fills. No per-element CSS rewrites required.
+
+### 2.2 Overlay handling
+
+The vignette and scanlines are dark-theme aesthetics. In adam-mode they would muddy the cream background. Two new rules:
+
+```css
+:root[data-theme="adam"] .overlay-frame {
+    background:
+        radial-gradient(ellipse at center, transparent 70%, rgba(184,116,26,0.10) 100%),
+        linear-gradient(180deg, rgba(184,116,26,0.06) 0%, transparent 18%, transparent 82%, rgba(184,116,26,0.08) 100%);
+}
+:root[data-theme="adam"] .scanlines {
+    opacity: 0.15;
+    mix-blend-mode: multiply;
+}
+```
+
+The vignette is preserved but inverted in colour and lightened; scanlines drop to 15 % opacity and switch from `overlay` to `multiply` blend so they read as faint paper texture rather than CRT lines.
+
+### 2.3 Three.js scene reactivity
+
+Two scene colours are hard-coded at construction (lines 269–270). Replace them with a function call that reads the current theme:
+
+```js
+function themeSceneColors(theme) {
+    return theme === 'adam'
+        ? { bg: 0xf6f2ea, fogDensity: 0.025 }
+        : { bg: 0x050507, fogDensity: 0.06 };
+}
+function applySceneTheme(theme) {
+    const c = themeSceneColors(theme);
+    scene.background = new THREE.Color(c.bg);
+    scene.fog = new THREE.FogExp2(c.bg, c.fogDensity);
+    renderer.setClearColor(c.bg, 1.0);
+}
+```
+
+Called once after `renderer` is constructed, then again from the toggle handler.
+
+`scene.fog` density drops in adam-mode because exponential fog on a light background reads as "haze" much more strongly than on dark — 0.06 → 0.025 keeps the falloff visible without losing the figure into the background.
+
+### 2.4 UI toggle
+
+Add to the `#helpers` panel (top of its labels list):
+
+```html
+<label class="theme-toggle">
+    <input type="checkbox" id="adam-mode-toggle">
+    <span>adam-mode (light)</span>
+    <span class="swatch" style="background: var(--amber)"></span>
+</label>
+```
+
+Handler:
+
+```js
+const THEME_KEY = 'ruview.theme';
+const root = document.documentElement;
+const toggle = document.getElementById('adam-mode-toggle');
+
+function applyTheme(theme) {
+    if (theme === 'adam') {
+        root.setAttribute('data-theme', 'adam');
+        toggle.checked = true;
+    } else {
+        root.removeAttribute('data-theme');
+        toggle.checked = false;
+    }
+    applySceneTheme(theme);
+    try { localStorage.setItem(THEME_KEY, theme); } catch (_) {}
+}
+
+const initialTheme = (() => {
+    try { return localStorage.getItem(THEME_KEY) || 'dark'; }
+    catch (_) { return 'dark'; }
+})();
+applyTheme(initialTheme);
+
+toggle.addEventListener('change', e => {
+    applyTheme(e.target.checked ? 'adam' : 'dark');
+});
+```
+
+### 2.5 Why "adam-mode" as the codename
+
+The user picked the name. It is a project-specific brand — distinct from the generic "light mode" terminology that other modes (`--theme=high-contrast`, `--theme=print`) may eventually need. Keeping a codename makes the toggle searchable in the codebase, the localStorage key portable across the demo set, and avoids ambiguity if dark itself is later renamed.
+
+The string `"adam"` is the only literal value the `data-theme` attribute and the `localStorage` key ever take. `"dark"` is the implicit default (no attribute, no stored value).
+
+### 2.6 Rejected alternatives
+
+| Alternative | Rejected because |
+|---|---|
+| Use `prefers-color-scheme: light` only, no toggle | Operators frequently want the opposite of their OS preference for screen-recording or daylight desk use. Auto-only frustrates the actual use case. |
+| Ship two separate HTML files (`05-…-dark.html`, `05-…-light.html`) | Doubles maintenance for every future demo edit. No path to per-session toggle. |
+| Build a full multi-theme system with a runtime registry | Premature. Two themes don't need a registry; the `data-theme="adam"` attribute is the registry. |
+| Use Tailwind / DaisyUI / a CSS framework | Demos are intentionally stand-alone single-file HTML for portability. No build step exists; adding one for theming is wrong shape. |
+| Adopt the cognitum-v0 / HOMECORE design tokens (`--hc-*` from `examples/frontend/`) | That design system is dark-only by intent (ADR-131). adam-mode is the light counterpart needed in *demo* contexts, not HA dashboard contexts. |
+| Make adam-mode the default | Breaks the dark-aesthetic recording context this demo was originally built for. Default stays dark; toggle stays opt-in. |
+
+## 3. Consequences
+
+### 3.1 Positive
+
+- Demo is usable in daylight, in printed documentation, on light-mode browsers, and by users who find the dark-amber combination fatiguing.
+- Toggle persists across reloads via `localStorage` — set once, sticks.
+- No structural change to information density, panel layout, or three.js scene geometry. Operators familiar with the dark theme can switch and still find every readout in the same place.
+- Implementation is contained — a single `<style>` block addition, a single button, a ~25-line JS handler, and a swap of two scene-construction lines.
+
+### 3.2 Negative
+
+- Two themes to maintain. Any future colour change requires updating both `:root` blocks. Mitigated by keeping the existing custom-property names — adam-mode's values are the only edits.
+- The vignette + scanlines lose some of the CRT charm in adam-mode. Tradeoff accepted by design.
+- One additional `localStorage` slot consumed per origin (`ruview.theme`).
+- The amber accent in adam-mode (`#b8741a`) is visibly different from the dark-mode amber (`#ffb840`) — they share the same CSS variable name but a screenshot from each mode is not pixel-comparable. This is the correct call for accessibility (the bright amber is unreadable on cream) but does mean side-by-side comparisons need both screenshots labelled.
+
+### 3.3 Risks
+
+| Risk | Likelihood | Mitigation |
+|---|---|---|
+| Future demo edits update one `:root` block and forget the other | Medium | A lint script in `scripts/` could grep both blocks for matching key sets; documented as P2 follow-up. |
+| `localStorage` blocked by privacy settings | Low | All accesses are wrapped in try/catch; falls back to dark. |
+| Three.js fog density of 0.025 still washes out the model on adam-mode | Low | Empirically tuned during implementation; if it does, drop to 0.015 or remove fog entirely in adam-mode. |
+| User on a high-DPI display sees scanlines as visible paper texture even at 15 % opacity | Low | If reported, drop to 8 % or hide scanlines entirely in adam-mode. |
+
+## 4. Implementation plan
+
+Tiny scope — single file. No swarm needed.
+
+1. Add `:root[data-theme="adam"]` CSS block and the two overlay overrides.
+2. Refactor scene background + fog into the two helper functions `themeSceneColors()` and `applySceneTheme()`.
+3. Add `<label>` markup and handler script.
+4. Verify in a browser at http://127.0.0.1:8765/examples/three.js/demos/05-skinned-realtime.html — toggle on, reload, confirm adam-mode persists; toggle off, reload, confirm dark persists.
+5. Smoke-screenshot both modes; commit.
+
+Acceptance criteria:
+
+- Toggle checkbox visible in `#helpers` panel.
+- Clicking the toggle swaps colours within one frame.
+- Reload preserves last choice.
+- Three.js scene background follows the toggle (no dark frame visible behind a light HUD or vice-versa).
+- Existing dark-theme appearance is byte-identical when toggle is off.
+
+## 5. Test plan
+
+- Manual visual check in two themes (no automated visual regression — demos aren't in the CI test loop today).
+- `view-source` confirms the new CSS block, the toggle markup, and the handler are present.
+- DevTools `localStorage` shows `ruview.theme` after a toggle.
+- Three.js inspector (or a `console.log(scene.background.getHexString())`) confirms scene colour swap.
+
+## 6. Follow-on work (out of scope for this ADR)
+
+- Roll adam-mode into demos 01–04. Each demo has its own `<style>` block; the same `data-theme="adam"` selector and the same JS handler can be copied.
+- Honor `prefers-color-scheme: light` on first load *if* `localStorage` has no stored choice. Trivial three-line addition.
+- Add a high-contrast theme for accessibility (separate ADR).
+- Lint script that asserts both `:root` blocks declare the same custom-property names.
+
+## 7. Related ADRs
+
+- [ADR-019](ADR-019-sensing-only-ui-mode.md) — sensing-only UI mode (Gaussian splats viewer)
+- [ADR-035](ADR-035-live-sensing-ui-accuracy.md) — live sensing UI accuracy norms (which this demo follows)
+- [ADR-131](docs/adr/ADR-131-...) — HOMECORE / cognitum-v0 design tokens (dark-only, separate context)
--- a/Show More
+++ b/Show More