Compare commits

..

1 Commits

Author SHA1 Message Date
github-actions[bot] c436b55ff4 chore: update vendor submodules to latest upstream 2026-05-31 12:37:39 +00:00
740 changed files with 28400 additions and 73926 deletions
-119
View File
@@ -1,119 +0,0 @@
{
"id": "aether-arena-aa",
"name": "AetherArena (AA) — Official Spatial-Intelligence Benchmark",
"adr": "ADR-149",
"adrPath": "docs/adr/ADR-149-public-community-leaderboard-huggingface.md",
"status": "Accepted",
"initializedDate": "2026-05-30",
"targetDate": "2026-08-31",
"exitCriteria": "Benchmark INFRASTRUCTURE done, tested, CI-gated, deploy-ready: aa_score_runner.rs passes deterministic fixture test; CI harness-gate green on every PR; aether-arena repo scaffold committed (README four-part framing + aa-submission.toml schema + VERIFY.md); public smoke split committed; HF Space lifecycle skeleton deployed; signed Parquet ledger functional; RuView baseline PCK@20 ~2.5% entered; ADR-149 §7 acceptance test (five-step stranger test) passes. NOTE: ML SOTA (MM-Fi PCK@20 ~72%) is a separate long-running stretch goal blocked on ADR-079 camera-ground-truth — it is NOT an infra exit criterion.",
"baselineState": {
"adrStatus": "Accepted, committed 2026-05-30",
"scorerCode": "ruview_metrics.rs + ablation.rs + proof.rs exist in wifi-densepose-train; aa_score_runner.rs not yet created",
"aetherArenaRepo": "does not exist yet — needs user authorization to create ruvnet/aether-arena public repo",
"hfSpace": "does not exist yet — needs HF_TOKEN and user authorization to deploy ruvnet/aether-arena HF Space",
"smokeDataset": "not committed",
"resultsLedger": "not created",
"ruviewBaseline": "PCK@20 ~2.5% self-reported, not formally entered",
"ciGate": "not added to workflow"
},
"milestones": {
"m1": {
"name": "ADR-149 Accepted + committed",
"status": "DONE",
"completedDate": "2026-05-30",
"completionCriteria": "ADR-149 file committed to docs/adr/ with status Accepted",
"notes": "Done this session. File at docs/adr/ADR-149-public-community-leaderboard-huggingface.md"
},
"m2": {
"name": "Deterministic scorer runner bin (aa_score_runner.rs)",
"status": "NOT_STARTED",
"completionCriteria": "aa_score_runner.rs compiles, runs ruview_metrics on a committed fixture, emits RuViewTier + SHA-256 proof hash, mirrors existing *_proof_runner.rs pattern; cargo test passes",
"estimatedEffort": "3-5 days",
"owner": "wifi-densepose-train crate or new aa-scorer crate"
},
"m3": {
"name": "CI harness-gate: GitHub Actions workflow",
"status": "NOT_STARTED",
"completionCriteria": "A GitHub Actions workflow runs aa_score_runner on every PR as a build gate; PR fails if scorer fails determinism check; workflow committed and green",
"estimatedEffort": "2-3 days",
"dependency": "M2 must be done first"
},
"m4": {
"name": "aether-arena repo scaffold",
"status": "NOT_STARTED",
"completionCriteria": "ruvnet/aether-arena repo created with: README (four-part framing: Public leaderboard / Private eval split / Open scorer / Signed results); aa-submission.toml manifest schema; VERIFY.md (ADR-149 §7 stranger acceptance test); neutrality/governance section (§2.8); contribution guide",
"estimatedEffort": "3-5 days",
"blockers": ["Needs user authorization to create public ruvnet/aether-arena repo on GitHub"]
},
"m5": {
"name": "Public smoke split committed + private MM-Fi held-out split prep",
"status": "NOT_STARTED",
"completionCriteria": "Public smoke split committed to aether-arena repo (stranger can score locally); private MM-Fi held-out split prepared under non-public path with CC BY-NC 4.0 attribution; Wi-Pose explicitly excluded from v0",
"estimatedEffort": "5-7 days",
"riskNotes": "MM-Fi CC BY-NC 4.0: AA must remain non-commercial and carry MM-Fi attribution; raw frames stay in private split; only derived CSI features + scores may be exposed"
},
"m6": {
"name": "HF Space (Gradio) skeleton",
"status": "BLOCKED",
"completionCriteria": "HF Space deployed at ruvnet/aether-arena with submission lifecycle (submitted->validated->quarantined->smoke_scored->full_scored->published/rejected); sandboxed scorer container wired; basic leaderboard table rendered",
"estimatedEffort": "7-10 days",
"blockers": [
"Needs HF_TOKEN — check .env for HF_TOKEN or HUGGINGFACE_TOKEN",
"Needs user authorization to create/deploy ruvnet/aether-arena HF Space (outward-facing public deployment)"
]
},
"m7": {
"name": "Signed append-only Parquet results ledger",
"status": "NOT_STARTED",
"completionCriteria": "HF dataset ruvnet/aether-arena-results created; append-only Parquet ledger with signed rows; determinism_gate enforced; no row can be silently edited",
"estimatedEffort": "3-5 days",
"ledgerSchema": "submitter, model_ref, category, feature_set, tier, pck20, oks, mota, vitals_bpm_err, latency_p50, latency_p95, privacy_leakage, cross_room_deg, proof_sha256, scored_at, harness_version",
"dependency": "M6 must be scaffolded first"
},
"m8": {
"name": "RuView baseline entry + public launch",
"status": "NOT_STARTED",
"completionCriteria": "RuView wifi-densepose-pretrained baseline entered (honest PCK@20 ~2.5%); ADR-149 §7 five-step stranger acceptance test passes; v0 live with Presence + Pose + Edge-latency + Determinism categories active; Privacy and Cross-room shown as gated/coming-soon",
"estimatedEffort": "3-5 days",
"dependency": "M4+M5+M6+M7 complete",
"notes": "ML SOTA improvement (PCK@20 ~72%) is a SEPARATE stretch goal blocked on ADR-079 P7-P9 camera ground truth. NOT a blocker for infra launch."
}
},
"activeMilestone": "m2",
"completedMilestones": ["m1"],
"knownRisks": [
"HF_TOKEN not confirmed present in .env — check before M6 work begins",
"ruvnet/aether-arena public repo creation is outward-facing — needs explicit user authorization",
"MM-Fi CC BY-NC 4.0: AA must stay legally non-commercial and brand-distinct from commercial RuView product; or seek MM-Fi commercial grant before any paid tier",
"Wi-Pose has research-use-only terms (no redistribution grant) — excluded from v0; revisit only if terms are clarified with authors",
"HF Space free CPU tier may be too slow for Candle/tch inference pipeline — may need ZeroGPU or self-hosted scorer on cognitum-20260110 GCloud A100/L4",
"ADR-079 camera-ground-truth (PCK@20 SOTA) is P7-P9 pending — NOT an infra blocker; must not be conflated with AA infra completion",
"Neutrality/governance risk: RuView seeded the scorer — must be demonstrably scored through the same public pipeline as any other entrant (§2.8 controls)"
],
"driftSignals": {
"timeline": "GREEN — just initialized, no timeline pressure yet",
"scope": "GREEN — scope locked at four-part structure per ADR-149 §2 decision",
"approach": "GREEN — reuse pattern (existing ruview_metrics + proof.rs) confirmed in ADR-149",
"dependency": "YELLOW — HF_TOKEN and ruvnet/aether-arena repo authorization are external blockers with unknown ETA",
"priority": "GREEN — active feature branch feat/adr-136-146-streaming-engine in progress; AA infra can proceed in parallel on its own branch"
},
"stretchGoals": {
"sotaML": "MM-Fi PCK@20 SOTA ~72% — separate ML effort blocked on ADR-079 P7-P9 camera-ground-truth data collection; NOT an infra exit criterion",
"privacyAxis": "ADR-145 §10 membership-inference attacker — activate Privacy leaderboard axis once attacker is implemented and published",
"crossRoom": "Multi-room held-out split — activate Cross-room generalization axis",
"multiOrgSteering": "Invite co-maintainers from other projects once >=N external entries land"
},
"sessionHistory": [
{
"date": "2026-05-30",
"type": "initialization",
"accomplished": [
"ADR-149 Accepted and committed to docs/adr/",
"Horizon record initialized in .claude-flow/horizons/aether-arena-aa.json",
"Memory stored in horizons namespace under key horizon-aether-arena-aa",
"Session check-in record stored in horizon-sessions namespace"
]
}
]
}
@@ -1,96 +0,0 @@
name: AetherArena harness gate (ADR-149)
# Runs the AetherArena scoring harness as a PR build gate. Every PR that touches
# the scorer, the metrics, or the benchmark scaffold must keep the deterministic
# score hash stable (ADR-149 §2.5 determinism_gate). If the scoring maths changes,
# the hash moves and this gate fails until `expected_score.sha256` is regenerated
# and reviewed — so scorer drift can never land silently.
#
# This is the "a PR that runs the harness as part of the build process" requirement.
on:
pull_request:
paths:
- 'v2/crates/wifi-densepose-train/src/ruview_metrics.rs'
- 'v2/crates/wifi-densepose-train/src/ablation.rs'
- 'v2/crates/wifi-densepose-train/src/bin/aa_score_runner.rs'
- 'aether-arena/**'
- '.github/workflows/aether-arena-harness.yml'
push:
branches: ['feat/adr-149-aether-arena']
workflow_dispatch:
permissions:
contents: read
pull-requests: write
jobs:
harness-gate:
name: Run AA scorer harness (determinism gate)
runs-on: ubuntu-latest
defaults:
run:
working-directory: v2
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Install Rust toolchain
run: rustup show && rustc --version
- name: Cache cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
v2/target
key: aa-harness-${{ runner.os }}-${{ hashFiles('v2/Cargo.lock') }}
# 1. Build the pure-Rust scorer (no torch / no GPU → fast PR gate).
- name: Build AA score runner
run: cargo build -p wifi-densepose-train --bin aa_score_runner --no-default-features
# 2. Determinism gate: the committed expected hash must still match. A
# non-zero exit here fails the PR.
- name: Run determinism gate
run: cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features
# 3. Repeatability analysis (witness chain): the harness must produce one
# identical proof hash across many runs — any nondeterminism fails here.
- name: Repeatability analysis (16 runs)
run: cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --repeat 16
# 4. Real-scoring smoke: score a sample prediction against the public smoke
# split, exercising the actual model-scoring path (not just the fixture).
- name: Real-scoring smoke test
run: |
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- \
--split ../aether-arena/fixtures/smoke_split.json \
--pred ../aether-arena/fixtures/smoke_pred.json --json
# 5. Witness ledger chain integrity: the append-only results ledger must
# verify (every prev_hash link + row_hash intact = no silent edits).
- name: Verify witness ledger chain
working-directory: aether-arena/ledger
run: python3 ledger_tools.py verify
# 6. Emit the witness row + repeatability into the PR run summary.
- name: Witness row → job summary
if: always()
run: |
ROW=$(cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --json)
REP=$(cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --repeat 16)
{
echo "## AetherArena harness gate (witness chain)"
echo ""
echo "Deterministic witness (ADR-149 §2.2 / proof + repeatability):"
echo '```json'
echo "$ROW"
echo "$REP"
echo '```'
echo ""
echo "If the determinism gate failed, the scoring maths changed: regenerate with"
echo '`cargo run -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --generate-hash > aether-arena/fixtures/expected_score.sha256` and review the diff.'
} >> "$GITHUB_STEP_SUMMARY"
-199
View File
@@ -1,199 +0,0 @@
name: Bench Regression Guard
# Sub-deliverable 8.3 of the benchmark/optimization milestone.
#
# HONEST SCOPE (read this before assuming this gates on timing):
# * The `bench-compile` job is a REAL, HARD-FAILING regression gate. It runs
# `cargo bench --no-default-features --no-run`, which type-checks and links
# EVERY criterion bench in the v2/ workspace without running a single
# measurement. Benches are not part of `cargo test`, so they silently
# bit-rot when a public API they call changes — this job catches that the
# moment it happens. This is the part of this workflow that can fail a PR.
#
# * The `bench-fast-run` job runs a small, curated subset of pure-CPU benches
# in criterion "quick mode" (short warm-up / measurement / 10 samples) and
# is INFORMATIONAL ONLY (`continue-on-error: true`). It does NOT gate on
# timing. Wall-clock timings on shared GitHub-hosted runners vary by
# 2-3x run-to-run (noisy neighbours, CPU throttling, no pinned frequency),
# so a hard ">X ms" threshold here would flake constantly and teach
# everyone to ignore it. We deliberately do not pretend to do timing
# regression-gating we cannot deliver reliably. The numbers are surfaced in
# the job log + uploaded as an artifact for humans to eyeball trends.
#
# WHY NO criterion --baseline COMPARE GATE:
# criterion's `--save-baseline` / `--baseline` compare is the textbook
# regression mechanism, but it only produces a trustworthy verdict when the
# baseline and the candidate were measured on the SAME hardware under the SAME
# conditions. GitHub-hosted runners give neither (the baseline commit and the
# PR commit land on different physical machines). Committing a baseline JSON
# measured on one runner and comparing a different runner against it would
# manufacture false regressions. If/when these benches run on a dedicated,
# frequency-pinned self-hosted runner, a `--baseline` compare with a generous
# (>2x) noise floor becomes honest and can be added then. Until then,
# compile-verify + informational-run is the honest gate.
on:
push:
branches: [ main, develop, 'feat/*' ]
paths:
- 'v2/crates/**/benches/**'
- 'v2/crates/**/Cargo.toml'
- 'v2/crates/**/src/**'
- 'v2/Cargo.toml'
- 'v2/Cargo.lock'
- '.github/workflows/bench-regression.yml'
pull_request:
paths:
- 'v2/crates/**/benches/**'
- 'v2/crates/**/Cargo.toml'
- 'v2/crates/**/src/**'
- 'v2/Cargo.toml'
- 'v2/Cargo.lock'
- '.github/workflows/bench-regression.yml'
workflow_dispatch:
permissions:
contents: read
env:
CARGO_TERM_COLOR: always
# Debuginfo is useless in CI and the 38-crate workspace target dir otherwise
# exhausts the runner disk (mirrors ci.yml's rust-tests job). The bench
# profile inherits release + debug = true (v2/Cargo.toml [profile.bench]);
# force it off so the link step does not run out of space.
CARGO_PROFILE_BENCH_DEBUG: "0"
CARGO_PROFILE_RELEASE_DEBUG: "0"
jobs:
# ── HARD GATE: every bench must still compile + link ─────────────────────
bench-compile:
name: bench compile-verify (--no-run)
runs-on: ubuntu-latest
steps:
- name: Checkout (recursive — wifi-densepose-rufield path-deps vendor/rufield)
uses: actions/checkout@v4
with:
# The workspace includes `wifi-densepose-rufield`, which path-deps the
# `vendor/rufield` submodule crates. Without a recursive checkout the
# whole workspace fails to resolve before any bench is built.
submodules: recursive
# The workspace pulls in `wifi-densepose-desktop` (Tauri v2) whose -sys
# crates need the GTK/WebKit/serial dev libraries via pkg-config, exactly
# as ci.yml's rust-tests job documents. A `--workspace` bench build links
# the whole graph, so these are required here too.
- name: Install Tauri / GTK / serial system dev libraries
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
libglib2.0-dev \
libgtk-3-dev \
libsoup-3.0-dev \
libjavascriptcoregtk-4.1-dev \
libwebkit2gtk-4.1-dev \
libayatana-appindicator3-dev \
librsvg2-dev \
libxdo-dev \
libudev-dev \
libdbus-1-dev \
libssl-dev \
pkg-config
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
- name: Cache cargo (Swatinem/rust-cache)
uses: Swatinem/rust-cache@v2
with:
workspaces: v2
# Distinct cache scope from ci.yml's rust-tests so the bench profile
# artifacts (release+opt) do not evict the test profile cache.
key: bench-regression
# The core regression guard. `--no-run` compiles + links every bench
# target in the workspace's DEFAULT feature set but runs no measurement,
# so it is deterministic and fast-ish (build only). A bench that no longer
# compiles — because a type/signature it calls changed and nobody updated
# the bench — fails the build here. `--no-default-features` is the
# workspace's standard gate flag (openblas/tch/ort/onnx stay opt-out).
- name: Compile all workspace benches (default features)
working-directory: v2
run: cargo bench --workspace --no-default-features --no-run
# Feature-gated benches are skipped by the default build above because
# their `[[bench]]` entries carry `required-features`. Compile the ones we
# can guard so they are also covered against bit-rot.
# * cir → wifi-densepose-signal/benches/cir_bench.rs (ADR-134). The
# `cir` feature is pure-Rust (`cir = []`), so it builds on the stock
# runner and is a real, hard-failing guard like the step above.
#
# NOT guarded here (honest scope):
# * crv → wifi-densepose-ruvector/benches/crv_bench.rs. The `crv` feature
# pulls the crates.io dependency `ruvector-crv 0.1.1`, which currently
# FAILS to compile on stable (E0308 type mismatch in its own
# `stage_iii.rs` — an UPSTREAM bug, unrelated to bench bit-rot).
# Adding a hard `--features crv` compile step would make this workflow
# red for a reason this gate is not meant to police. Re-add this step
# once `ruvector-crv` ships a fixed release. (mqtt/onnx benches are
# likewise left to their own crate workflows.)
- name: Compile feature-gated benches (cir)
working-directory: v2
run: cargo bench -p wifi-densepose-signal --no-default-features --features cir --bench cir_bench --no-run
# ── INFORMATIONAL: run a curated fast subset (never gates) ───────────────
bench-fast-run:
name: bench fast-run (informational, non-gating)
runs-on: ubuntu-latest
# NEVER fail the workflow on this job — timings are noise-prone on shared
# runners (see header). It exists to surface trends for humans, not to gate.
continue-on-error: true
needs: [bench-compile]
steps:
- name: Checkout (recursive)
uses: actions/checkout@v4
with:
submodules: recursive
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
- name: Cache cargo (Swatinem/rust-cache)
uses: Swatinem/rust-cache@v2
with:
workspaces: v2
key: bench-regression
# Curated subset = pure-CPU, fast, dependency-light criterion benches that
# finish in seconds under quick-mode flags. Each is targeted by `--bench`
# (NOT a bare `cargo bench -p`) because the crates' lib targets use the
# libtest harness, which rejects criterion's CLI flags (--warm-up-time
# etc.) and aborts the run. Quick-mode: 1s warm-up, 2s measure, 10 samples.
- name: nvsim pipeline_throughput (quick)
working-directory: v2
run: |
mkdir -p ../bench-out
cargo bench -p nvsim --no-default-features --bench pipeline_throughput -- \
--warm-up-time 1 --measurement-time 2 --sample-size 10 \
| tee ../bench-out/nvsim_pipeline_throughput.txt
- name: ruvector sketch_bench (quick)
working-directory: v2
run: |
cargo bench -p wifi-densepose-ruvector --no-default-features --bench sketch_bench -- \
--warm-up-time 1 --measurement-time 2 --sample-size 10 \
| tee ../bench-out/ruvector_sketch_bench.txt
- name: ruvector fusion_bench (quick)
working-directory: v2
run: |
cargo bench -p wifi-densepose-ruvector --no-default-features --bench fusion_bench -- \
--warm-up-time 1 --measurement-time 2 --sample-size 10 \
| tee ../bench-out/ruvector_fusion_bench.txt
- name: Upload informational bench logs
if: always()
uses: actions/upload-artifact@v4
with:
name: bench-fast-run-logs
path: bench-out/
if-no-files-found: warn
@@ -53,8 +53,6 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
-6
View File
@@ -42,8 +42,6 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Determine deployment environment
id: determine-env
@@ -88,8 +86,6 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up kubectl
uses: azure/setup-kubectl@v3
@@ -136,8 +132,6 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up kubectl
uses: azure/setup-kubectl@v3
+18 -74
View File
@@ -29,7 +29,6 @@ jobs:
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
fetch-depth: 0
- name: Set up Python
@@ -83,13 +82,6 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
# ADR-262 P1: `wifi-densepose-rufield` path-deps the `vendor/rufield`
# submodule. Without a recursive checkout the workspace build fails to
# resolve those path deps in CI even though it passes locally.
with:
submodules: recursive
# `wifi-densepose-desktop` is a Tauri v2 app — `glib-sys`, `gtk-sys`,
# `webkit2gtk-sys`, etc. need the Linux dev libraries via pkg-config or the
@@ -116,36 +108,23 @@ jobs:
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
# Swatinem/rust-cache replaces a naive `actions/cache` of the whole
# `v2/target`. That manual cache of a 38-crate target dir (multi-GB) was an
# intermittent failure source — several CI runs this cycle died at the
# cache/setup step (after toolchain install, before "Run Rust tests"),
# needing a rerun. rust-cache is purpose-built for Rust: it caches the
# registry + git + a pruned target, evicts stale deps, and restores far more
# reliably (and faster) on large workspaces. `workspaces: v2` points it at
# the v2/ cargo workspace (keys on v2/Cargo.lock, caches v2/target).
- name: Cache cargo (Swatinem/rust-cache)
uses: Swatinem/rust-cache@v2
- name: Cache cargo
uses: actions/cache@v4
with:
workspaces: v2
path: |
~/.cargo/registry
~/.cargo/git
v2/target
key: ${{ runner.os }}-cargo-${{ hashFiles('v2/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-
# The 38-crate workspace debug build exhausts the runner's disk when built
# with full debuginfo (observed: "final link failed: No space left on
# device" once the engine/benchmark crates landed; the same tree's local
# debug target measured 151 GB). Debuginfo is useless in CI — tests either
# pass or print their failure — so build without it; target shrinks ~5-10x.
- name: Run Rust tests
working-directory: v2
env:
CARGO_PROFILE_DEV_DEBUG: "0"
CARGO_PROFILE_TEST_DEBUG: "0"
run: cargo test --workspace --no-default-features
- name: Run ADR-147 worldmodel tests
working-directory: v2
env:
CARGO_PROFILE_DEV_DEBUG: "0"
CARGO_PROFILE_TEST_DEBUG: "0"
run: cargo test -p wifi-densepose-worldmodel --no-default-features
# ADR-134 CIR tests are behind the `cir` feature so the bench dependency
@@ -210,8 +189,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python ${{ matrix.python-version }}
continue-on-error: true
@@ -277,8 +254,6 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python
uses: actions/setup-python@v6
@@ -290,45 +265,23 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest # the perf suite is pytest, not locust
pip install locust
# No "Start application" step: the gated test (test_frame_budget.py) drives
# the CSIProcessor pipeline in-process and makes no HTTP calls, so the old
# uvicorn server + `sleep 10` were dead weight — they only existed for the
# now-excluded api_throughput/inference_speed tests, and on every run dumped
# ~50 misleading "router requires hardware setup" ERROR lines for a server
# no test touched. MOCK_POSE_DATA is server-only and unused here.
- name: Run performance tests
- name: Start application
working-directory: archive/v1
run: |
# Gate only on the genuine, deterministic perf guard:
# test_frame_budget.py times the *real* CSIProcessor pipeline against
# the ADR 50 ms per-frame budget (single-frame, p95 over 100 frames,
# +Doppler) — a true regression signal.
#
# test_api_throughput.py / test_inference_speed.py are excluded: every
# test there is a TDD red-phase stub (suffix `_should_fail_initially`)
# that times a *mock that sleeps* — meaningless as a perf signal, with
# machine-dependent wall-clock asserts (e.g. `actual_rps >= 40`,
# `batch_time < individual_time`) that are inherently flaky on shared
# CI runners, plus a cross-class fixture-scope bug. Forcing them green
# would be manufacturing a false signal; they stay in-repo for local
# TDD but do not gate CI until the underlying features are implemented.
#
# `python -m pytest` (not the bare `pytest` script) puts the cwd
# (archive/v1) on sys.path so `from src.core...` resolves — the bare
# script omits cwd and raises ModuleNotFoundError: No module named 'src'.
# -o addopts="" drops the root pyproject's --cov/--cov-fail-under=100.
python -m pytest tests/performance/test_frame_budget.py \
-o addopts="" -v --junitxml=perf-junit.xml
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 &
sleep 10
- name: Run performance tests
run: |
locust -f tests/performance/locustfile.py --headless --users 50 --spawn-rate 5 --run-time 60s --host http://localhost:8000
- name: Upload performance results
if: always()
uses: actions/upload-artifact@v4
with:
name: performance-results
path: archive/v1/perf-junit.xml
path: locust_report.html
# Docker Build and Test
# NOTE: the canonical Docker build for the sensing-server is now
@@ -347,8 +300,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Docker Buildx
continue-on-error: true
@@ -416,13 +367,9 @@ jobs:
runs-on: ubuntu-latest
needs: [docker-build]
if: github.ref == 'refs/heads/main'
permissions:
contents: write # gh-pages deploy needs write (GITHUB_TOKEN is read-only by default -> 403)
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python
uses: actions/setup-python@v6
@@ -437,8 +384,6 @@ jobs:
- name: Generate OpenAPI spec
working-directory: archive/v1
env:
MOCK_POSE_DATA: "true" # no CSI hardware in CI
run: |
python -c "
from src.api.main import app
@@ -449,7 +394,6 @@ jobs:
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v4
continue-on-error: true # openapi generation above is the real validation; deploy is best-effort (Pages may be disabled)
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./docs
-2
View File
@@ -35,8 +35,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Fetch /traffic/clones + /traffic/views from GitHub
env:
@@ -28,8 +28,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
@@ -80,8 +78,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
@@ -149,8 +145,6 @@ jobs:
vars.HAS_GCP_CREDENTIALS == 'true'
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Download x86_64 artifact
uses: actions/download-artifact@v4
-2
View File
@@ -20,8 +20,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: dtolnay/rust-toolchain@stable
with: { targets: wasm32-unknown-unknown }
-2
View File
@@ -26,8 +26,6 @@ jobs:
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
submodules: recursive
- name: Install Rust + wasm32 target
uses: dtolnay/rust-toolchain@stable
-6
View File
@@ -28,8 +28,6 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Setup Node.js
uses: actions/setup-node@v6
@@ -85,8 +83,6 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Setup Node.js
uses: actions/setup-node@v6
@@ -135,8 +131,6 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Download all artifacts
uses: actions/download-artifact@v4
-4
View File
@@ -22,8 +22,6 @@ jobs:
if: github.ref_type == 'tag'
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Check firmware version.txt == tag
run: |
# Tag form: vX.Y.Z-esp32 → expect version.txt to contain X.Y.Z
@@ -73,8 +71,6 @@ jobs:
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Build firmware (${{ matrix.variant }})
working-directory: firmware/esp32-csi-node
-8
View File
@@ -100,8 +100,6 @@ jobs:
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Download QEMU artifact
uses: actions/download-artifact@v4
@@ -216,8 +214,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Install clang
run: |
@@ -267,8 +263,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Install NVS generator
run: pip install esp-idf-nvs-partition-gen
@@ -323,8 +317,6 @@ jobs:
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Download QEMU artifact
uses: actions/download-artifact@v4
@@ -22,8 +22,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: actions/setup-python@v6
with:
-2
View File
@@ -41,8 +41,6 @@ jobs:
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Install mosquitto + clients and start with allow_anonymous
run: |
@@ -26,8 +26,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: docker/setup-buildx-action@v3
-6
View File
@@ -76,8 +76,6 @@ jobs:
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
# Linux aarch64 needs QEMU for cross-build on x86_64 runners.
- name: Set up QEMU
@@ -123,8 +121,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Install maturin
run: pip install maturin>=1.7
- name: Build sdist
@@ -148,8 +144,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: actions/setup-python@v5
with:
python-version: '3.12'
-2
View File
@@ -29,8 +29,6 @@ jobs:
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
submodules: recursive
- name: Stage viewer for Pages
run: |
-14
View File
@@ -40,8 +40,6 @@ jobs:
- { label: 'full+train', flags: '--features full,train' }
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: dtolnay/rust-toolchain@stable
- name: Cache cargo
uses: actions/cache@v4
@@ -62,16 +60,8 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
# v2/rust-toolchain.toml pins channel "1.89" with profile "minimal" (no
# clippy). dtolnay@stable installs clippy on the floating "stable"
# toolchain, but the override makes cargo use the separate "1.89"
# toolchain — so `cargo clippy` errors "cargo-clippy is not installed for
# 1.89". Install clippy on the pinned toolchain that cargo actually uses.
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: "1.89"
components: clippy
- name: Cache cargo
uses: actions/cache@v4
@@ -97,8 +87,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: dtolnay/rust-toolchain@stable
- name: Cache cargo
uses: actions/cache@v4
@@ -133,8 +121,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: publish = false is present (no accidental crates.io publish)
run: |
CARGO=v2/crates/ruview-swarm/Cargo.toml
+16 -29
View File
@@ -28,7 +28,6 @@ jobs:
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
fetch-depth: 0
- name: Set up Python
@@ -47,10 +46,7 @@ jobs:
- name: Run Bandit security scan
run: |
# The Python codebase lives under archive/v1/src (it moved there when
# the runtime was rewritten in Rust). Scanning `src/` matched nothing,
# so this SAST step was a silent no-op.
bandit -r archive/v1/src/ -f sarif -o bandit-results.sarif
bandit -r src/ -f sarif -o bandit-results.sarif
continue-on-error: true
- name: Upload Bandit results to GitHub Security
@@ -61,20 +57,22 @@ jobs:
sarif_file: bandit-results.sarif
category: bandit
# Removed the deprecated `returntocorp/semgrep-action@v1` step: it was
# redundant (the pip `semgrep --sarif` below is what feeds GitHub Security;
# the action only pushed to the Semgrep cloud app via SEMGREP_APP_TOKEN) and
# it pulled `returntocorp/semgrep-agent:v1` from Docker Hub on every run,
# which intermittently timed out and turned this check red. The pip semgrep
# (installed above) needs no Docker pull. The action's `p/docker` +
# `p/kubernetes` rulesets are folded into the command below so coverage is
# preserved.
- name: Run Semgrep + generate SARIF
- name: Run Semgrep security scan
continue-on-error: true
uses: returntocorp/semgrep-action@v1
with:
config: >-
p/security-audit
p/secrets
p/python
p/docker
p/kubernetes
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
- name: Generate Semgrep SARIF
run: |
semgrep \
--config=p/security-audit --config=p/secrets --config=p/python \
--config=p/docker --config=p/kubernetes \
--sarif --output=semgrep.sarif archive/v1/src/
semgrep --config=p/security-audit --config=p/secrets --config=p/python --sarif --output=semgrep.sarif src/
continue-on-error: true
- name: Upload Semgrep results to GitHub Security
@@ -98,8 +96,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python
continue-on-error: true
@@ -167,8 +163,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Docker Buildx
continue-on-error: true
@@ -250,8 +244,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Run Checkov IaC scan
continue-on-error: true
@@ -314,7 +306,6 @@ jobs:
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
fetch-depth: 0
- name: Run TruffleHog secret scan
@@ -349,8 +340,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python
continue-on-error: true
@@ -388,8 +377,6 @@ jobs:
- name: Checkout code
continue-on-error: true
uses: actions/checkout@v4
with:
submodules: recursive
- name: Check security policy files
continue-on-error: true
-2
View File
@@ -30,8 +30,6 @@ jobs:
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
submodules: recursive
- name: Stage demos for Pages
run: |
-4
View File
@@ -7,7 +7,6 @@ on:
- 'archive/v1/src/core/**'
- 'archive/v1/src/hardware/**'
- 'archive/v1/data/proof/**'
- 'archive/v1/requirements-lock.txt'
- '.github/workflows/verify-pipeline.yml'
pull_request:
branches: [ main, master ]
@@ -15,7 +14,6 @@ on:
- 'archive/v1/src/core/**'
- 'archive/v1/src/hardware/**'
- 'archive/v1/data/proof/**'
- 'archive/v1/requirements-lock.txt'
- '.github/workflows/verify-pipeline.yml'
workflow_dispatch:
@@ -30,8 +28,6 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v6
-16
View File
@@ -16,15 +16,6 @@ firmware/esp32-csi-node/sdkconfig.defaults.bak
# ESP-IDF set-target backup (local only)
firmware/esp32-hello-world/sdkconfig.old
# Host-built firmware test binaries (compiled from test/*.c, not source)
firmware/esp32-csi-node/test/test_adr110
firmware/esp32-csi-node/test/test_vitals
firmware/esp32-csi-node/test/fuzz_serialize
firmware/esp32-csi-node/test/fuzz_edge
firmware/esp32-csi-node/test/fuzz_nvs
firmware/esp32-csi-node/test/*.exe
firmware/esp32-csi-node/test/*.obj
# Claude Flow swarm runtime state
.swarm/
@@ -270,10 +261,3 @@ v2/crates/rvcsi-node/*.node
v2/crates/rvcsi-node/binding.js
v2/crates/rvcsi-node/binding.d.ts
v2/crates/rvcsi-node/npm/
# AetherArena private optimization staging — never published until reviewed
aether-arena/staging/
# MM-Fi benchmark dataset archives — large data, fetch separately, never commit
assets/MM-Fi/E0*.zip
assets/MM-Fi/*.zip
-7
View File
@@ -14,10 +14,3 @@
path = vendor/rvcsi
url = https://github.com/ruvnet/rvcsi
branch = main
[submodule "v2/crates/ruv-neural"]
path = v2/crates/ruv-neural
url = https://github.com/ruvnet/ruv-neural.git
branch = main
[submodule "vendor/rufield"]
path = vendor/rufield
url = https://github.com/ruvnet/rufield
+2 -141
View File
File diff suppressed because one or more lines are too long
+3 -8
View File
@@ -10,20 +10,17 @@ Dual codebase: Python v1 (`v1/`) and Rust port (`v2/`).
| `wifi-densepose-core` | Core types, traits, error types, CSI frame primitives |
| `wifi-densepose-signal` | SOTA signal processing + RuvSense multistatic sensing (16 modules) |
| `wifi-densepose-nn` | Neural network inference (ONNX, PyTorch, Candle backends) |
| `wifi-densepose-train` | Training pipeline with ruvector integration + ruview_metrics; MAE pretraining recipe (`mae.rs`, ADR-152 §2.3) + WiFlow-STD port (`wiflow_std/`, tch-gated) |
| `wifi-densepose-train` | Training pipeline with ruvector integration + ruview_metrics |
| `wifi-densepose-mat` | Mass Casualty Assessment Tool — disaster survivor detection |
| `wifi-densepose-hardware` | ESP32 aggregator, TDM protocol, channel hopping firmware; `ieee80211bf/` 802.11bf forward-compat protocol model (ADR-153) |
| `wifi-densepose-hardware` | ESP32 aggregator, TDM protocol, channel hopping firmware |
| `wifi-densepose-ruvector` | RuVector v2.0.4 integration + cross-viewpoint fusion (5 modules) |
| `wifi-densepose-wasm` | WebAssembly bindings for browser deployment |
| `wifi-densepose-cli` | CLI tool (`wifi-densepose` binary)`calibrate`/`calibrate-serve`/`enroll`/`train-room`/`room-watch` + MAT (MAT gated behind the `mat` feature; build `--no-default-features` for the aarch64/appliance calibration binary) |
| `wifi-densepose-calibration` | ADR-151 per-room calibration & specialist training — `baseline → enroll → extract → train` → bank of small specialists (presence/posture/breathing/heartbeat/restlessness/anomaly) + multistatic fusion; pure Rust, edge-deployable |
| `wifi-densepose-cli` | CLI tool (`wifi-densepose` binary) |
| `wifi-densepose-sensing-server` | Lightweight Axum server for WiFi sensing UI |
| `wifi-densepose-wifiscan` | Multi-BSSID WiFi scanning (ADR-022) |
| `wifi-densepose-vitals` | ESP32 CSI-grade vital sign extraction (ADR-021) |
| `nvsim` | Deterministic NV-diamond magnetometer pipeline simulator (ADR-089) — standalone leaf, WASM-ready |
| `vendor/rvcsi` (submodule) | **rvCSI** — edge RF sensing runtime (ADR-095/096): 9 crates (`rvcsi-core`/`-dsp`/`-events`/`-adapter-file`/`-adapter-nexmon`/`-ruvector`/`-runtime`/`-node`/`-cli`). Lives in its own repo ([github.com/ruvnet/rvcsi](https://github.com/ruvnet/rvcsi)), vendored here under `vendor/rvcsi`, published to crates.io as `rvcsi-* 0.3.x` and to npm as `@ruv/rvcsi`. Not a `v2/` workspace member — depend on the published crates (or the submodule's `crates/rvcsi-*` paths). Normalized `CsiFrame`/`CsiWindow`/`CsiEvent` schema, validate-before-FFI, reusable DSP, typed confidence-scored events, the napi-c Nexmon shim (real nexmon_csi `.pcap` from a Raspberry Pi 5 / 4 / 3B+ — BCM43455c0), the napi-rs SDK, the `rvcsi` CLI, a Claude Code plugin. |
| `vendor/rufield` (submodule) | **RuField MFS** — the open spec for camera-free multimodal field sensing (ADR-260). A common `FieldEvent`/`FieldTensor`/`FusionGraph`/`PrivacyClass`/`ProvenanceReceipt` model *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and quantum sensors. Lives in its own repo ([github.com/ruvnet/rufield](https://github.com/ruvnet/rufield)), vendored here under `vendor/rufield`. Not a `v2/` workspace member. v0.1 reference stack = 7 crates (`rufield-core`/`-provenance`/`-privacy`/`-adapters`/`-fusion`/`-bench`/`-viewer`), 72 tests/0 failed; `rufield-viewer` is an Axum + vanilla-JS read-only dashboard (`cargo run -p rufield-viewer`) completing ADR-260 §27.9. The WiFi-CSI modality is now **real-replay-backed** via `CsiReplayAdapter` (ingests real captured `.csi.jsonl` → fused presence/breathing inferences; replay-from-file, unlabeled CSI-variance proxy, not validated accuracy); mmWave/thermal + all synthetic-bench F1 numbers remain **SYNTHETIC** (no live hardware — live streaming + labeled accuracy are roadmap). |
| `wifi-densepose-rufield` | ADR-262 P1 **anti-corruption bridge** — converts RuView WiFi-CSI sensing output (`SensingSnapshot` mirroring `SensingUpdate` + `TrustedOutput`, owned primitives, no dep on `wifi-densepose-sensing-server`) into **signed RuField `FieldEvent`s** (`Modality::WifiCsi`, real `timestamp_ns`, sha256 + ed25519 provenance, `synthetic=false`). The single coupling point between RuView and the standalone RuField MFS spec (§5.4); path-deps the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion`). **Critical §3.3 privacy mapping** (`map_privacy`): maps RuView class → RuField P0P5 by **information content, never byte value**, fail-closed (`Derived → P4/P5`, never P1; `demoted` floors to ≥ P2). 15 tests / 0 failed (round-trip / `is_fusable` / fusion-ingest / privacy-safety / determinism). P1 plumbing — not wired into the live server (P3), no accuracy claim. |
| `ruview-swarm` | Drone swarm control system (ADR-148) — hierarchical-mesh topology, Raft consensus, MARL, CSI sensing payload, MAVLink/PX4 compat, Ruflo AI-agent integration |
### RuvSense Modules (`signal/src/ruvsense/`)
@@ -75,8 +72,6 @@ All 5 ruvector crates integrated in workspace:
- ADR-031: RuView sensing-first RF mode (Proposed)
- ADR-032: Multistatic mesh security hardening (Proposed)
- ADR-148: Drone swarm control system / `ruview-swarm` (In Progress)
- ADR-152: WiFi-Pose SOTA 2026 intake — geometry conditioning, WiFlow-STD benchmark (measurement (a) complete: claims MEASURED-EQUIVALENT at ~96% PCK@20), MAE recipe (Proposed; §2.12.3, 2.6 implemented)
- ADR-153: IEEE 802.11bf-2025 forward-compatibility protocol model (Accepted — amends ADR-152 §2.4)
### Supported Hardware
-78
View File
@@ -1,78 +0,0 @@
# PROOF — reproduce every claim, or find the one we can't yet
This project (RuView / wifi-densepose) has been publicly called "AI slop" and
"fake." This document is the answer: **a skeptic can clone the repo, run one
script, and have every headline claim either verified on their own machine or
shown — explicitly — as "CLAIMED, not yet reproduced (here's exactly what it
needs)."** Nothing below is asserted without a command you can run.
```bash
git clone https://github.com/ruvnet/RuView && cd RuView
bash scripts/prove.sh # core gate + the anti-slop assertion tests
bash scripts/prove.sh --full # also attempt the feature-gated subset
```
`prove.sh` exits 0 only if every **non-gated** claim passes. Gated claims never
fail the run; they print the prerequisite (a GPU, a dataset, real hardware, a
trained checkpoint) so you can reproduce them yourself.
## Grading
- **MEASURED** — reproduced on our hardware, with the exact command recorded, and
pinned by a test that *fails on the pre-fix code*. `prove.sh` re-runs these.
- **CLAIMED** — cited from a source, or measured by the source, but not
reproduced in this repo's automated harness.
- **DATA-GATED / HARDWARE-GATED** — the *code path* is real and tested, but the
*accuracy/throughput claim* needs data or hardware we don't ship. We never
fabricate the number; the code carries a typed error or a `weights_trained`/
provenance flag instead.
## The hard gate (run on any machine with Rust + Python)
| Claim | Grade | Reproduce |
|---|---|---|
| Rust workspace: 3,128 tests, 0 failed | **MEASURED** | `cd v2 && cargo test --workspace --no-default-features` |
| Deterministic CSI pipeline proof (bit-exact SHA-256) | **MEASURED** | `python archive/v1/data/proof/verify.py``VERDICT: PASS` |
## Anti-slop assertion tests (each fails on the pre-fix code)
| Claim | Grade | Test (run via `cargo test -p <crate> <name>`) |
|---|---|---|
| Fusion crafted-input DoS panics are closed (ADR-156 §2.2) | **MEASURED** | `wifi-densepose-ruvector :: triangulation_out_of_range_index_returns_none_no_panic` |
| **The "Soul Signature" identity claim, honestly bounded:** on WiFi-only cardiac+respiratory channels two people are **not separable** (gap ≈ 0.0005) | **MEASURED** | `wifi-densepose-bfld :: cardiac_alone_cannot_separate_identity_matches_audit` |
| OccWorld `predict()` is real (input-dependent), not random noise | **MEASURED** | `wifi-densepose-occworld-candle :: predict_is_deterministic_for_same_input` |
| Pose runtime emits frames under its own default config (ADR-159 A1) | **MEASURED** | `cog-pose-estimation :: default_config_emits_frames_with_real_model` |
| Person-count flags untrained classes — no count inflation (ADR-159 A2) | **MEASURED** | `cog-person-count :: untrained_class_argmax_is_flagged_low_confidence` |
| Medical edge skills carry a "not a medical device" disclaimer (ADR-160 A1) | **MEASURED** | `wifi-densepose-wasm-edge :: a1_med_modules_have_clinical_disclaimer` (`--features std`) |
| Survivor dedup 3→1, count-inflation killed (ADR-158 §2) | **MEASURED** | `wifi-densepose-mat :: test_identical_vitals_no_location_dedup_to_one` (`--features mat`) |
## Measured performance (criterion; reproduce on your machine)
| Claim | Grade | Reproduce |
|---|---|---|
| PSD FFT-planner cache 2.03.1×, DTW band 2.44.1× (ADR-154) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-signal` |
| fuse() double-clone removed ~2.17× marshalling (ADR-156) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-ruvector --bench fusion_bench` |
| zero-copy ORT input ~1.48× (ADR-155) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-nn --features onnx --bench onnx_bench` |
| pointcloud splats 9→2 passes ~1.24× (ADR-160 research) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-pointcloud --bench splats_bench` |
| native wlanapi multi-BSSID scan 9.74 Hz (vs netsh ~2 Hz) | **MEASURED (Windows)** | `cd v2 && cargo test -p wifi-densepose-wifiscan -- --ignored measure_native_scan_rate` |
| wasm-edge `process_frame` hot-path latency (host proxy, ADR-163) | **MEASURED-on-host** (NOT the ESP32/WASM3 budget — needs hardware) | `cd v2/crates/wifi-densepose-wasm-edge && cargo bench --features std` |
| cog steady-state CPU infer latency ~305 µs (ADR-163; NOT the manifest cold-start) | **MEASURED-on-host** | `cd v2 && cargo bench -p cog-person-count -p cog-pose-estimation --no-default-features --bench infer_bench` |
## What we do NOT claim (the honest negatives — the strongest anti-slop signal)
| Capability | Status |
|---|---|
| **Named person-identity from WiFi** | **NOT achieved, and measured why.** The §3.6 matcher is real, but identity does not lock on WiFi-only channels (gap 0.0005). DATA-GATED on a real enrollment feeding the AETHER/body-resonance channel — never done. No named-identity claim is made. |
| WiFlow-STD ~96% PCK@20 | **CLAIMED-reproduced** on our RTX 5080 (`benchmarks/wiflow-std/RESULTS.md`); HARDWARE-GATED for you (needs an NVIDIA GPU + the MM-Fi dataset). The upstream *shipped checkpoint* was **REFUTED** (0.08% PCK) — we publish that. |
| OccWorld trajectory accuracy | DATA-GATED on a trained checkpoint; `predict()` carries `weights_trained=false` until one is loaded — never silently faked. |
| Edge-skill detection accuracy (seizure, weapon, affect, …) | UNVALIDATED — every such module is now disclaimer-gated as experimental/research; the DSP is real, the accuracy is not claimed. |
| 802.11bf-2025 OTA conformance | No commodity silicon ships a conformant interface as of 2026; ours is a simulation-tested forward-compat protocol model, not a certified implementation. |
## Provenance
Every claim above traces to a committed ADR (`docs/adr/ADR-154``ADR-163`), a
test, a criterion bench, `benchmarks/wiflow-std/RESULTS.md`, or
`benchmarks/edge-latency/RESULTS.md`. The history
includes published **retractions** (the 92.9% PCK retraction; the WiFlow-STD
shipped-checkpoint refutation; the NV-diamond BOM reality check) — a faker hides
failures; we commit them.
+8 -28
View File
@@ -36,7 +36,7 @@ Built on [RuVector](https://github.com/ruvnet/ruvector/) and [Cognitum Seed](htt
The system learns each environment locally using spiking neural networks that adapt in under 30 seconds, with multi-frequency mesh scanning across 6 WiFi channels that uses your neighbors' routers as free radar illuminators. Every measurement is cryptographically attested via an Ed25519 witness chain.
RuView turns ordinary WiFi into a contactless sensor. A $9 ESP32 board reads the radio reflections off the people in a room, and a small pretrained model — published on Hugging Face at [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained) — tells you who's there, how they're breathing, and how their heart rate is trending. The model fits in 8 KB (4-bit quantized) and runs in microseconds on a Raspberry Pi. (The [v2 encoder](https://huggingface.co/ruvnet/wifi-densepose-pretrained) reports an honest, label-free held-out **temporal-triplet accuracy of 82.3%** — up from 66.4% raw; the older "100% presence" figure was measured on a single-class recording and has been retracted in favor of this.) No cameras, no wearables, no app on the user's phone.
RuView turns ordinary WiFi into a contactless sensor. A $9 ESP32 board reads the radio reflections off the people in a room, and a small pretrained model — published on Hugging Face at [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained) — tells you who's there, how they're breathing, and how their heart rate is trending. The model fits in 8 KB (4-bit quantized), runs in microseconds on a Raspberry Pi, and reports 100% presence accuracy on the validation set. No cameras, no wearables, no app on the user's phone.
### Built for low-power edge applications
@@ -56,9 +56,9 @@ RuView turns ordinary WiFi into a contactless sensor. A $9 ESP32 board reads the
> |------|-----|---------------|
> | 🫁 **Breathing rate** | Bandpass 0.10.5 Hz on wrapped phase, circular variance, zero-crossing BPM ([#593](https://github.com/ruvnet/RuView/issues/593)) | 630 BPM, real-time |
> | 💓 **Heart rate** | Bandpass 0.82.0 Hz, zero-crossing BPM | 40120 BPM, real-time |
> | 👤 **Presence detection** | Trained head on Hugging Face ([`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained); v2 encoder = 82.3% held-out temporal-triplet acc, honestly re-benchmarked) + a phase-variance fallback that needs no model | < 1 ms, ~30 s ambient calibration |
> | 👤 **Presence detection** | Trained head on Hugging Face ([`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained), 100% validation accuracy) + a phase-variance fallback that needs no model | < 1 ms, ~30 s ambient calibration |
> | 🧬 **CSI embeddings** | 128-dim contrastive encoder shipped on Hugging Face, 4-bit quantised variant fits in 8 KB | **164,183 emb/s** on M4 Pro |
> | 🦴 **17-keypoint pose estimation** | `cog-pose-estimation` Cog v0.0.1 — signed aarch64 + x86_64 binaries on GCS, loads `pose_v1.safetensors` via Candle. Train your own from paired data in 2.1 s on an RTX 5080 ([ADR-101](docs/adr/ADR-101-pose-estimation-cog.md), [benchmarks](docs/benchmarks/pose-estimation-cog.md)). **SOTA on MM-Fi:** [`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose) hits **82.69% torso-PCK@20** (ensemble 83.59%), beating MultiFormer (72.25%) and CSI2Pose (68.41%) on the matched MM-Fi `random_split` protocol — self-corrected and auditable on [AetherArena](https://huggingface.co/spaces/ruvnet/aether-arena) | 8.4 ms cold-start on a Pi 5 |
> | 🦴 **17-keypoint pose estimation** | `cog-pose-estimation` Cog v0.0.1 — signed aarch64 + x86_64 binaries on GCS, loads `pose_v1.safetensors` via Candle. Train your own from paired data in 2.1 s on an RTX 5080 ([ADR-101](docs/adr/ADR-101-pose-estimation-cog.md), [benchmarks](docs/benchmarks/pose-estimation-cog.md)) | 8.4 ms cold-start on a Pi 5 |
> | 🚶 **Motion / activity** | Motion-band power + phase acceleration | Real-time |
> | 🤸 **Fall detection** | Phase-acceleration threshold + 3-frame debounce + 5 s cooldown ([#263](https://github.com/ruvnet/RuView/issues/263)) | < 200 ms |
> | 🧮 **Multi-person count** | Adaptive P95 normalisation + runtime-tunable dedup factor (`/api/v1/config/dedup-factor`, [#491](https://github.com/ruvnet/RuView/pull/491)). Six specialised learned counters available as Cogs: `occupancy-zones`, `elevator-count`, `queue-length`, `customer-flow`, `clean-room`, `person-matching` | Real-time, self-calibrating |
@@ -162,7 +162,7 @@ pip install "ruview[client]" # or: pip install "wifi-densepose[clie
## 🤗 Pretrained model on Hugging Face
Pretrained CSI weights live at [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained) — 12.2M training steps on 60K frames / 610K contrastive triplets, **82.3% held-out temporal-triplet accuracy** (up from 66.4% raw; the older "100% presence" figure was measured on a single-class recording and has been retracted), 4-bit quantized variant fits in 8 KB. The release includes a contrastive **CSI encoder** producing 128-dim embeddings (164,183 emb/s on M4 Pro) and a **presence-detection head**. Per-node LoRA adapters are included for environment-specific fine-tuning.
Pretrained CSI weights live at [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained) — 12.2M training steps on 60K frames / 610K contrastive triplets, **100% presence accuracy** on the validation set, 4-bit quantized variant fits in 8 KB. The release includes a contrastive **CSI encoder** producing 128-dim embeddings (164,183 emb/s on M4 Pro) and a **presence-detection head**. Per-node LoRA adapters are included for environment-specific fine-tuning.
```bash
# Download the model bundle
@@ -182,27 +182,7 @@ huggingface-cli download ruvnet/wifi-densepose-pretrained --local-dir models/wif
**Quantization choices** (all in the HF repo): `model-q2.bin` (4 KB) · `model-q4.bin` ⭐ recommended (8 KB) · `model-q8.bin` (16 KB) · `model.safetensors` full (48 KB)
The separate **17-keypoint pose-estimation model** is now published at [`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose) — **82.69% torso-PCK@20** on MM-Fi (single model) / **83.59%** (3-model ensemble + TTA), beating the prior published SOTA MultiFormer (72.25%) and CSI2Pose (68.41%) on the matched `random_split` protocol. See **Results & proof** below.
### Results & proof
| What | Where | Numbers |
|------|-------|---------|
| **MM-Fi pose model (SOTA)** | [`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose) | 82.69% torso-PCK@20 (single) · 83.59% (ensemble+TTA) · 75K-param micro variant 74.30% |
| **AetherArena benchmark Space** | [`ruvnet/aether-arena`](https://huggingface.co/spaces/ruvnet/aether-arena) | self-correcting, auditable MM-Fi leaderboard |
| **Full MM-Fi study (honest picture)** | [`docs/benchmarks/mmfi-wifi-sensing-study.md`](docs/benchmarks/mmfi-wifi-sensing-study.md) | pose + action; zero-shot cross-subject ~64%, +~30 s in-room calibration → 72.2% |
| **Efficiency frontier** | [`docs/benchmarks/wifi-pose-efficiency-frontier.md`](docs/benchmarks/wifi-pose-efficiency-frontier.md) | SOTA-beating WiFi pose in a 20 KB int4 edge model |
| **Pretrained encoder** | [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained) | 82.3% held-out temporal-triplet, 8 KB int4 |
| **Reproducible proof (Trust Kill Switch)** | [`archive/v1/data/proof/verify.py`](archive/v1/data/proof/verify.py) + [`expected_features.sha256`](archive/v1/data/proof/expected_features.sha256) | one-command deterministic pipeline replay (SHA-256 of output vs published hash) |
| **Benchmark-proof ADR** | [ADR-168](docs/adr/ADR-168-benchmark-proof.md) | how the numbers are produced and verified |
| **Witness attestation** | [`docs/WITNESS-LOG-028.md`](docs/WITNESS-LOG-028.md) | 33-row capability attestation matrix with per-claim evidence |
```bash
# Reproduce the deterministic pipeline proof yourself (must print VERDICT: PASS):
python archive/v1/data/proof/verify.py
```
Tracked in [#509](https://github.com/ruvnet/RuView/issues/509); see [ADR-079](docs/adr/ADR-079-camera-supervised-pose-finetune.md) phases P7P9 for the camera-supervised fine-tune path.
The separate **17-keypoint pose-estimation model** is not in this release — pipeline is implemented but keypoint weights are still pending. Tracked in [#509](https://github.com/ruvnet/RuView/issues/509); see [ADR-079](docs/adr/ADR-079-camera-supervised-pose-finetune.md) phases P7P9.
## 🧩 Edge Module Catalog
@@ -501,7 +481,7 @@ Every WiFi signal that passes through a room creates a unique fingerprint of tha
**What it does in plain terms:**
- Turns any WiFi signal into a 128-number "fingerprint" that uniquely describes what's happening in a room
- Learns entirely on its own from raw WiFi data — no cameras, no labeling, no human supervision needed
- Recognizes rooms, detects intruders, and classifies activities using only WiFi (named person-identity is an experimental, data-gated research capability — see below, not a shipped feature)
- Recognizes rooms, detects intruders, identifies people, and classifies activities using only WiFi
- Runs on an $8 ESP32 chip (the entire model fits in 55 KB of memory)
- Produces both body pose tracking AND environment fingerprints in a single computation
@@ -512,7 +492,7 @@ Every WiFi signal that passes through a room creates a unique fingerprint of tha
| **Self-supervised learning** | The model watches WiFi signals and teaches itself what "similar" and "different" look like, without any human-labeled data | Deploy anywhere — just plug in a WiFi sensor and wait 10 minutes |
| **Room identification** | Each room produces a distinct WiFi fingerprint pattern | Know which room someone is in without GPS or beacons |
| **Anomaly detection** | An unexpected person or event creates a fingerprint that doesn't match anything seen before | Automatic intrusion and fall detection as a free byproduct |
| **Person re-identification** *(experimental, research)* | A real per-channel similarity matcher (Soul Signature §3.6, `wifi-densepose-bfld`); **measured** result: on WiFi-only cardiac+respiratory channels alone two people are *not* separable (gap ~0.0005) | Honest research capability — **named identity is not claimed** and is data-gated on enrollment with the decisive AETHER/body-resonance channel. See [#1021](https://github.com/ruvnet/RuView/issues/1021) |
| **Person re-identification** | Each person disturbs WiFi in a slightly different way, creating a personal signature | Track individuals across sessions without cameras |
| **Environment adaptation** | MicroLoRA adapters (1,792 parameters per room) fine-tune the model for each new space | Adapts to a new room with minimal data — 93% less than retraining from scratch |
| **Memory preservation** | EWC++ regularization remembers what was learned during pretraining | Switching to a new task doesn't erase prior knowledge |
| **Hard-negative mining** | Training focuses on the most confusing examples to learn faster | Better accuracy with the same amount of training data |
@@ -610,7 +590,7 @@ Verify the plugin structure: `bash plugins/ruview/scripts/smoke.sh`. Full detail
| [User Guide](docs/user-guide.md) | Step-by-step guide: installation, first run, API usage, hardware setup, training |
| [Build Guide](docs/build-guide.md) | Building from source (Rust and Python) |
| [**Home Assistant + Matter Integration**](docs/integrations/home-assistant.md) | **Works with Home Assistant** via MQTT auto-discovery + **Works with Matter** (Apple Home / Google Home / Alexa / SmartThings) — full entity catalog, 3 starter blueprints, Lovelace dashboards, privacy mode, threshold tuning ([ADR-115](docs/adr/ADR-115-home-assistant-integration.md)). |
| [**BFLD — Beamforming Feedback Layer for Detection**](v2/crates/wifi-densepose-bfld/README.md) | New privacy-gated WiFi sensing layer that measures + structurally prevents identity leakage from 802.11ac/ax Beamforming Feedback Information. Three type-enforced invariants (raw BFI never exits node, identity embedding is in-RAM-only, cross-site correlation cryptographically impossible via per-site BLAKE3 keyed hash + daily rotation). Ships full operator surface (`BfldPipeline`, `BfldPipelineHandle`, the Soul Signature §3.6 per-channel matcher `EnrolledMatcher`/`SoulMatchOracle` — experimental; named identity is data-gated, **measured** as not-separable on WiFi-only channels alone), MQTT topic router + HA-DISCO + availability + LWT, 3 operator HA blueprints, two runnable examples, eclipse-mosquitto:2 CI service container. 327+ tests. [ADR-118](docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md) umbrella + sub-ADRs [119](docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md)/[120](docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md)/[121](docs/adr/ADR-121-bfld-identity-risk-scoring.md)/[122](docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md)/[123](docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md). Research dossier: [`docs/research/BFLD/`](docs/research/BFLD/) (11 files, 13,544 words). |
| [**BFLD — Beamforming Feedback Layer for Detection**](v2/crates/wifi-densepose-bfld/README.md) | New privacy-gated WiFi sensing layer that measures + structurally prevents identity leakage from 802.11ac/ax Beamforming Feedback Information. Three type-enforced invariants (raw BFI never exits node, identity embedding is in-RAM-only, cross-site correlation cryptographically impossible via per-site BLAKE3 keyed hash + daily rotation). Ships full operator surface (`BfldPipeline`, `BfldPipelineHandle`, Soul Signature `SoulMatchOracle` integration), MQTT topic router + HA-DISCO + availability + LWT, 3 operator HA blueprints, two runnable examples, eclipse-mosquitto:2 CI service container. 327+ tests. [ADR-118](docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md) umbrella + sub-ADRs [119](docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md)/[120](docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md)/[121](docs/adr/ADR-121-bfld-identity-risk-scoring.md)/[122](docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md)/[123](docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md). Research dossier: [`docs/research/BFLD/`](docs/research/BFLD/) (11 files, 13,544 words). |
| [**SENSE-BRIDGE — rvagent MCP server**](tools/ruview-mcp/README.md) | Dual-transport MCP server (`@ruvnet/rvagent`) bridging the RuView sensing stack to AI agents (Claude Code, Cursor, ruflo swarms). 6 tools wired: `ruview.presence.now`, `ruview.vitals.get_{breathing,heart_rate,all}`, `ruview.bfld.last_scan`, `ruview.bfld.subscribe`. stdio + Streamable HTTP (`POST /mcp`, Origin-validated, bearer-token auth, `127.0.0.1` bind). Full 20-tool Zod schema barrel + 5 RUVIEW-POLICY governance tools. 93 tests. [ADR-124](docs/adr/ADR-124-rvagent-mcp-ruvector-npm-integration.md). Try: `npx @ruvnet/rvagent stdio`. |
| [Semantic Primitives — Precision/Recall](docs/integrations/semantic-primitives-metrics.md) | Per-primitive F1 on the held-out paired-capture set: someone-sleeping, possible-distress, room-active, elderly-inactivity-anomaly, meeting, bathroom, fall-risk, bed-exit, no-movement, multi-room. |
| [Claude Code / Codex Plugin](plugins/ruview/README.md) | The `ruview` plugin + marketplace — skills, `/ruview-*` commands, agents, and the Codex prompt mirror |
-50
View File
@@ -1,50 +0,0 @@
# AetherArena ("AA") — The Official Spatial-Intelligence Benchmark
> **Public leaderboard. Private evaluation split. Open scorer. Signed results.**
AetherArena is a **standalone, project-agnostic benchmark** for camera-free **spatial intelligence** — pose, presence, occupancy, tracking, and vitals from RF/WiFi (and, over time, mmWave / UWB / radar / lidar / multimodal). It is **not** a single-vendor leaderboard: any team, framework, or sensing modality can enter, and every entrant — including the RuView baseline that donated the seed scorer — is scored by the identical, open, pinned harness.
Specified in [ADR-149](../docs/adr/ADR-149-public-community-leaderboard-huggingface.md) (Accepted).
Canonical home: **`ruvnet/aether-arena`** + a Hugging Face Space (deploy pending — see `STATUS`).
---
## Why
WiFi/RF spatial sensing has no shared yardstick — papers self-report against inconsistent splits and metrics, with **no accounting for latency, reproducibility, or privacy leakage**. AA fixes the *measurement*, not just the models: a single deterministic scorer, a private held-out split nobody can train on, and a signed result ledger that can't be silently edited.
## What gets measured (v0)
| Category | Metric | Status |
|----------|--------|--------|
| **Pose** | PCK@0.2 (all / torso), OKS | Ranked |
| **Presence** | accuracy, FP/FN | Ranked |
| **Edge latency** | p50 / p95 / p99 ms | Ranked |
| **Determinism** | proof-hash pass/fail | Ranked (gate) |
| Tracking (MOTA) | — | activates when multi-person clips land |
| Vitals (BPM err) | — | activates when paired vitals ground truth lands |
| **Privacy leakage** | membership-inference ∈ [0,1] | **gated — not ranked** until the attacker ships |
| Cross-room | degradation ratio | coming soon |
The headline rank is the **category metric**; an optional `arena_score = quality × latency_factor × privacy_factor × determinism_gate` is exposed alongside (never instead) so accuracy can't win at any cost. See ADR-149 §2.5.
## How scoring works
The scorer is RuView's **already-published** `wifi-densepose-train` acceptance harness (`ruview_metrics` + ADR-145 `ablation`), run in a pinned sandbox. **You submit a model, not predictions** — predictions on data you hold prove nothing. Your model is scored against a **private** MM-Fi held-out split (CC BY-NC 4.0; Wi-Pose excluded for redistribution reasons), and one **signed, append-only** row is written to the results ledger with a determinism proof hash.
Submission lifecycle: `submitted → validated → quarantined → smoke_scored → full_scored → published` (or `rejected` with a reason). The model only ever runs inside a no-network, read-only-FS sandbox.
## Submit (when the Space is live)
1. Write a manifest: [`schema/aa-submission.toml`](schema/aa-submission.toml).
2. Push your model artifact (`.safetensors` / `.rvf` / LoRA adapter) + manifest to the Space.
3. Watch it move through the lifecycle; your signed row appears on the board.
## Verify it's fair (you don't have to trust us)
See [`VERIFY.md`](VERIFY.md) — run the **open scorer** locally on the **public smoke split**, reproduce the determinism hash, and confirm RuView's own entries were scored by the identical path. That five-step check is the launch gate (ADR-149 §7).
## Neutrality
AA is a neutral commons. The scorer is open and versioned; any metric change is a public `harness_version` bump that **re-scores all entries**. RuView donated the seed harness and enters as one baseline — it gets no special treatment (ADR-149 §2.8).
-30
View File
@@ -1,30 +0,0 @@
# AetherArena — Build Status
Tracks ADR-149 implementation milestones. "Complete" = benchmark **infrastructure** done,
tested, CI-gated, deploy-ready, RuView baseline entered, §7 acceptance test passing.
Model **SOTA** (e.g. MM-Fi PCK@20 ~72%) is a separate long-running ML effort, blocked on
ADR-079 camera-ground-truth collection — *not* an infra-completion blocker.
| # | Milestone | Status |
|---|-----------|--------|
| M1 | ADR-149 Accepted + committed | ✅ done |
| M2 | Scorer runner (`aa_score_runner`) — **real model scoring** + witness (proof+inputs hash) + **repeatability analysis** | ✅ done — builds `--no-default-features`, determinism gate PASS, repeatable 16/16 |
| M3 | CI harness-gate workflow (PR runs scorer + repeatability + real-scoring smoke + ledger verify) | ✅ done — `.github/workflows/aether-arena-harness.yml` |
| M4 | Scaffold: README + submission schema + VERIFY (acceptance test) | ✅ done |
| M5 | Public smoke split (committed) + private MM-Fi held-out split prep | 🟡 smoke split done (`fixtures/smoke_*.json`); private MM-Fi prep pending |
| M6 | HF Space (Gradio) — leaderboard + ledger integrity + submit/verify/about | ✅ deployed → https://huggingface.co/spaces/ruvnet/aether-arena (sandboxed scorer container = later hardening) |
| M7 | **Witness ledger chain** — append-only, hash-chained, tamper-evident | ✅ done — `ledger/ledger_tools.py` (seed/append/verify); tamper test fails as designed |
| M8 | Public launch | ✅ Space **LIVE** (gradio 5.9.1, serving 200) — **board empty, awaiting first real harness score** (benchmark-first: no seeded numbers) |
## v0 infrastructure: COMPLETE
Implement ✅ · Test ✅ · Deploy to HF ✅ (https://huggingface.co/spaces/ruvnet/aether-arena) · Instructions+Verification ✅ · PR runs the harness ✅ (PR #874, AA harness gate **passed**).
Remaining = data + hardening, not infra: private MM-Fi held-out split (M5), sandboxed scorer container (M6), privacy-leakage attacker (gated category), and **model SOTA** (separate ML effort, blocked on ADR-079 — explicitly not an infra exit).
## Benchmark-first posture (per user direction)
- **No placeholder numbers on the board.** The ledger seeds to genesis only; every result is a real scoring-pipeline witness. RuView gets no seeded baseline.
- **Witness chain** = `inputs_sha256` (binds witness to exact inputs) + `proof_sha256` (cross-platform-stable score hash) + the append-only hash-chained ledger. Repeatability analysis (`--repeat N`) proves the proof hash is identical across runs.
## Blockers / decisions needed
- **HF deploy (M6)** — token is in GCP Secret Manager (`HUGGINGFACE_API_KEY`); creating the public `ruvnet/aether-arena` Space still wants explicit go.
- **MM-Fi is CC BY-NC** → AA must stay non-commercial / legally distinct from the commercial RuView product.
- **Private MM-Fi split (M5)** — needs the dataset pulled + a held-out split assembled before real public scoring replaces the smoke fixture.
-78
View File
@@ -1,78 +0,0 @@
# Verifying AetherArena (you don't have to trust us)
AA's credibility rests on a stranger being able to reproduce a score and see that the rules are fair. This is the **launch gate** (ADR-149 §7): v0 does not ship until all five checks below pass for someone with no insider access.
> **Wider context:** this page covers the *leaderboard scorer*. For the whole-platform answer to
> "is this real / does it actually work?" — including the deterministic pipeline proof, the
> published models + public-benchmark numbers, and the built-in-public development trail — see
> [`docs/proof-of-capabilities.md`](../docs/proof-of-capabilities.md).
## The open scorer
The scoring engine is a pure-Rust, GPU-free binary: `aa_score_runner` in `wifi-densepose-train`. It runs the real `ruview_metrics` pose-acceptance harness on a fixed fixture and emits a cross-platform-stable SHA-256 **determinism proof**.
### Reproduce the determinism hash locally
```bash
cd v2
# Verify the committed expected hash still matches (this is the CI gate):
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features
# → prints the witness (inputs_sha256 + proof_sha256) and "VERDICT: PASS"
# See the witness row as JSON:
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --json
```
### Witness chain — proof + repeatability analysis
Every score is a **witness**: `inputs_sha256` (binds it to the exact inputs scored)
+ `proof_sha256` (cross-platform-stable hash of the quantised score) + `harness_version`.
Witnesses are recorded in an **append-only, hash-chained ledger** (each row references
the previous row's hash), so a silent edit to any past row breaks the chain.
```bash
# Repeatability: run the scorer K times, confirm ONE identical proof hash:
cd v2
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --repeat 16
# → {"repeatability":{"runs":16,"unique_proof_hashes":1,"repeatable":true,...}}
# Real model scoring (score predictions against an eval split):
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- \
--split ../aether-arena/fixtures/smoke_split.json \
--pred ../aether-arena/fixtures/smoke_pred.json --json
# Verify the witness ledger chain is intact (tamper-evident):
cd ../aether-arena/ledger && python3 ledger_tools.py verify
# → "OK: N rows, chain intact" (edit any row and it reports the broken link)
```
The expected hash is committed at [`fixtures/expected_score.sha256`](fixtures/expected_score.sha256). Same harness version + same fixture → same hash on glibc / MSVC / Apple. If your local run prints `VERDICT: PASS`, you have reproduced the scorer.
### What happens if the scoring maths changes
Any edit to `ruview_metrics.rs`, `ablation.rs`, or `aa_score_runner.rs` moves the hash and **fails the CI gate** (`.github/workflows/aether-arena-harness.yml`) until the maintainer regenerates and reviews:
```bash
cargo run -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --generate-hash \
> aether-arena/fixtures/expected_score.sha256
```
So a scorer change is always a reviewed, public diff — never silent. That's `harness_version` pinning + `determinism_gate` in action (ADR-149 §2.4–§2.5).
## The five-step acceptance test (v0 launch gate)
A stranger must be able to:
1. **Submit** a model (artifact + `schema/aa-submission.toml`) with no insider help.
2. **Get a deterministic score** — same model + same `harness_version` → same numbers.
3. **See the signed row** appended to the public results ledger.
4. **Rerun the scorer locally** on the public smoke split and reproduce the logic (the command above).
5. **Understand why the rank is fair** — private split, open scorer, pinned version, proof hash — from these docs alone.
If any step fails, v0 is not ready.
## Current status
- ✅ Step 4 (rerun the open scorer locally, reproduce the hash) — **works today** via `aa_score_runner`.
- ✅ CI harness gate runs the scorer on every PR.
- ⏳ Steps 13, 5 (HF Space submission flow + signed ledger) — in progress; require the HF Space deploy (needs an HF token / maintainer authorization).
-87
View File
@@ -1,87 +0,0 @@
# RuView Calibration Service (reference implementation)
Turn a **shared WiFi-CSI pose base model** into a room-specific one with a **30-second labeled
calibration** and a **~11 KB per-room LoRA adapter**. This is the deployable resolution of the
cross-subject / cross-environment generalization problem (full study: [ADR-150 §3.33.6](../../docs/adr/ADR-150-rf-foundation-encoder.md)).
## Why
Zero-shot WiFi pose generalizes poorly to a **new room or new person** — an unseen room can drop a
strong model to near-random. But that gap is **not** algorithmically closeable (CORAL, DANN,
instance-norm, contrastive foundation-pretraining all failed) and **not** closeable by collecting
more subjects (saturates ~64%). It **is** closeable, cheaply, at deployment time: a handful of
labeled frames from the actual room pin down its multipath instantly.
| Deployment case | Zero-shot | + in-room calibration |
|-----------------|----------:|----------------------:|
| Same room, new person (cross-subject) | 64% | **76%** (200 samples) |
| **New room + new person (cross-environment)** | **~10%** | **60% @ 5 samples → 73% @ 200** |
**Verified demo (this code, source-only base on an unseen MM-Fi room E04):**
`zero-shot 3.09% → after 200-sample calibration 74.29%` (+71 pts).
## How it works
A frozen shared **base** (transformer + temporal attention pool + skeleton-graph head, the published
[`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose)) plus a
tiny **LoRA adapter** (rank 8 on the input projection + pose head — **11,200 params ≈ 11 KB int8 /
22 KB fp16**) fitted per room. Thousands of room-adapters hang off one base.
## Usage
```bash
# 1) Capture a short labeled clip in the deployment room -> calib.npz {X:[N,3,114,10], Y:[N,17,2]}
# (~100200 samples recommended; below ~20 the adapter can underperform zero-shot)
# 2) Fit the per-room adapter (~11 KB):
python calibrate.py --base pose_mmfi_best.pt --data calib.npz --out room.adapter.npz
# 3) Run calibrated inference (base + room adapter):
python infer.py --base pose_mmfi_best.pt --adapter room.adapter.npz --data frames.npz --out kp.npy
# omit --adapter to run the uncalibrated (zero-shot) base
```
`X` is CSI amplitude `[N, 3 antennas, 114 subcarriers, 10 frames]` (per-sample standardization is
applied internally). `Y` is `[N,17,2]` COCO keypoints in `[0,1]`.
## Calibration budget (measured, rank-8 LoRA, 3 seeds — ADR-150 §3.5)
| Labeled samples/room | cross-subject | cross-environment |
|---------------------:|--------------:|------------------:|
| 0 (zero-shot) | 64% | ~10% |
| 5 | — | 60% |
| 20 | 66% | 66% |
| 50 | 70% | 70% |
| 200 | 72% | 73% |
Knee at ~50 samples (~70%); **below ~20 samples the adapter can hurt** (too few to fit reliably).
## Two models, two producers (not interchangeable)
Adapters are **model-specific**. There are two calibration producers here:
| Producer | Target model | Input | Adapter format | Consumer |
|----------|--------------|-------|----------------|----------|
| `calibrate.py` | MM-Fi **transformer** (`pose_mmfi_best.pt`, 3×114×10) | `[N,3,114,10]` | `.npz` (`proj`/`head` LoRA) | this Python `infer.py` |
| `cog_calibrate.py` | cog **conv+MLP** (`pose_v1.safetensors`, 56×20) | `[N,56,20]` | `.safetensors` (`fc1.a`/`fc1.b`/`fc2.a`/`fc2.b`) | Rust `cog-pose-estimation run --adapter` |
```bash
# Produce a cog-format per-room adapter for the deployed Rust pose engine:
python cog_calibrate.py --base pose_v1.safetensors --data calib.npz --out room.safetensors
# then in the cog runtime:
cog-pose-estimation run --config <cfg> --adapter room.safetensors
```
Same LoRA *mechanism* (ADR-150 §3.5), different architecture and key layout — an adapter from one
producer will not load into the other model.
## Notes
- **Calibration only helps when the base hasn't already seen the room.** The published flagship was
trained on MM-Fi `random_split`, so calibrating it on an MM-Fi subject is a near-no-op (it already
saw them); for a genuinely new real-world room it is zero-shot and calibration applies. To
*reproduce the demo* on a held-out MM-Fi room, train a source-only base (exclude the target
environment) — see `ADR-150 §3.6` and the few-shot harness in `aether-arena/staging/`.
- Adapter is saved fp16 (~22 KB); quantize to int8 for the ~11 KB on-device form.
- Inference is real-time on CPU (the 75 K-param `micro` variant runs in 0.135 ms single-thread x86;
see [`docs/benchmarks/wifi-pose-efficiency-frontier.md`](../../docs/benchmarks/wifi-pose-efficiency-frontier.md)).
-71
View File
@@ -1,71 +0,0 @@
"""RuView per-room calibration — fit a ~11 KB LoRA adapter from a short labeled in-room capture.
python calibrate.py --base pose_mmfi_best.pt --data room_calib.npz --out room_A.adapter.npz
`room_calib.npz` must contain `X` [N,3,114,10] CSI amplitude and `Y` [N,17,2] (or [N,34]) keypoints
in [0,1] — the labeled calibration samples from the deployment room (~100200 recommended; ≥20).
Outputs a tiny adapter (.npz, ~11 KB) that, loaded over the shared base at inference, recovers
SOTA-level pose for that room/person (ADR-150 §3.53.6).
"""
import argparse
import numpy as np
import torch
import torch.nn as nn
from model import PoseNet, standardize
def main():
ap = argparse.ArgumentParser()
ap.add_argument("--base", required=True, help="base checkpoint (pose_mmfi_best.pt)")
ap.add_argument("--data", required=True, help="labeled calibration .npz with X and Y")
ap.add_argument("--out", required=True, help="output adapter .npz")
ap.add_argument("--rank", type=int, default=8)
ap.add_argument("--iters", type=int, default=600)
ap.add_argument("--lr", type=float, default=8e-4)
ap.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu")
a = ap.parse_args()
z = np.load(a.data)
X = torch.tensor(z["X"].astype(np.float32))
Y = torch.tensor(z["Y"].reshape(len(z["Y"]), 34).astype(np.float32))
n = len(X)
if n < 20:
print(f"WARNING: only {n} calibration samples — below ~20 the adapter may underperform "
f"zero-shot (ADR-150 §3.5). Recommend ~100200.")
dev = a.device
net = PoseNet().to(dev)
net.load_state_dict(torch.load(a.base, map_location=dev), strict=False)
net.add_lora(r=a.rank).to(dev)
for k, p in net.named_parameters():
p.requires_grad = k.endswith(".A") or k.endswith(".B")
trainable = [p for p in net.parameters() if p.requires_grad]
n_tr = sum(p.numel() for p in trainable)
Xs = standardize(X.to(dev))
Yt = Y.to(dev)
opt = torch.optim.AdamW(trainable, lr=a.lr, weight_decay=0.0)
lossf = nn.SmoothL1Loss(beta=0.1)
bs = min(128, n)
net.train()
for it in range(a.iters):
bi = torch.randint(0, n, (bs,), device=dev)
xb = Xs[bi]
# light augmentation (subcarrier dropout + noise) — matches training-time regularization
m = (torch.rand(xb.shape[0], xb.shape[1], 1, 1, device=dev) > 0.15).float()
xb = xb * m + 0.03 * torch.randn_like(xb) * torch.rand(xb.shape[0], 1, 1, 1, device=dev)
opt.zero_grad()
lossf(net(xb), Yt[bi]).backward()
opt.step()
adapter = net.lora_state()
nbytes = sum(v.astype(np.float16).nbytes for v in adapter.values())
np.savez(a.out, **{k: v.astype(np.float16) for k, v in adapter.items()},
_meta=np.array([a.rank, n, n_tr], dtype=np.int64))
print(f"saved {a.out} | rank {a.rank} | {n_tr:,} params | ~{nbytes/1024:.1f} KB fp16 | "
f"from {n} labeled samples")
if __name__ == "__main__":
main()
-120
View File
@@ -1,120 +0,0 @@
"""Per-room calibration producer for the cog-pose-estimation **conv+MLP** model
(`pose_v1.safetensors`, 56 subcarriers x 20 frames). Companion to `calibrate.py`
(which targets the MM-Fi *transformer* model) — different model, different adapter
key layout, NOT interchangeable (ADR-150 §3.5).
Fits a rank-r LoRA on the pose head (fc1, fc2) from a short labeled in-room capture and
writes a **safetensors** adapter with keys `fc1.a`/`fc1.b`/`fc2.a`/`fc2.b` (scale baked
into `b`) — exactly what `cog-pose-estimation run --adapter <file>` consumes.
python cog_calibrate.py --base pose_v1.safetensors --data calib.npz --out room.safetensors
`calib.npz`: `X` [N,56,20] CSI window + `Y` [N,17,2] (or [N,34]) keypoints in [0,1].
"""
import argparse
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
class CogPose(nn.Module):
"""Mirrors cog-pose-estimation's PoseNet (Candle) exactly — same safetensors keys."""
def __init__(self):
super().__init__()
self.enc = nn.ModuleDict({
"c1": nn.Conv1d(56, 64, 3, padding=1, dilation=1),
"c2": nn.Conv1d(64, 128, 3, padding=2, dilation=2),
"c3": nn.Conv1d(128, 128, 3, padding=4, dilation=4),
})
self.head = nn.ModuleDict({"fc1": nn.Linear(128, 256), "fc2": nn.Linear(256, 34)})
self.fc1_lora = None
self.fc2_lora = None
def _lora(self, slot, x, y):
if slot is None:
return y
a, b = slot
return y + (x @ a) @ b
def forward(self, x): # x: [B, 56, 20]
h = F.relu(self.enc["c1"](x))
h = F.relu(self.enc["c2"](h))
h = F.relu(self.enc["c3"](h))
h = h.mean(2) # [B, 128]
z1 = self.head["fc1"](h)
z1 = self._lora(self.fc1_lora, h, z1)
h1 = F.relu(z1)
z2 = self.head["fc2"](h1)
z2 = self._lora(self.fc2_lora, h1, z2)
return torch.sigmoid(z2) # [B, 34]
def add_lora(self, r=4):
self.fc1_lora = (nn.Parameter(torch.randn(128, r) * 0.02), nn.Parameter(torch.zeros(r, 256)))
self.fc2_lora = (nn.Parameter(torch.randn(256, r) * 0.02), nn.Parameter(torch.zeros(r, 34)))
for p in (*self.fc1_lora, *self.fc2_lora):
self.register_parameter(f"lora_{id(p)}", p)
return self
def load_base(net: CogPose, path: str):
from safetensors.torch import load_file
sd = load_file(path)
# remap "enc.c1.weight" -> module dict keys
mapped = {}
for k, v in sd.items():
mapped[k.replace("enc.", "enc.").replace("head.", "head.")] = v
net.load_state_dict(mapped, strict=False)
return net
def fit(base: str, data: str, out: str, rank: int = 4, iters: int = 400, lr: float = 1e-3):
z = np.load(data)
X = torch.tensor(z["X"].astype(np.float32)) # [N,56,20]
Y = torch.tensor(z["Y"].reshape(len(z["Y"]), 34).astype(np.float32))
n = len(X)
net = CogPose()
load_base(net, base)
net.add_lora(rank)
for p in net.parameters():
p.requires_grad = False
lora = [*net.fc1_lora, *net.fc2_lora]
for p in lora:
p.requires_grad = True
opt = torch.optim.AdamW(lora, lr=lr, weight_decay=0.0)
lossf = nn.SmoothL1Loss(beta=0.1)
bs = min(64, n)
net.train()
for _ in range(iters):
bi = torch.randint(0, n, (bs,))
opt.zero_grad()
lossf(net(X[bi]), Y[bi]).backward()
opt.step()
alpha = 16.0
scale = alpha / rank
a1, b1 = net.fc1_lora
a2, b2 = net.fc2_lora
tensors = {
"fc1.a": a1.detach().contiguous(),
"fc1.b": (b1.detach() * scale).contiguous(), # bake scale into b
"fc2.a": a2.detach().contiguous(),
"fc2.b": (b2.detach() * scale).contiguous(),
}
from safetensors.torch import save_file
save_file(tensors, out)
return out, sum(p.numel() for p in lora), n
if __name__ == "__main__":
ap = argparse.ArgumentParser()
ap.add_argument("--base", required=True)
ap.add_argument("--data", required=True)
ap.add_argument("--out", required=True)
ap.add_argument("--rank", type=int, default=4)
ap.add_argument("--iters", type=int, default=400)
a = ap.parse_args()
out, np_, n = fit(a.base, a.data, a.out, a.rank, a.iters)
print(f"saved {out} | {np_} LoRA params from {n} samples "
f"(keys fc1.a/fc1.b/fc2.a/fc2.b — load with cog-pose-estimation run --adapter)")
-49
View File
@@ -1,49 +0,0 @@
"""Run calibrated WiFi-CSI pose inference: shared base + a per-room LoRA adapter.
python infer.py --base pose_mmfi_best.pt --adapter room_A.adapter.npz --data frames.npz
`frames.npz` contains `X` [N,3,114,10] CSI amplitude. Prints/saves [N,17,2] keypoints in [0,1].
Omit --adapter to run the uncalibrated (zero-shot) base. With a room adapter, expect SOTA-level
accuracy in that room/person; without one, zero-shot degrades in unseen rooms (ADR-150 §3.6).
"""
import argparse
import numpy as np
import torch
from model import PoseNet, standardize
def main():
ap = argparse.ArgumentParser()
ap.add_argument("--base", required=True)
ap.add_argument("--adapter", default=None, help="per-room .adapter.npz (omit for zero-shot)")
ap.add_argument("--data", required=True, help=".npz with X [N,3,114,10]")
ap.add_argument("--out", default=None, help="optional .npy to save [N,17,2] keypoints")
ap.add_argument("--rank", type=int, default=8)
ap.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu")
a = ap.parse_args()
dev = a.device
net = PoseNet().to(dev)
net.load_state_dict(torch.load(a.base, map_location=dev), strict=False)
if a.adapter:
net.add_lora(r=a.rank).to(dev)
z = np.load(a.adapter)
net.load_lora({k: z[k].astype(np.float32) for k in z.files if k.endswith(".A") or k.endswith(".B")})
net.eval()
X = torch.tensor(np.load(a.data)["X"].astype(np.float32)).to(dev)
Xs = standardize(X)
out = []
with torch.no_grad():
for i in range(0, len(Xs), 4096):
out.append(net(Xs[i:i + 4096]).cpu().numpy())
kp = np.concatenate(out).reshape(-1, 17, 2)
print(f"inferred {len(kp)} frames | adapter={'yes' if a.adapter else 'NONE (zero-shot)'}")
if a.out:
np.save(a.out, kp)
print(f"saved keypoints -> {a.out}")
if __name__ == "__main__":
main()
-107
View File
@@ -1,107 +0,0 @@
"""WiFi-CSI pose model + LoRA adapter for the RuView calibration service.
Architecture matches the published flagship checkpoint
[`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose)
(`pose_mmfi_best.pt`): transformer encoder + temporal attention pooling + skeleton-graph head.
The calibration service freezes this base and fits a tiny per-room **LoRA adapter** (rank 8 on the
input projection + pose head ≈ 11 KB) from ~100200 labeled in-room samples. Empirically that lifts
cross-subject 64→72% and cross-environment 11→73% (ADR-150 §3.33.6).
"""
import numpy as np
import torch
import torch.nn as nn
# COCO-17 skeleton edges for the graph-refinement head.
EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (5, 6), (5, 7), (7, 9), (6, 8), (8, 10),
(5, 11), (6, 12), (11, 12), (11, 13), (13, 15), (12, 14), (14, 16)]
_A = np.eye(17, dtype=np.float32)
for _i, _j in EDGES:
_A[_i, _j] = _A[_j, _i] = 1.0
_A = _A / _A.sum(1, keepdims=True)
class LoRA(nn.Module):
"""Low-rank adapter wrapping a frozen Linear: y = W·x + (x·A·B)·(alpha/r)."""
def __init__(self, base: nn.Linear, r: int = 8, alpha: int = 16):
super().__init__()
self.base = base
for p in self.base.parameters():
p.requires_grad = False
self.A = nn.Parameter(torch.zeros(base.in_features, r))
self.B = nn.Parameter(torch.zeros(r, base.out_features))
nn.init.normal_(self.A, std=0.02)
self.scale = alpha / r
def forward(self, x):
return self.base(x) + (x @ self.A @ self.B) * self.scale
class GR(nn.Module):
"""Skeleton-graph refinement: nudges joints toward anatomically consistent positions."""
def __init__(self, d=256, h=96):
super().__init__()
self.je = nn.Parameter(torch.randn(17, 32) * 0.02)
self.inp = nn.Linear(d + 34, h)
self.g1 = nn.Linear(h, h)
self.g2 = nn.Linear(h, h)
self.out = nn.Linear(h, 2)
self.register_buffer("A", torch.tensor(_A))
def forward(self, z, kp0):
B = z.shape[0]
f = torch.relu(self.inp(torch.cat(
[z.unsqueeze(1).expand(-1, 17, -1), self.je.unsqueeze(0).expand(B, -1, -1), kp0], -1)))
f = torch.relu(self.g1(torch.einsum('ij,bjh->bih', self.A, f)))
f = torch.relu(self.g2(torch.einsum('ij,bjh->bih', self.A, f)))
return kp0 + 0.3 * torch.tanh(self.out(f))
class PoseNet(nn.Module):
"""Flagship pose model. Input [B,3,114,10] CSI amplitude (per-sample standardized) -> [B,34]."""
def __init__(self, na=3, nsc=114, nt=10, d=256, L=4, H=8):
super().__init__()
self.proj = nn.Linear(na * nsc, d)
self.pos = nn.Parameter(torch.randn(1, nt, d) * 0.02)
enc = nn.TransformerEncoderLayer(d, H, d * 2, dropout=0.2, batch_first=True, activation='gelu')
self.tf = nn.TransformerEncoder(enc, L)
self.att = nn.Linear(d, 1)
self.head = nn.Sequential(nn.Linear(d, 256), nn.GELU(), nn.Dropout(0.3), nn.Linear(256, 34))
self.gr = GR(d)
self.na, self.nsc, self.nt = na, nsc, nt
def forward(self, x):
B = x.shape[0]
t = x.permute(0, 3, 1, 2).reshape(B, self.nt, self.na * self.nsc)
h = self.tf(self.proj(t) + self.pos)
w = torch.softmax(self.att(h), 1)
z = (h * w).sum(1)
kp0 = torch.sigmoid(self.head(z)).reshape(B, 17, 2)
return self.gr(z, kp0).reshape(B, 34)
def add_lora(self, r=8, alpha=16):
"""Wrap the input projection + pose head with LoRA adapters (the ~11 KB calibration set)."""
self.proj = LoRA(self.proj, r, alpha)
self.head[0] = LoRA(self.head[0], r, alpha)
self.head[3] = LoRA(self.head[3], r, alpha)
return self
def lora_state(self) -> dict:
"""Extract just the LoRA A/B tensors (the per-room adapter to save)."""
return {k: v.detach().cpu().numpy() for k, v in self.state_dict().items()
if k.endswith(".A") or k.endswith(".B")}
def load_lora(self, adapter: dict):
sd = self.state_dict()
for k, v in adapter.items():
sd[k] = torch.tensor(v)
self.load_state_dict(sd)
return self
def standardize(x: torch.Tensor) -> torch.Tensor:
"""Per-sample standardization used in training/inference."""
return (x - x.mean((1, 2, 3), keepdim=True)) / (x.std((1, 2, 3), keepdim=True) + 1e-6)
@@ -1,103 +0,0 @@
"""Self-contained regression test for the RuView calibration service.
Exercises the committed CLI end-to-end on synthetic data (CPU, no GPU, no real checkpoint):
build a base -> calibrate.py fits an adapter -> infer.py runs base+adapter -> assert the
adapter is small, inference is shape-correct and finite, and the adapter actually changes output.
Run: python test_calibration.py (or via pytest)
"""
import json
import subprocess
import sys
import tempfile
from pathlib import Path
import numpy as np
import torch
HERE = Path(__file__).parent
sys.path.insert(0, str(HERE))
from model import PoseNet, standardize # noqa: E402
def _make_base(path: Path):
torch.manual_seed(0)
net = PoseNet()
# Save without the deterministic gr.A buffer (mirrors the published checkpoint;
# calibrate.py/infer.py load with strict=False).
sd = {k: v for k, v in net.state_dict().items() if k != "gr.A"}
torch.save(sd, path)
def _make_data(path: Path, n: int, seed: int):
rng = np.random.default_rng(seed)
X = rng.standard_normal((n, 3, 114, 10)).astype(np.float32)
Y = rng.random((n, 17, 2)).astype(np.float32) # keypoints in [0,1]
np.savez(path, X=X, Y=Y)
def _run(*args):
r = subprocess.run(
[sys.executable, str(HERE / args[0]), *map(str, args[1:])],
capture_output=True, text=True,
)
assert r.returncode == 0, f"{args[0]} failed:\n{r.stdout}\n{r.stderr}"
return r.stdout
def test_calibration_end_to_end():
with tempfile.TemporaryDirectory() as d:
d = Path(d)
base = d / "base.pt"
calib = d / "calib.npz"
frames = d / "frames.npz"
adapter = d / "room.adapter.npz"
kp = d / "kp.npy"
_make_base(base)
_make_data(calib, n=40, seed=1) # ≥20 → no underfit warning
_make_data(frames, n=16, seed=2)
# 1) calibrate -> adapter
out = _run("calibrate.py", "--base", base, "--data", calib, "--out", adapter,
"--iters", "50", "--device", "cpu")
assert adapter.exists(), "adapter not written"
assert "saved" in out.lower()
sz = adapter.stat().st_size
assert sz < 200_000, f"adapter unexpectedly large ({sz} bytes)"
# adapter contains the expected LoRA tensors (materialize + close so the
# Windows tempdir can be cleaned up — np.load keeps a lazy file handle).
with np.load(adapter) as z:
keys = [k for k in z.files if k.endswith(".A") or k.endswith(".B")]
assert keys, f"adapter has no LoRA tensors: {z.files}"
lora = {k: z[k].astype(np.float32) for k in keys}
# 2) infer with adapter -> keypoints
_run("infer.py", "--base", base, "--adapter", adapter, "--data", frames,
"--out", kp, "--device", "cpu")
out_kp = np.load(kp)
assert out_kp.shape == (16, 17, 2), f"bad keypoint shape {out_kp.shape}"
assert np.isfinite(out_kp).all(), "non-finite keypoints"
assert (out_kp >= 0).all() and (out_kp <= 1).all(), "keypoints out of [0,1]"
# 3) adapter must actually change the output vs the zero-shot base
with np.load(frames) as fz:
frames_x = fz["X"][:]
net = PoseNet()
net.load_state_dict(torch.load(base, map_location="cpu"), strict=False)
net.eval()
x = standardize(torch.tensor(frames_x))
with torch.no_grad():
base_kp = net(x).reshape(16, 17, 2).numpy()
net.add_lora()
net.load_lora(lora)
net.eval()
with torch.no_grad():
cal_kp = net(x).reshape(16, 17, 2).numpy()
assert np.abs(base_kp - cal_kp).sum() > 1e-4, "adapter did not change output"
if __name__ == "__main__":
test_calibration_end_to_end()
print("PASS: calibration service end-to-end (calibrate -> adapter -> infer)")
@@ -1,75 +0,0 @@
"""Regression test for the cog-pose adapter producer (cog_calibrate.py).
Uses the in-repo `pose_v1.safetensors` (skips if absent). Verifies the produced adapter:
- has the exact keys/shapes the Rust `cog-pose-estimation --adapter` loader expects,
- reduces calibration fit error,
- actually changes inference output,
- is tiny.
Run: python test_cog_calibration.py (or via pytest)
"""
import os
import sys
import tempfile
from pathlib import Path
import numpy as np
import torch
import torch.nn.functional as F
HERE = Path(__file__).parent
sys.path.insert(0, str(HERE))
import cog_calibrate as C # noqa: E402
BASE = HERE / "../../v2/crates/cog-pose-estimation/cog/artifacts/pose_v1.safetensors"
def test_cog_adapter_producer():
if not BASE.exists():
print(f"(skip — {BASE} not present)")
return
from safetensors.torch import load_file
rng = np.random.default_rng(0)
n = 120
X = rng.standard_normal((n, 56, 20)).astype("float32")
Y = (0.5 + 0.1 * X[:, :34, 0].reshape(n, 34)).clip(0, 1).astype("float32")
with tempfile.TemporaryDirectory() as d:
calib = os.path.join(d, "calib.npz")
adapter = os.path.join(d, "room.safetensors")
np.savez(calib, X=X, Y=Y)
net0 = C.CogPose()
C.load_base(net0, str(BASE))
net0.eval()
with torch.no_grad():
base_err = F.smooth_l1_loss(net0(torch.tensor(X)), torch.tensor(Y)).item()
_, nparam, _ = C.fit(str(BASE), calib, adapter, rank=4, iters=400)
t = load_file(adapter)
# exact Rust loader contract: a:[in,r], b:[r,out]
assert tuple(t["fc1.a"].shape) == (128, 4)
assert tuple(t["fc1.b"].shape) == (4, 256)
assert tuple(t["fc2.a"].shape) == (256, 4)
assert tuple(t["fc2.b"].shape) == (4, 34)
net = C.CogPose()
C.load_base(net, str(BASE))
net.add_lora(4)
with torch.no_grad():
net.fc1_lora[0].copy_(t["fc1.a"]); net.fc1_lora[1].copy_(t["fc1.b"] / (16 / 4))
net.fc2_lora[0].copy_(t["fc2.a"]); net.fc2_lora[1].copy_(t["fc2.b"] / (16 / 4))
net.eval()
with torch.no_grad():
cal_err = F.smooth_l1_loss(net(torch.tensor(X)), torch.tensor(Y)).item()
changed = (net0(torch.tensor(X[:8])) - net(torch.tensor(X[:8]))).abs().sum().item()
assert cal_err < base_err, f"calibration did not reduce error ({base_err} -> {cal_err})"
assert changed > 1e-3, "adapter inert"
assert nparam < 5000, f"adapter unexpectedly large ({nparam} params)"
if __name__ == "__main__":
test_cog_adapter_producer()
print("PASS: cog adapter producer (Rust-loadable format, reduces error, active)")
@@ -1 +0,0 @@
9c35e541d51f00998691b98948887ebca09b907d8eb29a113f97e792340456ba
-1
View File
@@ -1 +0,0 @@
{"frames": [{"pred": [[0.4003, 0.2734], [0.5038, 0.4197], [0.2053, 0.4438], [0.4397, 0.685], [0.5796, 0.7645], [0.8001, 0.2195], [0.2789, 0.2833], [0.314, 0.5439], [0.511, 0.2259], [0.6008, 0.46], [0.4837, 0.3879], [0.3475, 0.5597], [0.6569, 0.3575], [0.437, 0.6539], [0.2341, 0.6038], [0.7331, 0.392], [0.5615, 0.4915]]}, {"pred": [[0.4669, 0.6066], [0.6012, 0.7873], [0.4124, 0.5997], [0.2832, 0.281], [0.2732, 0.3635], [0.2503, 0.4848], [0.6827, 0.715], [0.4336, 0.7165], [0.295, 0.3386], [0.5337, 0.3544], [0.4397, 0.5474], [0.5163, 0.5528], [0.7547, 0.6799], [0.4195, 0.4448], [0.2257, 0.2269], [0.384, 0.2176], [0.2419, 0.4332]]}, {"pred": [[0.5585, 0.283], [0.4325, 0.2934], [0.463, 0.4744], [0.4188, 0.3454], [0.215, 0.7565], [0.527, 0.2353], [0.7084, 0.6124], [0.3015, 0.6744], [0.4103, 0.3532], [0.7243, 0.6932], [0.3302, 0.4918], [0.2072, 0.3754], [0.7914, 0.4878], [0.7618, 0.4079], [0.323, 0.3386], [0.7104, 0.4997], [0.2673, 0.6077]]}, {"pred": [[0.6372, 0.4984], [0.4184, 0.6763], [0.4498, 0.7549], [0.2924, 0.303], [0.3069, 0.7022], [0.3954, 0.5098], [0.7836, 0.6071], [0.4733, 0.7114], [0.3407, 0.3793], [0.3408, 0.4678], [0.4156, 0.4911], [0.4525, 0.7519], [0.5117, 0.1985], [0.1893, 0.6784], [0.6281, 0.5346], [0.5175, 0.673], [0.36, 0.3665]]}, {"pred": [[0.5535, 0.6537], [0.568, 0.511], [0.4705, 0.5377], [0.6372, 0.7163], [0.5493, 0.7515], [0.2559, 0.4549], [0.2553, 0.6176], [0.2991, 0.6154], [0.7185, 0.7986], [0.4586, 0.5057], [0.2975, 0.4525], [0.3263, 0.3719], [0.5131, 0.4576], [0.557, 0.5268], [0.6572, 0.7736], [0.2146, 0.6526], [0.4662, 0.7371]]}, {"pred": [[0.2924, 0.7595], [0.2612, 0.2315], [0.2488, 0.7751], [0.2329, 0.7282], [0.4744, 0.4206], [0.3618, 0.267], [0.2477, 0.285], [0.3976, 0.3746], [0.494, 0.2874], [0.3596, 0.2112], [0.3311, 0.4692], [0.6912, 0.4727], [0.4434, 0.5233], [0.4139, 0.7048], [0.425, 0.3937], [0.2326, 0.631], [0.2655, 0.7116]]}, {"pred": [[0.3609, 0.3437], [0.285, 0.486], [0.7734, 0.5468], [0.3657, 0.4093], [0.4728, 0.5019], [0.1866, 0.3545], [0.2172, 0.2028], [0.5613, 0.5238], [0.6252, 0.7205], [0.7998, 0.2954], [0.242, 0.7063], [0.6259, 0.6883], [0.5148, 0.7141], [0.5577, 0.7434], [0.3233, 0.2131], [0.2652, 0.7066], [0.5753, 0.5885]]}, {"pred": [[0.6787, 0.6504], [0.6051, 0.2297], [0.2539, 0.3475], [0.6437, 0.7807], [0.4981, 0.6149], [0.5716, 0.2367], [0.6486, 0.3632], [0.2433, 0.369], [0.6061, 0.3731], [0.4955, 0.2591], [0.7676, 0.7602], [0.6899, 0.7716], [0.3143, 0.7707], [0.3031, 0.4997], [0.7076, 0.5133], [0.3382, 0.7196], [0.2002, 0.4871]]}]}
-1
View File
@@ -1 +0,0 @@
{"frames": [{"gt": [[0.3943, 0.2905], [0.5215, 0.4194], [0.2225, 0.4602], [0.4547, 0.6961], [0.5765, 0.7686], [0.7858, 0.2279], [0.2866, 0.2707], [0.3084, 0.549], [0.5286, 0.2377], [0.6082, 0.4566], [0.4719, 0.3799], [0.3465, 0.5447], [0.6377, 0.3728], [0.4509, 0.6543], [0.2235, 0.6009], [0.7253, 0.3882], [0.5479, 0.4737]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.4845, 0.5985], [0.5883, 0.7959], [0.4315, 0.6012], [0.3008, 0.2703], [0.2776, 0.3486], [0.2483, 0.4695], [0.6916, 0.7184], [0.4153, 0.7305], [0.3057, 0.3392], [0.5535, 0.3576], [0.4216, 0.5398], [0.5093, 0.5706], [0.7397, 0.668], [0.4354, 0.4394], [0.2373, 0.2404], [0.404, 0.2315], [0.2609, 0.4182]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.5684, 0.2891], [0.4185, 0.2737], [0.4796, 0.4903], [0.4056, 0.3589], [0.2139, 0.7706], [0.5259, 0.2162], [0.718, 0.6177], [0.3002, 0.6632], [0.3978, 0.3338], [0.7116, 0.6836], [0.336, 0.5106], [0.2168, 0.3677], [0.7739, 0.4683], [0.773, 0.4188], [0.318, 0.3226], [0.7043, 0.4877], [0.2509, 0.5964]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.6501, 0.4868], [0.3995, 0.6805], [0.4408, 0.7681], [0.2762, 0.2907], [0.2877, 0.6959], [0.4102, 0.5292], [0.7825, 0.5898], [0.4603, 0.723], [0.3511, 0.3758], [0.3556, 0.4514], [0.4123, 0.4749], [0.4524, 0.7506], [0.5141, 0.2112], [0.2024, 0.6795], [0.6351, 0.5339], [0.5333, 0.6706], [0.3491, 0.3662]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.537, 0.656], [0.5675, 0.5033], [0.4714, 0.52], [0.6195, 0.7259], [0.5357, 0.766], [0.273, 0.4653], [0.2439, 0.6017], [0.2927, 0.6297], [0.7297, 0.7805], [0.439, 0.4924], [0.2969, 0.4589], [0.3174, 0.3911], [0.5324, 0.4643], [0.5744, 0.5074], [0.673, 0.783], [0.2238, 0.6674], [0.4534, 0.7468]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.2896, 0.7515], [0.2537, 0.2345], [0.2434, 0.763], [0.2502, 0.7137], [0.4723, 0.4035], [0.3607, 0.2775], [0.2657, 0.2969], [0.3872, 0.383], [0.5001, 0.3067], [0.3503, 0.2092], [0.3137, 0.4849], [0.6914, 0.4593], [0.4359, 0.504], [0.4056, 0.6994], [0.4428, 0.4085], [0.2424, 0.6445], [0.2507, 0.7048]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.3692, 0.3453], [0.2945, 0.4675], [0.7836, 0.5282], [0.3857, 0.414], [0.4848, 0.5017], [0.203, 0.3585], [0.225, 0.2135], [0.5513, 0.5175], [0.6296, 0.7275], [0.7908, 0.2897], [0.2263, 0.7012], [0.6403, 0.6873], [0.5026, 0.701], [0.5504, 0.7357], [0.338, 0.2187], [0.2629, 0.7015], [0.5757, 0.6084]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}, {"gt": [[0.6786, 0.649], [0.5956, 0.2396], [0.2447, 0.3593], [0.6439, 0.7854], [0.4874, 0.6102], [0.5857, 0.2465], [0.6459, 0.3827], [0.2364, 0.3613], [0.6054, 0.3745], [0.4798, 0.2711], [0.7869, 0.7618], [0.6919, 0.7809], [0.3259, 0.7674], [0.285, 0.5144], [0.6921, 0.5052], [0.3388, 0.7386], [0.2022, 0.495]], "vis": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "scale": 1.0}]}
-5
View File
@@ -1,5 +0,0 @@
{"benchmark": "AetherArena", "created": "2026-05-30", "kind": "genesis", "note": "Official Spatial-Intelligence Benchmark \u2014 append-only signed ledger. Entries are real harness scores only; no seeded numbers.", "prev_hash": "0000000000000000000000000000000000000000000000000000000000000000", "row_hash": "940bdc6f0f5dd00f4d89e13a8fa843bab3c9ddf1b8051f426a1701e730249231", "seq": 0, "spec": "ADR-149"}
{"abs_gain": "+9.38", "benchmark": "MM-Fi", "category": "pose", "caveat": "Protocol-matched MM-Fi random_split result; NOT solved real-world generalization. Random split has temporal/subject-adjacency effects common to this benchmark family. Leakage-free cross-subject is far lower (~11-27%) and is the real deployment frontier.", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20 (||right_shoulder-left_hip|| norm, 17 COCO kpts)", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer (4L/8H ~2M params, temporal-attention)", "prev_hash": "940bdc6f0f5dd00f4d89e13a8fa843bab3c9ddf1b8051f426a1701e730249231", "protocol": "random_split (ratio=0.8, seed=0)", "rel_gain": "+13.0%", "reproduce": "download MM-Fi -> parse_mmfi_zips.py -> train_tf_torso.py X.npy Y.npy split_random.npy (seed 0)", "row_hash": "76598d8e1320d5248f8cd854a8ffa22a99bd2a2f0e0e7f2d2b1df79af16001d5", "score_pct": 81.63, "scored_at": "2026-05-30", "seq": 1, "sota_ref": "MultiFormer 72.25 (CSI2Pose 68.41)", "submitter": "ruvnet", "tier": "Gold"}
{"abs_gain": "+11.34", "benchmark": "MM-Fi", "category": "pose", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer + skeleton-graph head + 3-ensemble + TTA", "note": "Best in-domain. Stacks attention-pooling + transformer + skeleton-graph refine + warmup + TTA + 3-model ensemble. Supersedes the 81.63 single-model entry.", "prev_hash": "76598d8e1320d5248f8cd854a8ffa22a99bd2a2f0e0e7f2d2b1df79af16001d5", "protocol": "random_split (0.8, seed 0)", "row_hash": "5780a4bc3e98eb0e30c1ecfa9091e57b280444fa1f21cd5146797e408580e4ab", "score_pct": 83.59, "scored_at": "2026-05-30", "seq": 2, "sota_ref": "MultiFormer 72.25 (CSI2Pose 68.41)", "submitter": "ruvnet", "tier": "Gold"}
{"benchmark": "MM-Fi", "category": "pose", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer", "note": "Leakage-free generalization to unseen people, shared rooms. Honest deployment-relevant number.", "prev_hash": "5780a4bc3e98eb0e30c1ecfa9091e57b280444fa1f21cd5146797e408580e4ab", "protocol": "cross_subject (official, val=S05,S10,..,S40)", "row_hash": "d989e4e1dbc0182610305fdfbde8b094413b87c913283a46bf41f4afba7a06fd", "score_pct": 64.04, "scored_at": "2026-05-30", "seq": 3, "sota_ref": "(no matched public ref)", "submitter": "ruvnet", "tier": "Silver"}
{"benchmark": "MM-Fi", "category": "pose", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer + CORAL domain alignment", "note": "The real deployment frontier (new room). CORAL transductive DG (+30% rel over control). Data-bound: MM-Fi has only 3 source rooms.", "prev_hash": "d989e4e1dbc0182610305fdfbde8b094413b87c913283a46bf41f4afba7a06fd", "protocol": "cross_environment (train E01-03 -> test E04, new room)", "row_hash": "bf370487bde88e198c13877956dab3c83766a6a24afef0b78b6ac7aa130bb207", "score_pct": 17.51, "scored_at": "2026-05-30", "seq": 4, "sota_ref": "(hard frontier; control 13.52)", "submitter": "ruvnet", "tier": "Bronze"}
-100
View File
@@ -1,100 +0,0 @@
#!/usr/bin/env python3
"""AetherArena append-only, tamper-evident results ledger (ADR-149 §2.3/§2.4).
Each row is hash-chained to the previous one: ``row_hash = sha256(canonical_row
+ prev_hash)``. Any silent edit to an earlier row breaks every subsequent
``prev_hash`` link, so the ledger is append-only and verifiable by anyone — no
trust in the maintainer required. (Ed25519 row signing is the next hardening;
the chain already makes tampering detectable.)
Usage:
python ledger_tools.py seed # (re)build ledger.jsonl with genesis + baseline
python ledger_tools.py verify # verify the whole chain -> exit 0 / 1
python ledger_tools.py append '<json-row>' # append one scored row
"""
import hashlib
import json
import sys
from pathlib import Path
LEDGER = Path(__file__).parent / "ledger.jsonl"
GENESIS_PREV = "0" * 64
def canonical(row: dict) -> bytes:
# Stable key order, no whitespace -> deterministic bytes for hashing.
body = {k: row[k] for k in sorted(row) if k != "row_hash"}
return json.dumps(body, separators=(",", ":"), sort_keys=True).encode()
def row_hash(row: dict) -> str:
return hashlib.sha256(canonical(row)).hexdigest()
def read_rows() -> list[dict]:
if not LEDGER.exists():
return []
return [json.loads(l) for l in LEDGER.read_text().splitlines() if l.strip()]
def append(entry: dict) -> dict:
rows = read_rows()
prev = rows[-1]["row_hash"] if rows else GENESIS_PREV
entry = dict(entry)
entry["seq"] = len(rows)
entry["prev_hash"] = prev
entry["row_hash"] = row_hash(entry)
with LEDGER.open("a") as f:
f.write(json.dumps(entry, sort_keys=True) + "\n")
return entry
def verify() -> bool:
rows = read_rows()
prev = GENESIS_PREV
for i, r in enumerate(rows):
if r.get("seq") != i:
print(f"FAIL: row {i} seq mismatch ({r.get('seq')})")
return False
if r.get("prev_hash") != prev:
print(f"FAIL: row {i} prev_hash broken — ledger was edited")
return False
if r.get("row_hash") != row_hash(r):
print(f"FAIL: row {i} row_hash mismatch — row was tampered")
return False
prev = r["row_hash"]
print(f"OK: {len(rows)} rows, chain intact")
return True
def seed():
"""Rebuild with the genesis row only — an EMPTY board.
Benchmark-first: no placeholder/hand-entered numbers ever sit on the
leaderboard. Every result row is produced by the real scoring pipeline
(load model -> run inference -> score against the private eval split ->
proof hash). The board starts empty and awaits the first real harness score,
including RuView's own — which gets no special seeding.
"""
if LEDGER.exists():
LEDGER.unlink()
append({
"kind": "genesis",
"benchmark": "AetherArena",
"spec": "ADR-149",
"note": "Official Spatial-Intelligence Benchmark — append-only signed ledger. "
"Entries are real harness scores only; no seeded numbers.",
"created": "2026-05-30",
})
if __name__ == "__main__":
cmd = sys.argv[1] if len(sys.argv) > 1 else "verify"
if cmd == "seed":
seed(); verify()
elif cmd == "verify":
sys.exit(0 if verify() else 1)
elif cmd == "append":
print(json.dumps(append(json.loads(sys.argv[2])), indent=2))
else:
print(__doc__); sys.exit(2)
-41
View File
@@ -1,41 +0,0 @@
# AetherArena submission manifest (ADR-149 §2.2).
# Accompanies a model artifact pushed to the AA Hugging Face Space.
# This file is the contract the Space validates before quarantine + scoring.
[submission]
# Free-form display name shown on the leaderboard.
name = "my-spatial-model"
# Hugging Face repo or URL of the model artifact (.safetensors / .rvf / LoRA adapter).
model_ref = "hf://your-org/your-model"
# Submitter handle (HF username / org). Used to sign the ledger row.
submitter = "your-hf-username"
# SPDX license of the submitted model.
license = "Apache-2.0"
[category]
# One of: pose | presence | tracking | vitals | multi-task
# v0 ranks: pose, presence (tracking/vitals activate when ground truth lands).
primary = "pose"
[input]
# Which ADR-145 FeatureSet the model consumes. v0 input is RF/WiFi CSI.
# F0 = CSI amplitude/phase F1 = +CIR F2 = +Doppler F3 = +BFLD
feature_set = "F0"
# Tensor I/O contract so the scorer can feed the model correctly.
input_shape = [114, 2] # subcarriers × {amp, phase} (example)
output_shape = [17, 2] # 17 keypoints × {x, y} normalised [0,1]
# Normalisation expected on the input ("none" | "zscore" | "minmax").
normalization = "zscore"
[runtime]
# Inference entrypoint inside the artifact (framework-specific).
framework = "candle" # candle | onnx | torch
# Optional: target the edge-latency category with a declared device class.
device_class = "cpu" # cpu | pi5 | gpu
# Notes:
# - You submit a MODEL, never predictions on data you hold.
# - Scoring runs against a PRIVATE MM-Fi held-out split in a no-network,
# read-only sandbox. You cannot see the eval data.
# - The resulting score is a signed, append-only ledger row carrying a
# determinism proof hash and the pinned harness_version.
-37
View File
@@ -1,37 +0,0 @@
---
title: AetherArena — Spatial-Intelligence Benchmark
emoji: 📡
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
python_version: "3.12"
app_file: app.py
pinned: true
license: cc-by-nc-4.0
tags:
- benchmark
- leaderboard
- wifi-sensing
- spatial-intelligence
- pose-estimation
---
# AetherArena ("AA") — The Official Spatial-Intelligence Benchmark
> Public leaderboard. Private evaluation split. Open scorer. Signed results.
The field's standard yardstick for camera-free **spatial intelligence** (pose, presence,
occupancy, tracking, vitals) from RF/WiFi and, over time, mmWave / UWB / multimodal.
- **Project-agnostic** — any team, framework, or modality enters; RuView donated the seed
scorer and is scored like everyone else.
- **Benchmark-first** — the board starts empty; every row is a real scoring-pipeline
**witness** (`inputs_sha256` + `proof_sha256` + `harness_version`) in an append-only,
hash-chained, tamper-evident ledger.
- **Reproducible** — the scorer is open; reproduce any proof hash + repeatability locally.
Spec: [ADR-149](https://github.com/ruvnet/RuView/blob/main/docs/adr/ADR-149-public-community-leaderboard-huggingface.md).
Source + open scorer: https://github.com/ruvnet/RuView/tree/main/aether-arena
Non-commercial (CC BY-NC 4.0): the v0 eval split derives from MM-Fi (CC BY-NC); AA is operated non-commercially.
-161
View File
@@ -1,161 +0,0 @@
"""AetherArena ("AA") — The Official Spatial-Intelligence Benchmark.
Hugging Face Space (Gradio) — the public face of the benchmark (ADR-149).
This Space is the presentation + submission layer; the heavy scoring runs in the
pinned RuView harness (CI / scorer container), and results land in the append-only,
hash-chained **witness ledger** shown here.
Benchmark-first: the board starts EMPTY. No seeded or hand-entered numbers — every
row is a real scoring-pipeline witness (inputs_sha256 + proof_sha256 + harness_version).
"""
import hashlib
import json
from pathlib import Path
import gradio as gr
LEDGER = Path(__file__).parent / "ledger.jsonl"
GENESIS_PREV = "0" * 64
def _rows():
if not LEDGER.exists():
return []
return [json.loads(l) for l in LEDGER.read_text().splitlines() if l.strip()]
def _canon(row: dict) -> bytes:
body = {k: row[k] for k in sorted(row) if k != "row_hash"}
return json.dumps(body, separators=(",", ":"), sort_keys=True).encode()
def verify_chain():
rows, prev = _rows(), GENESIS_PREV
for i, r in enumerate(rows):
if r.get("prev_hash") != prev or r.get("row_hash") != hashlib.sha256(_canon(r)).hexdigest():
return f"❌ Ledger chain BROKEN at row {i} — tampering detected."
prev = r["row_hash"]
return f"✅ Witness ledger chain intact — {len(rows)} row(s), append-only."
def leaderboard(category: str):
results = [r for r in _rows() if r.get("kind") == "result" and (category == "all" or r.get("category") == category)]
if not results:
return [["— no entries yet —", "", "", "", "", ""]]
results.sort(key=lambda r: r.get("score_pct") or 0, reverse=True)
return [[
r.get("submitter", "?"),
r.get("model_ref", "?"),
f"{r.get('benchmark','?')} / {r.get('protocol','?')}",
r.get("metric", "?"),
f"{r.get('score_pct', 0):.2f}%",
f"{r.get('tier','?')} (vs {r.get('sota_ref','?')})",
] for r in results]
FOUR_PART = "### Public leaderboard. Private evaluation split. Open scorer. Signed results."
ABOUT = """
**AetherArena** is the official, project-agnostic **Spatial-Intelligence Benchmark** —
camera-free pose, presence, occupancy, tracking, and vitals from RF/WiFi (and, over
time, mmWave / UWB / radar / multimodal). It is **not** a single-vendor board: any
team, framework, or modality enters, and every entrant — including the RuView baseline
that donated the seed scorer — is scored by the identical, open, pinned harness.
The scorer reuses RuView's released `wifi-densepose-train` acceptance harness
(`ruview_metrics` + ablation). You submit a **model, not predictions**; it is scored
against a **private** MM-Fi held-out split; one **witness** row (inputs hash + proof
hash + harness version) is appended to a **hash-chained, tamper-evident ledger**.
**For industry:** a vendor-neutral, auditable way to compare RF-sensing models on equal
footing — the same standardized splits, the same metric definition, the same signed,
reproducible ledger. No more "trust our number on our split." Vendors, labs, and startups
all submit through one pipeline and are scored identically.
**Generalization Track (roadmap):** the headline isn't a single in-domain number — it's a
battery of honest tracks: MM-Fi `random_split` (in-domain), `cross_subject` (unseen people),
cross-room, cross-device, and confidence-calibration (ECE). Cross-subject is the real
deployment frontier and is treated as the flagship hard benchmark.
Spec: ADR-149. v0 ranks **pose, presence, edge-latency, determinism**. Tracking &
vitals activate when their ground truth lands; **privacy-leakage** is gated until the
membership-inference attacker ships. Source + the open scorer:
https://github.com/ruvnet/RuView/tree/main/aether-arena
"""
SUBMIT = """
### Submit a model
1. Write a manifest — [`schema/aa-submission.toml`](https://github.com/ruvnet/RuView/blob/main/aether-arena/schema/aa-submission.toml):
declare your model ref, category, the ADR-145 feature set (F0 CSI … F3 BFLD), and the tensor I/O contract.
2. Provide your model artifact (`.safetensors` / `.rvf` / LoRA adapter).
3. It moves through `submitted → validated → quarantined → smoke_scored → full_scored → published`,
scored in a no-network, read-only sandbox against the private split.
4. Your signed witness row appears on the leaderboard.
**You submit a model, never predictions** — predictions on data you hold prove nothing.
"""
VERIFY = """
### Verify it's fair (you don't have to trust us)
The scorer is open and reproducible. Reproduce the determinism proof + repeatability locally:
```bash
git clone https://github.com/ruvnet/RuView && cd RuView/v2
# determinism gate (same as CI):
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features
# repeatability — N runs, one identical proof hash:
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --repeat 16
# verify the append-only witness ledger chain:
cd ../aether-arena/ledger && python3 ledger_tools.py verify
```
A stranger must be able to: submit → get a deterministic score → see the signed row →
rerun the scorer locally → understand why the rank is fair. That is the launch gate (ADR-149 §7).
"""
with gr.Blocks(title="AetherArena — Spatial-Intelligence Benchmark") as demo:
gr.Markdown("# 📡 AetherArena (AA)\n## The Official, Vendor-Neutral Benchmark for WiFi / RF Spatial Sensing")
gr.Markdown(FOUR_PART)
gr.Markdown(
"**An open industry benchmark — for everyone, not any one vendor.** Submit any model, any framework, "
"any modality. Every entrant — academic, startup, or incumbent — is scored *identically*: standardized "
"protocols (MM-Fi `random_split` / `cross_subject`), matched metrics (torso-PCK@20, the published "
"definition), and an auditable, hash-chained **witness ledger** anyone can verify and reproduce.\n\n"
"**Why it exists:** WiFi/RF-sensing results are reported with inconsistent splits, metrics, and no "
"auditability — so numbers aren't comparable. AetherArena fixes the *measurement*: one protocol, one "
"metric, one signed ledger, one-command reproduction. The benchmark is the product; the leaderboard is "
"just the scoreboard. (Reference implementation seeded by RuView, ADR-149.)"
)
chain = gr.Markdown(verify_chain())
with gr.Tab("🏆 Leaderboard"):
gr.Markdown(
"### Current standings — MM-Fi WiFi-CSI 2D pose, torso-PCK@20\n"
"Ranked, protocol- & metric-matched results. Each row carries its own caveats in the ledger "
"(e.g. `random_split` has temporal-adjacency leakage that inflates *all* methods equally — the "
"leakage-free `cross_subject` track is the real deployment frontier). **Submit yours — top the board.**"
)
cat = gr.Dropdown(["all", "pose", "presence"], value="all", label="Category")
tbl = gr.Dataframe(
headers=["Submitter", "Model", "Benchmark / Protocol", "Metric", "Score", "Tier (vs prior SOTA)"],
value=leaderboard("all"), interactive=False, wrap=True,
)
cat.change(leaderboard, cat, tbl)
gr.Markdown(
"*Vendor-neutral & benchmark-first: every row is a real, metric- and protocol-matched result — "
"no seeded or vendor-favored numbers. Integrity is enforced, not promised: the current top entry's "
"score was self-corrected down from an inflated metric (91.86% bbox → 81.63% torso) before it could "
"be published. The same scorer and ledger apply to every submitter.*"
)
with gr.Tab("📤 Submit"):
gr.Markdown(SUBMIT)
with gr.Tab("🔬 Verify"):
gr.Markdown(VERIFY)
with gr.Tab("️ About"):
gr.Markdown(ABOUT)
if __name__ == "__main__":
demo.launch(server_name="0.0.0.0", server_port=7860)
-5
View File
@@ -1,5 +0,0 @@
{"benchmark": "AetherArena", "created": "2026-05-30", "kind": "genesis", "note": "Official Spatial-Intelligence Benchmark \u2014 append-only signed ledger. Entries are real harness scores only; no seeded numbers.", "prev_hash": "0000000000000000000000000000000000000000000000000000000000000000", "row_hash": "940bdc6f0f5dd00f4d89e13a8fa843bab3c9ddf1b8051f426a1701e730249231", "seq": 0, "spec": "ADR-149"}
{"abs_gain": "+9.38", "benchmark": "MM-Fi", "category": "pose", "caveat": "Protocol-matched MM-Fi random_split result; NOT solved real-world generalization. Random split has temporal/subject-adjacency effects common to this benchmark family. Leakage-free cross-subject is far lower (~11-27%) and is the real deployment frontier.", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20 (||right_shoulder-left_hip|| norm, 17 COCO kpts)", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer (4L/8H ~2M params, temporal-attention)", "prev_hash": "940bdc6f0f5dd00f4d89e13a8fa843bab3c9ddf1b8051f426a1701e730249231", "protocol": "random_split (ratio=0.8, seed=0)", "rel_gain": "+13.0%", "reproduce": "download MM-Fi -> parse_mmfi_zips.py -> train_tf_torso.py X.npy Y.npy split_random.npy (seed 0)", "row_hash": "76598d8e1320d5248f8cd854a8ffa22a99bd2a2f0e0e7f2d2b1df79af16001d5", "score_pct": 81.63, "scored_at": "2026-05-30", "seq": 1, "sota_ref": "MultiFormer 72.25 (CSI2Pose 68.41)", "submitter": "ruvnet", "tier": "Gold"}
{"abs_gain": "+11.34", "benchmark": "MM-Fi", "category": "pose", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer + skeleton-graph head + 3-ensemble + TTA", "note": "Best in-domain. Stacks attention-pooling + transformer + skeleton-graph refine + warmup + TTA + 3-model ensemble. Supersedes the 81.63 single-model entry.", "prev_hash": "76598d8e1320d5248f8cd854a8ffa22a99bd2a2f0e0e7f2d2b1df79af16001d5", "protocol": "random_split (0.8, seed 0)", "row_hash": "5780a4bc3e98eb0e30c1ecfa9091e57b280444fa1f21cd5146797e408580e4ab", "score_pct": 83.59, "scored_at": "2026-05-30", "seq": 2, "sota_ref": "MultiFormer 72.25 (CSI2Pose 68.41)", "submitter": "ruvnet", "tier": "Gold"}
{"benchmark": "MM-Fi", "category": "pose", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer", "note": "Leakage-free generalization to unseen people, shared rooms. Honest deployment-relevant number.", "prev_hash": "5780a4bc3e98eb0e30c1ecfa9091e57b280444fa1f21cd5146797e408580e4ab", "protocol": "cross_subject (official, val=S05,S10,..,S40)", "row_hash": "d989e4e1dbc0182610305fdfbde8b094413b87c913283a46bf41f4afba7a06fd", "score_pct": 64.04, "scored_at": "2026-05-30", "seq": 3, "sota_ref": "(no matched public ref)", "submitter": "ruvnet", "tier": "Silver"}
{"benchmark": "MM-Fi", "category": "pose", "harness_version": 1, "kind": "result", "metric": "torso-PCK@20", "modality": "wifi-csi", "model_ref": "RuView CSI-Transformer + CORAL domain alignment", "note": "The real deployment frontier (new room). CORAL transductive DG (+30% rel over control). Data-bound: MM-Fi has only 3 source rooms.", "prev_hash": "d989e4e1dbc0182610305fdfbde8b094413b87c913283a46bf41f4afba7a06fd", "protocol": "cross_environment (train E01-03 -> test E04, new room)", "row_hash": "bf370487bde88e198c13877956dab3c83766a6a24afef0b78b6ac7aa130bb207", "score_pct": 17.51, "scored_at": "2026-05-30", "seq": 4, "sota_ref": "(hard frontier; control 13.52)", "submitter": "ruvnet", "tier": "Bronze"}
-1
View File
@@ -1 +0,0 @@
gradio==5.9.1
@@ -1 +1 @@
304d54690af468dc6cbf0f2a1332f109cf187d5e2eab454efd8554cebc45bdeb
120bd7b1f549f57f3773971a389c48c2bdd99b4ab1f205935867a16e95583995
@@ -1 +1 @@
f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a
ca58956c1bbee8c46f1798b3d6b6f1f829aa5db90bba53e07177830eca429199
+16 -148
View File
@@ -185,14 +185,7 @@ def frame_to_csi_data(frame, signal_meta):
# observed pipeline-amplified ULP drift and is still far below any meaningful
# signal change (CSI phase precision is ~1e-3 rad; PSD bins differ by orders
# of magnitude). Round to this precision, then hash.
#
# NOTE: 6 decimals collapses the divergence *across Linux microarchitectures*
# but NOT Windows-vs-Linux, where the pocketfft/BLAS difference exceeds 1e-6 on
# a few elements that then straddle the 6th-decimal rounding boundary. The
# precision is overridable via PROOF_HASH_DECIMALS so it can be coarsened to a
# value that is boundary-stable across *all* platforms (Windows + Linux + macOS)
# while staying far below any signal-meaningful change.
HASH_QUANTIZATION_DECIMALS = int(os.environ.get("PROOF_HASH_DECIMALS", "6"))
HASH_QUANTIZATION_DECIMALS = 6
def features_to_bytes(features):
@@ -212,20 +205,13 @@ def features_to_bytes(features):
"""
parts = []
# Serialize each feature array in declaration order.
# doppler_shift is INTENTIONALLY excluded: it is peak-normalized
# (`spectrum / max(spectrum)` in csi_processor._extract_doppler_features),
# and when the raw spectrum has near-tied peaks the argmax flips under
# cross-microarchitecture FP reordering, renormalizing the whole array
# (O(1) divergence — not absorbable by any tolerance). The remaining five
# features, including the FFT-based PSD, reproduce deterministically and
# provide the proof. (The underlying doppler instability is a production
# reproducibility bug tracked separately.)
# Serialize each feature array in declaration order
for array in [
features.amplitude_mean,
features.amplitude_variance,
features.phase_difference,
features.correlation_matrix,
features.doppler_shift,
features.power_spectral_density,
]:
flat = np.asarray(array, dtype=np.float64).ravel()
@@ -239,45 +225,6 @@ def features_to_bytes(features):
return b"".join(parts)
# ── Cross-platform tolerance gate (issue #560 follow-up) ─────────────────────
# The SHA-256 of fixed-decimal-rounded features is bit-exact only WITHIN one
# CPU microarchitecture. The pocketfft / BLAS kernels in the manylinux
# numpy/scipy wheels reorder floating-point reductions differently across
# microarchs (e.g. a GitHub Azure runner vs a developer box vs another Linux
# host), and the resulting ~1e-6 *relative* drift lands on large-magnitude PSD
# bins as an absolute difference too large for ANY fixed-decimal grid to absorb
# (empirically the hash diverges across microarchs even at 2 decimals). So:
# • the hash is the strong, bit-exact, SAME-platform proof, and
# • a relative tolerance against a committed reference vector is the
# platform-INDEPENDENT proof.
# A run PASSES if either matches. Tolerances sit ~100x over the observed
# microarch drift and ~10x under any signal-meaningful change (CSI phase
# precision ~1e-3 rad), so real pipeline regressions still fail.
TOLERANCE_RTOL = 1e-4
TOLERANCE_ATOL = 1e-6
REFERENCE_VECTOR_FILENAME = "expected_features_reference.npz"
def features_to_vector(features):
"""Concatenate a frame's feature arrays as raw float64 (no rounding).
Mirrors ``features_to_bytes`` ordering but keeps full precision, for the
tolerance-based cross-platform comparison.
"""
# doppler_shift excluded — see features_to_bytes for the rationale
# (peak-normalization argmax instability across CPU microarchitectures).
arrays = [
features.amplitude_mean,
features.amplitude_variance,
features.phase_difference,
features.correlation_matrix,
features.power_spectral_density,
]
return np.concatenate(
[np.asarray(a, dtype=np.float64).ravel() for a in arrays]
)
def compute_pipeline_hash(data_path, verbose=False):
"""Run the full pipeline and compute the SHA-256 hash of all features.
@@ -320,7 +267,6 @@ def compute_pipeline_hash(data_path, verbose=False):
features_count = 0
total_feature_bytes = 0
last_features = None
feature_vectors = []
doppler_nonzero_count = 0
doppler_shape = None
psd_shape = None
@@ -337,7 +283,6 @@ def compute_pipeline_hash(data_path, verbose=False):
if features is not None:
feature_bytes = features_to_bytes(features)
hasher.update(feature_bytes)
feature_vectors.append(features_to_vector(features))
features_count += 1
total_feature_bytes += len(feature_bytes)
last_features = features
@@ -406,11 +351,7 @@ def compute_pipeline_hash(data_path, verbose=False):
"psd_shape": psd_shape,
}
reference_vector = (
np.concatenate(feature_vectors) if feature_vectors else np.array([], dtype=np.float64)
)
return hasher.hexdigest(), reference_vector, stats
return hasher.hexdigest(), stats
def audit_codebase(base_dir=None):
@@ -526,7 +467,7 @@ def main():
print(" This runs the SAME CSIProcessor.preprocess_csi_data() and")
print(" CSIProcessor.extract_features() used in production.")
print()
computed_hash, computed_vector, stats = compute_pipeline_hash(data_path, verbose=args.verbose)
computed_hash, stats = compute_pipeline_hash(data_path, verbose=args.verbose)
# ---------------------------------------------------------------
# Step 3: Hash comparison
@@ -538,11 +479,8 @@ def main():
with open(hash_path, "w") as f:
f.write(computed_hash + "\n")
print(f" Wrote expected hash to {hash_path}")
ref_path = os.path.join(SCRIPT_DIR, REFERENCE_VECTOR_FILENAME)
np.savez_compressed(ref_path, features=computed_vector)
print(f" Wrote reference vector ({computed_vector.size} values) to {ref_path}")
print()
print(" HASH + REFERENCE GENERATED -- run without --generate-hash to verify.")
print(" HASH GENERATED -- run without --generate-hash to verify.")
print("=" * 72)
return
@@ -561,70 +499,13 @@ def main():
print(f" Expected: {expected_hash}")
hash_match = computed_hash == expected_hash
# Cross-platform fallback: if the bit-exact hash differs (different CPU
# microarchitecture reorders the pocketfft/BLAS reductions), accept the run
# when the raw feature vector matches the committed reference within a
# relative tolerance — platform-independent where the hash is not (#560).
tolerance_match = False
max_abs_dev = None
max_rel_dev = None
ref_path = os.path.join(SCRIPT_DIR, REFERENCE_VECTOR_FILENAME)
if not hash_match and os.path.exists(ref_path):
ref_vec = np.load(ref_path)["features"]
if ref_vec.shape == computed_vector.shape:
tolerance_match = bool(
np.allclose(
computed_vector, ref_vec, rtol=TOLERANCE_RTOL, atol=TOLERANCE_ATOL
)
)
diff = np.abs(computed_vector - ref_vec)
max_abs_dev = float(np.max(diff)) if diff.size else 0.0
max_rel_dev = (
float(np.max(diff / np.maximum(np.abs(ref_vec), 1e-12)))
if diff.size
else 0.0
)
if hash_match:
match_status = "MATCH (bit-exact)"
elif tolerance_match:
match_status = f"TOLERANCE MATCH (max rel dev {max_rel_dev:.2e})"
if computed_hash == expected_hash:
match_status = "MATCH"
else:
match_status = "MISMATCH"
print(f" Status: {match_status}")
print()
if not hash_match and max_abs_dev is not None:
block_sizes = [56, 56, 55, 9, 128] # per-frame feature layout (doppler excluded)
block_names = ["amp_mean", "amp_var", "phase_diff", "corr", "psd"]
frame_len = sum(block_sizes)
tol = TOLERANCE_ATOL + TOLERANCE_RTOL * np.abs(ref_vec)
outside = diff > tol
n_out = int(outside.sum())
print(
f" DIVERGENCE: {n_out}/{computed_vector.size} outside tol "
f"({100.0 * n_out / computed_vector.size:.4f}%) "
f"max|d|={max_abs_dev:.3e} maxrel={max_rel_dev:.3e}"
)
if n_out:
wf = np.where(outside)[0] % frame_len
bounds = np.cumsum([0] + block_sizes)
parts = []
for bi, name in enumerate(block_names):
c = int(((wf >= bounds[bi]) & (wf < bounds[bi + 1])).sum())
if c:
parts.append(f"{name}={c}")
print(f" by feature: {', '.join(parts)}")
for w in np.argsort(diff)[::-1][:4]:
b = int(np.searchsorted(bounds, int(w) % frame_len, side="right")) - 1
print(
f" worst idx {int(w)} ({block_names[b]}): "
f"ref={ref_vec[int(w)]:.6g} got={computed_vector[int(w)]:.6g}"
)
print()
# ---------------------------------------------------------------
# Step 4: Audit (if requested or always in full mode)
# ---------------------------------------------------------------
@@ -647,22 +528,14 @@ def main():
# Final verdict
# ---------------------------------------------------------------
print("=" * 72)
if hash_match or tolerance_match:
if computed_hash == expected_hash:
print(" VERDICT: PASS")
print()
if hash_match:
print(" The pipeline produced a SHA-256 hash that matches the published")
print(" expected hash (bit-exact). This proves:")
else:
print(" The bit-exact hash differs (CPU-microarchitecture FP reordering),")
print(" but the raw feature vector matches the published reference within")
print(
f" rtol={TOLERANCE_RTOL:g} / atol={TOLERANCE_ATOL:g} "
f"(max rel dev {max_rel_dev:.2e}). This proves:"
)
print(" The pipeline produced a SHA-256 hash that matches the published")
print(" expected hash. This proves:")
print(" 1. The SAME signal processing code ran on the reference signal")
print(" 2. The output is DETERMINISTIC (same input -> same output)")
print(" 3. No randomness was introduced")
print(" 3. No randomness was introduced (hash would differ)")
print(" 4. The code path includes: noise removal, Hamming windowing,")
print(" amplitude normalization, FFT-based Doppler extraction,")
print(" and power spectral density computation")
@@ -673,19 +546,14 @@ def main():
else:
print(" VERDICT: FAIL")
print()
print(" The pipeline output does NOT match the expected hash OR the")
print(" reference feature vector within tolerance.")
if max_rel_dev is not None:
print(
f" max abs dev: {max_abs_dev:.3e} max rel dev: {max_rel_dev:.3e}"
f" (rtol={TOLERANCE_RTOL:g}, atol={TOLERANCE_ATOL:g})"
)
print(" The pipeline output does NOT match the expected hash.")
print()
print(" Possible causes:")
print(" - Numpy/scipy version mismatch (check requirements)")
print(" - Code change in CSI processor that alters numerical output")
print(" - A real (non-microarch) numerical regression")
print(" - Platform floating-point differences (unlikely for IEEE 754)")
print()
print(" To update after an intentional change:")
print(" To update the expected hash after intentional changes:")
print(" python verify.py --generate-hash")
print("=" * 72)
sys.exit(1)
+2 -8
View File
@@ -6,14 +6,8 @@
#
# To update: change versions, run `python v1/data/proof/verify.py --generate-hash`,
# then commit the new expected_features.sha256.
#
# numpy/scipy track the versions the *published* expected hash
# (expected_features.sha256 = ca58956c…) was generated with — modern numpy 2.x,
# i.e. what a fresh `pip install numpy` and the proof-of-capabilities.md skeptic
# path produce today. The old 1.26.4 pin no longer matched that hash and made
# the determinism gate fail against its own published proof.
numpy==2.4.2
scipy==1.17.1
numpy==1.26.4
scipy==1.14.1
pydantic==2.10.4
pydantic-settings==2.7.1
+3 -7
View File
@@ -221,15 +221,11 @@ class ESP32BinaryParser:
snr = float(rssi - noise_floor)
frequency = float(freq_mhz) * 1e6
bandwidth = 20e6 # default; could infer from n_subcarriers
# Bandwidth inference (issue #1005): HE-LTF uses a 4x denser tone
# grid than HT-LTF on the same channel width — an HE-SU frame with
# 256 bins (242 active HE20 tones) is a *20 MHz* capture, not 160.
if ppdu_byte in (1, 2, 3): # HE-SU / HE-MU / HE-TB
bandwidth = 40e6 if (flags_byte & 0x01) or n_subcarriers > 256 else 20e6
elif n_subcarriers <= 64: # ESP32 HT20 delivers the full 64-bin FFT
if n_subcarriers <= 56:
bandwidth = 20e6
elif n_subcarriers <= 128:
elif n_subcarriers <= 114:
bandwidth = 40e6
elif n_subcarriers <= 242:
bandwidth = 80e6
+3 -12
View File
@@ -107,25 +107,16 @@ class PoseService:
async def _initialize_models(self):
"""Initialize neural network models."""
try:
# Initialize DensePose model. DensePoseHead requires a config
# dict — input_channels matches the modality translator's output
# (256), with the standard DensePose 24 body parts and 2 (U,V)
# coordinates. (Previously called with no args → TypeError at
# startup, which broke the API service.)
densepose_config = {
'input_channels': 256,
'num_body_parts': 24,
'num_uv_coordinates': 2,
}
# Initialize DensePose model
if self.settings.pose_model_path:
self.densepose_model = DensePoseHead(densepose_config)
self.densepose_model = DensePoseHead()
# Load model weights if path is provided
# model_state = torch.load(self.settings.pose_model_path)
# self.densepose_model.load_state_dict(model_state)
self.logger.info("DensePose model loaded")
else:
self.logger.warning("No pose model path provided, using default model")
self.densepose_model = DensePoseHead(densepose_config)
self.densepose_model = DensePoseHead()
# Initialize modality translation
config = {
-137
View File
@@ -1,137 +0,0 @@
# Edge-Latency Benchmark Results — ADR-163
Converting **CLAIMED** edge latency budgets into **MEASURED-on-host** numbers,
closing the measurement debt flagged by Milestones 5/6 (ADR-159 / ADR-160).
Benches + docs only — **no production-code behavior changed**.
## The honest caveat, up front (read before citing any number)
Two distinct gaps separate every number below from the figure it is converting:
1. **Host ≠ ESP32.** The wasm-edge skill modules document budgets *"on ESP32-S3
WASM3"* (e.g. `exo_time_crystal`: "H (<10 ms)"). These benches run **native
x86_64 on a development laptop**, not the Xtensa/WASM3 target. A native host
median is an **upper bound on the algorithm's work**, not the ESP32 number.
WASM3 interpretation on a ~240 MHz Xtensa core is typically 12 orders of
magnitude slower than native `-O` host code, so a host median far under the
budget **does NOT prove the ESP32 meets it.** *The ESP32 figure is NOT
reproduced here — it needs hardware.*
2. **Bench ≠ the doc-claimed measurement.** For the cogs, the manifest cites a
**cold-start** number (`cold_start_ms_avg`, weight-load included); these
benches measure **steady-state** per-frame `infer` (warm, weights resident).
Different measurements; we report both, labelled.
Grades (per `benchmarks/wiflow-std/RESULTS.md` / ADR-152 vocabulary):
- **MEASURED-on-host** — reproduced in this repo on the machine below, exact
command recorded. NOT the ESP32 / NOT the cold-start figure.
- **CLAIMED (ESP32)** — the doc budget; UNMEASURED on hardware here.
## Machine
| | |
|---|---|
| Host | `ruvzen` (Windows 11, this dev box) |
| CPU | Intel Core Ultra 9 285H |
| Toolchain | `cargo 1.91.1`, `--release` (opt-level per crate profile) |
| Bench harness | criterion 0.5 (`time: [low **median** high]` reported below) |
| Date | 2026-06-12 |
Run-to-run spread on this box is non-trivial (criterion's low/high bracket the
median by a few %); the medians below are single-session captures with the smoke
settings `--warm-up-time 1 --measurement-time 2` (wasm-edge) / `3` (cogs). Re-run
for your own machine — the absolute numbers are host-specific.
---
## T1 — wasm-edge `process_frame` hot paths (ADR-160 deferred item → DONE host)
The crate is **excluded from the v2 workspace**; bench from the crate dir.
```bash
cd v2/crates/wifi-densepose-wasm-edge
cargo bench --features std -- --warm-up-time 1 --measurement-time 2
# med_seizure_detect is medical-experimental-gated:
cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure
```
| Hot path (M6-audit-named) | Bench id | Host median | Grade | Doc budget (CLAIMED, ESP32) |
|---|---|---|---|---|
| `exo_time_crystal` 256-pt × 128-lag autocorrelation (full buffer) | `exo_time_crystal::process_frame[autocorr_256x128]` | **17.3 µs** | MEASURED-on-host | "H (<10 ms) on ESP32-S3 WASM3" — **NOT reproduced here (needs hardware)** |
| `exo_ghost_hunter` empty-room periodicity + hidden-breathing | `exo_ghost_hunter::process_frame[empty_room_periodicity]` | **1.44 µs** | MEASURED-on-host | research/exotic; no firm ESP32 figure — host proxy only |
| `sec_weapon_detect` per-subcarrier Welford (MAX_SC=32) | `sec_weapon_detect::process_frame[per_sc_welford]` | **0.42 µs** (420 ns) | MEASURED-on-host | research-grade; calibration-gated — host proxy only |
| `med_seizure_detect` clonic-phase rhythm path (steady-state frame) | `med_seizure_detect::process_frame[clonic_rhythm]` | **0.10 µs** (105 ns) | MEASURED-on-host (feature-gated) | doc budget "S (<5 ms) on ESP32"; **NOT reproduced here** |
Reading these honestly:
- `exo_time_crystal` at **17.3 µs host** is the only one whose host cost is even
in the same *thousandths* of its 10 ms ESP32 budget — it does the most work
(~32K MACs/frame). 17.3 µs native says the algorithm is cheap; it says
**nothing** about whether WASM3-on-Xtensa lands under 10 ms. A naïve
host→ESP32 extrapolation (assume 100× interpreter+clock penalty) would put it
near ~1.7 ms, comfortably under — **but that is an extrapolation, not a
measurement**, and is recorded here only to show the host number is not
obviously in tension with the budget. ESP32 figure: **UNMEASURED**.
- `med_seizure_detect`'s 105 ns is the **steady-state** per-frame cost; the
expensive clonic autocorrelation only fires when the state machine is in the
clonic phase, so this is a lower-bound on the heavy path, not the worst case.
It is still a real, committed host datapoint.
- The pre-existing `tests/budget_compliance.rs` already asserts the L/S/H
wall-clock tiers (25 passing tests); these criterion benches add the
regression-grade, reproducible median that ADR-160 deferred.
---
## T2 — cog steady-state inference latency (ADR-159/160 deferred item → DONE)
Cog crates are normal workspace members; bench from `v2/`. Real weights
(`count_v1.safetensors` / `pose_v1.safetensors`) ship in-repo under each cog's
`cog/artifacts/`, so the bench measures the **real Candle CPU forward**, not the
stub (the bench `assert!`s `backend().starts_with("candle-")`).
```bash
cd v2
cargo bench -p cog-person-count --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3
cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3
```
| Cog | Bench id | Host median (steady-state infer, CPU) | Grade | Manifest cold-start (CLAIMED, different measurement + machine) |
|---|---|---|---|---|
| cog-person-count | `cog_person_count::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | — (person-count manifest carries comparable provenance) |
| cog-pose-estimation | `cog_pose_estimation::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | `cold_start_ms_avg: 5.4` (30 invocations, **ruvultra/RTX 5080 host**, candle 0.9 cpu) — **cold-start, NOT steady-state; NOT this machine** |
> Spread caveat (observed, honest): both medians above were captured with the box
> otherwise idle. A re-run of the validate-form command *while a second cargo job
> was loading the same cores* gave 385 µs (person-count) / 973 µs (pose) —
> the criterion low/high bracket widens to ~0.341.18 ms under contention. The
> 305 µs figures are the idle-box datapoints; the absolute number is host- and
> load-dependent (the ~10× pose swing is core contention, not a code change).
Reading these honestly:
- **Steady-state ≠ cold-start.** The pose manifest's `5.4 ms` folds in one-time
weight load / mmap / first-forward allocation. This bench warms the engine
first and times only the recurring per-frame forward, on a *different
machine*. The two numbers are not comparable and we do not claim this bench
reproduces the 5.4 ms manifest figure.
- Both cogs share the same conv encoder; person-count adds a count head +
confidence head, pose adds a 256-wide MLP head. The host steady-state cost is
dominated by the three dilated Conv1d layers (56→64→128→128) shared by both —
which is why both land at ~305 µs.
- **Empirical confirmation of the steady-state/cold-start gap:** pose
steady-state (305 µs host) is ~18× *under* the manifest's 5.4 ms cold-start.
Even accounting for the different machine, this is the expected shape — the
bulk of cold-start is one-time setup, not the forward pass — and it is exactly
why conflating the two would be dishonest.
---
## Status vs the deferred items
| Deferred item | Was | Now |
|---|---|---|
| ADR-160 "Criterion benches for `process_frame` budget claims" | ACCEPTED-FUTURE | **DONE (host)**; ESP32-on-hardware still **PENDING** (needs the wasm32 target + a flashed ESP32-S3) |
| ADR-159/160 cog inference latency (`cold_start_ms_avg` uncommitted-benched) | CLAIMED | **MEASURED-on-host (steady-state)**; cold-start-on-ruvultra remains the manifest's separate claim |
Nothing here changes runtime behavior — these are benches + this results file
only. No crate needs republishing.
-132
View File
@@ -1,132 +0,0 @@
# Edge-Skill Synthetic-Ground-Truth Validation — RESULTS
**Crate:** `v2/crates/wifi-densepose-wasm-edge` (workspace-EXCLUDED — build from its own dir)
**Branch:** `feat/edge-skills-synthetic-validation`
**ADR:** [ADR-160](../../docs/adr/ADR-160-edge-skill-library-honest-labeling.md)
**Date:** 2026-06-13
**Harness:** `tests/synthetic_validation.rs`
> **HONESTY BOUNDARY — read first.** Everything below is **synthetic-ground-truth
> validation**: a signal is *planted* with a known answer, the **real** detector
> is run, and detection accuracy / precision / recall / rate-error is **measured**.
> This is **NOT field accuracy.** A skill that recovers a planted sinusoid here is
> proven to do the math it claims on a *constructed* signal; it is **NOT** proven
> to work on real CSI in a real room. Skills whose detection target cannot be
> honestly planted (clinical, weapon, affect, sleep-stage, sign-language) are
> **NOT** given a number — they are listed under **DATA-GATED** with the real
> data each would require.
## Reproduce
```bash
cd v2/crates/wifi-densepose-wasm-edge # workspace-excluded; build here
cargo test --features std --test synthetic_validation -- --nocapture
# also runs under the medical tier (med_* skills stay DATA-GATED, not validated):
cargo test --features std,medical-experimental --test synthetic_validation -- --nocapture
```
Each `MEASURED-on-synthetic | …` line printed by the harness is the source of the
table below. Numbers are deterministic (no RNG; pseudo-noise uses a fixed LCG seed).
---
## MEASURED-on-synthetic (constructible skills)
| Skill | What was planted (ground truth) | Result | Grade |
|-------|----------------------------------|--------|-------|
| **vital_trend** | BPM held N≥6 calls at each threshold band (brady/tachy-pnea <12 / >25, brady/tachy-cardia <50 / >120, apnea breathing<1.0 for ≥20) vs normal | **acc 1.000, prec 1.000, recall 1.000** (TP5 FP0 TN5 FN0) | MEASURED |
| **exo_time_crystal** | period-2 coordinated motion vs pseudo-noise + flat | **acc 1.000** (TP1 FP0 TN2 FN0) | MEASURED † |
| **exo_ghost_hunter** (hidden breathing) | phase sinusoid at lag-8 (breathing band 515) in an empty room vs flat phase | **acc 1.000**; planted score **1.000**, flat **0.000** | MEASURED |
| **occupancy** | 220-frame flat-amplitude calibration, then strong per-zone amplitude variance vs flat | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
| **intrusion** | calibrate→arm (330 quiet frames), then per-subcarrier Δphase>1.5 + Δamp≫3σ vs quiet | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
| **exo_rain_detect** | empty room, 60-frame baseline, then broadband variance (8/8 groups, ratio≫2.5) for ≥10 frames vs stable-low | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
| **sig_flash_attention** | sustained high phase+amplitude in each of the 8 subcarrier groups; assert reported attention peak == planted group | **peak-localization 8/8 = 1.000** | MEASURED |
| **spt_spiking_tracker** | sparse (2-subcarrier) large phase-delta in each of the 4 zones; assert tracked zone == planted zone | **zone-localization 4/4 = 1.000** | MEASURED ‡ |
| **sig_optimal_transport** | sustained large frame-to-frame amplitude-distribution change vs stationary | **acc 1.000** (TP1 FP0 TN1 FN0) | MEASURED |
| **sig_mincut_person_match** | 2 persons with distinct stable per-region variance signatures over 40 frames | **person ids assigned, 0 id-swaps / 40 frames** | MEASURED |
| **lrn_dtw_gesture_learn** | stillness → 3 identical gesture rehearsals → enrollment | **template enrolled (templates=1)** | MEASURED (enroll) §|
| **sig_sparse_recovery** | 30 clean frames to init, then 8/32 (25%) nulled subcarriers | **dropout-detect + recovery-trigger = PASS** | MEASURED (trigger) ¶|
### Caveats on individual results
**exo_time_crystal — honest discriminative limit.** A *pure* periodic signal
already has autocorrelation peaks at lag L **and** 2L (natural harmonics), so this
"period-doubling" detector cannot separate a true period-2 sub-harmonic from a
plain periodic signal — an earlier plant using a clean sine produced a *false
positive* (recorded during development). The construct it **can** discriminate
with known ground truth is **periodic-coordination vs aperiodic** (noise/flat),
which is what is measured (1.000). The original "sub-harmonic vs clean period"
claim is **NOT** validatable with this algorithm.
**spt_spiking_tracker — plant must be sparse.** With weights init'd home=1.0 /
cross=0.25, firing all 8 inputs in a zone (8×0.25=2.0 > threshold 1.0) overdrives
*every* output neuron and the tracker collapses to zone 0 (measured 1/4 during
development). Firing only 2 inputs (home 2.0 fires, cross 0.5 silent) yields clean
4/4 zone localization. The validatable claim is *single-zone* localization.
§ **lrn_dtw_gesture_learn — enrollment validated; replay-match NOT.** The
deterministic, constructible part (stillness → 3 identical rehearsals → a template
is enrolled) is MEASURED. The DTW *replay match* (731) did **not** fire on the
identical replay in this run (`match_same=false`) — replay-recognition accuracy is
**reported, not asserted**, and is not claimed as validated.
**sig_sparse_recovery — trigger validated; recovery accuracy is NEGATIVE.**
The dropout-detection + ISTA-recovery *trigger* pipeline fires correctly on >10%
planted nulls (asserted). But the **measured recovery accuracy is NOT a win**:
recovered RMSE **1.0045** vs unrecovered-null RMSE **0.9830** (**2.2%**, i.e.
slightly *worse* than leaving the nulls at zero) on a neighbor-correlated signal.
The tridiagonal correlation model's fixed point does not equal the planted truth.
**The recovery's reconstruction quality is therefore NOT validated as effective on
synthetic data** — only its detection/trigger path is. Reported honestly; no
positive number claimed.
---
## DATA-GATED — NOT validatable on synthetic data
Planting a "seizure-like" / "weapon-like" / "happy-like" synthetic signal and
claiming the detector "works" validates **nothing real** and is exactly the
AI-slop this project fights. These skills run real DSP (per ADR-160, 0 stubs) and
keep their ADR-160 disclaimers, but get **no accuracy number** here. Each needs
the specific real, labelled data listed:
| Skill | Why not constructible on synthetic | Real data required |
|-------|------------------------------------|--------------------|
| `med_seizure_detect` | "seizure-like" motion is not a seizure; no ground-truth signature exists synthetically | Clinical EEG-/video-labelled tonic-clonic seizure CSI from instrumented patients |
| `med_sleep_apnea` | a planted breathing-pause is not clinical apnea (AHI scoring, hypopnea, desaturation) | Polysomnography-labelled (PSG) overnight CSI with scored apnea/hypopnea events |
| `med_cardiac_arrhythmia` | a synthetic HR sequence cannot encode true arrhythmia morphology | ECG-labelled CSI (AFib/PVC/etc.) from clinical monitoring |
| `med_respiratory_distress` | distress is a clinical gestalt, not a plantable rate | Clinician-labelled respiratory-distress CSI episodes |
| `med_gait_analysis` | clinical gait metrics need a reference motion-capture standard | Mocap-/force-plate-labelled gait CSI |
| `sec_weapon_detect` | a high variance ratio is RF reflectivity, **not** weapon discrimination (ADR-160 §A3 already renamed the event to `HIGH_METAL_REFLECTIVITY`) | Labelled metal-object-vs-no-object CSI with controlled object classes |
| `exo_emotion_detect` | affect is not recoverable from a planted heuristic; outputs are proxies (ADR-160 §A2) | Validated affect-labelled CSI (self-report / physiological ground truth) |
| `exo_happiness_score` | "happiness" is a gait-energy proxy, not a measured affect (ADR-160 §A2) | Validated affect/valence-labelled CSI |
| `exo_dream_stage` | sleep staging needs PSG reference (EEG/EOG/EMG) | PSG-staged overnight CSI |
| `exo_gesture_language` | coarse gesture clusters ≠ true sign language (ADR-160 §A4) | Labelled ASL letter/word CSI dataset |
> The above are **not failures** — they are the honest boundary. A smaller set of
> genuinely-measured skills plus this explicit gated list is the deliverable, per
> the prove-everything directive.
---
## Skills not in either list
The remaining edge skills (smart-building / retail / industrial occupancy-style,
the other `sig_*`/`lrn_*`/`spt_*`/`tmp_*`/`qnt_*`/`aut_*`/`ais_*` algorithm-named
modules) are **wired and exercised live** in the unified pipeline integration test
(`tests/pipeline_all.rs`, all 59 default / 64 medical skills run without panic over
300 synthetic frames) but were **not** given an individual planted-ground-truth
accuracy number here. They are honest REAL-DSP modules (ADR-160) whose physical
observable could be planted with more harness work; that is deferred, not claimed.
## Test counts (full crate suite)
```
DEFAULT (--features std): 631 passed, 0 failed
(lib 504; budget 25; honest_labeling 10; pipeline_all 4; synthetic_validation 12; bench 1; vendor 75)
MEDICAL (--features std,medical-experimental): 669 passed, 0 failed
(lib 542; +16 same new tests; med_* stay DATA-GATED, not validated)
```
(M6 baseline was 615 / 653; the new pipeline_all (4) + synthetic_validation (12)
tests add 16 to each tier.)
-26
View File
@@ -1,26 +0,0 @@
# Upstream clone (WiFlow-STD, DY2434) -- never commit third-party code/weights
upstream/
# Local python env
.venv/
# Downloaded data / artifacts
data/
downloads/
*.pth
*.pt
*.npy
*.npz
*.zip
*.mat
*.safetensors
results/parity_fixture.json
__pycache__/
*.onnx
# Committed ground truth: corruption masks for the pristine Kaggle download.
# remote/clean_v2.py zeroes the corrupted source windows IN PLACE, so these
# masks CANNOT be regenerated from a cleaned copy (generate_corruption_masks.py
# documents the criteria and reproduces them only from a fresh download).
!results/nan_windows_mask.npy
!results/big_windows_mask.npy
-486
View File
@@ -1,486 +0,0 @@
# WiFlow-STD (DY2434) Benchmark Results — ADR-152 §2.2
Upstream: <https://github.com/DY2434/WiFlow-WiFi-Pose-Estimation-with-Spatio-Temporal-Decoupling>
pinned at `06899d29` (2026-04-05), Apache-2.0. Dataset: Kaggle `kaka2434/wiflow-dataset`
(12.8 GB archive → 15.5 GB extracted; 360,000 windows of 540×20 CSI + 15-keypoint 2D labels).
Published claims (README "Setting 1"): PCK@20 97.25%, PCK@30 98.63%, PCK@40 99.16%,
PCK@50 99.48%, MPJPE 0.007 m, 2.23M params, 0.07 GFLOPs.
## Measurement (a): their model on their data
### Artifact verification (MEASURED, 2026-06-10, this repo `eval_repro.py`)
| Check | Result |
|---|---|
| Parameter count | **2,225,042 (2.23M) — matches claim** |
| FLOPs (torch profiler, batch 1) | ~0.055 GFLOPs — consistent with 0.07B claim |
| CPU latency (Windows box, torch 2.12 CPU) | 13.2 ms/window @ batch 1 (76/s); 2.48 ms/sample @ batch 64 (403/s) |
| Checkpoint load | `weights_only=True` (no pickle code execution) |
### Released checkpoint does NOT reproduce the claims — REFUTED as shipped
Running the released `best_pose_model.pth` through the released code on the released
dataset with the released split procedure (seed-42 file-level 70/15/15; 54,000 test
samples) yields:
| Metric | Published | Measured (shipped checkpoint) |
|---|---|---|
| PCK@20 | 97.25% | **0.08%** |
| PCK@30 | 98.63% | 0.78% |
| PCK@40 | 99.16% | 5.53% |
| PCK@50 | 99.48% | 15.42% |
| MPJPE | 0.007 | **NaN** (dataset contains NaN CSI windows) |
Raw output: `results/repro_a.json`.
Diagnostics (on 2,000 NaN-free windows from the first files of the dataset, i.e.
mostly would-be *training* data — so this is not a split mismatch):
- Predictions correlate with targets (Pearson r ≈ 0.76) — the checkpoint is a trained
model, but in a **different keypoint normalization/order** than the released data.
- Best-case post-hoc global per-axis affine correction: PCK@20 ≈ 20%.
- Best-case per-keypoint affine correction (15×2 fitted transforms — generous
cheating): PCK@20 ≈ 72%, still far below 97.25%.
- Pred↔target keypoint correspondence matrix is degenerate (multiple predicted
keypoints best-match the same target joint) — keypoint convention mismatch.
### Reproducibility defects in the released artifacts
1. `models/__init__.py` imports `TemporalConvNet`, which `models/tcn.py` does not
define — **the published code does not import/run as-is**.
2. The released root checkpoint uses pre-rename module names (`att.*`, `final_conv.*`)
vs the published code (`attention.*`, `decoder.*`) — same shapes/param count, but
confirms the checkpoint predates the published code.
3. The second shipped checkpoint (`cross_dataset_test/WiFlow/best_pose_model.pth`) is
a **different architecture** (342-channel input = MM-Fi layout, 3 TCN layers,
3-channel/3D decoder) — not usable on their own dataset.
4. `run.py` ignores `--data_dir` and hardcodes `../preprocessed_csi_data`.
5. The released dataset's final 13 files (indices 487499; 9,072 windows, 2.52%)
are corrupted: NaN values plus garbage amplitudes up to 3.4e38 (float32 max) in
data that is otherwise [0,1]-normalized. Upstream code has no NaN/inf handling;
training as published on this download diverges — the first corrupted batch
overflows fp16 autocast and permanently poisons BatchNorm running statistics
(GradScaler step-skipping does not protect BN). The authors' training curves
show normal convergence, so their local data evidently differed from the
Kaggle upload. Window masks: `results/nan_windows_mask.npy`,
`results/big_windows_mask.npy`.
### Reproducing the corruption masks
The two mask files (9,070 NaN/Inf windows, 9,072 with |amplitude| > 1.5;
union 9,072, all in dataset files 487499) are **committed ground truth**
(gitignore-negated, ~352 KB each). They can only be regenerated from a
**pristine** Kaggle download: `remote/clean_v2.py` repairs the dataset by
zeroing the corrupted windows in place, after which the corruption evidence
is gone and a rescan returns all-False. `generate_corruption_masks.py`
re-derives them (chunked scan, criteria: any non-finite value OR
max |finite| > 1.5 per 540×20 window) and refuses to write all-False masks,
which indicate a cleaned copy. Verified 2026-06-11: a regeneration from the
local pristine download is bit-identical to the committed masks.
### Retraining result (MEASURED, 2026-06-10): claims APPROXIMATELY REPRODUCED
Since the shipped checkpoint is unusable, measurement (a) fell back to retraining
with upstream code + defaults (seed 42, batch 64, early-stopped at epoch 41 of 50,
best epoch 36, ~75 s/epoch) on ruvultra (RTX 5080). Deviations, all forced and
documented: one-line fix for defect (1); torch 2.x+cu128 instead of pinned 2.3.1
(Blackwell sm_120 unsupported); the 9,072 corrupted windows (defect 5) zeroed
entirely — without this the published pipeline produces NaN from epoch 1 (observed).
Scripts mirrored in `remote/`; raw metrics in `results/eval_retrained.json`.
| Metric | Published | Retrained (full test, 54,000) | Retrained (corruption-free, 52,560) |
|---|---|---|---|
| PCK@20 | 97.25% | **96.09%** | **96.61%** |
| PCK@30 | 98.63% | 97.89% | 98.23% |
| PCK@40 | 99.16% | 98.58% | 98.79% |
| PCK@50 | 99.48% | 98.99% | 99.11% |
| MPJPE | 0.007 | 0.0098 | 0.0094 |
Within ~0.61.2 PCK points of every published figure (single run, corrupted train
windows zeroed, different torch/GPU). **Verdict: the accuracy claims are credible
and approximately reproducible — but only after repairing the released dataset and
code.** Val best: PCK@20 96.99%, MPJPE 0.0086 (epoch 36).
One more defect found during the run:
6. `train.py` calls `plot_training_history`, which is not defined anywhere — the
built-in post-training test evaluation is unreachable as published (crashes
with NameError after training completes).
## ADR-152 §2.2 citation rule
Evidence grade for the WiFlow-STD accuracy claims after measurement (a):
**MEASURED-EQUIVALENT (96.196.6% PCK@20 reproduced by retraining; shipped
checkpoint REFUTED; dataset/code require repairs)**. RuView docs may cite
"~96% PCK@20 (our reproduction)" — still **not comparable** to our 17-keypoint
ESP32 numbers (different hardware, 5 subjects, in-domain random split,
15 keypoints).
## Edge optimization (measured)
ADR-152 "optimize beyond SOTA" track, 2026-06-10, this Windows box (Windows 11,
16 torch threads, torch 2.12.0+cpu, onnxruntime 1.26.0). Subject: the retrained
checkpoint `results/retrained_best_pose_model.pth` (2,225,042 fp32 params).
Scripts: `quantize_bench.py`, `onnx_bench.py`, `eval_ort_accuracy.py`.
Raw numbers: `results/edge_optimization.json`.
Accuracy is on a **10,000-window seed-42 random subset** of the corruption-free
test split (same seed-42 file-level 70/15/15 split as `eval_repro.py`; 54,000
test windows, 1,440 corrupted excluded via `results/nan_windows_mask.npy` |
`results/big_windows_mask.npy`, leaving 52,560; subset drawn with
`np.random.default_rng(42)`). The fp32 subset PCK@20 (96.68%) matches the full
clean-test figure (96.61%), so the subset is representative.
Latency is CPU ms/window, median of repeated runs, 3 interleaved repetitions
per variant (medians below; run-to-run spread on this box is large, roughly
±20-40% at batch 1 — reps are in the JSON).
| Variant | Disk size | Batch 1 (ms/win) | Batch 64 (ms/win) | PCK@20 | PCK@50 | MPJPE |
|---|---|---|---|---|---|---|
| torch fp32 (baseline) | 9.07 MB | 11.0 | 2.27 | 96.68% | 99.15% | 0.00936 |
| torch fp16 (`.half()`) | **4.58 MB** | 24.3 | 2.42 | 96.68% | 99.15% | 0.00946 |
| torch int8 dynamic | 9.07 MB (unchanged) | 15.6 | 2.06 | 96.68% (identical) | 99.15% | 0.00936 |
| ONNX fp32 (onnxruntime) | 8.97 MB | **3.2** | **2.0** | 96.68% | 99.15% | 0.00936 |
| ONNX int8 (ORT dynamic, supplementary) | **2.44 MB** | 6.5 | 5.8 | 96.52% | 99.15% | 0.01108 |
Findings:
- **torch dynamic INT8 quantizes nothing on this model.** The architecture has
**zero `nn.Linear` layers** — it is entirely Conv1d (21) + Conv2d (22) +
BatchNorm. `torch.ao.quantization.quantize_dynamic` (requested over
`{Linear, Conv1d, Conv2d}`) converted **0 modules / 0.0% of params**: dynamic
quantization only has kernels for Linear/RNN-family modules and silently
skips convolutions. The "int8" model is bit-identical to fp32 (same outputs,
same 9.07 MB). Conv quantization would require static (PTQ) quantization
with calibration — out of scope here; the ORT dynamic path below is the
honest int8 datapoint.
- **fp16 halves size for free accuracy-wise** (PCK@20 0.005 pt, MPJPE
+0.0001) but is *slower* on CPU at batch 1 (~2.2×) — torch CPU fp16 conv
kernels are emulated. fp16 is a storage/transport format here, not a CPU
runtime win.
- **ONNX Runtime is the real batch-1 latency win: ~3.4× faster than torch**
(3.2 vs 11.0 ms/window) at identical accuracy (parity 2.4e-7).
### Verdict on the paper's "~2.2 MB int8" claim
**Plausible but not free, and unreachable by the obvious PyTorch route.**
2,225,042 params × 1 byte ≈ 2.2 MB assumes *every* parameter quantizes.
PyTorch dynamic quantization — the one-liner most readers would reach for —
yields **9.07 MB (0% quantized)** because the model has no Linear layers.
ONNX Runtime dynamic quantization, which does have int8 conv weight support,
gets **2.44 MB** (close to the claim; the overhead is BatchNorm params/buffers
and quantization scales kept in fp32) at a measurable accuracy cost:
PCK@20 96.68 → 96.52% (0.16 pt) and MPJPE 0.00936 → 0.01108 (+18%), and
~2× slower inference than ONNX fp32 (ConvInteger kernels). The paper does not
state a method or an int8 accuracy; treat "2.2 MB" as a weight-arithmetic
estimate, achievable in practice only via conv-capable quantization toolchains
and with a small accuracy penalty.
### ONNX export status
**Works.** Exported via the TorchScript exporter (`dynamo=False`), opset 17,
with a dynamic batch axis — `results/retrained_fp32_dynamic.onnx` (8.97 MB),
verified to run at batch 1/2/64. The axial attention's
`view(N*W, C, H)` reshape traced correctly (sizes recorded as graph ops, not
baked constants). The dynamo exporter also captures the graph but crashed on
this box writing a ✅ to a cp1252 console (cosmetic Windows encoding issue, not
a model blocker). Parity vs torch on the stored fixture
(`results/parity_fixture.npz`, batch 2, seed 42): **max abs diff 2.4e-7 —
PASS** (< 1e-4). ORT-quantized int8 model: `results/retrained_int8_ort_dynamic.onnx`.
### Static PTQ (calibrated) — follow-up
Follow-up to the dynamic-int8 row above (2026-06-10, same box, onnxruntime
1.26.0): ONNX Runtime **static** post-training quantization
(`quantize_static`, QDQ format, per-channel int8 weights + int8 activations)
of the same fp32 export, calibrated on **corruption-free TRAINING-split
windows only** (seed-42 file-level split, same masks; 1,000 windows for
MinMax, 512 for the histogram calibrators; never test windows). Scopes:
"conv-only" (`op_types_to_quantize=["Conv"]` — the attention path exports as
Einsum/Softmax, which ORT never quantizes anyway, so "all-ops" additionally
quantizes the elementwise Mul/Sigmoid/Add/AveragePool glue). Accuracy on the
identical 10k-window seed-42 corruption-free test subset; latency median of
3 interleaved reps (fp32/dynamic re-benched in-session as references).
Script: `static_ptq_bench.py`; raw: `results/edge_optimization.json`
(`onnx_static_ptq`).
| Variant | Disk size | Batch 1 (ms/win) | Batch 64 (ms/win) | PCK@20 | PCK@50 | MPJPE |
|---|---|---|---|---|---|---|
| ONNX fp32 (reference) | 8.97 MB | 2.5 | 1.9 | 96.68% | 99.15% | 0.00936 |
| ORT dynamic int8 (baseline) | **2.44 MB** | 5.7 | 4.6 | 96.52% | 99.15% | 0.01108 |
| static QDQ **Percentile(99.99) conv-only** | 2.53 MB | 5.3 | 4.7 | 96.61% | 99.16% | **0.01031** |
| static QDQ MinMax conv-only | 2.53 MB | 5.2 | 3.3 | **96.63%** | 99.19% | 0.01084 |
| static QDQ Entropy conv-only | 2.53 MB | 5.2 | 3.1 | 96.60% | 99.19% | 0.01078 |
| static QDQ MinMax all-ops | 2.60 MB | 6.5 | 3.9 | 95.45% | 99.14% | 0.01486 |
| static QDQ Entropy all-ops | 2.60 MB | 5.7 | 4.1 | 95.30% | 99.13% | 0.01510 |
| static QDQ Percentile all-ops | 2.60 MB | 5.3 | 4.3 | 96.39% | 99.17% | 0.01218 |
**Verdict: static PTQ (conv-only) is the new best int8 point on accuracy —
but only modestly, and it does not fix int8's latency penalty.**
- **Accuracy: beats dynamic.** All three conv-only calibrations land at
PCK@20 96.6096.63% (vs dynamic 96.52%, fp32 96.68% — recovers ~⅔ of the
dynamic gap) and MPJPE 0.01030.0108 (vs dynamic 0.01108). Best MPJPE:
Percentile conv-only, +10% over fp32 instead of dynamic's +18%.
- **Size: slightly worse.** 2.53 MB vs 2.44 MB (+3.6%) — QDQ nodes and
per-channel scales cost a little; BatchNorm stays fp32 in both (the 12 BNs
follow Slice/Einsum/Reshape, never Conv, so they cannot be folded).
- **Latency: a wash vs dynamic, still ~2× slower than ONNX fp32 at batch 1.**
Batch-1 medians 5.25.3 vs dynamic 5.7 ms/win in-session — within this
box's ±2040% noise. Batch 64 leans static (3.13.3 for MinMax/Entropy
conv-only vs 4.6), same caveat.
- **All-ops QDQ is strictly worse**: up to 1.4 pt PCK@20 and +60% MPJPE for
zero size/latency benefit — int8 activations through the elementwise glue
around the attention blocks is where the damage is. Conv-only is the right
scope.
- Negative result worth recording: **Entropy calibration is a no-op here**
on an identical calibration set it selects full-range thresholds
bit-identical to MinMax (all 247 scales equal; verified on a 64-window
smoke set). Also, ORT 1.26's `CalibMaxIntermediateOutputs` raises a
spurious "No data is collected" when the batch count divides the chunk
size (worked around in the script).
Deployment guidance: need speed → ONNX fp32 (3.2 ms b1). Need int8 weights
for size → static QDQ conv-only (Percentile or MinMax,
`results/retrained_int8_static_percentile_conv.onnx`), which strictly
dominates dynamic int8 on accuracy at ~equal latency and +0.09 MB.
## Efficiency sweep (MEASURED, overnight 2026-06-10/11)
ADR-152 beyond-SOTA track: compact purpose-built variants of the WiFlow-STD
architecture, trained from scratch on the same cleaned dataset, identical
seed-42 file-level split, loss and protocol as the measurement-(a) reference
(fp32, batch 64, ≤50 epochs, patience 5; RTX 5080, ~2229 min/variant).
Variant transforms are pure channel/group/stride scalings of an
architecture-exact parameterized model (validated: reproduces 2,225,042 params
at the reference config). Scripts: `remote/sweep/`; raw:
`results/efficiency_sweep.jsonl`; checkpoints `results/{half,quarter,tiny}_best.pth`
(gitignored).
| Variant | Params | vs 2.23M | Clean-test PCK@20 | PCK@50 | MPJPE | Best epoch |
|---|---|---|---|---|---|---|
| full (reference, meas. a) | 2,225,042 | 1× | 96.61% | 99.11% | 0.0094 | 36 |
| **half** | **843,834** | **0.38×** | **96.62%** | **99.47%** | **0.00898** | 23 |
| quarter | 338,600 | 0.15× | 96.05% | 99.43% | 0.00928 | 50 |
| tiny | 56,290 | 0.025× | 94.11% | 99.36% | 0.0125 | 47 |
Findings:
- **The half model (843k params) strictly dominates the full reference** on
this dataset — equal PCK@20, better PCK@50 and MPJPE, converges in fewer
epochs. The published 2.23M architecture is over-parameterized for its own
benchmark.
- **tiny (56k params, 1/39.5) holds 94.11% PCK@20** — a ~220 KB fp32 /
~60 KB int8-class model in reach of severely constrained edge targets,
at 2.5 pt from the full reference.
- Caveats: in-domain (5-subject random-file split) like every number on this
dataset; single run per variant; corruption-free test subset (52,560).
Cross-domain behavior of compact variants is untested — ADR-150's evidence
says capacity *hurts* cross-subject, so the compact end may generalize no
worse, but that is a hypothesis, not a measurement.
### Compact-variant edge artifacts (MEASURED, 2026-06-11)
Edge pipeline for the **tiny** checkpoint (56,290 params), same machinery and
protocol as the full-model edge rows above (this Windows box, torch
2.12.0+cpu, onnxruntime 1.26.0; dynamic-batch opset-17 TorchScript export;
static QDQ **Percentile(99.99) conv-only** int8 calibrated on **512**
corruption-free TRAIN-split windows; accuracy on the identical 10k-window
seed-42 clean test subset; latency = median ms/window over 3 interleaved
reps, with the full-model fp32/int8 sessions interleaved as same-session
references). Script: `tiny_edge_bench.py`; raw:
`results/edge_optimization.json` (`tiny_variant`). Torch-vs-ORT parity on the
stored fixture input: **max abs diff 1.5e-7 — PASS** (< 1e-4). The tiny fp32
subset PCK@20 (94.11%) matches the full clean-test sweep figure (94.11%)
exactly, so the subset remains representative.
Two forced deviations, both recorded in the JSON:
1. **Adaptive-pool export rewrite.** tiny's derived stride schedule
`[2,1,1,1]` leaves feature width 16, and the TorchScript exporter rejects
`AdaptiveAvgPool2d((15,1))` when 15 is not a factor of the input height
(the full model never hit this — its width was exactly 15). Since the
pool over a fixed-size map is a fixed linear operator, the export wrapper
replaces it with `mean(-1)` (W axis, a factor) + a constant averaging
matmul using PyTorch's exact bin rule; the parity check (vs the original
torch model with the real pool) proves exactness.
2. **Calibration count 512, not "~500"**: ORT 1.26's histogram collector
`np.asarray()`'s the per-batch maxima, so the calibration count must be a
multiple of the 64-window calibration batch or the ragged last batch
crashes it (the earlier static-PTQ run dodged this by using exactly 512).
| Variant | Disk size | Batch 1 (ms/win) | Batch 64 (ms/win) | PCK@20 | PCK@50 | MPJPE |
|---|---|---|---|---|---|---|
| full ONNX fp32 (same-session ref) | 8.97 MB | 2.27 | 1.42 | 96.68% | 99.15% | 0.00936 |
| full static QDQ Percentile conv-only (same-session ref) | 2.53 MB | 5.53 | 3.82 | 96.61% | 99.16% | 0.01031 |
| **tiny ONNX fp32** | **0.295 MB** | **0.66** | **0.24** | **94.11%** | 99.37% | 0.01253 |
| tiny static QDQ Percentile conv-only | 0.248 MB | 0.85 | 1.03 | 92.68% | 99.33% | 0.01491 |
(tiny torch `.pth` checkpoint for reference: 0.34 MB on disk; 56,290 fp32
params ≈ 225 KB of weights.)
Findings:
- **The smallest deployable WiFlow-class model is the tiny ONNX fp32
artifact: ~295 KB on disk, 0.66 ms/window batch-1 CPU (~1,500 windows/s),
94.1% PCK@20** — 30× smaller and ~3.4× faster (in-session) than the full
ONNX fp32 model for 2.6 pt PCK@20.
- **int8 is a bad trade at this scale.** Static QDQ conv-only — the recipe
that cost the full model only 0.07 pt — costs tiny **1.43 pt** PCK@20
(94.11 → 92.68%) and +19% MPJPE, saves only 47 KB (16%; QDQ scales and
the fp32 BN/attention glue are proportionally larger in a small graph),
and is *slower* than tiny fp32 (0.85 vs 0.66 ms b1; 1.03 vs 0.24 ms b64 —
QDQ kernel overhead dominates when the convs are this small). A 56k-param
model has little redundancy left to absorb weight+activation rounding.
- Deployment guidance, compact edition: ship tiny as **ONNX fp32** — at
295 KB the int8 size saving solves no real constraint and costs accuracy
and speed. If ~250 KB vs ~295 KB ever matters, weight-only quantization
would be the thing to try next, not QDQ.
## Measurement (b): BLOCKED-ON-DATA (attempted 2026-06-10)
The fine-tune-on-ESP32 measurement stopped at dataset characterization, per the
pre-registered stop rule (<2,000 paired windows). Findings (MEASURED):
- **Only one trainable paired dataset exists**: `ruvultra:~/work/cog-pose-train/paired.jsonl`
— 1,077 windows (one subject, one room, one 29.9-min session, single node;
CSI [56, 20]; 17 COCO keypoints, MediaPipe confidence mean 0.44 — only 264
windows pass ADR-079's own conf>0.5 training filter). Prior measured attempts
on this exact set: 03% torso-PCK@20 (temporal splits, three independent
pipelines). Fine-tuning a 2.23M-param model on ~860 train windows would
measure memorization, not transfer.
- **The April session behind the old "92.9% PCK@20" claim is lost** (345
samples, 35 subcarriers; raw CSI gone from ruvzen/ruvultra/cognitum-v0; only
a 69-sample predictions+GT holdout survives at `models/wiflow-real/eval-holdout.jsonl`).
- **Forensic recheck of that holdout RETRACTS the 92.9% figure**: the trainer's
`pck()` used an absolute 0.2 image-unit threshold (not torso-normalized) and
the model output a **constant pose** (pred std 0.0000 across 69 near-static
frames; a mean predictor scores 100% under the same protocol). The
torso-normalized PCK@20 on the same holdout is 19.1%. This corroborates the
2026-05-11 audit retraction (CHANGELOG, PR #535); stale doc citations were
removed 2026-06-10 (user-guide, readme-details, ADR-152 §2.1.3). The §2.2
no-citation rule now applies to ADR-079 accuracy claims.
Unblock criteria: a paired collection session of ≥2k windows (≈35+ min at the
observed stride; multi-pose, conf>0.5, ideally with the §2.1.3 two-checkerboard
calibration), plus a re-baselined our-pipeline number under torso-PCK@20 on the
same split. WiFlow-STD assets stand ready on ruvultra (`~/wiflow-std-bench/`).
Also worth investigating: ADR-079's protocol predicts ~9k windows per 30 min;
the May session under-delivered ~8× (aligner drop rate?).
## Measurement (b) (MEASURED 2026-06-10/11)
The data baseline unblocked: the 2026-06-10 22:1022:40 collection session produced
**2,046 paired windows** (`ruvultra:~/wiflow-std-bench/paired-20260610.jsonl`; ONE
subject, ONE room, ONE ESP32 node, varied poses: walk/raise/squat/kick/wave/turn/
jump/sit; aligner `scripts/align-ground-truth.js`, non-overlapping 20-frame windows
~0.42 s; 17 COCO keypoints in normalized [0,1] camera coords; MediaPipe confidence
mean 0.802, min 0.692 — all windows pass the conf>0.5 filter). The 4 h timestamp
bug and the empty-frame confidence-dilution aligner findings are recorded
separately; results only here. Trained on ruvultra (RTX 5080, torch 2.11+cu128,
fp32, batch 32, GPU shared with the efficiency sweep). Scripts mirrored in
`remote/measb/`; raw metrics + full training curves in `results/measurement_b.json`.
### Two new aligner/dataset findings (forced deviations, MEASURED)
1. **`csi_shape` is heterogeneous, not [70, 20]**: 1,347× [70,20], 284× [134,20],
243× [26,20], 130× [12,20], 42× [20,20]. The ESP32 stream emits mixed frame
types and `extractCsiMatrix` stamps each window's subcarrier count from
`window[0].subcarriers`, zero-padding/truncating the other frames — even
native-70 windows contain ~20.4% internally zero-padded short frames
(subcarriers 4069 all-zero). Handling: the primary suite ("all 2,046")
linearly resamples every frame's subcarrier axis to 70 bins (identity for
native-70 frames) so the pre-registered n and split sizes hold; a secondary
suite restricts to the 1,347 native [70,20] windows as a homogeneity check.
2. **Aligner layout bug**: `extractCsiMatrix` fills `matrix[f * nSc + s]`
(frame-major) but declares `shape: [nSc, nFrames]` — the stored shape label is
transposed relative to the data. Confirmed by coherent per-frame zero-tails;
corrected on load (`reshape(nFrames, nSc).T`).
### Protocol (pre-registered, followed)
Temporal split, no shuffling across time: first 70% train (1,432), next 15% val
(307), last 15% test (307); seed 42 elsewhere. Model: learned 1×1 Conv1d 70→540
adapter prepended to the upstream WiFlow-STD trunk; K=17 via the parameter-free
adaptive pool (`AdaptiveAvgPool2d((17,1))` — pretrained weights load strict for
any K). CSI normalized by the TRAIN-split p99 amplitude (129.7 all / 130.9
native-70), clipped to [0,1]. Three runs, ≤60 epochs, early-stop patience 8 on
val MPJPE, AdamW (adapter lr 1e-4; pretrained trunk lr 1e-5, 10× lower; scratch
all 1e-4), fp32. Pretrained init = the measurement-(a) **retrained** checkpoint
(`upstream/test/best_pose_model.pth`, ~96% PCK@20 on WiFlow data; the
`att.`/`final_conv.` key remap from `eval_repro.py` applied defensively — a no-op,
that checkpoint already uses post-rename keys). Frozen-trunk run: trunk
`requires_grad=False` **and** held in `.eval()` so BatchNorm running stats cannot
drift — a pure transfer probe; only the 70→540 adapter (38,340 params) trains.
PCK is torso-normalized with **torso = ‖l_shoulder(5) l_hip(11)‖** (upstream
`calculate_pck` math — per-frame norm clamped at 0.01, mean over keypoints ×
frames — but upstream's `NECK_IDX/PELVIS_IDX = 2, 12` is a 15-keypoint
convention; on 17-kp COCO those indices are right_eye/right_hip, so the indices
were replaced, not the math). MPJPE is in normalized image units (not meters).
### Results — primary suite, all 2,046 windows (test = last 307)
| Run | PCK@10 | PCK@20 | PCK@30 | PCK@40 | PCK@50 | MPJPE | pred std | best ep |
|---|---|---|---|---|---|---|---|---|
| **mean-pose baseline** (honesty bar) | **73.1%** | **95.9%** | **98.7%** | 99.3% | 99.3% | **0.0148** | 0 (by constr.) | — |
| (i) pretrained-init, full fine-tune | 26.0% | 65.0% | 88.0% | 96.4% | 98.9% | 0.0313 | 0.0113 | 58/60 |
| (ii) scratch | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.2554 | 0.0002 | 4 (stop @13) |
| (iii) frozen-trunk (adapter only) | 0.0% | 0.0% | 0.2% | 3.2% | 14.4% | 0.1260 | 0.0073 | 59/60 |
Secondary suite (native [70,20] windows only, n=1,347, test=202) reproduces the
same ordering: mean-baseline 96.0% / pretrained 67.1% / scratch 0.0% /
frozen-trunk 0.0% PCK@20 (MPJPE 0.0153 / 0.0318 / 0.2236 / 0.1343) — the
subcarrier-resampling choice does not change any conclusion.
### Interpretation
- **Did pretraining-transfer happen? Partially — as optimization transfer, not
feature transfer, and not past the honesty bar.**
- *Pretrained vs scratch*: dramatic (65.0% vs 0.0% PCK@20). The pretrained init
is the only configuration that trains at all under the pre-registered budget.
- *Frozen-trunk*: near-zero (0.0% PCK@20, 14.4% @50). WiFlow-STD's frozen
features do **not** transfer to our ESP32 domain through a linear subcarrier
adapter — the pretrained benefit is a well-conditioned initialization (incl.
calibrated BN/output scales), not reusable CSI→pose features.
- *Everything vs mean-pose baseline*: **no run beats it.** A constant
train-mean pose scores 95.9% torso-PCK@20 / 0.0148 MPJPE on this test split,
because a single subject in one camera frame barely moves in normalized
coordinates. The fine-tuned model is a real, non-constant model
(pred std 0.0113 > 0 — passes the constant-pose detector that retracted the
old 92.9% figure) but its deviations from the mean hurt: it fits train-period
temporal dynamics that do not generalize across the temporal split.
- **Verdict for ADR-152 §2.2(b): fine-tuning WiFlow-STD on this dataset does not
demonstrate CSI→pose signal beyond the mean pose.** Until a model beats the
mean-pose baseline on a temporal split, no PCK number from this line may be
cited as pose-estimation capability.
### Caveats (honest, pre-registered)
- Single subject, single room, single session (30 min), single ESP32 node —
in-domain temporal split only; nothing here speaks to cross-room or
cross-subject generalization.
- 2k windows vs the 360k-window WiFlow-STD corpus — **NOT comparable** to the
~96% in-domain measurement-(a) number, and the published 97.25% even less so.
- The scratch run's total collapse (it cannot even reach the mean pose; its
output BatchNorm/SiLU head must learn output scale from random init at lr 1e-4)
is an optimization outcome under the fixed budget, not proof the architecture
cannot learn from scratch — the pretrained-vs-scratch gap partially reflects
this conditioning advantage.
- Mixed-subcarrier frames (finding 1) mean even the "clean" windows carry ~20%
zero-padded frames; collection-side frame-type filtering should precede the
next session.
- Mean-baseline PCK is inflated by low pose variance relative to torso size
(~0.20.3 image units); PCK@10 (73.1%) shows the same ceiling effect at a
stricter threshold — the bar is the bar, but a livelier dataset would lower it.
## Pending
- (b) fine-tune on our ESP32 17-keypoint eval set — **MEASURED 2026-06-10/11**,
see above: no run beats the mean-pose baseline; pretraining transfers as
optimization aid only.
- (c) our internal WiFlow on their dataset (15-keypoint subset mapping) — also
affected: there is currently no validated internal pose model to compare
(the 92.9% artifact is retracted; the MM-Fi SOTA models in ADR-150 §3 are a
different input domain).
-200
View File
@@ -1,200 +0,0 @@
"""Shared infrastructure for the LOCAL wiflow-std benchmark scripts (ADR-152).
This module is the single canonical implementation of the helpers that were
previously copy-pasted across eval_repro.py / quantize_bench.py /
onnx_bench.py / eval_ort_accuracy.py / export_to_safetensors.py:
- ``import_upstream()`` -- sys.path setup + the models-package stub that
works around the upstream import bug, plus the >1GB np.load mmap patch
- ``install_np_load_mmap_patch()`` -- the mmap patch on its own
- ``remap_legacy_keys()`` / ``load_remapped_state()`` -- checkpoint
key remap for the pre-rename released checkpoint
- ``load_wiflow_model()`` -- WiFlowPoseModel from a checkpoint, eval mode
- ``set_seed()`` -- mirrors upstream run.py seeding exactly
- ``evaluate()`` -- THE canonical batch-weighted PCK/MPJPE evaluation loop
(thresholds 0.1-0.5, upstream utils/metrics.py math); accepts either a
torch nn.Module or an onnxruntime InferenceSession
The scripts under remote/ deploy to ruvultra as standalone single files and
therefore intentionally inline private copies of these helpers; when editing
them, treat this module as the reference implementation and keep the copies
in sync.
"""
import os
import random
import sys
import time
import types
import numpy as np
import torch
HERE = os.path.dirname(os.path.abspath(__file__))
UPSTREAM = os.path.join(HERE, "upstream")
RESULTS = os.path.join(HERE, "results")
DEFAULT_THRESHOLDS = (0.1, 0.2, 0.3, 0.4, 0.5)
# ---------------------------------------------------------------------------
# >1GB np.load mmap patch
# ---------------------------------------------------------------------------
# csi_windows.npy is ~13 GB; mmap large arrays instead of loading into RAM
# (loading it eagerly needs ~15 GB).
_np_load = np.load
def _np_load_mmap(path, *a, **kw):
if (isinstance(path, str) and path.endswith(".npy")
and os.path.getsize(path) > 1 << 30 and "mmap_mode" not in kw):
kw["mmap_mode"] = "r"
return _np_load(path, *a, **kw)
def install_np_load_mmap_patch():
"""Globally patch np.load so .npy files >1GB are mmap'd read-only.
Idempotent. Patching the numpy module attribute is equivalent to the
historical ``upstream_dataset.np.load = _np_load_mmap`` (dataset.np IS
the numpy module), but works regardless of import order.
"""
np.load = _np_load_mmap
# ---------------------------------------------------------------------------
# upstream import shim
# ---------------------------------------------------------------------------
def import_upstream(mmap_patch=True):
"""Make the upstream WiFlow-STD clone importable; returns its path.
Upstream bug: models/__init__.py imports TemporalConvNet, which
models/tcn.py does not define -- the package fails to import as
published. Register a stub package so the broken __init__ never
executes; submodules (models.pose_model etc.) still resolve via
__path__. Idempotent.
"""
if UPSTREAM not in sys.path:
sys.path.insert(0, UPSTREAM)
if "models" not in sys.modules:
_models_pkg = types.ModuleType("models")
_models_pkg.__path__ = [os.path.join(UPSTREAM, "models")]
sys.modules["models"] = _models_pkg
if mmap_patch:
install_np_load_mmap_patch()
return UPSTREAM
# ---------------------------------------------------------------------------
# checkpoint loading
# ---------------------------------------------------------------------------
# The released checkpoint predates the published code: modules were renamed
# att -> attention, final_conv -> decoder (param count identical, 2.23M).
LEGACY_RENAMES = {"att.": "attention.", "final_conv.": "decoder."}
def remap_legacy_keys(state):
"""Remap pre-rename state_dict keys; no-op for already-new-style keys."""
return {next((new + k[len(old):] for old, new in LEGACY_RENAMES.items()
if k.startswith(old)), k): v
for k, v in state.items()}
def load_remapped_state(path, map_location="cpu"):
"""torch.load (weights_only) + legacy key remap."""
state = torch.load(path, map_location=map_location, weights_only=True)
return remap_legacy_keys(state)
def load_wiflow_model(checkpoint, map_location="cpu", dropout=0.5):
"""Full-size WiFlowPoseModel from a checkpoint, strict load, eval mode."""
import_upstream()
from models.pose_model import WiFlowPoseModel
model = WiFlowPoseModel(dropout=dropout)
model.load_state_dict(load_remapped_state(checkpoint, map_location),
strict=True)
model.eval()
return model
# ---------------------------------------------------------------------------
# seeding
# ---------------------------------------------------------------------------
def set_seed(seed=42):
# mirror upstream run.py exactly
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
# ---------------------------------------------------------------------------
# THE canonical evaluation loop
# ---------------------------------------------------------------------------
def evaluate(model, loader, device=None, dtype=None, label="",
thresholds=DEFAULT_THRESHOLDS, progress_every=50):
"""Batch-weighted PCK/MPJPE over a DataLoader (upstream metrics math).
``model`` may be a torch nn.Module (optionally evaluated on ``device``
with inputs cast to ``dtype``) or an onnxruntime InferenceSession.
Per-threshold PCK values are independent in upstream calculate_pck, so
evaluating a superset of thresholds never changes any individual value.
Returns {"samples", "mpjpe", "pck@10".."pck@50", "wall_seconds"}.
"""
import_upstream()
from utils.metrics import calculate_mpjpe, calculate_pck
is_ort = hasattr(model, "get_inputs") # onnxruntime InferenceSession
if is_ort:
inp = model.get_inputs()[0].name
def forward(bx):
return torch.from_numpy(model.run(None, {inp: bx.numpy()})[0])
else:
model.eval()
def forward(bx):
if device is not None:
bx = bx.to(device)
if dtype is not None:
bx = bx.to(dtype)
return model(bx).float()
thresholds = list(thresholds)
totals = {t: 0.0 for t in thresholds}
total_mpe, n = 0.0, 0
t0 = time.time()
with torch.no_grad():
for batch_idx, (bx, by) in enumerate(loader):
out = forward(bx)
if device is not None and not is_ort:
by = by.to(device)
mpe = calculate_mpjpe(out, by)
pck = calculate_pck(out, by, thresholds=thresholds)
bs = by.size(0)
total_mpe += mpe * bs
for t in totals:
totals[t] += pck[t] * bs
n += bs
if batch_idx % progress_every == 0:
tag = f"[{label}] " if label else ""
pck20 = totals.get(0.2)
pck20_str = f"pck20={pck20 / n:.4f} " if pck20 is not None else ""
print(f" {tag}batch {batch_idx}: n={n} {pck20_str}"
f"mpjpe={total_mpe / n:.4f} ({time.time() - t0:.0f}s)",
flush=True)
return {
"samples": n,
"mpjpe": total_mpe / n,
**{f"pck@{int(t * 100)}": totals[t] / n for t in thresholds},
"wall_seconds": time.time() - t0,
}
@@ -1,67 +0,0 @@
"""ADR-152 edge optimization: accuracy of the ONNX fp32 and ORT-dynamic-int8
models on the same corruption-free 10k test subset used by quantize_bench.py.
The torch dynamic-int8 path quantizes nothing (no nn.Linear in the model), so
the only real int8 datapoint for the paper's "~2.2 MB int8" claim is the
onnxruntime dynamically quantized model -- this script measures what that
quantization costs in PCK/MPJPE.
Usage:
.venv/Scripts/python.exe eval_ort_accuracy.py \
--data-dir <preprocessed_csi_data> [--subset 10000]
Writes/merges into results/edge_optimization.json under key "onnx_accuracy".
"""
import argparse
import json
import os
import sys
HERE = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, HERE)
from _bench_common import RESULTS, evaluate # noqa: E402
from quantize_bench import build_test_subset # noqa: E402 (sets up upstream imports)
def evaluate_ort(sess, loader, label):
"""ORT-session evaluation via the canonical _bench_common.evaluate loop."""
return evaluate(sess, loader, label=label)
def main():
import onnxruntime as ort
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", default=os.path.join(
os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
"wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
parser.add_argument("--subset", type=int, default=10000)
parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
args = parser.parse_args()
loader, _n_clean = build_test_subset(args.data_dir, args.subset)
results = {}
for label, fname in (("onnx_fp32", "retrained_fp32_dynamic.onnx"),
("onnx_int8_ort_dynamic", "retrained_int8_ort_dynamic.onnx")):
path = os.path.join(RESULTS, fname)
if not os.path.exists(path):
results[label] = {"error": f"{fname} not found; run onnx_bench.py first"}
continue
sess = ort.InferenceSession(path, providers=["CPUExecutionProvider"])
print(f"=== accuracy: {label} ({fname}) ===")
results[label] = evaluate_ort(sess, loader, label)
print(json.dumps(results[label], indent=2))
merged = {}
if os.path.exists(args.out):
with open(args.out) as f:
merged = json.load(f)
merged["onnx_accuracy"] = results
with open(args.out, "w") as f:
json.dump(merged, f, indent=2)
print(f"wrote {args.out}")
if __name__ == "__main__":
main()
-102
View File
@@ -1,102 +0,0 @@
"""ADR-152 §2.2 measurement (a): reproduce WiFlow-STD (DY2434) published test metrics.
Runs the released pretrained checkpoint (upstream/best_pose_model.pth) against the
released Kaggle dataset (kaka2434/wiflow-dataset) using the upstream code path:
identical dataset class, identical file-level 70/15/15 split at seed 42, identical
PCK/MPJPE implementations (utils/metrics.py).
Published claims (README, "Setting 1 random split"):
PCK@20 97.25% | PCK@30 98.63% | PCK@40 99.16% | PCK@50 99.48% | MPJPE 0.007 m
Usage:
.venv/Scripts/python.exe eval_repro.py --data-dir <dir containing csi_windows.npy>
"""
import argparse
import json
import os
import sys
import torch
from torch.utils.data import DataLoader
from _bench_common import (UPSTREAM, evaluate, import_upstream,
load_remapped_state, set_seed)
import_upstream() # sys.path + models stub + >1GB np.load mmap patch
from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders # noqa: E402
from models.pose_model import WiFlowPoseModel # noqa: E402
def find_data_dir(root):
for dirpath, _dirnames, filenames in os.walk(root):
if "csi_windows.npy" in filenames:
return dirpath
return None
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", required=True,
help="Directory containing csi_windows.npy (searched recursively)")
parser.add_argument("--checkpoint", default=os.path.join(UPSTREAM, "best_pose_model.pth"))
parser.add_argument("--batch-size", type=int, default=64)
parser.add_argument("--out", default=os.path.join(os.path.dirname(os.path.abspath(__file__)),
"results", "repro_a.json"))
args = parser.parse_args()
data_dir = args.data_dir
if not os.path.exists(os.path.join(data_dir, "csi_windows.npy")):
located = find_data_dir(data_dir)
if located is None:
sys.exit(f"csi_windows.npy not found under {data_dir}")
data_dir = located
print(f"data dir: {data_dir}")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}, torch {torch.__version__}")
set_seed(42)
dataset = PreprocessedCSIKeypointsDataset(
data_dir=data_dir, keypoint_scale=1000.0, enable_temporal_clean=True)
# split must match upstream: file-level shuffle at random_seed=42, 70/15/15
_train_loader, _val_loader, test_loader = create_preprocessed_train_val_test_loaders(
dataset=dataset, batch_size=args.batch_size, num_workers=0, random_seed=42)
model = WiFlowPoseModel(dropout=0.5).to(device)
# released checkpoint predates the published code: modules were renamed
# att -> attention, final_conv -> decoder (param count identical, 2.23M)
state = load_remapped_state(args.checkpoint, map_location=device)
model.load_state_dict(state, strict=True)
n_params = sum(p.numel() for p in model.parameters())
print(f"checkpoint: {args.checkpoint} ({n_params/1e6:.2f}M params)")
# upstream also evaluates with drop_last=True; we report the full test set
# (drop_last=False) and the drop_last variant for exact comparability
results = {"published": {"pck@20": 0.9725, "pck@30": 0.9863, "pck@40": 0.9916,
"pck@50": 0.9948, "mpjpe": 0.007},
"params_millions": n_params / 1e6,
"data_dir": data_dir,
"device": str(device)}
print("=== test set (full, drop_last=False) ===")
results["test_full"] = evaluate(model, test_loader, device=device)
print(json.dumps(results["test_full"], indent=2))
test_loader_dl = DataLoader(test_loader.dataset, batch_size=args.batch_size,
shuffle=False, drop_last=True)
print("=== test set (drop_last=True, as upstream train.py) ===")
results["test_drop_last"] = evaluate(model, test_loader_dl, device=device)
print(json.dumps(results["test_drop_last"], indent=2))
os.makedirs(os.path.dirname(args.out), exist_ok=True)
with open(args.out, "w") as f:
json.dump(results, f, indent=2)
print(f"wrote {args.out}")
if __name__ == "__main__":
main()
@@ -1,174 +0,0 @@
"""ADR-152 §2.2: export the retrained WiFlow-STD PyTorch checkpoint to
safetensors with tch-rs (VarStore) variable names, plus a numerical-parity
fixture for the Rust port.
Outputs (all under results/, gitignored):
retrained_wiflow_std.safetensors -- 248 f32 tensors named exactly as the
Rust WiFlowStdModel VarStore expects
(see wiflow_std/model.rs
`dump_variable_names` for the
authoritative name dump)
parity_fixture.npz -- deterministic input (seed 42,
shape (2, 540, 20), uniform [0,1]) and
the Python model's eval-mode output
parity_fixture.json -- same data as flattened f32 lists, for
the dependency-free Rust test
(tests/test_wiflow_std_parity.rs)
PyTorch -> tch key mapping (derived from the VarStore dump, not guessed):
tcn.network.{i}.conv1_group.weight -> tcn{i}.conv1_group.weight
tcn.network.{i}.bn*_{group,pw}.<leaf> -> tcn{i}.bn*_{group,pw}.<leaf>
tcn.network.{i}.downsample.0.weight -> tcn{i}.ds_conv.weight
tcn.network.{i}.downsample.1.<leaf> -> tcn{i}.ds_bn.<leaf>
up.block.{0,1,4,5,8,9}.<leaf> -> conv_in.{conv1,bn1,conv2,bn2,conv3,bn3}.<leaf>
up.downsample.{0,1}.<leaf> -> conv_in.{ds_conv,ds_bn}.<leaf>
residual_blocks.{i}.block.{...}.<leaf> -> conv{i}.{conv1..bn3}.<leaf>
residual_blocks.{i}.downsample.{0,1} -> conv{i}.{ds_conv,ds_bn}
attention.{width,height}_axis.qkv_transform.weight
-> attention.{width,height}.qkv.weight
attention.{width,height}_axis.bn_* -> attention.{width,height}.bn_*
decoder.{0,1,3,4}.<leaf> -> {dec_conv1,dec_bn1,dec_conv2,dec_bn2}.<leaf>
*.num_batches_tracked -> dropped (tch BatchNorm has no such buffer)
Legacy upstream names (att. -> attention., final_conv. -> decoder.) are
remapped first, exactly as eval_repro.py does for the released checkpoint.
Usage:
.venv/Scripts/python.exe export_to_safetensors.py
"""
import json
import os
import re
import numpy as np
import torch
from safetensors.torch import save_file
from _bench_common import RESULTS, import_upstream, remap_legacy_keys
import_upstream() # sys.path + models stub
from models.pose_model import WiFlowPoseModel # noqa: E402
CHECKPOINT = os.path.join(RESULTS, "retrained_best_pose_model.pth")
# Sequential index -> tch sub-name inside one ConvBlock1/AsymmetricConvBlock:
# [Conv2d(0), BN(1), SiLU(2), Dropout2d(3), Conv2d(4), BN(5), SiLU(6),
# Dropout2d(7), Conv2d(8), BN(9)]
_BLOCK_IDX = {"0": "conv1", "1": "bn1", "4": "conv2", "5": "bn2",
"8": "conv3", "9": "bn3"}
_DS_IDX = {"0": "ds_conv", "1": "ds_bn"}
_DECODER_IDX = {"0": "dec_conv1", "1": "dec_bn1", "3": "dec_conv2",
"4": "dec_bn2"}
def _conv_block(new_prefix: str, rest: str) -> str:
m = re.fullmatch(r"block\.(\d+)\.(.+)", rest)
if m:
return f"{new_prefix}.{_BLOCK_IDX[m.group(1)]}.{m.group(2)}"
m = re.fullmatch(r"downsample\.(\d+)\.(.+)", rest)
if m:
return f"{new_prefix}.{_DS_IDX[m.group(1)]}.{m.group(2)}"
raise KeyError(f"unmapped conv-block key: {new_prefix} / {rest}")
def map_key(key: str) -> str:
"""Map one PyTorch state_dict key to the tch VarStore name."""
m = re.fullmatch(r"tcn\.network\.(\d+)\.(.+)", key)
if m:
i, rest = m.groups()
rest = (rest.replace("downsample.0.", "ds_conv.")
.replace("downsample.1.", "ds_bn."))
return f"tcn{i}.{rest}"
m = re.fullmatch(r"up\.(.+)", key)
if m:
return _conv_block("conv_in", m.group(1))
m = re.fullmatch(r"residual_blocks\.(\d+)\.(.+)", key)
if m:
return _conv_block(f"conv{m.group(1)}", m.group(2))
m = re.fullmatch(r"attention\.(width|height)_axis\.(.+)", key)
if m:
axis, rest = m.groups()
rest = rest.replace("qkv_transform.", "qkv.")
return f"attention.{axis}.{rest}"
m = re.fullmatch(r"decoder\.(\d+)\.(.+)", key)
if m:
return f"{_DECODER_IDX[m.group(1)]}.{m.group(2)}"
raise KeyError(f"unmapped checkpoint key: {key}")
def main():
state = torch.load(CHECKPOINT, map_location="cpu", weights_only=True)
if not isinstance(state, dict) or "tcn.network.0.conv1_group.weight" not in {
k for k in state
} | {k.replace("att.", "attention.") for k in state}:
# tolerate trainer wrappers like {"model_state_dict": ...}
for wrapper in ("model_state_dict", "state_dict", "model"):
if isinstance(state, dict) and wrapper in state:
state = state[wrapper]
break
# Legacy upstream names predate the published code (_bench_common).
state = remap_legacy_keys(state)
mapped = {}
dropped = 0
for k, v in state.items():
if k.endswith("num_batches_tracked"):
dropped += 1
continue
tch_key = map_key(k)
if tch_key in mapped:
raise KeyError(f"duplicate mapped key: {k} -> {tch_key}")
mapped[tch_key] = v.detach().to(torch.float32).contiguous()
n_params = sum(v.numel() for k, v in mapped.items()
if "running_" not in k)
print(f"checkpoint tensors: {len(state)} "
f"(dropped {dropped} num_batches_tracked)")
print(f"mapped tensors: {len(mapped)}, "
f"non-buffer params: {n_params/1e6:.6f}M")
assert len(mapped) == 248, f"expected 248 tch variables, got {len(mapped)}"
assert n_params == 2_225_042, f"param count mismatch: {n_params}"
st_path = os.path.join(RESULTS, "retrained_wiflow_std.safetensors")
save_file(mapped, st_path)
print(f"wrote {st_path}")
# ---- parity fixture --------------------------------------------------
model = WiFlowPoseModel(dropout=0.5)
model.load_state_dict(state, strict=True)
model.eval()
gen = torch.Generator().manual_seed(42)
x = torch.rand(2, 540, 20, generator=gen, dtype=torch.float32)
with torch.no_grad():
y = model(x)
print(f"fixture input {tuple(x.shape)} -> output {tuple(y.shape)}, "
f"output range [{y.min().item():.6f}, {y.max().item():.6f}]")
np.savez(os.path.join(RESULTS, "parity_fixture.npz"),
input=x.numpy(), output=y.numpy())
fixture = {
"seed": 42,
"input_shape": list(x.shape),
"input": x.flatten().tolist(),
"output_shape": list(y.shape),
"output": y.flatten().tolist(),
}
json_path = os.path.join(RESULTS, "parity_fixture.json")
with open(json_path, "w") as f:
json.dump(fixture, f)
print(f"wrote {os.path.join(RESULTS, 'parity_fixture.npz')}")
print(f"wrote {json_path}")
if __name__ == "__main__":
main()
@@ -1,148 +0,0 @@
"""Regenerate results/nan_windows_mask.npy + results/big_windows_mask.npy by
scanning a PRISTINE kagglehub download of the WiFlow-STD dataset
(kaka2434/wiflow-dataset v1, csi_windows.npy, 360,000 windows of 540x20).
============================ READ THIS FIRST ===============================
This script MUST be run against an UNCLEANED copy of the dataset.
remote/clean_v2.py (and its predecessor clean_nan.py) repair the dataset by
zeroing the corrupted windows IN PLACE, with no backup. A cleaned copy
contains no non-finite values and no out-of-range amplitudes, so on a cleaned
copy this scan produces ALL-FALSE masks -- silently wrong ground truth. The
script errors out loudly in that case (see the sanity check in main()).
That irreversibility is exactly why the two committed mask files under
results/ (gitignore-negated) are the canonical ground truth: once a download
has been cleaned, the masks can NEVER be regenerated from it. Only run this
on a fresh `kagglehub.dataset_download("kaka2434/wiflow-dataset")`.
============================================================================
Criteria (per window; mirrors the original 2026-06-10 scan and the
remote/clean_v2.py repair criteria):
nan mask: any non-finite value (NaN/Inf) anywhere in the 540x20 window
big mask: max |finite value| > 1.5 (the data is otherwise [0,1]-normalized;
the corrupted files contain garbage up to 3.4e38, float32 max)
Expected result on the pristine Kaggle download (RESULTS.md defect 5):
nan: 9,070 True | big: 9,072 True | union: 9,072 -- all windows in dataset
files 487-499 (the final 13 files), window indices 350,922-359,999.
Usage:
PYTHONUTF8=1 .venv/Scripts/python.exe generate_corruption_masks.py \
[--data-dir <dir containing csi_windows.npy>] [--out-dir results]
"""
import argparse
import os
import sys
import numpy as np
HERE = os.path.dirname(os.path.abspath(__file__))
RESULTS = os.path.join(HERE, "results")
EXPECTED = {"nan": 9070, "big": 9072, "union": 9072,
"files": (487, 499), "windows": (350922, 359999)}
def scan(csi_path, chunk=4000):
"""Chunked scan of the (mmap'd) windows array; returns (nan_mask, big_mask)."""
csi = np.load(csi_path, mmap_mode="r")
n = len(csi)
nan_mask = np.zeros(n, dtype=bool)
big_mask = np.zeros(n, dtype=bool)
for i in range(0, n, chunk):
block = np.asarray(csi[i:i + chunk])
finite = np.isfinite(block)
nan_mask[i:i + chunk] = (~finite).any(axis=(1, 2))
big_mask[i:i + chunk] = (
np.abs(np.where(finite, block, 0)).max(axis=(1, 2)) > 1.5)
if (i // chunk) % 10 == 0:
print(f" scanned {min(i + chunk, n):,}/{n:,} windows "
f"(nan={int(nan_mask.sum()):,} big={int(big_mask.sum()):,})",
flush=True)
return nan_mask, big_mask
def describe_files(data_dir, mask):
"""Map marked windows to dataset file indices via window_info.npz."""
info = os.path.join(data_dir, "window_info.npz")
if not os.path.exists(info):
return None
w2f = np.load(info)["window_to_file"]
return np.unique(w2f[mask])
def main():
parser = argparse.ArgumentParser(
description="Regenerate the corruption masks from a PRISTINE "
"(uncleaned) kagglehub download. See module docstring.")
parser.add_argument("--data-dir", default=os.path.join(
os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
"wiflow-dataset", "versions", "1", "preprocessed_csi_data"),
help="Directory containing csi_windows.npy (PRISTINE copy)")
parser.add_argument("--out-dir", default=RESULTS,
help="Where to write the two .npy masks")
parser.add_argument("--chunk", type=int, default=4000,
help="Windows per scan chunk (memory/speed tradeoff)")
args = parser.parse_args()
csi_path = os.path.join(args.data_dir, "csi_windows.npy")
if not os.path.exists(csi_path):
sys.exit(f"csi_windows.npy not found in {args.data_dir}")
print(f"scanning {csi_path} (chunk={args.chunk}) ...")
nan_mask, big_mask = scan(csi_path, args.chunk)
union = nan_mask | big_mask
print(f"nan: {int(nan_mask.sum()):,} | big: {int(big_mask.sum()):,} | "
f"union: {int(union.sum()):,} of {len(union):,} windows")
# ---- sanity check: an all-False result means a CLEANED copy ------------
if not union.any():
sys.exit(
"ERROR: scan found ZERO corrupted windows.\n"
"\n"
"The pristine Kaggle download (kaka2434/wiflow-dataset v1) is "
"known to contain\n"
"9,072 corrupted windows (NaN/Inf + amplitudes up to 3.4e38) in "
"dataset files\n"
"487-499 (RESULTS.md, reproducibility defect 5). Finding none "
"means this copy\n"
"has almost certainly already been repaired by remote/clean_v2.py "
"(or clean_nan.py),\n"
"which zeroes the corrupted windows IN PLACE -- after that the "
"corruption evidence\n"
"is gone and the masks CANNOT be regenerated from this copy.\n"
"\n"
"Refusing to overwrite the committed ground-truth masks with "
"all-False ones.\n"
"Re-download the dataset (kagglehub.dataset_download("
"'kaka2434/wiflow-dataset'))\n"
"and point --data-dir at the fresh, uncleaned copy.")
files = describe_files(args.data_dir, union)
if files is not None:
print(f"marked windows span dataset files {files.min()}-{files.max()}: "
f"{files.tolist()}")
lo, hi = EXPECTED["files"]
if files.min() != lo or files.max() != hi:
print(f"WARNING: expected marked files exactly {lo}-{hi} "
f"(the pristine v1 download); got {files.min()}-{files.max()}. "
f"Different dataset version, or a partially cleaned copy?")
for name, mask, exp in (("nan", nan_mask, EXPECTED["nan"]),
("big", big_mask, EXPECTED["big"])):
if int(mask.sum()) != exp:
print(f"WARNING: {name} mask has {int(mask.sum()):,} True windows; "
f"the pristine v1 download yields {exp:,}.")
os.makedirs(args.out_dir, exist_ok=True)
for name, mask in (("nan_windows_mask.npy", nan_mask),
("big_windows_mask.npy", big_mask)):
out = os.path.join(args.out_dir, name)
np.save(out, mask)
print(f"wrote {out} ({int(mask.sum()):,} True)")
if __name__ == "__main__":
main()
-220
View File
@@ -1,220 +0,0 @@
"""ADR-152 edge optimization: ONNX export + onnxruntime CPU benchmark for the
retrained WiFlow-STD checkpoint.
- Exports fp32 to ONNX. The axial attention reshapes with python ints taken
from tensor.size() (view(N*W, C, H)), so a traced graph bakes the batch
size; we first try a dynamic-batch export and verify it actually works at
batch sizes 1/2/64 -- if not, we fall back to fixed-batch exports.
- Verifies output parity vs torch on the stored fixture
(results/parity_fixture.npz, batch 2, seed 42): max abs diff < 1e-4.
- Measures onnxruntime CPU latency at batch 1 and 64 (median of N runs).
- Supplementary: onnxruntime dynamic int8 quantization of the exported model
(weight size datapoint for the paper's "~2.2 MB int8" claim).
Usage:
.venv/Scripts/python.exe onnx_bench.py
Writes/merges into results/edge_optimization.json under key "onnx".
"""
import json
import os
import platform
import statistics
import time
import traceback
import numpy as np
import torch
from _bench_common import RESULTS, import_upstream, load_wiflow_model
import_upstream() # sys.path + models stub + >1GB np.load mmap patch
CHECKPOINT = os.path.join(RESULTS, "retrained_best_pose_model.pth")
OUT_JSON = os.path.join(RESULTS, "edge_optimization.json")
def load_fp32_model():
return load_wiflow_model(CHECKPOINT)
def try_export(model, path, batch, dynamic, opset=17):
"""Returns (ok, exporter_used, error)."""
x = torch.rand(batch, 540, 20)
attempts = []
if dynamic:
attempts.append(("dynamo", dict(dynamo=True,
dynamic_shapes={"x": {0: "batch"}})))
attempts.append(("torchscript", dict(dynamo=False,
dynamic_axes={"input": {0: "batch"},
"output": {0: "batch"}})))
else:
attempts.append(("torchscript", dict(dynamo=False)))
attempts.append(("dynamo", dict(dynamo=True)))
last_err = None
for name, kw in attempts:
try:
with torch.no_grad():
torch.onnx.export(model, (x,), path, opset_version=opset,
input_names=["input"], output_names=["output"],
**kw)
return True, name, None
except Exception as e: # noqa: BLE001
last_err = f"{name}: {type(e).__name__}: {e}"
traceback.print_exc()
return False, None, last_err
def ort_session(path):
import onnxruntime as ort
return ort.InferenceSession(path, providers=["CPUExecutionProvider"])
def ort_run(sess, x):
inp = sess.get_inputs()[0].name
return sess.run(None, {inp: x})[0]
def bench_ort(sess, batch, n_runs):
rng = np.random.default_rng(123)
x = rng.random((batch, 540, 20), dtype=np.float32)
for _ in range(max(5, n_runs // 10)):
ort_run(sess, x)
times = []
for _ in range(n_runs):
t0 = time.perf_counter()
ort_run(sess, x)
times.append(time.perf_counter() - t0)
med = statistics.median(times)
return {
"batch_size": batch,
"runs": n_runs,
"median_ms_per_batch": med * 1e3,
"median_ms_per_window": med * 1e3 / batch,
"windows_per_second": batch / med,
}
def main():
import argparse
parser = argparse.ArgumentParser(
description="ONNX export + onnxruntime CPU benchmark for the "
"retrained WiFlow-STD checkpoint (no options; see "
"module docstring). NB: the published "
"retrained_fp32_dynamic.onnx came from the TorchScript "
"exporter; on newer torch the dynamo attempt may succeed "
"first and produce a different (external-data) artifact.")
parser.parse_args()
import onnxruntime
model = load_fp32_model()
results = {
"env": {
"torch": torch.__version__,
"onnxruntime": onnxruntime.__version__,
"platform": platform.platform(),
},
}
fixture = np.load(os.path.join(RESULTS, "parity_fixture.npz"))
fx, fy = fixture["input"], fixture["output"] # (2,540,20) -> (2,15,2)
# ---- export: dynamic batch first, fall back to fixed --------------------
dyn_path = os.path.join(RESULTS, "retrained_fp32_dynamic.onnx")
ok, exporter, err = try_export(model, dyn_path, batch=2, dynamic=True)
dynamic_works = False
if ok:
# verify the dynamic graph really runs at other batch sizes
try:
sess = ort_session(dyn_path)
for b in (1, 2, 64):
y = ort_run(sess, np.zeros((b, 540, 20), dtype=np.float32))
assert y.shape == (b, 15, 2), y.shape
dynamic_works = True
except Exception as e: # noqa: BLE001
print(f"dynamic-batch model does not generalize: {e}")
sessions = {}
if dynamic_works:
results["export"] = {"mode": "dynamic-batch", "exporter": exporter,
"file": os.path.basename(dyn_path),
"size_mb": os.path.getsize(dyn_path) / 1e6}
sess = ort_session(dyn_path)
sessions = {1: sess, 2: sess, 64: sess}
print(f"dynamic-batch export OK via {exporter}")
else:
results["export"] = {"mode": "fixed-batch", "fallback_reason": err,
"files": {}}
for b in (1, 2, 64):
p = os.path.join(RESULTS, f"retrained_fp32_b{b}.onnx")
ok, exporter, err = try_export(model, p, batch=b, dynamic=False)
if not ok:
results["export"]["files"][str(b)] = {"error": err}
print(f"EXPORT FAILED at batch {b}: {err}")
continue
results["export"]["files"][str(b)] = {
"exporter": exporter, "file": os.path.basename(p),
"size_mb": os.path.getsize(p) / 1e6}
sessions[b] = ort_session(p)
print(f"fixed-batch {b} export OK via {exporter}")
# ---- parity vs torch on the fixture -------------------------------------
if 2 in sessions:
y_ort = ort_run(sessions[2], fx)
with torch.no_grad():
y_torch = model(torch.from_numpy(fx)).numpy()
results["parity"] = {
"fixture": "results/parity_fixture.npz (batch 2, seed 42)",
"max_abs_diff_vs_stored_fixture": float(np.abs(y_ort - fy).max()),
"max_abs_diff_vs_torch_now": float(np.abs(y_ort - y_torch).max()),
"pass_lt_1e-4": bool(np.abs(y_ort - y_torch).max() < 1e-4),
}
print("parity:", json.dumps(results["parity"], indent=2))
# ---- latency -------------------------------------------------------------
results["latency"] = {}
if 1 in sessions:
results["latency"]["batch1"] = bench_ort(sessions[1], 1, 100)
print(f"ORT batch 1: {results['latency']['batch1']['median_ms_per_window']:.2f} ms/window")
if 64 in sessions:
results["latency"]["batch64"] = bench_ort(sessions[64], 64, 30)
print(f"ORT batch 64: {results['latency']['batch64']['median_ms_per_window']:.3f} ms/window")
# ---- supplementary: ORT dynamic int8 (size datapoint for the 2.2MB claim)
src = (dyn_path if dynamic_works
else os.path.join(RESULTS, "retrained_fp32_b1.onnx"))
if os.path.exists(src):
try:
from onnxruntime.quantization import QuantType, quantize_dynamic
q_path = os.path.join(RESULTS, "retrained_int8_ort_dynamic.onnx")
quantize_dynamic(src, q_path, weight_type=QuantType.QInt8)
entry = {"file": os.path.basename(q_path),
"size_mb": os.path.getsize(q_path) / 1e6}
try:
qs = ort_session(q_path)
yq = ort_run(qs, fx[:1] if not dynamic_works else fx)
ref = fy[:1] if not dynamic_works else fy
entry["runs"] = True
entry["max_abs_diff_vs_fp32_fixture"] = float(np.abs(yq - ref).max())
except Exception as e: # noqa: BLE001
entry["runs"] = False
entry["run_error"] = f"{type(e).__name__}: {e}"
results["ort_int8_dynamic_supplementary"] = entry
print("ORT int8:", json.dumps(entry, indent=2))
except Exception as e: # noqa: BLE001
results["ort_int8_dynamic_supplementary"] = {
"error": f"{type(e).__name__}: {e}"}
merged = {}
if os.path.exists(OUT_JSON):
with open(OUT_JSON) as f:
merged = json.load(f)
merged["onnx"] = results
with open(OUT_JSON, "w") as f:
json.dump(merged, f, indent=2)
print(f"wrote {OUT_JSON}")
if __name__ == "__main__":
main()
-228
View File
@@ -1,228 +0,0 @@
"""ADR-152 "optimize beyond SOTA": edge-optimization benchmark for the
retrained WiFlow-STD checkpoint (results/retrained_best_pose_model.pth,
~96% PCK@20, fp32 params 2,225,042).
Measures, for fp32 / fp16 / dynamic-int8 torch variants:
(a) serialized state_dict size on disk,
(b) CPU inference latency per window at batch 1 and batch 64
(median of repeated runs, this Windows box),
(c) accuracy (PCK@20/50 + MPJPE, upstream metrics) on a corruption-free
random subset of the seed-42 file-level 70/15/15 test split
(same split as eval_repro.py; corrupted windows 487-499 excluded via
results/nan_windows_mask.npy | results/big_windows_mask.npy).
Also verifies the paper's "~2.2 MB int8" size claim: reports which layer
types torch dynamic quantization actually converts (the model contains NO
nn.Linear -- it is Conv1d/Conv2d/BatchNorm only) and the real on-disk size.
Usage:
.venv/Scripts/python.exe quantize_bench.py \
--data-dir C:/Users/ruv/.cache/kagglehub/datasets/kaka2434/wiflow-dataset/versions/1/preprocessed_csi_data \
[--subset 10000] [--skip-accuracy]
Writes/merges into results/edge_optimization.json under key "torch".
"""
import argparse
import json
import os
import platform
import statistics
import time
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from _bench_common import HERE, RESULTS, evaluate, import_upstream, load_wiflow_model
import_upstream() # sys.path + models stub + >1GB np.load mmap patch
from dataset import ( # noqa: E402
PreprocessedCSIKeypointsDataset,
create_preprocessed_train_val_test_loaders,
)
CHECKPOINT = os.path.join(RESULTS, "retrained_best_pose_model.pth")
def load_fp32_model():
# legacy upstream key remap inside is a harmless no-op on this checkpoint
return load_wiflow_model(CHECKPOINT)
def state_dict_size_bytes(model, path):
torch.save(model.state_dict(), path)
return os.path.getsize(path)
def bench_latency(model, batch_size, n_runs, dtype=torch.float32):
gen = torch.Generator().manual_seed(123)
x = torch.rand(batch_size, 540, 20, generator=gen).to(dtype)
with torch.no_grad():
for _ in range(max(5, n_runs // 10)): # warmup
model(x)
times = []
for _ in range(n_runs):
t0 = time.perf_counter()
model(x)
times.append(time.perf_counter() - t0)
med = statistics.median(times)
return {
"batch_size": batch_size,
"runs": n_runs,
"median_ms_per_batch": med * 1e3,
"median_ms_per_window": med * 1e3 / batch_size,
"windows_per_second": batch_size / med,
}
def build_test_subset(data_dir, subset_size, batch_size=64):
"""Seed-42 file-level 70/15/15 test split (exactly as eval_repro.py),
minus corrupted windows, then a seed-42 random subset."""
dataset = PreprocessedCSIKeypointsDataset(
data_dir=data_dir, keypoint_scale=1000.0, enable_temporal_clean=True)
_tr, _va, test_loader = create_preprocessed_train_val_test_loaders(
dataset=dataset, batch_size=batch_size, num_workers=0, random_seed=42)
test_indices = np.asarray(test_loader.dataset.indices)
corrupted = (np.load(os.path.join(RESULTS, "nan_windows_mask.npy"))
| np.load(os.path.join(RESULTS, "big_windows_mask.npy")))
clean = test_indices[~corrupted[test_indices]]
print(f"test split: {len(test_indices)} windows, "
f"{len(test_indices) - len(clean)} corrupted excluded, "
f"{len(clean)} clean")
if subset_size and subset_size < len(clean):
rng = np.random.default_rng(42)
clean = np.sort(rng.choice(clean, size=subset_size, replace=False))
subset = torch.utils.data.Subset(dataset, clean.tolist())
loader = DataLoader(subset, batch_size=batch_size, shuffle=False,
num_workers=0)
return loader, len(clean)
def quantize_int8_dynamic(fp32_model):
"""torch.ao.quantization.quantize_dynamic on Linear/Conv where supported.
Returns (model, report) where report documents what actually quantized."""
qmodel = torch.ao.quantization.quantize_dynamic(
fp32_model, {nn.Linear, nn.Conv1d, nn.Conv2d}, dtype=torch.qint8)
quantized, total_params, quant_params = [], 0, 0
for name, mod in qmodel.named_modules():
cls = type(mod).__module__ + "." + type(mod).__name__
if "quantized" in cls:
w = mod.weight() if callable(getattr(mod, "weight", None)) else None
numel = w.numel() if w is not None else 0
quant_params += numel
quantized.append({"module": name, "class": cls, "params": numel})
for p in fp32_model.parameters():
total_params += p.numel()
n_linear = sum(isinstance(m, nn.Linear) for m in fp32_model.modules())
n_conv1d = sum(isinstance(m, nn.Conv1d) for m in fp32_model.modules())
n_conv2d = sum(isinstance(m, nn.Conv2d) for m in fp32_model.modules())
report = {
"eligible_module_counts": {
"nn.Linear": n_linear, "nn.Conv1d": n_conv1d, "nn.Conv2d": n_conv2d},
"modules_actually_quantized": quantized,
"n_modules_quantized": len(quantized),
"params_total": total_params,
"params_quantized": quant_params,
"params_quantized_fraction": quant_params / total_params,
}
return qmodel, report
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", default=os.path.join(
os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
"wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
parser.add_argument("--subset", type=int, default=10000)
parser.add_argument("--runs-b1", type=int, default=100)
parser.add_argument("--runs-b64", type=int, default=30)
parser.add_argument("--skip-accuracy", action="store_true")
parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
args = parser.parse_args()
torch.manual_seed(42)
results = {
"env": {
"torch": torch.__version__,
"platform": platform.platform(),
"processor": platform.processor(),
"num_threads": torch.get_num_threads(),
"checkpoint": os.path.relpath(CHECKPOINT, HERE),
},
"variants": {},
}
# ---- build variants ---------------------------------------------------
fp32 = load_fp32_model()
n_params = sum(p.numel() for p in fp32.parameters())
results["env"]["params"] = n_params
print(f"fp32 model: {n_params:,} params")
fp16 = load_fp32_model().half()
int8, q_report = quantize_int8_dynamic(load_fp32_model())
results["int8_dynamic_quant_report"] = q_report
print(f"int8 dynamic: {q_report['n_modules_quantized']} modules quantized, "
f"{q_report['params_quantized_fraction']*100:.1f}% of params")
variants = {
"fp32": (fp32, torch.float32, "retrained_fp32_resaved.pth"),
"fp16": (fp16, torch.float16, "retrained_fp16.pth"),
"int8_dynamic": (int8, torch.float32, "retrained_int8_dynamic.pth"),
}
# ---- (a) size + (b) latency -------------------------------------------
for name, (model, dtype, fname) in variants.items():
path = os.path.join(RESULTS, fname)
size = state_dict_size_bytes(model, path)
print(f"\n=== {name}: {size/1e6:.3f} MB on disk ({fname}) ===")
lat1 = bench_latency(model, 1, args.runs_b1, dtype)
lat64 = bench_latency(model, 64, args.runs_b64, dtype)
print(f" batch 1: {lat1['median_ms_per_window']:.2f} ms/window "
f"({lat1['windows_per_second']:.0f}/s)")
print(f" batch 64: {lat64['median_ms_per_window']:.3f} ms/window "
f"({lat64['windows_per_second']:.0f}/s)")
results["variants"][name] = {
"file": fname,
"size_bytes": size,
"size_mb": size / 1e6,
"latency_batch1": lat1,
"latency_batch64": lat64,
}
# ---- (c) accuracy ------------------------------------------------------
if not args.skip_accuracy:
loader, n_clean = build_test_subset(args.data_dir, args.subset)
results["accuracy_subset"] = {
"description": "seed-42 file-level 70/15/15 test split, corrupted "
"windows (files 487-499) excluded, seed-42 random "
"subset",
"subset_size": min(args.subset, n_clean) if args.subset else n_clean,
"clean_test_total": n_clean,
}
for name, (model, dtype, _f) in variants.items():
print(f"\n=== accuracy: {name} ===")
results["variants"][name]["accuracy"] = evaluate(
model, loader, dtype=dtype, label=name)
print(json.dumps(results["variants"][name]["accuracy"], indent=2))
# ---- merge into edge_optimization.json ---------------------------------
merged = {}
if os.path.exists(args.out):
with open(args.out) as f:
merged = json.load(f)
merged["torch"] = results
with open(args.out, "w") as f:
json.dump(merged, f, indent=2)
print(f"\nwrote {args.out}")
if __name__ == "__main__":
main()
-14
View File
@@ -1,14 +0,0 @@
import numpy as np, os
d = os.path.expanduser('~/wiflow-std-bench/preprocessed_csi_data')
csi = np.load(os.path.join(d, 'csi_windows.npy'), mmap_mode='r+')
zeroed = 0
chunk = 4000
for i in range(0, len(csi), chunk):
block = csi[i:i+chunk]
finite = np.isfinite(block)
bad = (~finite).any(axis=(1, 2)) | (np.abs(np.where(finite, block, 0)).max(axis=(1, 2)) > 1.5)
if bad.any():
block[bad] = 0.0
zeroed += int(bad.sum())
csi.flush()
print(f'zeroed {zeroed} corrupted windows entirely')
@@ -1,112 +0,0 @@
"""Evaluate the retrained WiFlow-STD checkpoint (ADR-152 §2.2a fallback).
Scores the model produced by run.py (train_output/best_pose_model.pth or similar)
on the seed-42 test split: full test set AND NaN-free subset (excluding windows
that were zero-filled by clean_nan.py — file indices 487-499).
NOTE: deployed to ruvultra (~/wiflow-std-bench) as a standalone single file,
so it deliberately inlines its helpers. The reference implementations (upstream
import shim, >1GB np.load mmap patch, key-remap loader, canonical evaluate
loop) live in benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
"""
import json, os, random, sys
import numpy as np
import torch
from torch.utils.data import DataLoader, Subset
# csi_windows.npy is ~13 GB; mmap large arrays instead of eagerly loading
# ~15 GB into RAM (same patch as _bench_common._np_load_mmap).
_np_load = np.load
def _np_load_mmap(path, *a, **kw):
if (isinstance(path, str) and path.endswith('.npy')
and os.path.getsize(path) > 1 << 30 and 'mmap_mode' not in kw):
kw['mmap_mode'] = 'r'
return _np_load(path, *a, **kw)
np.load = _np_load_mmap
sys.path.insert(0, os.path.expanduser('~/wiflow-std-bench/upstream'))
from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders
from models.pose_model import WiFlowPoseModel
from utils.metrics import calculate_pck, calculate_mpjpe
def find_checkpoint():
cands = []
for root, _, files in os.walk(os.path.expanduser('~/wiflow-std-bench/train_output')):
for f in files:
if f.endswith('.pth'):
cands.append(os.path.join(root, f))
# also upstream/test default output dir
for root, _, files in os.walk(os.path.expanduser('~/wiflow-std-bench/upstream')):
for f in files:
if f.endswith('.pth') and 'best' in f and 'cross_dataset' not in root:
p = os.path.join(root, f)
if os.path.getmtime(p) > os.path.getmtime(os.path.expanduser('~/wiflow-std-bench/train.log')) - 86400 * 2:
cands.append(p)
cands = [c for c in cands if not c.endswith('upstream/best_pose_model.pth')]
if not cands:
sys.exit('no retrained checkpoint found')
return max(cands, key=os.path.getmtime)
def evaluate(model, loader, device):
model.eval()
totals = {t: 0.0 for t in (0.1, 0.2, 0.3, 0.4, 0.5)}
total_mpe, n = 0.0, 0
with torch.no_grad():
for bx, by in loader:
bx, by = bx.to(device), by.to(device)
out = model(bx)
bs = by.size(0)
total_mpe += calculate_mpjpe(out, by) * bs
pck = calculate_pck(out, by, thresholds=list(totals))
for t in totals:
totals[t] += pck[t] * bs
n += bs
return {'samples': n, 'mpjpe': total_mpe / n,
**{f'pck@{int(t*100)}': totals[t] / n for t in totals}}
random.seed(42); np.random.seed(42); torch.manual_seed(42)
torch.cuda.manual_seed_all(42)
torch.backends.cudnn.deterministic = True
d = os.path.expanduser('~/wiflow-std-bench/preprocessed_csi_data')
dataset = PreprocessedCSIKeypointsDataset(data_dir=d, keypoint_scale=1000.0,
enable_temporal_clean=True)
_, _, test_loader = create_preprocessed_train_val_test_loaders(
dataset=dataset, batch_size=256, num_workers=2, random_seed=42)
device = torch.device('cuda')
ckpt = find_checkpoint()
print('checkpoint:', ckpt)
model = WiFlowPoseModel(dropout=0.5).to(device)
state = torch.load(ckpt, map_location=device, weights_only=True)
renames = {'att.': 'attention.', 'final_conv.': 'decoder.'}
state = {next((new + k[len(old):] for old, new in renames.items()
if k.startswith(old)), k): v for k, v in state.items()}
model.load_state_dict(state, strict=True)
results = {'checkpoint': ckpt}
print('=== full test set ===')
results['test_full'] = evaluate(model, test_loader, device)
print(json.dumps(results['test_full'], indent=2))
# NaN-free subset: exclude windows from corrupted files 487-499
test_subset = test_loader.dataset # Subset(dataset, test_indices)
w2f = dataset.window_to_file
clean_idx = [i for i in test_subset.indices if w2f[i] < 487]
print(f'=== NaN-free test subset ({len(clean_idx)} of {len(test_subset.indices)}) ===')
clean_loader = DataLoader(Subset(dataset, clean_idx), batch_size=256, shuffle=False)
results['test_clean'] = evaluate(model, clean_loader, device)
print(json.dumps(results['test_clean'], indent=2))
out = os.path.expanduser('~/wiflow-std-bench/eval_retrained.json')
with open(out, 'w') as f:
json.dump(results, f, indent=2)
print('wrote', out)
@@ -1,374 +0,0 @@
"""ADR-152 SS2.2 measurement (b): WiFlow-STD fine-tuned on our fresh ESP32 paired dataset.
Dataset: ~/wiflow-std-bench/paired-20260610.jsonl -- 2,046 paired windows collected
2026-06-10 22:10-22:40 (ONE subject, ONE room, ONE ESP32 node, varied poses).
Per record: csi = flat float32 list, csi_shape, kp = 17 COCO [x, y] normalized [0,1]
camera coords, conf (MediaPipe mean confidence, all > 0.5 in this set), ts_start/ts_end.
Aligner: scripts/align-ground-truth.js, non-overlapping 20-frame windows (~0.42 s each).
Dataset findings (MEASURED on this file, 2026-06-10):
- csi_shape is HETEROGENEOUS, not uniformly [70, 20]: 1,347x [70,20], 284x [134,20],
243x [26,20], 130x [12,20], 42x [20,20]. The ESP32 stream emits mixed frame types
and the aligner stamps each window's subcarrier count from frame[0]
(extractCsiMatrix: nSc = window[0].subcarriers), zero-padding/truncating the rest.
Even native-70 windows contain ~20.4% internally zero-padded short frames
(subcarriers 40..69 all-zero for those frames).
- LAYOUT BUG: the aligner fills matrix[f * nSc + s] (frame-major) but declares
shape [nSc, nFrames]. The true layout is (frame, subcarrier); we reshape
(nFrames, nSc) and transpose. Confirmed by coherent per-frame zero-tails.
- Handling here (primary suite, "all2046"): every frame's subcarrier axis is
linearly resampled to 70 bins (np.interp over a normalized index domain;
identity for native-70 frames) so the pre-registered n=2,046 and split sizes
hold. Secondary suite ("native70") restricts to the 1,347 native [70,20]
windows (temporal 70/15/15 of those) as a homogeneity robustness check.
Pre-registered protocol (followed exactly):
1. TEMPORAL split (records are time-sorted; asserted): first 70% train (1,432),
next 15% val (307), last 15% test (307). No shuffling across time. Seed 42
for everything else.
2. Model: upstream WiFlow-STD trunk (WiFlowPoseModel) with a learned 1x1 Conv1d
projection 70->540 prepended, and K=17 via the parameter-free adaptive pool
(AdaptiveAvgPool2d((17, 1)) instead of (15, 1)) -- pretrained weights load
for any K. CSI normalization: divide by the TRAIN-split 99th-percentile
amplitude, clip to [0, 1] (documented in output JSON).
3. Three runs, <=60 epochs, early-stop patience 8 on val MPJPE, batch 32,
AdamW, fp32 (no autocast):
(i) pretrained-init: trunk init from upstream/test/best_pose_model.pth
(the measurement-(a) retrained checkpoint, ~96% PCK@20 on WiFlow data;
key remap att.->attention. / final_conv.->decoder. applied defensively
as in eval_repro.py -- a no-op for this checkpoint, which already uses
the new names). Discriminative lr: adapter 1e-4, trunk 1e-5.
(ii) scratch: same architecture, random init, all params lr 1e-4.
(iii) frozen-trunk: pretrained trunk frozen (requires_grad=False AND held in
.eval() so BatchNorm running stats cannot drift -- pure transfer probe);
only the 70->540 adapter trains, lr 1e-4.
4. Metrics on the temporal TEST split: torso-normalized PCK@10/20/30/40/50 and
MPJPE. Upstream utils/metrics.py calculate_pck(use_torso_norm=True) hardcodes
NECK_IDX/PELVIS_IDX = 2, 12 -- a 15-keypoint convention that is WRONG for our
17 COCO keypoints (2 = right_eye, 12 = right_hip). We therefore reimplement the
identical math (per-frame norm distance, clamp min 0.01, mean over all
keypoints x frames) with torso = ||l_shoulder(5) - l_hip(11)||.
Also reported: prediction std across test frames (constant-pose detector;
must be > 0) and the mean-pose-predictor baseline (train-split mean pose
evaluated on test -- the honesty bar).
Usage (on ruvultra):
nice -n 10 nohup ~/wiflow-std-bench/venv/bin/python train_measb.py > train_measb.log 2>&1 &
NOTE: deployed to ruvultra as a standalone single file, so it deliberately
inlines its helpers. The reference implementations (upstream import shim,
np.load mmap patch, key-remap loader, canonical evaluate loop) live in
benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
"""
import json
import os
import random
import sys
import time
import numpy as np
import torch
import torch.nn as nn
BENCH = os.path.expanduser("~/wiflow-std-bench")
UPSTREAM = os.path.join(BENCH, "upstream")
MEASB = os.path.join(BENCH, "measb")
DATA = os.path.join(BENCH, "paired-20260610.jsonl")
CHECKPOINT = os.path.join(UPSTREAM, "test", "best_pose_model.pth")
sys.path.insert(0, UPSTREAM)
# Upstream defect (1): models/__init__.py imports a name tcn.py does not define.
# Register a stub package so the broken __init__ never executes (as eval_repro.py).
import types # noqa: E402
_models_pkg = types.ModuleType("models")
_models_pkg.__path__ = [os.path.join(UPSTREAM, "models")]
sys.modules["models"] = _models_pkg
from models.pose_model import WiFlowPoseModel # noqa: E402
SEED = 42
K = 17
N_SUBC = 70
TRUNK_IN = 540
BATCH = 32 # <= 64 per protocol (GPU shared with the efficiency sweep)
MAX_EPOCHS = 60
PATIENCE = 8
LR_ADAPTER = 1e-4
LR_TRUNK_FT = 1e-5 # 10x lower for the pretrained trunk vs the fresh adapter
L_SHOULDER, L_HIP = 5, 11
THRESHOLDS = (0.1, 0.2, 0.3, 0.4, 0.5)
def set_seed(seed=SEED):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def resample_subcarriers(frame_major, n_out=N_SUBC):
"""(nFrames, nSc) -> (nFrames, n_out) by per-frame linear interpolation.
Identity for nSc == n_out. Normalized index domain [0, 1] on both sides.
"""
nf, nsc = frame_major.shape
if nsc == n_out:
return frame_major
xi = np.linspace(0.0, 1.0, nsc)
xo = np.linspace(0.0, 1.0, n_out)
return np.stack([np.interp(xo, xi, frame_major[f]) for f in range(nf)]).astype(np.float32)
def load_dataset():
csi, kps, confs, ts, native70 = [], [], [], [], []
shape_counts = {}
with open(DATA) as f:
for line in f:
r = json.loads(line)
nsc, nf = r["csi_shape"]
shape_counts[f"{nsc}x{nf}"] = shape_counts.get(f"{nsc}x{nf}", 0) + 1
assert nf == 20, r["csi_shape"]
# Aligner layout bug: data is frame-major despite the declared
# [nSc, nFrames] shape -- reshape (nFrames, nSc), then resample the
# subcarrier axis to 70 and transpose to (70 subcarriers, 20 frames).
fm = np.asarray(r["csi"], dtype=np.float32).reshape(nf, nsc)
csi.append(resample_subcarriers(fm).T)
kp = np.asarray(r["kp"], dtype=np.float32)
assert kp.shape == (K, 2), kp.shape
kps.append(kp)
confs.append(r["conf"])
ts.append(r["ts_start"])
native70.append(nsc == N_SUBC)
assert all(ts[i] <= ts[i + 1] for i in range(len(ts) - 1)), "records not time-sorted"
return (np.stack(csi), np.stack(kps), np.asarray(confs, dtype=np.float32),
np.asarray(native70), shape_counts, ts[0], ts[-1])
def temporal_split(n):
n_train = int(round(n * 0.70))
n_val = int(round(n * 0.15))
return slice(0, n_train), slice(n_train, n_train + n_val), slice(n_train + n_val, n)
class AdaptedWiFlow(nn.Module):
"""1x1 Conv1d adapter 70->540 + upstream WiFlow-STD trunk with K=17 pool head."""
def __init__(self, k=K, dropout=0.5):
super().__init__()
self.adapter = nn.Conv1d(N_SUBC, TRUNK_IN, kernel_size=1)
nn.init.kaiming_normal_(self.adapter.weight, mode="fan_out", nonlinearity="relu")
nn.init.constant_(self.adapter.bias, 0)
self.trunk = WiFlowPoseModel(dropout=dropout)
# K=17 via the parameter-free adaptive pool: decoder emits [B, 2, 15, 20]
# spatial maps; pooling H->17 instead of 15 yields [B, 17, 2] with no new
# parameters, so the pretrained state_dict loads strict=True for any K.
self.trunk.avg_pool = nn.AdaptiveAvgPool2d((k, 1))
def forward(self, x):
return self.trunk(self.adapter(x))
def load_pretrained_trunk(trunk, path):
state = torch.load(path, map_location="cpu", weights_only=True)
# Defensive remap as in eval_repro.py (no-op for the retrained checkpoint).
renames = {"att.": "attention.", "final_conv.": "decoder."}
state = {next((new + k[len(old):] for old, new in renames.items()
if k.startswith(old)), k): v
for k, v in state.items()}
trunk.load_state_dict(state, strict=True)
def pck_torso(pred, target, thresholds=THRESHOLDS):
"""Upstream calculate_pck math, torso = l_shoulder(5)<->l_hip(11) for 17-kp COCO."""
norm = torch.sqrt(((target[:, L_SHOULDER] - target[:, L_HIP]) ** 2).sum(dim=1))
norm = torch.clamp(norm, min=0.01)
dist = torch.sqrt(((pred - target) ** 2).sum(dim=2)) / norm.unsqueeze(1)
return {f"pck@{int(t * 100)}": (dist <= t).float().mean().item() for t in thresholds}
def mpjpe(pred, target):
return torch.sqrt(((pred - target) ** 2).sum(dim=2)).mean().item()
@torch.no_grad()
def predict(model, x, batch=256):
model.eval()
return torch.cat([model(x[i:i + batch]) for i in range(0, len(x), batch)])
def eval_preds(pred, target):
out = pck_torso(pred, target)
out["mpjpe"] = mpjpe(pred, target)
# Constant-pose detector: std across test frames per coordinate, mean over
# the 17x2 coordinates. 0.0 == degenerate constant predictor.
out["pred_std"] = pred.std(dim=0).mean().item()
return out
def train_run(name, x_tr, y_tr, x_va, y_va, device, pretrained, freeze_trunk,
lr_trunk):
set_seed(SEED)
model = AdaptedWiFlow().to(device)
if pretrained:
load_pretrained_trunk(model.trunk, CHECKPOINT)
if freeze_trunk:
for p in model.trunk.parameters():
p.requires_grad = False
groups = [{"params": model.adapter.parameters(), "lr": LR_ADAPTER}]
else:
groups = [{"params": model.adapter.parameters(), "lr": LR_ADAPTER},
{"params": model.trunk.parameters(), "lr": lr_trunk}]
opt = torch.optim.AdamW(groups)
loss_fn = nn.MSELoss()
n = len(x_tr)
best_val, best_state, best_epoch, bad = float("inf"), None, -1, 0
history = []
t0 = time.time()
for epoch in range(MAX_EPOCHS):
model.train()
if freeze_trunk:
model.trunk.eval() # keep BatchNorm running stats fixed: pure transfer
perm = torch.randperm(n, device=device)
ep_loss = 0.0
for i in range(0, n, BATCH):
idx = perm[i:i + BATCH]
opt.zero_grad()
loss = loss_fn(model(x_tr[idx]), y_tr[idx])
loss.backward()
opt.step()
ep_loss += loss.item() * len(idx)
val_mpjpe = mpjpe(predict(model, x_va), y_va)
history.append({"epoch": epoch, "train_mse": ep_loss / n, "val_mpjpe": val_mpjpe})
marker = ""
if val_mpjpe < best_val:
best_val, best_epoch, bad = val_mpjpe, epoch, 0
best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
marker = " *"
else:
bad += 1
print(f"[{name}] epoch {epoch:02d} train_mse {ep_loss / n:.6f} "
f"val_mpjpe {val_mpjpe:.5f}{marker}", flush=True)
if bad >= PATIENCE:
print(f"[{name}] early stop at epoch {epoch} (best {best_epoch})", flush=True)
break
model.load_state_dict(best_state)
torch.save(best_state, os.path.join(MEASB, f"{name}_best.pth"))
return model, {"best_epoch": best_epoch, "best_val_mpjpe": best_val,
"epochs_run": len(history), "wall_seconds": round(time.time() - t0, 1),
"history": history}
def run_suite(tag, csi, kps, device):
"""Temporal 70/15/15 split, mean-pose baseline, three training runs."""
n = len(csi)
tr, va, te = temporal_split(n)
print(f"=== suite {tag}: n={n} train={tr.stop} val={va.stop - va.start} "
f"test={te.stop - te.start} ===", flush=True)
# CSI normalization constant from TRAIN split only.
train_p99 = float(np.percentile(csi[tr], 99))
train_max = float(csi[tr].max())
print(f"[{tag}] train p99={train_p99:.3f} max={train_max:.3f} -> /p99, clip [0,1]",
flush=True)
csi_n = np.clip(csi / train_p99, 0.0, 1.0).astype(np.float32)
x = torch.from_numpy(csi_n).to(device)
y = torch.from_numpy(kps).to(device)
x_tr, y_tr = x[tr], y[tr]
x_va, y_va = x[va], y[va]
x_te, y_te = x[te], y[te]
suite = {
"n_windows": n,
"split": {"n_train": int(tr.stop), "n_val": int(va.stop - va.start),
"n_test": int(te.stop - te.start)},
"csi_norm": {"method": "divide by train-split p99 amplitude, clip [0,1]",
"train_p99": train_p99, "train_max": train_max},
"runs": {},
}
# Honesty bar: mean-pose predictor fit on TRAIN, evaluated on TEST.
mean_pose = y_tr.mean(dim=0, keepdim=True).expand(len(y_te), -1, -1)
suite["mean_pose_baseline"] = eval_preds(mean_pose, y_te)
suite["mean_pose_baseline"]["note"] = "train-split mean pose; pred_std 0 by construction"
print(f"[{tag}] mean-pose baseline:", json.dumps(suite["mean_pose_baseline"]),
flush=True)
configs = [
("pretrained", dict(pretrained=True, freeze_trunk=False, lr_trunk=LR_TRUNK_FT)),
("scratch", dict(pretrained=False, freeze_trunk=False, lr_trunk=LR_ADAPTER)),
("frozen_trunk", dict(pretrained=True, freeze_trunk=True, lr_trunk=0.0)),
]
for name, cfg in configs:
print(f"=== run: {tag}/{name} {cfg} ===", flush=True)
model, train_info = train_run(f"{tag}_{name}", x_tr, y_tr, x_va, y_va,
device, **cfg)
test_metrics = eval_preds(predict(model, x_te), y_te)
n_trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
suite["runs"][name] = {"config": cfg, "trainable_params": n_trainable,
"train": {k: v for k, v in train_info.items()
if k != "history"},
"history": train_info["history"],
"test": test_metrics}
print(f"[{tag}/{name}] TEST:", json.dumps(test_metrics), flush=True)
return suite
def main():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device {device}, torch {torch.__version__}", flush=True)
set_seed(SEED)
csi, kps, confs, native70, shape_counts, ts_first, ts_last = load_dataset()
print(f"shape distribution: {shape_counts}", flush=True)
results = {
"protocol": {
"dataset": DATA, "n_windows": len(csi),
"ts_first": ts_first, "ts_last": ts_last,
"conf_mean": float(confs.mean()), "conf_min": float(confs.min()),
"csi_shape_distribution": shape_counts,
"csi_layout_note": "aligner stores frame-major data under a transposed "
"[nSc, nFrames] shape label; corrected on load",
"csi_resample": "per-frame linear interp of subcarrier axis to 70 bins "
"(identity for native-70 frames); native-70 windows still "
"contain ~20.4% internally zero-padded short frames",
"split": "temporal 70/15/15 (no shuffle across time)",
"model": "1x1 Conv1d 70->540 adapter + WiFlowPoseModel trunk, "
"AdaptiveAvgPool2d((17,1)) head (parameter-free K=17)",
"checkpoint": CHECKPOINT,
"checkpoint_note": "measurement-(a) retrained checkpoint (~96% PCK@20 on "
"WiFlow data); att./final_conv. remap applied "
"defensively (no-op, already new-style keys)",
"optimizer": f"AdamW, adapter lr {LR_ADAPTER}, fine-tuned trunk lr "
f"{LR_TRUNK_FT} (10x lower), scratch all {LR_ADAPTER}",
"batch": BATCH, "max_epochs": MAX_EPOCHS, "patience": PATIENCE,
"precision": "fp32", "seed": SEED,
"pck": "torso-normalized, torso = ||l_shoulder(5) - l_hip(11)||, "
"clamp min 0.01, mean over keypoints x frames "
"(upstream math; upstream 2/12 indices are a 15-kp convention)",
},
# Primary: all 2,046 windows (pre-registered n), subcarrier axis resampled.
"all2046": None,
# Secondary robustness check: the 1,347 native [70,20] windows only.
"native70": None,
}
results["all2046"] = run_suite("all2046", csi, kps, device)
results["native70"] = run_suite("native70", csi[native70], kps[native70], device)
out = os.path.join(MEASB, "measurement_b.json")
with open(out, "w") as f:
json.dump(results, f, indent=2)
print(f"wrote {out}", flush=True)
if __name__ == "__main__":
main()
@@ -1,33 +0,0 @@
#!/bin/bash
set -ex
cd ~/wiflow-std-bench
# 1. clone upstream at the pinned commit
if [ ! -d upstream ]; then
git clone https://github.com/DY2434/WiFlow-WiFi-Pose-Estimation-with-Spatio-Temporal-Decoupling upstream
fi
cd upstream && git checkout 06899d294a0f44709d601a53e91dbf24759daefb && cd ..
# 2. documented deviation: fix upstream import bug (TemporalConvNet does not exist)
sed -i 's/from .tcn import TemporalConvNet/from .tcn import TemporalBlock/; s/'"'"'TemporalConvNet'"'"'/'"'"'TemporalBlock'"'"'/' upstream/models/__init__.py
# 3. venv: torch cu128 (RTX 5080 = sm_120 needs >=2.7; their pin 2.3.1 predates Blackwell)
if [ ! -d venv ]; then
python3 -m venv venv
./venv/bin/pip install -q --upgrade pip
./venv/bin/pip install -q torch --index-url https://download.pytorch.org/whl/cu128
./venv/bin/pip install -q numpy pandas matplotlib seaborn scikit-learn opencv-python-headless scipy tqdm psutil kagglehub
fi
./venv/bin/python -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# 4. dataset via kagglehub (anonymous, public dataset)
DS=$(./venv/bin/python -c "import kagglehub; print(kagglehub.dataset_download('kaka2434/wiflow-dataset'))")
echo "dataset at: $DS"
# 5. run.py hardcodes ../preprocessed_csi_data relative to upstream/
ln -sfn "$DS/preprocessed_csi_data" ~/wiflow-std-bench/preprocessed_csi_data
# 6. train with upstream defaults (seed 42 set inside run.py)
../venv/bin/python ../clean_nan.py 2>/dev/null || venv/bin/python clean_nan.py
cd upstream
../venv/bin/python run.py --gpu 0 --batch_size 64 --epochs 50 --output_dir ../train_output
@@ -1,332 +0,0 @@
"""Configurable compact variants of the WiFlow-STD pose model (ADR-152 efficiency sweep).
This is a parameterized copy of upstream models/{pose_model,tcn,convnet,attention}.py
(DY2434/WiFlow @ 06899d29, Apache-2.0). upstream/ is NOT modified. Deviations from
upstream, all forced by shrinking channels and documented per variant in run_sweep.py:
1. TCN grouped-conv groups: upstream hardcodes groups=20, which does not divide
the compact channel counts (e.g. 270, 135, 85). Rule here:
- groups_mode='gcd20': per-conv groups = gcd(channels, 20) (== 20 wherever
upstream's choice is valid, incl. the 540-ch input conv; falls back to the
largest common divisor with 20 otherwise).
- groups_mode='depthwise': groups = channels (tiny variant only).
2. Conv2d downsampling strides: upstream uses 4 stride-(1,2) blocks because
240/2^4 = 15 == n_keypoints. With smaller TCN output widths that would leave
<15 rows and AdaptiveAvgPool2d((15,1)) would duplicate rows across keypoints.
Rule: halve the width only while the result stays >= 15 (stride-2 blocks
first, stride-1 after). Full model: 240 -> 4 halvings = upstream exactly.
3. input_pw_groups (tiny only): the dense 540->c pointwise + residual downsample
in TCN block 1 cost 2*540*c params (a ~117k floor that alone exceeds the
tiny <100k budget). tiny groups these two convs (groups=4; 4 | gcd(540, 68)).
4. Decoder mid-channels: upstream 64->32; here c_last -> max(c_last // 2, 4).
"""
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
def tcn_groups(channels: int, mode: str) -> int:
if mode == 'depthwise':
return channels
if mode == 'gcd20':
return math.gcd(channels, 20)
raise ValueError(mode)
# ---------------------------------------------------------------- TCN (copy of tcn.py)
class Chomp1d(nn.Module):
def __init__(self, chomp_size):
super().__init__()
self.chomp_size = chomp_size
def forward(self, x):
return x[:, :, :-self.chomp_size].contiguous()
class CompactGroupedTemporalBlock(nn.Module):
"""Upstream InnerGroupedTemporalBlock with parameterized groups."""
def __init__(self, n_inputs, n_outputs, kernel_size, stride, dilation, padding,
dropout=0.2, groups_mode='gcd20', pw_groups=1):
super().__init__()
g_in = tcn_groups(n_inputs, groups_mode)
g_out = tcn_groups(n_outputs, groups_mode)
self.groups = (g_in, g_out)
self.pw_groups = pw_groups
self.conv1_group = nn.Conv1d(n_inputs, n_inputs, kernel_size, stride=stride,
padding=padding, dilation=dilation,
groups=g_in, bias=False)
self.chomp1 = Chomp1d(padding) if padding > 0 else nn.Identity()
self.bn1_group = nn.BatchNorm1d(n_inputs)
self.relu1_group = nn.SiLU(inplace=True)
self.conv1_pw = nn.Conv1d(n_inputs, n_outputs, 1, groups=pw_groups, bias=False)
self.bn1_pw = nn.BatchNorm1d(n_outputs)
self.relu1_pw = nn.SiLU(inplace=True)
self.dropout1 = nn.Dropout(dropout)
self.conv2_group = nn.Conv1d(n_outputs, n_outputs, kernel_size, stride=1,
padding=padding, dilation=dilation,
groups=g_out, bias=False)
self.chomp2 = Chomp1d(padding) if padding > 0 else nn.Identity()
self.bn2_group = nn.BatchNorm1d(n_outputs)
self.relu2_group = nn.SiLU(inplace=True)
self.conv2_pw = nn.Conv1d(n_outputs, n_outputs, 1, bias=False)
self.bn2_pw = nn.BatchNorm1d(n_outputs)
self.relu2_pw = nn.SiLU(inplace=True)
self.dropout2 = nn.Dropout(dropout)
self.downsample = nn.Sequential(
nn.Conv1d(n_inputs, n_outputs, 1, groups=pw_groups, bias=False),
nn.BatchNorm1d(n_outputs)
) if n_inputs != n_outputs else nn.Identity()
def forward(self, x):
res = self.downsample(x)
out = self.conv1_group(x)
out = self.chomp1(out)
out = self.bn1_group(out)
out = self.relu1_group(out)
out = self.conv1_pw(out)
out = self.bn1_pw(out)
out = self.relu1_pw(out)
out = self.dropout1(out)
out = self.conv2_group(out)
out = self.chomp2(out)
out = self.bn2_group(out)
out = self.relu2_group(out)
out = self.conv2_pw(out)
out = self.bn2_pw(out)
out = self.relu2_pw(out)
out = self.dropout2(out)
return F.silu(out + res)
class CompactTemporalBlock(nn.Module):
def __init__(self, num_inputs, num_channels, kernel_size=3, dropout=0.2,
groups_mode='gcd20', input_pw_groups=1):
super().__init__()
layers = []
for i, out_channels in enumerate(num_channels):
dilation_size = 2 ** i
in_channels = num_inputs if i == 0 else num_channels[i - 1]
layers.append(CompactGroupedTemporalBlock(
in_channels, out_channels, kernel_size, stride=1,
dilation=dilation_size, padding=(kernel_size - 1) * dilation_size,
dropout=dropout, groups_mode=groups_mode,
pw_groups=input_pw_groups if i == 0 else 1))
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
# ------------------------------------------------------- Conv2d path (copy of convnet.py)
class AsymmetricConvBlock(nn.Module):
"""Upstream block with parameterized width stride (upstream: always (1,2))."""
def __init__(self, in_channels, out_channels, dropout=0.3, stride_w=2):
super().__init__()
self.block = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=(1, 3),
stride=(1, stride_w), padding=(0, 1)),
nn.BatchNorm2d(out_channels),
nn.SiLU(inplace=True),
nn.Dropout2d(dropout),
nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
nn.BatchNorm2d(out_channels),
nn.SiLU(inplace=True),
nn.Dropout2d(dropout),
nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
nn.BatchNorm2d(out_channels)
)
self.downsample = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1,
stride=(1, stride_w), bias=False),
nn.BatchNorm2d(out_channels)
)
self.activation = nn.SiLU(inplace=True)
def forward(self, x):
return self.activation(self.block(x) + self.downsample(x))
class ConvBlock1(nn.Module):
def __init__(self, in_channels, out_channels, dropout=0.3):
super().__init__()
self.block = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
nn.BatchNorm2d(out_channels),
nn.SiLU(inplace=True),
nn.Dropout2d(dropout),
nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
nn.BatchNorm2d(out_channels),
nn.SiLU(inplace=True),
nn.Dropout2d(dropout),
nn.Conv2d(out_channels, out_channels, kernel_size=(1, 3), padding=(0, 1)),
nn.BatchNorm2d(out_channels)
)
self.downsample = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False),
nn.BatchNorm2d(out_channels)
)
self.activation = nn.SiLU(inplace=True)
def forward(self, x):
return self.activation(self.block(x) + self.downsample(x))
# ----------------------------------------------------- attention (verbatim attention.py)
class AxialAttention(nn.Module):
def __init__(self, in_planes, out_planes, groups=8, stride=1, bias=False, width=False):
assert (in_planes % groups == 0) and (out_planes % groups == 0)
super().__init__()
self.in_planes = in_planes
self.out_planes = out_planes
self.groups = groups
self.group_planes = out_planes // groups
self.stride = stride
self.bias = bias
self.width = width
self.qkv_transform = nn.Conv1d(in_planes, out_planes * 3, kernel_size=1,
stride=1, padding=0, bias=False)
self.bn_qkv = nn.BatchNorm1d(out_planes * 3)
self.bn_similarity = nn.BatchNorm2d(groups)
self.bn_output = nn.BatchNorm1d(out_planes)
if stride > 1:
self.pooling = nn.AvgPool2d(stride, stride=stride)
nn.init.normal_(self.qkv_transform.weight.data, 0, math.sqrt(1. / self.in_planes))
def forward(self, x):
if self.width:
x = x.permute(0, 2, 1, 3)
else:
x = x.permute(0, 3, 1, 2)
N, W, C, H = x.shape
x = x.contiguous().view(N * W, C, H)
qkv = self.bn_qkv(self.qkv_transform(x))
qkv = qkv.reshape(N * W, 3, self.out_planes, H).permute(1, 0, 2, 3)
q, k, v = qkv[0], qkv[1], qkv[2]
q = q.reshape(N * W, self.groups, self.group_planes, H)
k = k.reshape(N * W, self.groups, self.group_planes, H)
v = v.reshape(N * W, self.groups, self.group_planes, H)
qk = torch.einsum('bgci, bgcj->bgij', q, k)
qk = self.bn_similarity(qk)
similarity = F.softmax(qk, dim=-1)
sv = torch.einsum('bgij,bgcj->bgci', similarity, v)
sv = sv.reshape(N * W, self.out_planes, H)
out = self.bn_output(sv)
out = out.view(N, W, self.out_planes, H)
if self.width:
out = out.permute(0, 2, 1, 3)
else:
out = out.permute(0, 2, 3, 1)
if self.stride > 1:
out = self.pooling(out)
return out
class DualAxialAttention(nn.Module):
def __init__(self, in_planes, out_planes, groups=8, stride=1, bias=False):
super().__init__()
self.width_axis = AxialAttention(in_planes, out_planes, groups, stride, bias, width=True)
self.height_axis = AxialAttention(out_planes, out_planes, groups, stride, bias, width=False)
def forward(self, x):
return self.height_axis(self.width_axis(x))
# --------------------------------------------------------------- full model
def compute_strides(width: int, n_blocks: int, target: int = 15):
"""Halve width while result stays >= target (upstream: 240 -> 4 halvings -> 15)."""
strides = []
for _ in range(n_blocks):
nxt = (width + 1) // 2 # conv k=3 s=2 p=1: out = ceil(in/2)
if nxt >= target:
strides.append(2)
width = nxt
else:
strides.append(1)
return strides, width
class CompactWiFlowPoseModel(nn.Module):
"""Parameterized upstream WiFlowPoseModel.
Upstream config == tcn_channels=[540,440,340,240], conv_channels=[8,16,32,64],
attn_groups=8, groups_mode='gcd20' (gcd(c,20)==20 for all upstream channels),
input_pw_groups=1 -> identical architecture, 2,225,042 params.
"""
def __init__(self, tcn_channels, conv_channels, attn_groups,
groups_mode='gcd20', input_pw_groups=1, dropout=0.3,
num_subcarriers=540, num_keypoints=15):
super().__init__()
self.tcn = CompactTemporalBlock(
num_inputs=num_subcarriers, num_channels=tcn_channels, kernel_size=3,
dropout=dropout, groups_mode=groups_mode, input_pw_groups=input_pw_groups)
self.up = ConvBlock1(1, conv_channels[0])
strides, self.final_width = compute_strides(
tcn_channels[-1], len(conv_channels), target=num_keypoints)
self.conv_strides = strides
self.residual_blocks = nn.ModuleList()
in_channels = conv_channels[0]
for out_channels, s in zip(conv_channels, strides):
self.residual_blocks.append(
AsymmetricConvBlock(in_channels, out_channels, stride_w=s))
in_channels = out_channels
c_last = conv_channels[-1]
self.attention = DualAxialAttention(c_last, c_last, groups=attn_groups)
c_mid = max(c_last // 2, 4)
self.decoder = nn.Sequential(
nn.Conv2d(c_last, c_mid, kernel_size=3, padding=1),
nn.BatchNorm2d(c_mid),
nn.SiLU(inplace=True),
nn.Conv2d(c_mid, 2, kernel_size=1),
nn.BatchNorm2d(2),
nn.SiLU(inplace=True)
)
self.avg_pool = nn.AdaptiveAvgPool2d((num_keypoints, 1))
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, (nn.BatchNorm1d, nn.LayerNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.xavier_normal_(m.weight)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
def forward(self, x):
# [B, 540, 20]
x = self.tcn(x) # [B, C_tcn, 20]
x = x.transpose(1, 2).unsqueeze(1) # [B, 1, 20, C_tcn]
x = self.up(x)
for block in self.residual_blocks:
x = block(x) # [B, C_conv, 20, W']
x = x.permute(0, 1, 3, 2) # [B, C_conv, W', 20]
x = self.attention(x)
x = self.decoder(x) # [B, 2, W', 20]
x = self.avg_pool(x).squeeze(-1) # [B, 2, 15]
return x.transpose(1, 2) # [B, 15, 2]
def describe(model: 'CompactWiFlowPoseModel'):
params = sum(p.numel() for p in model.parameters())
tcn_g = [blk.groups for blk in model.tcn.network]
return {'params': params, 'tcn_groups_per_block': tcn_g,
'conv_strides': model.conv_strides, 'final_width': model.final_width}
@@ -1,278 +0,0 @@
"""WiFlow-STD compact-variant efficiency sweep (ADR-152) — sequential overnight runner.
Trains compact variants of the upstream WiFlow-STD architecture on the same
data/split as the full-size reference retraining (seed 42, file-level 70/15/15,
upstream dataset.py) and evaluates PCK@10..50 + MPJPE on the full test split and
the corruption-free test subset (file indices < 487).
Training mirrors upstream run.py/train.py defaults except:
- fp32 only (no fp16 autocast / GradScaler — avoids the BN-poisoning trap
documented in RESULTS.md defect 5; data on disk is already cleaned).
- batch 64 (kept modest: another GPU job may share the 16 GB card tonight).
- scheduler + early stopping keyed on val MPJPE (upstream early-stops on val MPE
with patience 5; same here).
Usage:
venv/bin/python sweep/run_sweep.py --dry-run # param counts only
nohup venv/bin/python sweep/run_sweep.py > sweep/sweep.log 2>&1 &
Idempotent: variants already present in sweep/results.jsonl are skipped.
NOTE: deployed to ruvultra (~/wiflow-std-bench/sweep) as a standalone file, so
it deliberately inlines its helpers. The reference implementations (upstream
import shim, >1GB np.load mmap patch, key-remap loader, canonical evaluate
loop) live in benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
"""
import argparse
import copy
import json
import os
import random
import sys
import time
import numpy as np
import torch
from torch.utils.data import DataLoader, Subset
# csi_windows.npy is ~13 GB; mmap large arrays instead of eagerly loading
# ~15 GB into RAM (same patch as _bench_common._np_load_mmap).
_np_load = np.load
def _np_load_mmap(path, *a, **kw):
if (isinstance(path, str) and path.endswith('.npy')
and os.path.getsize(path) > 1 << 30 and 'mmap_mode' not in kw):
kw['mmap_mode'] = 'r'
return _np_load(path, *a, **kw)
np.load = _np_load_mmap
BENCH = os.path.expanduser('~/wiflow-std-bench')
SWEEP = os.path.join(BENCH, 'sweep')
sys.path.insert(0, os.path.join(BENCH, 'upstream'))
sys.path.insert(0, SWEEP)
from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders # noqa: E402
from losses.pose_loss import PoseLoss # noqa: E402
from utils.metrics import calculate_pck, calculate_mpjpe # noqa: E402
from model_compact import CompactWiFlowPoseModel, describe # noqa: E402
VARIANTS = [
# name, tcn_channels, conv_channels, attn_groups, groups_mode, input_pw_groups
dict(name='half', tcn=[270, 220, 170, 120], conv=[4, 8, 16, 32], attn_groups=4,
groups_mode='gcd20', input_pw_groups=1),
dict(name='quarter', tcn=[135, 110, 85, 60], conv=[2, 4, 8, 16], attn_groups=2,
groups_mode='gcd20', input_pw_groups=1),
dict(name='tiny', tcn=[68, 56, 44, 32], conv=[2, 4, 8, 16], attn_groups=2,
groups_mode='depthwise', input_pw_groups=4),
]
BATCH = 64
EPOCHS = 50
PATIENCE = 5
LR = 1e-4
WEIGHT_DECAY = 5e-5
SEED = 42
CORRUPT_FILE_START = 487 # files 487-499 were zero-filled by clean_nan.py
def set_seed(seed=SEED):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def build_model(v, dropout=0.5):
return CompactWiFlowPoseModel(
tcn_channels=v['tcn'], conv_channels=v['conv'], attn_groups=v['attn_groups'],
groups_mode=v['groups_mode'], input_pw_groups=v['input_pw_groups'],
dropout=dropout)
@torch.no_grad()
def evaluate(model, loader, device):
model.eval()
totals = {t: 0.0 for t in (0.1, 0.2, 0.3, 0.4, 0.5)}
total_mpe, n = 0.0, 0
for bx, by in loader:
bx, by = bx.to(device), by.to(device)
out = model(bx)
bs = by.size(0)
total_mpe += calculate_mpjpe(out, by) * bs
pck = calculate_pck(out, by, thresholds=list(totals))
for t in totals:
totals[t] += pck[t] * bs
n += bs
return {'samples': n, 'mpjpe': total_mpe / n,
**{f'pck@{int(t * 100)}': totals[t] / n for t in totals}}
def train_variant(v, dataset, device):
set_seed(SEED)
train_loader, val_loader, test_loader = create_preprocessed_train_val_test_loaders(
dataset=dataset, batch_size=BATCH, num_workers=2, random_seed=SEED)
set_seed(SEED) # re-seed after split so init is split-independent
model = build_model(v).to(device)
info = describe(model)
print(f"[{v['name']}] params={info['params']:,} tcn_groups={info['tcn_groups_per_block']} "
f"conv_strides={info['conv_strides']} final_width={info['final_width']}", flush=True)
criterion = PoseLoss(position_weight=1.0, bone_weight=0.2, loss_type='smooth_l1')
optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY,
betas=(0.9, 0.999))
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=3, min_lr=LR / 1000,
cooldown=1, threshold=1e-4)
best_val_mpe = float('inf')
best_val_pck20 = 0.0
best_epoch = 0
best_state = None
patience_counter = 0
t0 = time.time()
error = None
epochs_run = 0
for epoch in range(1, EPOCHS + 1):
model.train()
ep_loss, nb = 0.0, 0
te = time.time()
for i, (bx, by) in enumerate(train_loader):
bx = bx.to(device, non_blocking=True)
by = by.to(device, non_blocking=True)
optimizer.zero_grad(set_to_none=True)
out = model(bx)
loss, _parts = criterion(out, by)
if not torch.isfinite(loss):
error = f'non-finite loss at epoch {epoch} step {i}'
break
loss.backward()
optimizer.step()
ep_loss += loss.item()
nb += 1
if epoch == 1 and i % 500 == 0:
print(f"[{v['name']}] e1 step {i}/{len(train_loader)} loss={loss.item():.5f}",
flush=True)
if error:
break
epochs_run = epoch
val = evaluate(model, val_loader, device)
scheduler.step(val['mpjpe'])
lr_now = optimizer.param_groups[0]['lr']
print(f"[{v['name']}] epoch {epoch}/{EPOCHS} train_loss={ep_loss / max(nb, 1):.5f} "
f"val_mpjpe={val['mpjpe']:.5f} val_pck20={val['pck@20'] * 100:.2f}% "
f"lr={lr_now:.2e} ({time.time() - te:.0f}s)", flush=True)
if val['mpjpe'] < best_val_mpe:
best_val_mpe = val['mpjpe']
best_val_pck20 = val['pck@20']
best_epoch = epoch
best_state = copy.deepcopy(model.state_dict())
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= PATIENCE:
print(f"[{v['name']}] early stop at epoch {epoch} (best {best_epoch})", flush=True)
break
train_seconds = time.time() - t0
result = {
'variant': v['name'], 'params': info['params'],
'tcn_channels': v['tcn'], 'conv_channels': v['conv'],
'attn_groups': v['attn_groups'], 'groups_mode': v['groups_mode'],
'input_pw_groups': v['input_pw_groups'],
'tcn_groups_per_block': info['tcn_groups_per_block'],
'conv_strides': info['conv_strides'], 'final_width': info['final_width'],
'batch_size': BATCH, 'max_epochs': EPOCHS, 'patience': PATIENCE,
'lr': LR, 'weight_decay': WEIGHT_DECAY, 'seed': SEED, 'precision': 'fp32',
'epochs_run': epochs_run, 'best_epoch': best_epoch,
'best_val_mpjpe': best_val_mpe if best_state else None,
'best_val_pck20': best_val_pck20 if best_state else None,
'train_seconds': round(train_seconds, 1),
'torch': torch.__version__, 'error': error,
'finished_utc': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
}
if best_state is not None:
ckpt = os.path.join(SWEEP, f"{v['name']}_best.pth")
torch.save(best_state, ckpt)
result['checkpoint'] = ckpt
model.load_state_dict(best_state)
eval_loader = DataLoader(test_loader.dataset, batch_size=256, shuffle=False,
num_workers=2)
result['test_full'] = evaluate(model, eval_loader, device)
w2f = dataset.window_to_file
clean_idx = [i for i in test_loader.dataset.indices if w2f[i] < CORRUPT_FILE_START]
clean_loader = DataLoader(Subset(dataset, clean_idx), batch_size=256,
shuffle=False, num_workers=2)
result['test_clean'] = evaluate(model, clean_loader, device)
print(f"[{v['name']}] TEST clean: pck20={result['test_clean']['pck@20'] * 100:.2f}% "
f"mpjpe={result['test_clean']['mpjpe']:.5f} | full: "
f"pck20={result['test_full']['pck@20'] * 100:.2f}%", flush=True)
return result
def main():
ap = argparse.ArgumentParser()
ap.add_argument('--dry-run', action='store_true', help='print param counts and exit')
args = ap.parse_args()
if args.dry_run:
for v in VARIANTS:
m = build_model(v)
info = describe(m)
x = torch.randn(2, 540, 20)
m.eval()
y = m(x)
print(f"{v['name']:8s} params={info['params']:>9,} "
f"tcn={v['tcn']} conv={v['conv']} attn_g={v['attn_groups']} "
f"mode={v['groups_mode']} pw_g={v['input_pw_groups']} "
f"tcn_groups={info['tcn_groups_per_block']} strides={info['conv_strides']} "
f"W'={info['final_width']} out={tuple(y.shape)}")
return
results_path = os.path.join(SWEEP, 'results.jsonl')
done = set()
if os.path.exists(results_path):
with open(results_path) as f:
for line in f:
try:
done.add(json.loads(line)['variant'])
except Exception:
pass
device = torch.device('cuda')
print(f"torch {torch.__version__} on {torch.cuda.get_device_name(0)}", flush=True)
data_dir = os.path.join(BENCH, 'preprocessed_csi_data')
dataset = PreprocessedCSIKeypointsDataset(data_dir=data_dir, keypoint_scale=1000.0,
enable_temporal_clean=True)
for v in VARIANTS:
if v['name'] in done:
print(f"[{v['name']}] already in results.jsonl — skipping", flush=True)
continue
print(f"\n===== variant: {v['name']} =====", flush=True)
try:
result = train_variant(v, dataset, device)
except Exception as e: # record and move on to next variant
import traceback
traceback.print_exc()
result = {'variant': v['name'], 'error': repr(e),
'finished_utc': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())}
with open(results_path, 'a') as f:
f.write(json.dumps(result) + '\n')
f.flush()
print('\nSWEEP COMPLETE', flush=True)
if __name__ == '__main__':
main()
Binary file not shown.
@@ -1,772 +0,0 @@
{
"torch": {
"env": {
"torch": "2.12.0+cpu",
"platform": "Windows-11-10.0.26200-SP0",
"processor": "Intel64 Family 6 Model 197 Stepping 2, GenuineIntel",
"num_threads": 16,
"checkpoint": "results\\retrained_best_pose_model.pth",
"params": 2225042
},
"variants": {
"fp32": {
"file": "retrained_fp32_resaved.pth",
"size_bytes": 9068948,
"size_mb": 9.068948,
"latency_batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 24.903650000851485,
"median_ms_per_window": 24.903650000851485,
"windows_per_second": 40.15475642991324
},
"latency_batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 184.02919999789447,
"median_ms_per_window": 2.875456249967101,
"windows_per_second": 347.77089723115813
},
"accuracy": {
"samples": 10000,
"pck@20": 0.9668200004577636,
"pck@50": 0.9915333324432373,
"mpjpe": 0.00936222033649683,
"wall_seconds": 37.85407733917236
}
},
"fp16": {
"file": "retrained_fp16.pth",
"size_bytes": 4580332,
"size_mb": 4.580332,
"latency_batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 23.936699999467237,
"median_ms_per_window": 23.936699999467237,
"windows_per_second": 41.776853117691964
},
"latency_batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 102.32584999903338,
"median_ms_per_window": 1.5988414062348966,
"windows_per_second": 625.4529036465817
},
"accuracy": {
"samples": 10000,
"pck@20": 0.966773332977295,
"pck@50": 0.9915066654205322,
"mpjpe": 0.009460017587244511,
"wall_seconds": 21.632277250289917
}
},
"int8_dynamic": {
"file": "retrained_int8_dynamic.pth",
"size_bytes": 9068948,
"size_mb": 9.068948,
"latency_batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 18.105350000041653,
"median_ms_per_window": 18.105350000041653,
"windows_per_second": 55.23229321707117
},
"latency_batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 168.77549999844632,
"median_ms_per_window": 2.6371171874757238,
"windows_per_second": 379.20195763359703
},
"accuracy": {
"samples": 10000,
"pck@20": 0.9668200004577636,
"pck@50": 0.9915333324432373,
"mpjpe": 0.00936222033649683,
"wall_seconds": 45.35376596450806
}
}
},
"int8_dynamic_quant_report": {
"eligible_module_counts": {
"nn.Linear": 0,
"nn.Conv1d": 21,
"nn.Conv2d": 22
},
"modules_actually_quantized": [],
"n_modules_quantized": 0,
"params_total": 2225042,
"params_quantized": 0,
"params_quantized_fraction": 0.0
},
"accuracy_subset": {
"description": "seed-42 file-level 70/15/15 test split, corrupted windows (files 487-499) excluded, seed-42 random subset",
"subset_size": 10000,
"clean_test_total": 10000
}
},
"onnx": {
"env": {
"torch": "2.12.0+cpu",
"onnxruntime": "1.26.0",
"platform": "Windows-11-10.0.26200-SP0"
},
"export": {
"mode": "dynamic-batch",
"exporter": "torchscript",
"file": "retrained_fp32_dynamic.onnx",
"size_mb": 8.971781
},
"parity": {
"fixture": "results/parity_fixture.npz (batch 2, seed 42)",
"max_abs_diff_vs_stored_fixture": 2.384185791015625e-07,
"max_abs_diff_vs_torch_now": 2.384185791015625e-07,
"pass_lt_1e-4": true
},
"latency": {
"batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 2.5410999987798277,
"median_ms_per_window": 2.5410999987798277,
"windows_per_second": 393.5303610563043
},
"batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 181.95204999938142,
"median_ms_per_window": 2.8430007812403346,
"windows_per_second": 351.7410218803118
}
},
"ort_int8_dynamic_supplementary": {
"file": "retrained_int8_ort_dynamic.onnx",
"size_mb": 2.438794,
"runs": true,
"max_abs_diff_vs_fp32_fixture": 0.00827130675315857
}
},
"onnx_accuracy": {
"onnx_fp32": {
"samples": 10000,
"pck@20": 0.9668200004577636,
"pck@50": 0.9915333324432373,
"mpjpe": 0.00936222568154335,
"wall_seconds": 22.34790802001953
},
"onnx_int8_ort_dynamic": {
"samples": 10000,
"pck@20": 0.965240001964569,
"pck@50": 0.9915466655731201,
"mpjpe": 0.01108054072111845,
"wall_seconds": 55.742953062057495
}
},
"latency_controlled_rerun": {
"note": "3 interleaved repetitions per variant, median ms/window; quiet box",
"fp32": {
"batch1_ms_per_window_median": 10.969150001983508,
"batch1_reps": [
10.969150001983508,
12.646450000829645,
10.49820000116597
],
"batch64_ms_per_window_median": 2.2734187500077496,
"batch64_reps": [
2.377234374989712,
2.124126562478068,
2.2734187500077496
]
},
"fp16": {
"batch1_ms_per_window_median": 24.313550000442774,
"batch1_reps": [
25.1078499986761,
21.856999999727122,
24.313550000442774
],
"batch64_ms_per_window_median": 2.414695312495496,
"batch64_reps": [
2.5705156249955508,
1.7137437499741281,
2.414695312495496
]
},
"int8_dynamic": {
"batch1_ms_per_window_median": 15.627150000000256,
"batch1_reps": [
17.67525000104797,
14.627999998992891,
15.627150000000256
],
"batch64_ms_per_window_median": 2.0546906250160646,
"batch64_reps": [
2.0546906250160646,
2.03407343752815,
2.9325796875241394
]
},
"onnx_fp32": {
"batch1_ms_per_window_median": 3.186650001225644,
"batch1_reps": [
2.7332500012562377,
3.1995500012271805,
3.186650001225644
],
"batch64_ms_per_window_median": 1.9893374999924163,
"batch64_reps": [
1.5590843750032946,
1.9893374999924163,
2.2144343749914697
]
},
"onnx_int8_ort_dynamic": {
"batch1_ms_per_window_median": 6.50984999811044,
"batch1_reps": [
6.50984999811044,
6.455249998907675,
6.789299999581999
],
"batch64_ms_per_window_median": 5.770093750015803,
"batch64_reps": [
5.770093750015803,
3.912374999970325,
7.8067296875019565
]
}
},
"onnx_static_ptq": {
"env": {
"onnxruntime": "1.26.0",
"torch": "2.12.0+cpu",
"platform": "Windows-11-10.0.26200-SP0",
"source_model": "retrained_fp32_dynamic.onnx",
"preprocessed_model": {
"file": "retrained_fp32_preproc.onnx",
"size_mb": 8.981529
}
},
"variants": {
"minmax_all": {
"file": "retrained_int8_static_minmax_all.onnx",
"size_bytes": 2604286,
"size_mb": 2.604286,
"calibration": {
"method": "minmax",
"windows": 1000,
"percentile": null,
"seconds": 5.052440166473389
},
"scope": "all",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 283,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 181,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.015945255756378174,
"accuracy": {
"samples": 10000,
"pck@20": 0.9545266661643982,
"pck@50": 0.9913666645050049,
"mpjpe": 0.014860070134699345,
"wall_seconds": 43.455235958099365
}
},
"minmax_conv": {
"file": "retrained_int8_static_minmax_conv.onnx",
"size_bytes": 2527421,
"size_mb": 2.527421,
"calibration": {
"method": "minmax",
"windows": 1000,
"percentile": null,
"seconds": 4.380746126174927
},
"scope": "conv",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 156,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 78,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.010693132877349854,
"accuracy": {
"samples": 10000,
"pck@20": 0.9663399996757507,
"pck@50": 0.9918666641235352,
"mpjpe": 0.01084446222037077,
"wall_seconds": 35.937947034835815
}
},
"entropy_all": {
"file": "retrained_int8_static_entropy_all.onnx",
"size_bytes": 2604268,
"size_mb": 2.604268,
"calibration": {
"method": "entropy",
"windows": 512,
"percentile": null,
"seconds": 23.835066318511963
},
"scope": "all",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 283,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 181,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.015280365943908691,
"accuracy": {
"samples": 10000,
"pck@20": 0.9530466662406921,
"pck@50": 0.9912600006103516,
"mpjpe": 0.015098519864678382,
"wall_seconds": 51.514281034469604
}
},
"entropy_conv": {
"file": "retrained_int8_static_entropy_conv.onnx",
"size_bytes": 2527403,
"size_mb": 2.527403,
"calibration": {
"method": "entropy",
"windows": 512,
"percentile": null,
"seconds": 9.634419918060303
},
"scope": "conv",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 156,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 78,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.012535125017166138,
"accuracy": {
"samples": 10000,
"pck@20": 0.9659599989891052,
"pck@50": 0.9918666648864746,
"mpjpe": 0.010778637571632861,
"wall_seconds": 41.01180171966553
}
},
"percentile_all": {
"file": "retrained_int8_static_percentile_all.onnx",
"size_bytes": 2604052,
"size_mb": 2.604052,
"calibration": {
"method": "percentile",
"windows": 512,
"percentile": 99.99,
"seconds": 20.221954584121704
},
"scope": "all",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 283,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 181,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.017689883708953857,
"accuracy": {
"samples": 10000,
"pck@20": 0.9639333323478698,
"pck@50": 0.9916799991607667,
"mpjpe": 0.012176512064039708,
"wall_seconds": 49.365190744400024
}
},
"percentile_conv": {
"file": "retrained_int8_static_percentile_conv.onnx",
"size_bytes": 2527241,
"size_mb": 2.527241,
"calibration": {
"method": "percentile",
"windows": 512,
"percentile": 99.99,
"seconds": 8.223475694656372
},
"scope": "conv",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 156,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 78,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.014725983142852783,
"accuracy": {
"samples": 10000,
"pck@20": 0.9660599988937378,
"pck@50": 0.9916066654205322,
"mpjpe": 0.010310938355326652,
"wall_seconds": 36.89548587799072
}
}
},
"latency": {
"note": "3 interleaved repetitions per variant, median ms/window; onnx_fp32 / onnx_int8_ort_dynamic are same-session references",
"onnx_fp32": {
"batch1_reps": [
4.5327999996516155,
2.535649999117595,
2.167549997466267
],
"batch64_reps": [
1.9354515624740998,
2.4948054687854437,
1.9334703125082342
],
"batch1_ms_per_window_median": 2.535649999117595,
"batch64_ms_per_window_median": 1.9354515624740998
},
"onnx_int8_ort_dynamic": {
"batch1_reps": [
5.698599999959697,
5.721350000385428,
4.805099997611251
],
"batch64_reps": [
4.096601562508795,
4.857628124995017,
4.583800000006022
],
"batch1_ms_per_window_median": 5.698599999959697,
"batch64_ms_per_window_median": 4.583800000006022
},
"entropy_all": {
"batch1_reps": [
6.444149999879301,
5.038299999796436,
5.713200000172947
],
"batch64_reps": [
4.149468750028973,
3.437125000004926,
4.410960937491382
],
"batch1_ms_per_window_median": 5.713200000172947,
"batch64_ms_per_window_median": 4.149468750028973
},
"entropy_conv": {
"batch1_reps": [
4.874750000453787,
5.169099998965976,
5.236699998931726
],
"batch64_reps": [
3.010160156236452,
3.1175546875203963,
3.516850781238645
],
"batch1_ms_per_window_median": 5.169099998965976,
"batch64_ms_per_window_median": 3.1175546875203963
},
"percentile_all": {
"batch1_reps": [
5.184749999898486,
5.2898499998264015,
5.916899999647285
],
"batch64_reps": [
4.305105468745296,
4.460741406262514,
4.184502343747454
],
"batch1_ms_per_window_median": 5.2898499998264015,
"batch64_ms_per_window_median": 4.305105468745296
},
"percentile_conv": {
"batch1_reps": [
4.916449999655015,
7.150899999032845,
5.284949998895172
],
"batch64_reps": [
3.855813281262499,
4.688969531230214,
5.220103124997877
],
"batch1_ms_per_window_median": 5.284949998895172,
"batch64_ms_per_window_median": 4.688969531230214
},
"minmax_all": {
"batch1_reps": [
6.463300000177696,
7.149449998905766,
5.3209000016067876
],
"batch64_reps": [
3.9251343750095202,
4.033442187505898,
3.428199218745931
],
"batch1_ms_per_window_median": 6.463300000177696,
"batch64_ms_per_window_median": 3.9251343750095202
},
"minmax_conv": {
"batch1_reps": [
5.9961499991914025,
5.236549999608542,
4.854399998293957
],
"batch64_reps": [
4.368359375007458,
3.249617187492504,
3.0238906249735464
],
"batch1_ms_per_window_median": 5.236549999608542,
"batch64_ms_per_window_median": 3.249617187492504
}
},
"accuracy_subset": {
"description": "seed-42 file-level 70/15/15 test split, corrupted windows excluded, seed-42 random subset (same as quantize_bench/eval_ort_accuracy)",
"subset_size": 10000
}
},
"tiny_variant": {
"env": {
"torch": "2.12.0+cpu",
"onnxruntime": "1.26.0",
"platform": "Windows-11-10.0.26200-SP0",
"num_threads": 16,
"checkpoint": "results\\tiny_best.pth",
"checkpoint_size_bytes": 340555,
"params": 56290,
"variant_config": {
"tcn": [
68,
56,
44,
32
],
"conv": [
2,
4,
8,
16
],
"attn_groups": 2,
"groups_mode": "depthwise",
"input_pw_groups": 4
}
},
"export": {
"mode": "dynamic-batch",
"exporter": "torchscript",
"opset": 17,
"file": "tiny_fp32_dynamic.onnx",
"size_bytes": 295279,
"size_mb": 0.295279,
"verified_batches": [
1,
2,
64
],
"note": "AdaptiveAvgPool2d((15,1)) replaced at export by an exact mean(-1) + constant averaging matmul (final_width 16 is not a multiple of 15, which the TorchScript exporter rejects); exactness proven by the parity check vs the original torch model"
},
"parity": {
"fixture": "results/parity_fixture.npz input (batch 2, seed 42); reference output recomputed with the tiny torch model",
"max_abs_diff_vs_torch": 1.4901161193847656e-07,
"pass_lt_1e-4": true
},
"int8_static_percentile_conv": {
"file": "tiny_int8_static_percentile_conv.onnx",
"size_bytes": 248278,
"size_mb": 0.248278,
"calibration": {
"method": "percentile",
"percentile": 99.99,
"windows": 512,
"scope": "conv-only TRAIN-split corruption-free",
"seconds": 1.5347836017608643
},
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"max_abs_diff_vs_fp32_fixture": 0.018491357564926147
},
"latency": {
"note": "3 interleaved repetitions per variant, median ms/window; full-model sessions are same-session references",
"tiny_onnx_fp32": {
"batch1_reps": [
0.6312500008789357,
0.6834500018157996,
0.6595999984710943
],
"batch64_reps": [
0.37747578119251557,
0.24196640623586063,
0.2314671875183194
],
"batch1_ms_per_window_median": 0.6595999984710943,
"batch64_ms_per_window_median": 0.24196640623586063
},
"tiny_onnx_int8_static_percentile_conv": {
"batch1_reps": [
0.7988500001374632,
0.9382499993080273,
0.8451000030618161
],
"batch64_reps": [
0.9211476562995813,
1.3045390625165965,
1.026230468767153
],
"batch1_ms_per_window_median": 0.8451000030618161,
"batch64_ms_per_window_median": 1.026230468767153
},
"full_onnx_fp32_reference": {
"batch1_reps": [
2.267249998112675,
2.80170000041835,
2.132149998942623
],
"batch64_reps": [
1.3050578124875756,
1.4244992187855132,
1.8014164062947202
],
"batch1_ms_per_window_median": 2.267249998112675,
"batch64_ms_per_window_median": 1.4244992187855132
},
"full_onnx_int8_static_percentile_conv_reference": {
"batch1_reps": [
5.529599999135826,
4.768399998283712,
6.215800000063609
],
"batch64_reps": [
3.815724218725336,
3.1025562500417436,
4.333318749957016
],
"batch1_ms_per_window_median": 5.529599999135826,
"batch64_ms_per_window_median": 3.815724218725336
}
},
"accuracy_subset": {
"description": "seed-42 file-level 70/15/15 test split, corrupted windows excluded, seed-42 random subset (same as quantize_bench/eval_ort_accuracy/static_ptq_bench)",
"subset_size": 10000
},
"accuracy": {
"tiny_onnx_fp32": {
"samples": 10000,
"pck@20": 0.941106667804718,
"pck@50": 0.99369333152771,
"mpjpe": 0.012527281279861927,
"wall_seconds": 10.927234888076782
},
"tiny_onnx_int8_static_percentile_conv": {
"samples": 10000,
"pck@20": 0.9268133331298828,
"pck@50": 0.9932933319091797,
"mpjpe": 0.014906252065300942,
"wall_seconds": 12.320892333984375
}
}
}
}
@@ -1,3 +0,0 @@
{"variant": "half", "params": 843834, "tcn_channels": [270, 220, 170, 120], "conv_channels": [4, 8, 16, 32], "attn_groups": 4, "groups_mode": "gcd20", "input_pw_groups": 1, "tcn_groups_per_block": [[20, 10], [10, 20], [20, 10], [10, 20]], "conv_strides": [2, 2, 2, 1], "final_width": 15, "batch_size": 64, "max_epochs": 50, "patience": 5, "lr": 0.0001, "weight_decay": 5e-05, "seed": 42, "precision": "fp32", "epochs_run": 28, "best_epoch": 23, "best_val_mpjpe": 0.008576328293592842, "best_val_pck20": 0.9690593021534107, "train_seconds": 1346.4, "torch": "2.11.0+cu128", "error": null, "finished_utc": "2026-06-11T03:09:47Z", "checkpoint": "/home/ruvultra/wiflow-std-bench/sweep/half_best.pth", "test_full": {"samples": 54000, "mpjpe": 0.009419974447676428, "pck@10": 0.8740543655289544, "pck@20": 0.9610469643628156, "pck@30": 0.9813556064146537, "pck@40": 0.9896086878246731, "pck@50": 0.9934827546013726}, "test_clean": {"samples": 52560, "mpjpe": 0.008980081718602137, "pck@10": 0.8840944136840205, "pck@20": 0.9662253179869514, "pck@30": 0.9847971080282144, "pck@40": 0.9917795997050618, "pck@50": 0.9946956242600532}}
{"variant": "quarter", "params": 338600, "tcn_channels": [135, 110, 85, 60], "conv_channels": [2, 4, 8, 16], "attn_groups": 2, "groups_mode": "gcd20", "input_pw_groups": 1, "tcn_groups_per_block": [[20, 5], [5, 10], [10, 5], [5, 20]], "conv_strides": [2, 2, 1, 1], "final_width": 15, "batch_size": 64, "max_epochs": 50, "patience": 5, "lr": 0.0001, "weight_decay": 5e-05, "seed": 42, "precision": "fp32", "epochs_run": 50, "best_epoch": 50, "best_val_mpjpe": 0.008780752391864856, "best_val_pck20": 0.9672531302240159, "train_seconds": 1754.4, "torch": "2.11.0+cu128", "error": null, "finished_utc": "2026-06-11T03:39:06Z", "checkpoint": "/home/ruvultra/wiflow-std-bench/sweep/quarter_best.pth", "test_full": {"samples": 54000, "mpjpe": 0.009705399298005634, "pck@10": 0.8646123917014511, "pck@20": 0.9553815319449813, "pck@30": 0.979827209190086, "pck@40": 0.9887037501511751, "pck@50": 0.9931309027671814}, "test_clean": {"samples": 52560, "mpjpe": 0.009279253277105465, "pck@10": 0.8742288637923323, "pck@20": 0.9605315079427745, "pck@30": 0.9833016723076865, "pck@40": 0.9908206971631566, "pck@50": 0.9942719799017071}}
{"variant": "tiny", "params": 56290, "tcn_channels": [68, 56, 44, 32], "conv_channels": [2, 4, 8, 16], "attn_groups": 2, "groups_mode": "depthwise", "input_pw_groups": 4, "tcn_groups_per_block": [[540, 68], [68, 56], [56, 44], [44, 32]], "conv_strides": [2, 1, 1, 1], "final_width": 16, "batch_size": 64, "max_epochs": 50, "patience": 5, "lr": 0.0001, "weight_decay": 5e-05, "seed": 42, "precision": "fp32", "epochs_run": 50, "best_epoch": 47, "best_val_mpjpe": 0.012602971208592256, "best_val_pck20": 0.9397210340146666, "train_seconds": 1540.1, "torch": "2.11.0+cu128", "error": null, "finished_utc": "2026-06-11T04:04:50Z", "checkpoint": "/home/ruvultra/wiflow-std-bench/sweep/tiny_best.pth", "test_full": {"samples": 54000, "mpjpe": 0.012859782406853305, "pck@10": 0.7640358444319831, "pck@20": 0.9364815320968628, "pck@30": 0.9731568422317505, "pck@40": 0.9866444962642811, "pck@50": 0.992488939108672}, "test_clean": {"samples": 52560, "mpjpe": 0.012502924276904246, "pck@10": 0.770895526488985, "pck@20": 0.9411073559313967, "pck@30": 0.9764840687790962, "pck@40": 0.9886695077067278, "pck@50": 0.9936238432039409}}
@@ -1,21 +0,0 @@
{
"checkpoint": "/home/ruvultra/wiflow-std-bench/upstream/test/best_pose_model.pth",
"test_full": {
"samples": 54000,
"mpjpe": 0.009834060806367133,
"pck@10": 0.8686346120127925,
"pck@20": 0.9608815324571398,
"pck@30": 0.9789111610695168,
"pck@40": 0.9857975759682832,
"pck@50": 0.9898827553325229
},
"test_clean": {
"samples": 52560,
"mpjpe": 0.009432755044379373,
"pck@10": 0.876996495807189,
"pck@20": 0.9661454100405608,
"pck@30": 0.9823453060205306,
"pck@40": 0.987909734176537,
"pck@50": 0.9911238361167036
}
}
File diff suppressed because it is too large Load Diff
Binary file not shown.
@@ -1,32 +0,0 @@
{
"published": {
"pck@20": 0.9725,
"pck@30": 0.9863,
"pck@40": 0.9916,
"pck@50": 0.9948,
"mpjpe": 0.007
},
"params_millions": 2.225042,
"data_dir": "C:\\Users\\ruv\\.cache\\kagglehub\\datasets\\kaka2434\\wiflow-dataset\\versions\\1\\preprocessed_csi_data",
"device": "cpu",
"test_full": {
"samples": 54000,
"mpjpe": NaN,
"pck@10": 5.6790124349020145e-05,
"pck@20": 0.0007876543271596785,
"pck@30": 0.007780246982971827,
"pck@40": 0.05529259262923841,
"pck@50": 0.1542370371548114,
"wall_seconds": 118.03756999969482
},
"test_drop_last": {
"samples": 53952,
"mpjpe": NaN,
"pck@10": 5.6840649370682976e-05,
"pck@20": 0.0007883550872372227,
"pck@30": 0.007787168910892621,
"pck@40": 0.055318307667895535,
"pck@50": 0.15425316342412276,
"wall_seconds": 120.87458372116089
}
}
Binary file not shown.
-333
View File
@@ -1,333 +0,0 @@
"""ADR-152 edge optimization follow-up: ONNX Runtime STATIC post-training
quantization (calibration-based QDQ) of the retrained WiFlow-STD model, to
improve on the dynamic-int8 result (2.44 MB, PCK@20 96.52%, 6.5 ms/win b1).
Static PTQ pre-computes activation ranges from calibration data, so inference
uses QLinearConv/QDQ kernels instead of dynamic ConvInteger -- typically both
faster and (with good calibration) closer to fp32 accuracy.
Method:
- Calibration set: corruption-free windows drawn ONLY from the seed-42
file-level TRAINING split (same split as eval_repro.py; corrupted windows
excluded via results/nan_windows_mask.npy | big_windows_mask.npy), chosen
with np.random.default_rng(42). Never test windows.
- quantize_static, QuantFormat.QDQ, per-channel int8 weights, int8
activations; calibration methods MinMax / Entropy / Percentile(99.99);
scopes "all" (ORT default op set) vs "conv" (op_types_to_quantize=
["Conv"] -- leaves the attention path, which exports as Einsum/Softmax
and elementwise ops, in fp32).
- Model is pre-processed first (quant_pre_process: symbolic shape
inference + ORT graph optimization, folds BatchNormalization into Conv).
- Accuracy: identical protocol to eval_ort_accuracy.py -- the 10,000-window
seed-42 subset of the corruption-free test split (PCK@20/50, MPJPE).
- Latency: median ms/window at batch 1 (100 runs) and batch 64 (30 runs),
3 interleaved repetitions across all variants (fp32 and dynamic-int8
sessions included as same-session reference points).
Usage:
PYTHONUTF8=1 .venv/Scripts/python.exe static_ptq_bench.py \
[--data-dir <preprocessed_csi_data>] [--subset 10000]
[--calib-minmax 1000] [--calib-hist 512] [--skip-accuracy]
Writes/merges into results/edge_optimization.json under key "onnx_static_ptq".
"""
import argparse
import collections
import json
import os
import platform
import statistics
import sys
import time
import numpy as np
import torch
HERE = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, HERE)
from _bench_common import RESULTS # noqa: E402
# quantize_bench sets up upstream imports + the np.load mmap patch
# (both via _bench_common.import_upstream)
from quantize_bench import build_test_subset # noqa: E402
import quantize_bench as qb # noqa: E402
from eval_ort_accuracy import evaluate_ort # noqa: E402
FP32_ONNX = os.path.join(RESULTS, "retrained_fp32_dynamic.onnx")
DYN_INT8_ONNX = os.path.join(RESULTS, "retrained_int8_ort_dynamic.onnx")
PREPROC_ONNX = os.path.join(RESULTS, "retrained_fp32_preproc.onnx")
# ---------------------------------------------------------------------------
# calibration data: corruption-free TRAINING-split windows only
# ---------------------------------------------------------------------------
def build_calibration_windows(data_dir, n_windows):
"""Seed-42 file-level 70/15/15 TRAIN split (exactly as eval_repro.py),
minus corrupted windows, then a seed-42 random draw of n_windows."""
dataset = qb.PreprocessedCSIKeypointsDataset(
data_dir=data_dir, keypoint_scale=1000.0, enable_temporal_clean=True)
train_loader, _va, _te = qb.create_preprocessed_train_val_test_loaders(
dataset=dataset, batch_size=64, num_workers=0, random_seed=42)
train_indices = np.asarray(train_loader.dataset.indices)
corrupted = (np.load(os.path.join(RESULTS, "nan_windows_mask.npy"))
| np.load(os.path.join(RESULTS, "big_windows_mask.npy")))
clean = train_indices[~corrupted[train_indices]]
print(f"train split: {len(train_indices)} windows, "
f"{len(train_indices) - len(clean)} corrupted excluded, "
f"{len(clean)} clean")
rng = np.random.default_rng(42)
sel = np.sort(rng.choice(clean, size=n_windows, replace=False))
xs = np.stack([dataset[int(i)][0].numpy() for i in sel]).astype(np.float32)
print(f"calibration tensor: {xs.shape} from {n_windows} clean TRAIN windows")
return xs
def make_reader(windows, batch_size=64):
from onnxruntime.quantization import CalibrationDataReader
class WindowReader(CalibrationDataReader):
def __init__(self):
self._batches = [windows[i:i + batch_size]
for i in range(0, len(windows), batch_size)]
self._it = iter(self._batches)
def get_next(self):
b = next(self._it, None)
return None if b is None else {"input": b}
def rewind(self):
self._it = iter(self._batches)
def __len__(self):
return len(self._batches)
return WindowReader()
# ---------------------------------------------------------------------------
# quantization variants
# ---------------------------------------------------------------------------
def preprocess_model():
from onnxruntime.quantization.shape_inference import quant_pre_process
quant_pre_process(FP32_ONNX, PREPROC_ONNX)
return PREPROC_ONNX
def quantize_variant(src, dst, method, scope, calib_windows):
from onnxruntime.quantization import (CalibrationMethod, QuantFormat,
QuantType, quantize_static)
methods = {
"minmax": CalibrationMethod.MinMax,
"entropy": CalibrationMethod.Entropy,
"percentile": CalibrationMethod.Percentile,
}
# NB: do NOT pass CalibMaxIntermediateOutputs -- in ORT 1.26 the MinMax
# calibrater clears its buffer every N batches and then raises
# "No data is collected" if the batch count is divisible by N.
extra = {}
if method == "percentile":
extra["CalibPercentile"] = 99.99
op_types = ["Conv"] if scope == "conv" else None
t0 = time.time()
quantize_static(
src, dst, make_reader(calib_windows),
quant_format=QuantFormat.QDQ,
op_types_to_quantize=op_types,
per_channel=True,
activation_type=QuantType.QInt8,
weight_type=QuantType.QInt8,
calibrate_method=methods[method],
extra_options=extra,
)
secs = time.time() - t0
import onnx
ops = collections.Counter(n.op_type for n in onnx.load(dst).graph.node)
return {
"file": os.path.basename(dst),
"size_bytes": os.path.getsize(dst),
"size_mb": os.path.getsize(dst) / 1e6,
"calibration": {"method": method,
"windows": int(len(calib_windows)),
"percentile": extra.get("CalibPercentile"),
"seconds": secs},
"scope": scope,
"per_channel": True,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {k: v for k, v in sorted(ops.items())},
}
# ---------------------------------------------------------------------------
# latency (3 interleaved reps, like the latency_controlled_rerun)
# ---------------------------------------------------------------------------
def ort_session(path):
import onnxruntime as ort
return ort.InferenceSession(path, providers=["CPUExecutionProvider"])
def bench_ort(sess, batch, n_runs):
rng = np.random.default_rng(123)
x = rng.random((batch, 540, 20), dtype=np.float32)
inp = sess.get_inputs()[0].name
for _ in range(max(5, n_runs // 10)):
sess.run(None, {inp: x})
times = []
for _ in range(n_runs):
t0 = time.perf_counter()
sess.run(None, {inp: x})
times.append(time.perf_counter() - t0)
return statistics.median(times) * 1e3 / batch # ms/window
def interleaved_latency(sessions, reps=3, runs_b1=100, runs_b64=30):
lat = {name: {"batch1_reps": [], "batch64_reps": []} for name in sessions}
for rep in range(reps):
for name, sess in sessions.items():
lat[name]["batch1_reps"].append(bench_ort(sess, 1, runs_b1))
lat[name]["batch64_reps"].append(bench_ort(sess, 64, runs_b64))
print(f" rep {rep + 1}/{reps} {name}: "
f"b1={lat[name]['batch1_reps'][-1]:.2f} "
f"b64={lat[name]['batch64_reps'][-1]:.3f} ms/win", flush=True)
for name in lat:
lat[name]["batch1_ms_per_window_median"] = statistics.median(
lat[name]["batch1_reps"])
lat[name]["batch64_ms_per_window_median"] = statistics.median(
lat[name]["batch64_reps"])
return lat
# ---------------------------------------------------------------------------
def main():
import onnxruntime
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", default=os.path.join(
os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
"wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
parser.add_argument("--subset", type=int, default=10000)
parser.add_argument("--calib-minmax", type=int, default=1000)
parser.add_argument("--calib-hist", type=int, default=512,
help="calibration windows for Entropy/Percentile "
"(histogram calibraters hold all intermediate "
"activations in RAM)")
parser.add_argument("--skip-accuracy", action="store_true")
parser.add_argument("--methods", default="minmax,entropy,percentile",
help="comma list of calibration methods to (re)run; "
"results merge into existing onnx_static_ptq")
parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
args = parser.parse_args()
results = {
"env": {
"onnxruntime": onnxruntime.__version__,
"torch": torch.__version__,
"platform": platform.platform(),
"source_model": os.path.basename(FP32_ONNX),
},
"variants": {},
}
# ---- calibration data (TRAIN split only) -------------------------------
calib_mm = build_calibration_windows(args.data_dir, args.calib_minmax)
calib_hist = calib_mm[:args.calib_hist]
# ---- preprocess + quantize ---------------------------------------------
print("\n=== quant_pre_process (shape inference + graph optimization) ===")
src = preprocess_model()
results["env"]["preprocessed_model"] = {
"file": os.path.basename(src),
"size_mb": os.path.getsize(src) / 1e6,
}
matrix = [(m, s) for m in args.methods.split(",")
for s in ("all", "conv")]
for method, scope in matrix:
name = f"{method}_{scope}"
dst = os.path.join(RESULTS, f"retrained_int8_static_{name}.onnx")
calib = calib_mm if method == "minmax" else calib_hist
print(f"\n=== quantize_static: {name} "
f"({len(calib)} calib windows) ===", flush=True)
try:
results["variants"][name] = quantize_variant(
src, dst, method, scope, calib)
print(f" {results['variants'][name]['size_mb']:.3f} MB")
except Exception as e: # noqa: BLE001
results["variants"][name] = {"error": f"{type(e).__name__}: {e}"}
print(f" FAILED: {e}")
# ---- fixture parity (sanity, batch 2) ----------------------------------
fixture = np.load(os.path.join(RESULTS, "parity_fixture.npz"))
fx, fy = fixture["input"], fixture["output"]
sessions = {}
for name, info in results["variants"].items():
if "error" in info:
continue
path = os.path.join(RESULTS, info["file"])
try:
sess = ort_session(path)
yq = sess.run(None, {sess.get_inputs()[0].name: fx})[0]
info["max_abs_diff_vs_fp32_fixture"] = float(np.abs(yq - fy).max())
sessions[name] = sess
except Exception as e: # noqa: BLE001
info["run_error"] = f"{type(e).__name__}: {e}"
print("\nfixture max-abs-diff vs fp32:",
{n: round(results["variants"][n].get("max_abs_diff_vs_fp32_fixture",
float("nan")), 5)
for n in results["variants"]})
# ---- latency: 3 interleaved reps incl. fp32 + dynamic-int8 reference ----
print("\n=== latency (3 interleaved reps) ===")
lat_sessions = {"onnx_fp32": ort_session(FP32_ONNX),
"onnx_int8_ort_dynamic": ort_session(DYN_INT8_ONNX)}
lat_sessions.update(sessions)
results["latency"] = {
"note": "3 interleaved repetitions per variant, median ms/window; "
"onnx_fp32 / onnx_int8_ort_dynamic are same-session references",
**interleaved_latency(lat_sessions),
}
# ---- accuracy on the standard 10k corruption-free test subset ----------
if not args.skip_accuracy:
loader, n_clean = build_test_subset(args.data_dir, args.subset)
results["accuracy_subset"] = {
"description": "seed-42 file-level 70/15/15 test split, corrupted "
"windows excluded, seed-42 random subset (same as "
"quantize_bench/eval_ort_accuracy)",
"subset_size": min(args.subset, n_clean) if args.subset else n_clean,
}
for name, sess in sessions.items():
print(f"\n=== accuracy: {name} ===")
results["variants"][name]["accuracy"] = evaluate_ort(
sess, loader, name)
print(json.dumps(results["variants"][name]["accuracy"], indent=2))
# ---- merge into edge_optimization.json ----------------------------------
merged = {}
if os.path.exists(args.out):
with open(args.out) as f:
merged = json.load(f)
prev = merged.get("onnx_static_ptq")
if prev: # nested merge so partial --methods reruns don't clobber
prev["env"] = results["env"]
prev["variants"].update(results["variants"])
prev.setdefault("latency", {}).update(results["latency"])
if "accuracy_subset" in results:
prev["accuracy_subset"] = results["accuracy_subset"]
else:
merged["onnx_static_ptq"] = results
with open(args.out, "w") as f:
json.dump(merged, f, indent=2)
print(f"\nwrote {args.out}")
if __name__ == "__main__":
main()
-313
View File
@@ -1,313 +0,0 @@
"""ADR-152 efficiency-sweep follow-up: edge pipeline for the TINY compact
WiFlow-STD variant (56,290 params, results/tiny_best.pth, trained overnight
2026-06-10/11 -- see RESULTS.md "Efficiency sweep").
Headline question: what does the smallest deployable WiFlow-class model look
like (KB + ms + PCK)? Reuses the onnx_bench.py / static_ptq_bench.py
machinery on the tiny checkpoint:
1. Load tiny_best.pth with remote/sweep/model_compact.py
(depthwise TCN groups, input_pw_groups=4, conv [2,4,8,16], attn groups 2).
2. Export ONNX: dynamic batch, opset 17, TorchScript exporter (dynamo=False)
-- same recipe that worked for the full model; verified at batch 1/2/64.
One forced deviation: tiny's stride schedule [2,1,1,1] leaves final_width
16, and the TorchScript exporter cannot export AdaptiveAvgPool2d((15,1))
when 15 is not a factor of the input height (the full model never hit
this -- its width was exactly 15). The adaptive pool over a fixed-size
feature map is a fixed linear map, so the export wrapper replaces it with
an exact matmul equivalent (PyTorch adaptive-pool bin semantics:
bin i averages rows floor(i*H/K)..ceil((i+1)*H/K)); the W axis (20->1,
a factor) becomes mean(-1). Exactness is proven by the parity check
below, which compares against the ORIGINAL torch model with the real
AdaptiveAvgPool2d.
3. Torch-vs-ORT parity on the stored fixture input
(results/parity_fixture.npz, batch 2, seed 42 -- same 540x20 input layout;
reference output recomputed with the tiny torch model). PASS < 1e-4.
4. Static QDQ conv-only int8 (quant_pre_process + quantize_static,
per-channel QInt8 weights+activations, Percentile(99.99) calibration on
512 corruption-free TRAIN-split windows -- the winning recipe and
calibration count from static_ptq_bench.py. 512, not "about 500":
ORT 1.26's histogram collector np.asarray()'s the per-batch maxima, so
the calibration count must be a multiple of the batch size 64 or the
ragged last batch crashes it).
5. Disk size + CPU latency b1/b64 (3 interleaved reps, median ms/window)
for tiny fp32 + tiny int8, with the full-model ONNX fp32 + static-int8
sessions interleaved as same-session references.
6. Accuracy (PCK@20/50 + MPJPE) on the identical 10k-window seed-42
corruption-free test subset for tiny fp32 + tiny int8.
Usage:
PYTHONUTF8=1 .venv/Scripts/python.exe tiny_edge_bench.py \
[--data-dir <preprocessed_csi_data>] [--subset 10000] [--calib 512]
(--calib must be a multiple of 64; see step 4 above)
Writes/merges into results/edge_optimization.json under key "tiny_variant".
"""
import argparse
import json
import os
import platform
import sys
import time
import numpy as np
import torch
HERE = os.path.dirname(os.path.abspath(__file__))
RESULTS = os.path.join(HERE, "results")
sys.path.insert(0, HERE)
sys.path.insert(0, os.path.join(HERE, "remote", "sweep"))
# quantize_bench sets up upstream imports + the np.load mmap patch
from quantize_bench import build_test_subset # noqa: E402
from eval_ort_accuracy import evaluate_ort # noqa: E402
from static_ptq_bench import ( # noqa: E402
build_calibration_windows,
interleaved_latency,
make_reader,
ort_session,
)
from model_compact import CompactWiFlowPoseModel, describe # noqa: E402
TINY_CKPT = os.path.join(RESULTS, "tiny_best.pth")
TINY_FP32_ONNX = os.path.join(RESULTS, "tiny_fp32_dynamic.onnx")
TINY_PREPROC_ONNX = os.path.join(RESULTS, "tiny_fp32_preproc.onnx")
TINY_INT8_ONNX = os.path.join(RESULTS, "tiny_int8_static_percentile_conv.onnx")
FULL_FP32_ONNX = os.path.join(RESULTS, "retrained_fp32_dynamic.onnx")
FULL_INT8_ONNX = os.path.join(RESULTS, "retrained_int8_static_percentile_conv.onnx")
# Exact tiny config from remote/sweep/run_sweep.py VARIANTS (measured 56,290
# params, clean-test PCK@20 94.11% -- results/efficiency_sweep.jsonl).
TINY = dict(tcn=[68, 56, 44, 32], conv=[2, 4, 8, 16], attn_groups=2,
groups_mode="depthwise", input_pw_groups=4)
def load_tiny_model():
model = CompactWiFlowPoseModel(
tcn_channels=TINY["tcn"], conv_channels=TINY["conv"],
attn_groups=TINY["attn_groups"], groups_mode=TINY["groups_mode"],
input_pw_groups=TINY["input_pw_groups"], dropout=0.5)
state = torch.load(TINY_CKPT, map_location="cpu", weights_only=True)
model.load_state_dict(state, strict=True)
model.eval()
return model
def adaptive_pool_matrix(h_in, h_out):
"""Exact AdaptiveAvgPool1d as a (h_out, h_in) averaging matrix, using
PyTorch's bin rule: bin i covers rows floor(i*h_in/h_out) ..
ceil((i+1)*h_in/h_out)."""
w = torch.zeros(h_out, h_in)
for i in range(h_out):
s = (i * h_in) // h_out
e = -((-(i + 1) * h_in) // h_out) # ceil division
w[i, s:e] = 1.0 / (e - s)
return w
class ExportWrapper(torch.nn.Module):
"""CompactWiFlowPoseModel forward with the AdaptiveAvgPool2d((K,1))
replaced by an exact fixed linear map (mean over the factor W axis, then
a constant averaging matmul over the non-factor H axis) so the
TorchScript ONNX exporter accepts it. Bit-equivalent up to float
round-off; proven by the parity check against the original model."""
def __init__(self, m, num_keypoints=15):
super().__init__()
self.m = m
self.register_buffer(
"pool_w_t", adaptive_pool_matrix(m.final_width, num_keypoints).t())
def forward(self, x):
m = self.m
x = m.tcn(x)
x = x.transpose(1, 2).unsqueeze(1)
x = m.up(x)
for block in m.residual_blocks:
x = block(x)
x = x.permute(0, 1, 3, 2)
x = m.attention(x)
x = m.decoder(x) # [B, 2, H=final_width, T=20]
x = x.mean(-1) # W-axis pool (20 -> 1, a factor)
x = x.matmul(self.pool_w_t) # exact adaptive H pool: [B, 2, K]
return x.transpose(1, 2) # [B, K, 2]
def export_onnx(model):
"""Dynamic-batch TorchScript export (the recipe that worked for the full
model in onnx_bench.py), verified at batch 1/2/64. Uses ExportWrapper
(see docstring) because final_width 16 is not a multiple of 15."""
wrapper = ExportWrapper(model).eval()
x = torch.rand(2, 540, 20)
with torch.no_grad():
torch.onnx.export(
wrapper, (x,), TINY_FP32_ONNX, opset_version=17,
input_names=["input"], output_names=["output"], dynamo=False,
dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}})
sess = ort_session(TINY_FP32_ONNX)
inp = sess.get_inputs()[0].name
for b in (1, 2, 64):
y = sess.run(None, {inp: np.zeros((b, 540, 20), dtype=np.float32)})[0]
assert y.shape == (b, 15, 2), y.shape
return {
"mode": "dynamic-batch", "exporter": "torchscript", "opset": 17,
"file": os.path.basename(TINY_FP32_ONNX),
"size_bytes": os.path.getsize(TINY_FP32_ONNX),
"size_mb": os.path.getsize(TINY_FP32_ONNX) / 1e6,
"verified_batches": [1, 2, 64],
"note": "AdaptiveAvgPool2d((15,1)) replaced at export by an exact "
"mean(-1) + constant averaging matmul (final_width 16 is not "
"a multiple of 15, which the TorchScript exporter rejects); "
"exactness proven by the parity check vs the original torch "
"model",
}
def quantize_tiny(calib_windows):
"""quant_pre_process + static QDQ conv-only Percentile(99.99) int8 --
the winning recipe from static_ptq_bench.py."""
from onnxruntime.quantization import (CalibrationMethod, QuantFormat,
QuantType, quantize_static)
from onnxruntime.quantization.shape_inference import quant_pre_process
quant_pre_process(TINY_FP32_ONNX, TINY_PREPROC_ONNX)
t0 = time.time()
quantize_static(
TINY_PREPROC_ONNX, TINY_INT8_ONNX, make_reader(calib_windows),
quant_format=QuantFormat.QDQ,
op_types_to_quantize=["Conv"],
per_channel=True,
activation_type=QuantType.QInt8,
weight_type=QuantType.QInt8,
calibrate_method=CalibrationMethod.Percentile,
extra_options={"CalibPercentile": 99.99},
)
return {
"file": os.path.basename(TINY_INT8_ONNX),
"size_bytes": os.path.getsize(TINY_INT8_ONNX),
"size_mb": os.path.getsize(TINY_INT8_ONNX) / 1e6,
"calibration": {"method": "percentile", "percentile": 99.99,
"windows": int(len(calib_windows)),
"scope": "conv-only TRAIN-split corruption-free",
"seconds": time.time() - t0},
"per_channel": True,
"activation_type": "QInt8",
"weight_type": "QInt8",
}
def main():
import onnxruntime
parser = argparse.ArgumentParser()
parser.add_argument("--data-dir", default=os.path.join(
os.path.expanduser("~"), ".cache", "kagglehub", "datasets", "kaka2434",
"wiflow-dataset", "versions", "1", "preprocessed_csi_data"))
parser.add_argument("--subset", type=int, default=10000)
parser.add_argument("--calib", type=int, default=512,
help="calibration windows; must be a multiple of the "
"64-window calibration batch (ORT histogram "
"collector rejects ragged batches)")
parser.add_argument("--skip-accuracy", action="store_true")
parser.add_argument("--out", default=os.path.join(RESULTS, "edge_optimization.json"))
args = parser.parse_args()
if args.calib % 64 != 0:
parser.error(
f"--calib must be a multiple of 64 (got {args.calib}): ORT 1.26's "
f"histogram calibration collector np.asarray()'s the per-batch "
f"maxima and crashes on a ragged final batch (calibration batch "
f"size is 64)")
model = load_tiny_model()
info = describe(model)
print(f"tiny model: {info['params']:,} params, tcn_groups={info['tcn_groups_per_block']}, "
f"strides={info['conv_strides']}, final_width={info['final_width']}")
assert info["params"] == 56290, info["params"]
results = {
"env": {
"torch": torch.__version__,
"onnxruntime": onnxruntime.__version__,
"platform": platform.platform(),
"num_threads": torch.get_num_threads(),
"checkpoint": os.path.relpath(TINY_CKPT, HERE),
"checkpoint_size_bytes": os.path.getsize(TINY_CKPT),
"params": info["params"],
"variant_config": TINY,
},
}
# ---- export + parity ----------------------------------------------------
print("\n=== ONNX export (dynamic batch, opset 17, torchscript) ===")
results["export"] = export_onnx(model)
print(f" {results['export']['size_mb']:.3f} MB, batches {results['export']['verified_batches']} OK")
fixture = np.load(os.path.join(RESULTS, "parity_fixture.npz"))
fx = fixture["input"] # (2, 540, 20), seed 42 -- same input layout as full model
sess_fp32 = ort_session(TINY_FP32_ONNX)
y_ort = sess_fp32.run(None, {sess_fp32.get_inputs()[0].name: fx})[0]
with torch.no_grad():
y_torch = model(torch.from_numpy(fx)).numpy()
results["parity"] = {
"fixture": "results/parity_fixture.npz input (batch 2, seed 42); "
"reference output recomputed with the tiny torch model",
"max_abs_diff_vs_torch": float(np.abs(y_ort - y_torch).max()),
"pass_lt_1e-4": bool(np.abs(y_ort - y_torch).max() < 1e-4),
}
print("parity:", json.dumps(results["parity"], indent=2))
assert results["parity"]["pass_lt_1e-4"], "torch-vs-ORT parity FAILED"
# ---- static PTQ int8 ------------------------------------------------------
print(f"\n=== static QDQ int8 (Percentile conv-only, {args.calib} calib windows) ===")
calib = build_calibration_windows(args.data_dir, args.calib)
results["int8_static_percentile_conv"] = quantize_tiny(calib)
print(f" {results['int8_static_percentile_conv']['size_mb']:.3f} MB")
sess_int8 = ort_session(TINY_INT8_ONNX)
yq = sess_int8.run(None, {sess_int8.get_inputs()[0].name: fx})[0]
results["int8_static_percentile_conv"]["max_abs_diff_vs_fp32_fixture"] = float(
np.abs(yq - y_torch).max())
# ---- latency (3 interleaved reps, full-model sessions as references) -----
print("\n=== latency (3 interleaved reps) ===")
lat_sessions = {
"tiny_onnx_fp32": sess_fp32,
"tiny_onnx_int8_static_percentile_conv": sess_int8,
"full_onnx_fp32_reference": ort_session(FULL_FP32_ONNX),
"full_onnx_int8_static_percentile_conv_reference": ort_session(FULL_INT8_ONNX),
}
results["latency"] = {
"note": "3 interleaved repetitions per variant, median ms/window; "
"full-model sessions are same-session references",
**interleaved_latency(lat_sessions),
}
# ---- accuracy on the standard 10k corruption-free test subset ------------
if not args.skip_accuracy:
loader, n_clean = build_test_subset(args.data_dir, args.subset)
results["accuracy_subset"] = {
"description": "seed-42 file-level 70/15/15 test split, corrupted "
"windows excluded, seed-42 random subset (same as "
"quantize_bench/eval_ort_accuracy/static_ptq_bench)",
"subset_size": min(args.subset, n_clean) if args.subset else n_clean,
}
results["accuracy"] = {}
for name, sess in (("tiny_onnx_fp32", sess_fp32),
("tiny_onnx_int8_static_percentile_conv", sess_int8)):
print(f"\n=== accuracy: {name} ===")
results["accuracy"][name] = evaluate_ort(sess, loader, name)
print(json.dumps(results["accuracy"][name], indent=2))
# ---- merge into edge_optimization.json -----------------------------------
merged = {}
if os.path.exists(args.out):
with open(args.out) as f:
merged = json.load(f)
merged["tiny_variant"] = results
with open(args.out, "w") as f:
json.dump(merged, f, indent=2)
print(f"\nwrote {args.out}")
if __name__ == "__main__":
main()
-7
View File
@@ -14,13 +14,6 @@ COPY v2/crates/ ./crates/
# Copy vendored RuVector crates
COPY vendor/ruvector/ /build/vendor/ruvector/
# Copy vendored RuField submodule — the `wifi-densepose-rufield` bridge crate
# (ADR-262) path-deps `../../../vendor/rufield/crates/*`, which from the Docker
# build layout (v2/ collapsed into /build) resolves to /vendor/rufield. Copy the
# whole tree so the rufield workspace Cargo.toml (workspace-dep inheritance) and
# the four bridged crates (rufield-core/-provenance/-privacy/-fusion) all resolve.
COPY vendor/rufield/ /vendor/rufield/
# Build release binaries:
# - sensing-server with `mqtt` feature so the HA-DISCO MQTT publisher
# (ADR-115) is wired in (auto-discovery topics flow to Home Assistant)
+2 -5
View File
@@ -24,13 +24,10 @@ services:
environment:
- RUST_LOG=info
# CSI_SOURCE controls the data source for the sensing server.
# Options: auto (default) — probe for ESP32 UDP then host WiFi; **fail
# hard with exit 78 if neither is detected**.
# Synthetic data is no longer a silent fallback
# (issue #937 fix) — operators must opt in.
# Options: auto (default) — probe for ESP32 UDP then fall back to simulation
# esp32 — receive real CSI frames from an ESP32 on UDP port 5005
# wifi — use host Wi-Fi RSSI/scan data (Windows netsh)
# simulated — explicitly generate synthetic CSI for demo mode
# simulated — generate synthetic CSI data (no hardware required)
- CSI_SOURCE=${CSI_SOURCE:-auto}
# MODELS_DIR controls where the server scans for .rvf model files.
# Mount a host directory and set this to make models visible:
+2 -57
View File
@@ -11,65 +11,10 @@
# docker run ruvnet/wifi-densepose:latest --model /app/models/my.rvf
#
# Environment variables:
# CSI_SOURCE — data source. Valid values:
# auto — try ESP32 then Windows WiFi, **fail-loud if no
# real hardware is detected** (issue #937 fix:
# the server no longer silently falls back to
# synthetic data — that's now opt-in only).
# esp32 — listen for UDP CSI on the configured port.
# wifi — Windows-native WiFi capture.
# simulated — explicit demo mode with synthetic CSI.
# Default is `auto`. Set CSI_SOURCE=simulated when you want
# fake data tagged as such; never set it implicitly.
# CSI_SOURCE — data source: auto (default), esp32, wifi, simulated
# MODELS_DIR — directory to scan for .rvf model files (default: data/models)
set -e
# ── Issue #864: fail-closed on default posture ───────────────────────────────
# The pre-fix default was: empty RUVIEW_API_TOKEN (auth off) + --bind-addr
# 0.0.0.0 + docker-compose publishing :3000/:3001/:5005 → an unauthenticated
# attacker on any reachable network segment could read /api/v1/sensing/latest
# and the /ws/sensing live stream. That posture is unsafe on guest WiFi,
# untrusted LANs, accidentally-port-forwarded hosts, or any reverse-proxied
# deployment. Refuse to start with this combination.
#
# Escape hatches (operator must opt in explicitly):
# * Set RUVIEW_API_TOKEN to a strong secret → auth enabled on /api/v1/*.
# * Set RUVIEW_ALLOW_UNAUTHENTICATED=1 → preserves the pre-fix behaviour;
# only safe on an isolated trust boundary.
# * Set RUVIEW_BIND_ADDR to a loopback / private interface → unauth is fine
# when the socket isn't reachable. The auto-bind nudges toward 127.0.0.1.
#
# This check runs only for the default sensing-server path (no args + flag-only
# args). The `cog-ha-matter` / `homecore` routes below are excluded because
# they own their own auth lifecycle.
case "${1:-}" in
cog-ha-matter|ha-matter|homecore|homecore-server) ;;
*)
if [ -z "${RUVIEW_API_TOKEN:-}" ] && [ "${RUVIEW_ALLOW_UNAUTHENTICATED:-}" != "1" ]; then
# If the operator hasn't overridden the bind, refuse outright on
# the default 0.0.0.0. If they've nailed it to loopback (or a
# specific private address they trust), let it run.
__bind_default="${RUVIEW_BIND_ADDR:-0.0.0.0}"
case "$__bind_default" in
127.*|localhost|::1)
: ;; # loopback bind is safe even without a token
*)
echo "[entrypoint] ERROR: refusing to start sensing-server with default" >&2
echo "[entrypoint] posture: RUVIEW_API_TOKEN is unset AND bind is" >&2
echo "[entrypoint] ${__bind_default}. /ws/sensing streams live sensing" >&2
echo "[entrypoint] frames; that data would be readable by anyone who" >&2
echo "[entrypoint] can reach this host. Pick one:" >&2
echo "[entrypoint] docker run -e RUVIEW_API_TOKEN=\$(openssl rand -hex 32) ..." >&2
echo "[entrypoint] docker run -e RUVIEW_BIND_ADDR=127.0.0.1 ..." >&2
echo "[entrypoint] docker run -e RUVIEW_ALLOW_UNAUTHENTICATED=1 ... # only on trusted network" >&2
echo "[entrypoint] See https://github.com/ruvnet/RuView/issues/864" >&2
exit 64
;;
esac
fi
;;
esac
# Route to cog-ha-matter (ADR-116) when invoked as:
# docker run <image> cog-ha-matter [--flags]
# or via the short alias `ha-matter`. Strips the keyword and execs the
@@ -103,7 +48,7 @@ if [ "${1#-}" != "$1" ] || [ -z "$1" ]; then
--ui-path /app/ui \
--http-port 3000 \
--ws-port 3001 \
--bind-addr "${RUVIEW_BIND_ADDR:-0.0.0.0}" \
--bind-addr 0.0.0.0 \
"$@"
fi
+1 -1
View File
@@ -57,7 +57,7 @@ This witness separates what was **empirically observed on real silicon today** f
| # | Claim | Why it's not verified |
|---|---|---|
| **B1** | "Wi-Fi 6 HE-LTF: 242 subcarriers per HE20 frame" | The only AP in range (`ruv.net`) is 11n-only. Every captured frame is 128 bytes = 64 subcarriers (HT-LTF, `ppdu_type=0`). No HE-SU/HE-MU/HE-TB observed. Even if an 11ax AP were available, **whether ESP-IDF v5.4's CSI callback exposes HE-LTF subcarriers via `wifi_csi_info_t.buf` is an open question** — the public API was designed for HT-LTF, and the driver may quietly downconvert. **Validate by capturing CSI against an 11ax AP and comparing `info->len` between HT and HE frames.**<br><br>**RESOLVED WITH MEASUREMENT (2026-06-11, external — issue #1005, production deployment by @stuinfla):** the open question is answered in both directions. **IDF v5.4's driver blob downconverts** (148 B / 64-subcarrier HT frames, PPDU byte 0x00, on a confirmed-HE link); **IDF v5.5.2 delivers true HE-LTF** — 532 B frames = 256 bins (242 active HE20 tones), PPDU byte 0x01 (HE-SU), ~90% of frames, same board/AP/link. Setup: XIAO ESP32-C6 → hostapd on Intel AX210, 2.4 GHz ch 6, `ieee80211ax=1`. No firmware change required (`acquire_csi_su=1` was already set); the gate was purely the IDF driver version. Three C6 nodes ran this mode simultaneously with ADR-110 ESP-NOW sync. Requires the issue-#1005 version-guard fix in `c6_sync_espnow.c` to build on v5.5.x. |<br><br>**REPLICATED IN-HOUSE (2026-06-11):** same source + fix, fresh IDF v5.5.2 toolchain, original COM12 board (`20:6e:f1:17:00:84`), AP `ruv.net` (11ax 2.4 GHz): **84% of 1,525 captured frames at 532 B / PPDU 0x01 (HE-SU)**, HT minority 148 B / 0x00. Evidence grade: MEASURED (two independent rigs). |
| **B1** | "Wi-Fi 6 HE-LTF: 242 subcarriers per HE20 frame" | The only AP in range (`ruv.net`) is 11n-only. Every captured frame is 128 bytes = 64 subcarriers (HT-LTF, `ppdu_type=0`). No HE-SU/HE-MU/HE-TB observed. Even if an 11ax AP were available, **whether ESP-IDF v5.4's CSI callback exposes HE-LTF subcarriers via `wifi_csi_info_t.buf` is an open question** — the public API was designed for HT-LTF, and the driver may quietly downconvert. **Validate by capturing CSI against an 11ax AP and comparing `info->len` between HT and HE frames.** |
| **B2** | "TWT-bounded deterministic CSI cadence (10 ms wake)" | No 11ax AP in range. The TWT setup *call* was exercised live and the graceful fallback path is now correct (A9), but the agreement itself was never accepted. **Validate by associating with an 11ax AP that has TWT Responder=1, then capturing the timestamped CSI cadence vs the wall clock.** |
| **B3** | "±100 µs cross-node alignment over 802.15.4" | 3 boards initialized their radios with correct EUIs (A4/A5), but **none stepped down from candidate-leader to follower** during repeated 35-second multi-board captures. <br><br>**Coex hypothesis REJECTED**: rebuilt + reflashed all 3 boards with `CONFIG_C6_TIMESYNC_CHANNEL=26` (2480 MHz, non-overlapping with WiFi ch 5 at 2432 MHz). Result identical: 3× candidate, 0× "stepping down". So 2.4 GHz radio coex was NOT the cause. <br><br>**Current leading hypothesis**: OpenThread (CONFIG_OPENTHREAD_ENABLED=y) owns the 802.15.4 radio when its stack is initialized — our weak-symbol overrides of `esp_ieee802154_receive_done` / `_transmit_done` may never be called because OpenThread registers strong handlers. Validation in progress: rebuilding with `CONFIG_OPENTHREAD_ENABLED=n` (raw 802.15.4 only, our beacon protocol is private — no need for the Thread stack). If leader election fires under raw-15.4-only, hypothesis confirmed. <br><br>If raw-only also fails, next move is to dump the actual PHY frame bytes via the IEEE 802.15.4 sniffer mode on a 4th board and diagnose at the frame level. |
| **B4** | "~5 µA hibernation for battery seed nodes" | No INA / Joulescope current measurement available on this bench. The shipped code uses `esp_deep_sleep_enable_gpio_wakeup` (ext1 path, ESP-IDF default ~10 µA), not a true LP-core polling program. The 5 µA number is the C6 datasheet figure for ULP-level hibernation, not a measured value. **Validate by hooking an INA219/INA226 between the dev board's 3V3 rail and the regulator output, then averaging current over a 60-second cycle with the LP-core armed.** |
@@ -1081,23 +1081,6 @@ The `wifi-densepose-vitals` crate (ESP32 CSI-grade vital signs) has not yet been
- SONA-based environment adaptation
- VitalSignStore with tiered temporal compression
## Implementation Notes
### 2026-06 — ESP32 edge vitals: person-count over-count + presence flicker (#998, #996)
Two robustness bugs were fixed in the on-device edge path (`firmware/esp32-csi-node/main/edge_processing.c`, the ADR-039 packet `0xC5110002`). These touch the *boolean/count emission logic*, not the underlying CSI signal-processing math, and do **not** constitute a validated-accuracy claim — true occupancy-count and presence accuracy vs labelled ground truth remain hardware/data-gated (COM9 ESP32-S3 + labelled capture).
- **#998 `n_persons` over-count (reported 4 for one person).** `update_multi_person_vitals()` divided the top-K subcarriers into `top_k_count/2` groups and marked *every* group `active`, so one body's multipath always read the full `EDGE_MAX_PERSONS`. Added an energy gate (`EDGE_PERSON_MIN_ENERGY_RATIO`), spatial dedup (`EDGE_PERSON_MIN_SC_SEP`), and a persistence debounce (`EDGE_PERSON_PERSIST_FRAMES`) via two pure functions `count_distinct_persons()` / `person_count_debounce()`.
- **#996 presence flag flicker at ~50 cm.** Single-threshold compare on a noisy `presence_score` chattered at the boundary. Replaced with a Schmitt trigger + clear-debounce (`presence_flag_update()`, constants `EDGE_PRESENCE_HYST_RATIO` / `EDGE_PRESENCE_CLEAR_FRAMES`); `presence_score` is unchanged and still emitted for consumer-side thresholding.
Both are pinned by host-buildable C99 tests in `firmware/esp32-csi-node/test/test_vitals_count_presence.c` (`make run_vitals`). The exact thresholds are documented constants pending on-device calibration against ground truth.
### 2026-06 — Rust `wifi-densepose-vitals`: IIR filter NaN/inf self-heal (ADR-158 §A1)
A correctness/safety review of the Rust extraction crate found a real bug parallel to the firmware robustness class above. The 2nd-order resonator `bandpass_filter` in both `breathing.rs` and `heartrate.rs` latches each output `y[n]` into its filter state (`y1`/`y2`). A single non-finite amplitude residual from a corrupt CSI frame produced a NaN `output` that was written into the state; the existing `extract()` `is_finite()` guard dropped that one sample from the history buffer **but never sanitized the poisoned filter state**, so every later output stayed NaN, was rejected too, and the sliding-window history never refilled — breathing **and** heart-rate extraction went silently dead (returning `None` forever) until `reset()`. On the alert path this is a safety-relevant denial of service (one bad frame stops vitals monitoring with no error surfaced).
Fix: when `bandpass_filter` computes a non-finite `output`, it resets the IIR state to default and returns `0.0`, so the resonator self-heals on the next clean frame (the `0.0` is still dropped by the caller's finite-check, so no spurious sample enters history). Same shape as the calibration NaN bug (ADR-154 §3) — the prior hardening guarded the *history boundary* but not the *filter-state boundary*. Pinned by `breathing::tests::nan_frame_does_not_permanently_poison_filter`, `breathing::tests::inf_mid_stream_does_not_freeze_history`, and `heartrate::tests::nan_frame_does_not_permanently_poison_filter` (all FAIL pre-fix, verified by reverting). The review also de-magicked the HR physiological plausibility band into named `HR_PLAUSIBLE_MIN_BPM`/`HR_PLAUSIBLE_MAX_BPM` consts (value-identical 40/180 BPM) and added a fabricated-vital negative (`pure_noise_is_never_reported_valid` — broadband noise never yields a clinically `Valid` HR; the extractor honestly returns low-confidence `Unreliable`). Clean dimensions confirmed with evidence: flat/silent input → `None`; pure noise → low-confidence `Unreliable`, never `Valid`; harmonic-rich breathing with no cardiac component → low-confidence, not a confident false HR; out-of-band BPM rejected by the plausibility clamp.
## References
- Ramsauer et al. (2020). "Hopfield Networks is All You Need." ICLR 2021. (ModernHopfield formulation)
@@ -1,4 +1,4 @@
# ADR-166: Quality Engineering Response — Security Hardening & Code Quality
# ADR-050: Quality Engineering Response — Security Hardening & Code Quality
| Field | Value |
|-------|-------|
@@ -1,8 +1,4 @@
# ADR-167 Appendix: DDD Bounded Contexts — Tauri Desktop Frontend
> Appendix to [ADR-052](ADR-052-tauri-desktop-frontend.md). Renumbered from ADR-052
> to ADR-167 to resolve the ADR-052 duplicate-number collision (per ADR-164 Gap Register
> G1); the parent decision remains ADR-052.
# ADR-052 Appendix: DDD Bounded Contexts — Tauri Desktop Frontend
This document maps out the domain model for the RuView Tauri desktop application
described in ADR-052. It defines bounded contexts, their aggregates, entities,
@@ -162,7 +158,7 @@ Represents an over-the-air firmware update to a running node.
| `target_node` | `MacAddress` | Target node MAC |
| `target_ip` | `IpAddr` | Target node IP |
| `firmware` | `FirmwareBinary` | The binary being pushed |
| `psk` | `Option<SecureString>` | PSK for authentication (ADR-166) |
| `psk` | `Option<SecureString>` | PSK for authentication (ADR-050) |
| `phase` | `OtaPhase` | Uploading / Rebooting / Verifying / Done / Failed |
| `progress` | `Progress` | Upload progress |
+3 -3
View File
@@ -5,7 +5,7 @@
| Status | Proposed |
| Date | 2026-03-06 |
| Deciders | ruv |
| Depends on | ADR-012 (ESP32 CSI Mesh), ADR-039 (Edge Intelligence), ADR-040 (WASM Programmable Sensing), ADR-044 (Provisioning Enhancements), ADR-166 (Security Hardening, renumbered from ADR-050), ADR-051 (Server Decomposition) |
| Depends on | ADR-012 (ESP32 CSI Mesh), ADR-039 (Edge Intelligence), ADR-040 (WASM Programmable Sensing), ADR-044 (Provisioning Enhancements), ADR-050 (Security Hardening), ADR-051 (Server Decomposition) |
| Issue | [#177](https://github.com/ruvnet/RuView/issues/177) |
## Context
@@ -211,7 +211,7 @@ pub struct FlashProgress {
// commands/ota.rs
/// Push firmware to a node via HTTP OTA (port 8032).
/// Includes PSK authentication per ADR-166.
/// Includes PSK authentication per ADR-050.
#[tauri::command]
async fn ota_update(
node_ip: String,
@@ -801,7 +801,7 @@ Total estimated effort: ~11 weeks for a single developer.
- ADR-039: ESP32 Edge Intelligence
- ADR-040: WASM Programmable Sensing
- ADR-044: Provisioning Tool Enhancements
- ADR-166: Quality Engineering — Security Hardening (renumbered from ADR-050)
- ADR-050: Quality Engineering — Security Hardening
- ADR-051: Sensing Server Decomposition
- `firmware/esp32-csi-node/` — ESP32 firmware source
- `firmware/esp32-csi-node/provision.py` — Current provisioning script
+11 -24
View File
@@ -1,6 +1,6 @@
# ADR-080: QE Analysis Remediation Plan
- **Status:** Proposed — P0 security findings #1#3 **RESOLVED** on the shipped Rust sensing-server boundary (2026-06-13; closes ADR-164 G11)
- **Status:** Proposed
- **Date:** 2026-04-06
- **Source:** [QE Analysis Gist (2026-04-05)](https://gist.github.com/proffesor-for-testing/a6b84d7a4e26b7bbef0cf12f932925b7)
- **Full Reports:** [proffesor-for-testing/RuView `qe-reports` branch](https://github.com/proffesor-for-testing/RuView/tree/qe-reports/docs/qe-reports)
@@ -13,38 +13,25 @@ An 8-agent QE swarm analyzed ~305K lines across Rust, Python, C firmware, and Ty
Address the 15 prioritized issues from the QE analysis in three waves: P0 (immediate), P1 (this sprint), P2 (this quarter).
## Security P0 closure note (2026-06-13) — Rust sensing-server boundary
The three P0 security findings below were logged against the **Python v1** API
(`archive/v1/src/…`). ADR-164 G11 re-scoped them to the *shipped* boundary:
`wifi-densepose-sensing-server` (Rust). They were verified against the current
Rust crate and closed on branch `fix/adr-080-sensing-server-security`. Each fix
(or already-fixed finding) is pinned by a test that fails on the old behavior.
**The Python v1 paths remain as-is** — v1 is archived and not the shipped
surface; this closure governs the live Rust server only.
## P0 — Fix Immediately
### 1. Rate Limiter Bypass / XFF spoofing (Security HIGH) — **RESOLVED (verified absent on Rust boundary)**
### 1. Rate Limiter Bypass (Security HIGH)
- **Original location (v1):** `archive/v1/src/middleware/rate_limit.py:200-206`
- **Location:** `archive/v1/src/middleware/rate_limit.py:200-206`
- **Problem:** Trusts `X-Forwarded-For` without validation. Any client bypasses rate limits via header spoofing.
- **Rust verification (2026-06-13):** The Rust sensing-server has **no XFF-trusting control to bypass** — there is no IP-based rate-limiter and no IP-allowlist, and neither security middleware reads a forwarded header. `bearer_auth.rs` authenticates on the token alone (`require_bearer` inspects only the `AUTHORIZATION` header); `host_validation.rs` decides on the `Host` header only. A repo-wide grep for `x-forwarded-for|forwarded|peer_addr|client_ip|real-ip` over `wifi-densepose-sensing-server` returns nothing. The only "rate limiter" is the MQTT *sample-rate* gate (`mqtt/state.rs`), a per-entity publish throttle with no IP/header input.
- **Resolution:** No code change needed (no vulnerable surface). Regression tests pin the immunity: `bearer_auth::tests::xff_header_never_affects_auth_decision` (spoofed XFF never flips a 401↔200 decision) and `host_validation::tests::forwarded_headers_never_bypass_host_allowlist` (spoofed `X-Forwarded-Host: localhost` never lets a foreign `Host: evil.com` past the allowlist). Residual: if an IP-based control is ever added, it must derive the peer from the socket (`ConnectInfo<SocketAddr>`) and only honor XFF from an explicit `--trusted-proxy` CIDR — captured as guidance in the test docstrings.
- **Fix:** Validate forwarded headers against trusted proxy list, or use connection IP directly.
### 2. Exception Details Leaked in Responses (Security HIGH, CWE-209) — **RESOLVED**
### 2. Exception Details Leaked in Responses (Security HIGH)
- **Original location (v1):** `archive/v1/src/api/routers/pose.py:140`, `stream.py:297`, +5 endpoints
- **Problem:** Internal error/stack-trace detail serialized into client responses.
- **Rust finding (2026-06-13):** Six handlers in `wifi-densepose-sensing-server/src/main.rs` serialized the internal error `Display` into the JSON body: `edge_registry_endpoint` returned a panicked `spawn_blocking` `JoinError` (`"task … panicked"`) in a `500` and the raw upstream error in a `503`; `delete_model`/`delete_recording`/`start_recording` returned `std::io::Error` strings (OS detail / path); `calibration_start`/`calibration_stop` returned the `FieldModel` error chain.
- **Fix:** New `src/error_response.rs` module — `internal_error` / `internal_error_json` / `upstream_unavailable` log the full detail **server-side only** (tagged with a correlation id) and return a generic body (`{"error":"internal_error","correlation_id":…}`) with no `panicked`, no file paths, no Debug chain. All six call-sites rewired. Pinned by `error_response::tests::internal_error_body_does_not_leak_detail` (leak-substring guard, verified to fail on the reverted old body) + 4 sibling tests.
- **Location:** `archive/v1/src/api/routers/pose.py:140`, `stream.py:297`, +5 endpoints
- **Problem:** Stack traces visible regardless of environment.
- **Fix:** Wrap with generic error responses in production; log details server-side only.
### 3. WebSocket JWT in URL (Security HIGH, CWE-598) — **RESOLVED (verified absent on Rust boundary)**
### 3. WebSocket JWT in URL (Security HIGH, CWE-598)
- **Original location (v1):** `archive/v1/src/api/routers/stream.py:74`, `archive/v1/src/middleware/auth.py:243`
- **Location:** `archive/v1/src/api/routers/stream.py:74`, `archive/v1/src/middleware/auth.py:243`
- **Problem:** Tokens in query strings visible in logs/proxies/browser history.
- **Rust verification (2026-06-13):** The Rust sensing-server never reads a token from the URL. `require_bearer` (`bearer_auth.rs`) inspects only the `Authorization` header; the WebSocket handlers (`ws_sensing_handler`/`ws_introspection_handler`/`ws_pose_handler`) take a bare `WebSocketUpgrade` with no `Query` extractor; the single `Query` in the crate (`EdgeRegistryParams`) is a non-secret `refresh` flag.
- **Resolution:** No code change needed (no query-token path exists). Regression test `bearer_auth::tests::query_string_token_is_never_accepted` proves `?token=`/`?access_token=` in the URL never authenticates (stays `401`) while the same token in the header succeeds (`200`) — verified to fail if a query-token path is re-introduced.
- **Fix:** Use WebSocket subprotocol or first-message auth pattern.
### 4. Rust Tests Not in CI
+5 -66
View File
@@ -259,75 +259,14 @@ Validation runs against:
- **ADR-083** (Proposed) — Per-cluster Pi compute hop. Defines the
device class that hosts the sketch bank.
## Pass 2 — randomized rotation + multi-bit (ADR-156 §8, landed 2026-06)
The "Open question" below ("does `BinaryQuantized` need a randomized
rotation pre-pass?") is now **answered with measured numbers** via
ADR-156 §10. Summary:
- **Pass 2 (randomized rotation) is implemented** —
`crates/wifi-densepose-ruvector/src/rotation.rs`: a deterministic
`R = H·D` (Fast Hadamard Transform + seeded ±1 sign flips), `O(d log d)`
/ `O(d)`, norm-preserving, reproducible from a stored `u64` seed. Opt-in
via `Sketch::from_embedding_rotated` / `SketchBank::with_rotation`;
Pass-1 API and wire format unchanged.
- **Measured top-K coverage** (anisotropic planted-cluster fixture,
cosine ground truth, dim=128 N=2048 K=8): rotation lifts coverage
**36.13% → 46.39%** at the strict `candidate_k = K` bar, and Pass-2
reaches the **≥90% acceptance bar at candidate_k = 24 (~3× over-fetch)**.
Multi-bit (≤4-bit) reaches 74% at the strict bar. **Honest verdict:
neither rotation nor ≤4-bit multi-bit clears the strict-K 90% bar on
this distribution; the bar is met via the over-fetch "candidate set"
pattern this ADR specifies** (Decision §"the canonical pattern" — sketch
picks the candidate set, full precision refines). Full numbers and
reproduce commands in ADR-156 §10.
- **Pre-existing `SketchBank::topk` bug fixed** — the `n > k` heap path
returned the k *farthest* sketches (min-heap mistaken for max-heap);
only the `n ≤ k` fast path had test coverage. Fixed + regression-pinned
(`topk_heap_path_returns_nearest`,
`tight_clusters_give_high_coverage_with_overfetch`). This makes every
prior top-K acceptance number in this ADR depend on the fixed path; the
≥90% coverage criterion is only meaningful post-fix.
## Pass 2b — RaBitQ unbiased distance estimator (ADR-156 §11, landed 2026-06)
The **real** RaBitQ contribution (Gao & Long, SIGMOD 2024) — an
**unbiased estimator of the inner product / distance** from the 1-bit
code + per-vector side info, not just sign bits — is now implemented and
**MEASURED against this ADR's ≥90% strict-K bar**:
- **Implemented** — `crates/wifi-densepose-ruvector/src/estimator.rs`:
`EstimatorSketch` (Pass-2 sign code + 8 B/vec side info:
`residual_norm` + `x_dot_o = ⟨x̄, o'⟩`), `DistanceEstimator`
(`⟨o',q'⟩ ≈ ⟨x̄,q'⟩ / x_dot_o`, the paper's unbiased rescale), and
`EstimatorBank` reranking candidates by the estimate instead of raw
Hamming. **Zero-centroid simplification** (`c = 0`) documented;
paper-faithful centroid path also built (`with_centroid`). Additive —
Pass-1/Pass-2 and the wire format are unchanged.
- **MEASURED strict-K coverage** (same fixture as §"Pass 2", cosine
ground truth): the estimator lifts the strict `candidate_k = K` bar
**46.39% (Pass-2 sign) → 49.71% (estimator, cosine rerank)** — a real
**+3.3 pp** lift, but **still ~40 pp short of the ≥90% strict bar.**
At over-fetch the estimator does better than sign (95.12% vs 91.60% at
candidate_k = 24). **Honest verdict: the unbiased estimator does NOT
clear the strict-K 90% bar on this distribution** — the binding
constraint is the 1-bit code's information ceiling, not estimator
variance. The ≥90% acceptance bar is still met only via the over-fetch
"candidate set" pattern this ADR's Decision specifies; the estimator
**reduces the over-fetch factor** needed but does not remove it. This
is a **published negative**, reported as such. Full numbers + reproduce
commands in ADR-156 §11.
## Open questions
- **Does `BinaryQuantized` need a randomized rotation pre-pass for
RuView's embedding distributions?** **ANSWERED (ADR-156 §10):** rotation
is built and measured — it helps (+10pp at strict K) but is not
sufficient alone for strict-K 90% on the tested anisotropic
distribution; the over-fetch candidate-set pattern meets the bar.
Pure sign quantization assumes zero-centered, isotropic embeddings; the
rotation decorrelates anisotropic coords as the RaBitQ paper
(Gao & Long, SIGMOD 2024) prescribes.
RuView's embedding distributions?** Pure sign quantization assumes
zero-centered, isotropic embeddings. If AETHER / spectrogram
distributions are skewed (likely for spectrogram), add a
`randomized_rotation` pre-pass following the original RaBitQ paper
(Gao & Long, SIGMOD 2024). Decided after pass-1 benchmark.
- **Sketch dimension target.** Default to the embedding's native
dimension (128 for AETHER, 256 for spectrogram). Higher-dimensional
sketches (Johnson-Lindenstrauss-projected to 512) trade compute for
@@ -19,7 +19,7 @@ The production CSI node firmware (`firmware/esp32-csi-node`) was built around th
| C6 capability | What it enables for sensing | Why we can't get it on S3 |
|---|---|---|
| **802.11ax (Wi-Fi 6) HE-LTF CSI** | 242 subcarriers per HE20 frame (vs 52 for HT-LTF), HE-MU/HE-TB PPDU types, OFDMA-aware channel sounding. **Hardware-confirmed 2026-06-11** (issue #1005, external production deployment): requires **ESP-IDF ≥ 5.5** — the v5.4 driver blob silently downconverts to 64-subcarrier HT even on a confirmed-HE link; v5.5.2 delivers 532 B frames = 256 bins (242 active tones), PPDU 0x01 (HE-SU). See WITNESS-LOG-110 §B1 (resolved). | S3 radio is HT-only (n) |
| **802.11ax (Wi-Fi 6) HE-LTF CSI** | 242 subcarriers per HE20 frame (vs 52 for HT-LTF), HE-MU/HE-TB PPDU types, OFDMA-aware channel sounding | S3 radio is HT-only (n) |
| **802.15.4 (Thread / Zigbee)** | Cross-node time-sync over a separate radio — frees Wi-Fi airtime for CSI, ±100 µs alignment possible without coordination traffic on the sensing channel | S3 has no 802.15.4 |
| **TWT (Target Wake Time)** | Sensor negotiates a deterministic wake slot with the AP; CSI cadence becomes scheduler-bounded instead of opportunistic | Requires 802.11ax — S3 can't speak it |
| **LP-core + hibernation (~5 µA)** | Always-on motion gate runs on a separate RISC-V LP core in deep sleep; HP core stays off until a real event | S3 ULP is FSM-only, ~10 µA floor |
-51
View File
@@ -104,57 +104,6 @@ Ranked by build cost × user impact:
| **P9** | HACS integration repo (`hass-wifi-densepose`) for HA-side install path | pending |
| **P10** | Witness bundle + CSA-style spec compliance check | pending |
## 4.1 Crypto/security review notes (§2.2 witness chain — ADR-262 P2 prerequisite)
Beyond-SOTA crypto+security review of the SHA-256 + Ed25519 witness chain
(`witness.rs` / `witness_signing.rs`) and the manifest signature surface
(`manifest.rs`), because ADR-262 P2 proposes to **reuse this exact signing
chain**. Top priority was the sibling `wifi-densepose-engine` bug class —
unframed boundary-to-boundary concatenation of operator-influenceable strings
into a signed/hashed digest.
- **Engine bug class ABSENT (good result, reported with byte evidence).**
`canonical_bytes` is `DOMAIN_TAG ‖ prev_hash[32] ‖ seq:u64-be ‖ ts:u64-be ‖
kind_len:u32-be ‖ kind ‖ payload_len:u32-be ‖ payload`. The two
variable-length operator-influenceable fields (`kind`, `payload`) are
**length-prefixed**; the fixed-width fields are self-delimiting → the
encoding is injective (no two distinct event tuples share a preimage). The
Ed25519 signature signs the **identical** bytes the SHA-256 chain commits to.
No separate unframed concatenation exists; the manifest `binary_signature`
is signed at build time (Makefile) over a single fixed-length `binary_sha256`
hex value, not in-crate.
- **CHM-WIT-01 (FIXED) — domain-separation tag added.** The engine fix
prescribed *domain-tag + length-prefix*; length-prefix was present, the
domain tag was not. Added a versioned, NUL-terminated
`WITNESS_DOMAIN_TAG = b"cog-ha-matter/witness-event/v1\x00"` prefix so the
witness message can never be replayed as a message for another Ed25519
context that shares key infrastructure (notably the manifest signature).
**Witness bytes change by design** (prior on-disk hashes/signatures
invalidated, as with the engine fix); verified safe because no in-repo crate
consumes cog-ha-matter witness bytes programmatically (doc-mentions only).
- **CHM-WIT-02 (HARDENED) — `verify_signature` now uses `verify_strict`.** For
an audit chain the signature is the attestation, so non-canonical encodings
and small-order keys are rejected (RFC 8032 strict), giving the "one
canonical signature per event" property. Not a forgery fix — the verifying
key is caller-pinned, never read from the event.
- **Confirmed clean (with evidence):** verify-before-trust + key-pinning
(`verify_signature` takes the verifying key as a parameter; `read_jsonl`
re-derives every hash and chain-verifies); key handling (the crate never
generates/stores/logs/serializes a signing key — only a documented test-only
fixed seed; production keys come from the Seed secure store, out of scope);
determinism (positional bytes, deterministic Ed25519, alphabetically-locked
JSONL field order, sorted TXT records — no HashMap/float nondeterminism feeds
any digest); fail-closed parsing (structured errors, no panics; `main.rs`
reads no untrusted files/paths).
Tests: `cog-ha-matter --no-default-features` 64 → **68**, 0 failed (CHM-WIT-01
pinned by 4 fails-on-old tests across `witness.rs`/`witness_signing.rs`;
CHM-WIT-02 guarded by a key-pinning test). Python deterministic proof
unchanged (cog-ha-matter is off the signal proof path).
## 5. References
- ADR-101 — `cog-pose-estimation` packaging precedent (signed binaries on GCS, .cog manifest)
@@ -190,78 +190,4 @@ The entity registry is a `RwLock<HashMap<EntityId, EntityEntry>>` backed by an a
- `v2/crates/wifi-densepose-sensing-server/src/main.rs` — Axum + Tokio architecture pattern used throughout the existing server stack
- `docs/adr/ADR-126-ruview-native-ha-port-master.md` — HOMECORE master; §5.5 crate naming; §6 compatibility contract; §5.1 RUVIEW-POLICY
---
## 9. Security & concurrency review (P1 core, beyond-SOTA sweep)
Foundational review of the `homecore` crate — the state store + event bus +
service/entity registries every other HOMECORE module trusts. Same rigor as
the ADR-129/130/132/133/161 sibling reviews. **Three real fixes (one
concurrency, two hardening), each pinned by a fails-on-old test; the bus-lag
and lock-discipline dimensions confirmed clean with evidence.**
- **HC-RACE-01 (state-set TOCTOU — lost / reordered `state_changed`, the
crux). FIXED.** `StateMachine::set` did `get()` (releasing the DashMap
shard lock) → compute the next snapshot + the no-op / `last_changed`
decision → `insert()` (re-acquiring the lock) → `send()`. The
read-modify-write was **not atomic** w.r.t. a concurrent writer on the
same entity, contradicting §2.1's promise that "the writer atomically
replaces the map entry." A writer that read a stale `old` could
mis-classify a genuine transition as a no-op and **drop its
`state_changed` event** (a missed automation trigger) or fire an event
whose `new_state` duplicated the previously delivered one (a spurious
trigger for any automation keyed on `old_state != new_state`). **Fix:**
hold the shard write-lock across the entire read→decide→insert→fire
sequence via `entry()`/`insert_entry()`; `tx.send` is non-blocking,
non-async, and never re-enters the map, so firing under the shard lock
cannot deadlock and keeps global event order in lock-step with global
commit order. Pinned by `concurrent_set_fires_no_duplicate_adjacent_events`
(4 writers toggling one entity A/B; asserts no two consecutive fired
events carry an identical `new_state` — impossible under correct
serialisation; a probe observed ~93k such duplicate-adjacent events across
200 trials on the racy code, zero on the fix).
- **HC-EID-LEN-01 (unbounded `entity_id` — memory-DoS at the REST boundary).
FIXED.** `homecore-api/src/rest.rs` parses untrusted path segments
straight through `EntityId::parse`; with no length cap, an
otherwise-valid id (`a.` + many MB of `[a-z0-9_]`) was accepted and a
`POST /api/states/<giant>` would persist it into the DashMap state store
(permanent growth across distinct ids). **Fix:** reject ids longer than
`MAX_ENTITY_ID_LEN` (255, HA-compatible) up front in `parse()`, before any
per-char scan, with a new `EntityIdError::TooLong`; fail-closed at the
boundary type protects every caller. Pinned by `entity_id_length_boundary`
(exactly-MAX accepted, MAX+1 and a 4 MiB id rejected — fails on old code).
- **HC-SVC-PANIC-01 (service-handler panic not isolated). HARDENED.**
`ServiceRegistry::call` already ran handlers outside the registry lock (no
`RwLock` poisoning, no blocking of other callers — clean), but a
panicking handler unwound through `call()` into the caller's task. **Fix:**
wrap the handler future in `AssertUnwindSafe` + `catch_unwind`, converting
a panic to `ServiceError::HandlerPanicked`; the registry stays fully
usable. Pinned by `panicking_handler_is_isolated_and_registry_survives`.
**Dimensions confirmed clean (with evidence):**
- **Event-bus bounds / lag (same class as the homecore-api WS lag-DoS).**
Both `StateMachine` and `EventBus` use bounded `tokio::sync::broadcast`
(capacity 4,096). A slow subscriber gets a recoverable `Lagged(n)`
(drop-oldest + re-sync); `fire_*` is non-blocking and **never waits on
slow receivers**, so a lagging subscriber cannot block the publisher, grow
the channel without bound, or take down a fast subscriber. Evidenced by
`slow_subscriber_does_not_block_publisher_or_kill_the_bus` (fire 3×
capacity at an idle subscriber; publisher unblocked, bus stays live).
- **Lock ordering / lock-across-await (deadlock).** No code path holds two
of `{state DashMap, registry RwLock, service RwLock}` simultaneously, so
no inconsistent-ordering deadlock can exist. Every `tokio::sync::RwLock`
guard in `registry.rs`/`service.rs` is used in a single synchronous
statement and dropped before any `.await`; `call` explicitly scopes the
read guard out before awaiting the handler. The only guard held across a
send is the DashMap shard lock in `set`, across a synchronous
(non-await) broadcast send — safe.
- **Panic-on-input.** No reachable `unwrap`/`expect`/index in non-test code
beyond the safe `send().unwrap_or(0)` and the dead-but-harmless
`split_once(...).unwrap_or(...)` fallbacks on already-validated ids.
`cargo test -p homecore --no-default-features`: **20 → 24 passed, 0 failed**
(+4 pins). Workspace green; Python deterministic proof unchanged
(`f8e76f21…46f7a`, bit-exact — `homecore` is off the signal proof path).
- `docs/adr/ADR-028-esp32-capability-audit.md` — witness chain pattern (Ed25519 per state transition)
@@ -190,23 +190,6 @@ This is the same Wasmtime host already used for integration plugins (ADR-128)
---
## 8a. Security review (beyond-SOTA sweep, post ADR-154159)
A focused security review of `homecore-automation` (the execution/eval surface — triggers → conditions → actions, with templates) was run after the ADR-154159 sweep, applying the same rigor that the sibling engine/bfld/calibration/vitals/geo reviews used. **Two real DoS findings, each pinned by a fails-on-old test; the condition-bypass, fail-closed-parsing, and action-authorization dimensions were probed and found clean.**
- **HC-SEC-01 (template-injection / unbounded-expansion DoS, HIGH) — FIXED.** A `template:` condition / `value_template` is user automation config, and was rendered with MiniJinja's defaults: **no instruction budget, no output cap**. A single condition such as `{% for i in range(5000) %}{% for j in range(5000) %}xxxx{% endfor %}{% endfor %}` rendered a **100 MB string over ~11 s on one render call** (measured) — a CPU/memory denial of service (the bfld-class "unbounded expansion"; MiniJinja's per-call `range()` 10k cap does **not** stop nested loops). **Fix:** enable MiniJinja's `fuel` feature and set a per-render budget (`set_fuel(Some(1_000_000))`) so a nested loop burns one unit per iteration — the attack now fails fast (~90 ms) with "engine ran out of fuel"; plus a 64 KiB source-length cap rejecting pathological sources before compilation. Legitimate HA templates (a few dozen instructions) are unaffected. Pinned by `nested_loop_template_is_bounded_not_unbounded_dos`, `single_huge_repeat_template_is_bounded`, `oversized_template_source_is_rejected` (all fail-on-old: unbounded render / no rejection), and `legitimate_template_still_renders_within_fuel` (no regression).
- **HC-SEC-02 (panic-on-config DoS, MEDIUM) — FIXED.** `Action::Delay { seconds }` and `Action::WaitForTrigger { timeout_seconds }` fed the user-supplied float straight into `Duration::from_secs_f64`, which **panics** on negative, NaN, infinite, or overflowing inputs — all reachable from a crafted (or typo'd) YAML (`delay: {seconds: -1}`, `.nan`, `.inf`, `1e308`). One hostile config aborts the spawned automation run task with a panic (measured: "cannot convert float seconds to Duration: value is negative"). **Fix:** a `safe_duration_from_secs` guard that saturates instead of panicking (NaN/±inf/negative → `Duration::ZERO`, matching HA's lenient "non-positive delay = no delay"; absurdly large → clamped to ~100 years). Pinned by `delay_negative_seconds_does_not_panic`, `delay_nan_seconds_does_not_panic`, `delay_infinite_seconds_does_not_panic`, `wait_for_trigger_negative_timeout_does_not_panic`, `safe_duration_saturates_hostile_values` (incl. overflow clamp).
**Dimensions confirmed clean (with evidence):**
- **Condition bypass / fail-closed eval** — a `Condition::Template` whose render errors evaluates to `false` (`condition.rs` `Err(_) => false`), and a `Choose` branch condition that fails to deserialize is treated as **non-matching** (the branch is skipped), not silently passing (`action.rs` `ChoiceBranch::matches` `Err(_) => return false`). Both fail **closed** (do-not-run), confirmed by the existing `choose_*` tests and template-false-blocks-action behavioral test. No true-by-default-on-parse-error path found.
- **Re-entrancy / livelock (DoS)** — run-mode machinery is bounded and tested: `Single`/`IgnoreFirst` re-entrancy guard, `Restart` cancel-and-replace, `Queued` FIFO serialization, and `max: N` semaphore cap (ADR-162; `restart_mode_cancels_prior_run`, `queued_mode_runs_sequentially_not_concurrently`, `max_two_caps_concurrency_at_two`, `single_mode_does_not_double_fire_on_rapid_triggers`). A self-triggering automation does not livelock the engine — each fire is bounded by its run-mode.
- **Action authorization** — templates are read-only sandboxed (`states`/`state_attr`/`is_state`/`now` globals; no service-call or state-set global is exposed to template scope), so a template cannot escalate into an action. Service authorization itself is enforced at the `homecore` service-registry boundary (out of this crate's scope); no gap found in what the automation crate enforces.
- **Panic-on-config (parse)** — `serde_yaml`/`serde_json` deserialization returns structured `AutomationError` (no `unwrap`/`expect`/index reachable from a crafted config in the eval/exec path); the only remaining panic surface was the `from_secs_f64` path fixed as HC-SEC-02.
Validation: `cargo test -p homecore-automation --no-default-features` → 54 passed / 0 failed (+14 over baseline). Python deterministic proof unchanged (homecore-automation is off the signal-processing proof path).
---
## 9. References
### HA upstream
@@ -1,444 +0,0 @@
# ADR-131: HOMECORE-UI — Operational dashboard for the two-tier Cognitum stack
| Field | Value |
|-------|-------|
| **Status** | Accepted — UI implemented (§10); full backend wiring specified (§11–§12) |
| **Date** | 2026-06-14 |
| **Deciders** | ruv |
| **Codename** | **HOMECORE-UI** — first-class operator dashboard inside the Cognitum Appliance shell |
| **Relates to** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (HOMECORE master), [ADR-127](ADR-127-homecore-state-machine-rust.md) (HOMECORE-CORE state machine), [ADR-128](ADR-128-homecore-integration-plugin-system.md) (HOMECORE-PLUGINS), [ADR-129](ADR-129-homecore-automation-engine.md) (automation engine), [ADR-130](ADR-130-homecore-rest-websocket-api.md) (HOMECORE-API), [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (recorder/semantic search), [ADR-151](ADR-151-room-calibration-specialist-training.md) (room calibration HTTP API), [ADR-100](ADR-100-cog-packaging-specification.md) (Cog packaging), [ADR-116](ADR-116-cog-ha-matter-seed.md) (cog-ha-matter), [ADR-069](ADR-069-cognitum-seed-csi-pipeline.md) (SEED RVF ingest), [ADR-105](ADR-105-federated-csi-training.md) (federated CSI training) |
| **Tracking issue** | TBD |
| **Parent** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (sub-ADR, HOMECORE-127…134 family) |
---
## 1. Context
HOMECORE (ADR-126 through ADR-134) is the native Rust + WASM + TypeScript port of Home Assistant running as the hub on the Cognitum v0 Appliance. As of P2, the state machine ([ADR-127](ADR-127-homecore-state-machine-rust.md)), API ([ADR-130](ADR-130-homecore-rest-websocket-api.md)), and COG runtime ([ADR-128](ADR-128-homecore-integration-plugin-system.md)) are in place. What is missing is a first-class dashboard UI that operators, integrators, and residents can use to manage the full two-tier hardware stack that HOMECORE coordinates.
### 1.1 The two-tier hardware model this UI must represent
This is the most important architectural constraint the UI must carry through every panel:
- **Cognitum SEED** — a Pi Zero 2 W-based edge node. It has its own RVF vector store (8-dim, content-addressed, with kNN queries), Ed25519 witness chain, SHA-256 ingest audit trail, onboard environmental sensors (BME280 temperature/humidity/pressure, PIR motion, reed switch, ADS1115 4-channel ADC, vibration), 13 drift detectors, an MCP proxy (114 tools, JSON-RPC 2.0, default-deny policy), 98 HTTPS API endpoints, and epoch-based swarm sync for multi-SEED deployments. SEEDs sit close to the ESP32 sensing nodes and receive feature vectors from them at 1 Hz. Multiple SEEDs can form a peer mesh. **This is the sensing and memory tier.**
- **Cognitum v0 Appliance** — a Pi 5 + Hailo-10H hub, running at `:9000`. It hosts the COG runtime (`/var/lib/cognitum/apps/`), the HOMECORE state machine and event bus, the calibration service, `ruview-mcp-brain:9876`, `cognitum-rvf-agent:9004`, `ruvector-hailo-worker:50051`, and acts as the fleet coordinator for multi-room correlation and federated training. The Appliance is where HOMECORE runs, and it is what the dashboard user is sitting in front of. **This is the computation and orchestration tier.**
SEEDs are **subordinate nodes that the Appliance supervises** — they are not peers. The UI navigation hierarchy must reflect this: the Appliance is the root, SEEDs are children, ESP32 nodes are leaves.
### 1.2 What the UI is not
HOMECORE-UI is **not** a re-skin of the existing Cognitum Cog Store. It is a full operational dashboard that **extends** the Cognitum platform's shell — the Cog Store, API Explorer, and Guide already exist and must remain intact, with the HOMECORE dashboard added as a first-class navigation section alongside them.
---
## 2. Decision
Build HOMECORE-UI as a **complete** TypeScript + Rust→WASM frontend (per this ADR's §3 and the HOMECORE-127…134 family) that:
1. Lives at `http://cognitum-v0:9000/homecore` (or as a dedicated nav item in the Cognitum Appliance shell).
2. Is visually and stylistically seamless with the existing Cognitum platform — same dark theme, same design tokens, same component patterns as `https://seed.cognitum.one/store`.
3. Drives the HOMECORE REST + WebSocket API ([ADR-130](ADR-130-homecore-rest-websocket-api.md)) and the calibration HTTP API ([ADR-151](ADR-151-room-calibration-specialist-training.md)) for all data.
4. Updates in real-time via the homecore `subscribe_events` WebSocket channel. **The UI must never poll for entity state.**
**This is a decision to deliver the complete operational dashboard — every panel in §4.1 through §4.10, every navigation section in §5, fully wired to live data — not a design-system scaffold or a partial first cut.** A static layout shell with placeholder data is explicitly **out of scope as a deliverable**: the design system (§3) is a means to the complete UI, not an end in itself. The acceptance bar for this ADR is that an operator can drive the full two-tier stack — fleet, entities, rooms, COGs, calibration, events, audit, and settings — from the dashboard, against real APIs, with no panel left as a stub.
### 2.1 `homecore-server` is the single backend-for-frontend (BFF) gateway
The data the dashboard needs is spread across **three backend tiers that are not one process**: (a) `homecore-api` (`/api/*` REST + `/api/websocket`, mounted in `homecore-server`); (b) the **calibration API** (`/api/v1/*`, served by a *separate* binary — `wifi-densepose calibrate-serve` / `wifi-densepose-sensing-server`); and (c) the **SEED device tier + appliance daemons** (RVF vector store, witness chain, onboard sensors, reflex rules, COG supervisor, federation), which are physically separate HTTPS services on the SEED nodes and the appliance.
The browser must talk to **exactly one origin.** Therefore `homecore-server` is promoted to the **single BFF / API gateway** for HOMECORE-UI: it serves the static assets at `/homecore`, serves `homecore-api` at `/api/*`, and **adds a new `/api/homecore/*` namespace** that proxies and aggregates the calibration API and the SEED/appliance tiers server-side. The UI only ever issues same-origin requests; cross-service auth (SEED bearer tokens, calibration tokens) is held by the gateway and **never exposed to the browser**. This collapses the CORS/multi-port problem and gives one place to enforce the long-lived-access-token auth (§4.10).
### 2.2 No mock data in production
The in-browser mock layer that the first UI cut shipped behind DEMO banners (§7.1, prior revision) is **demoted to a dev-only fixture** gated behind an explicit `?demo=1` / `HOMECORE_UI_DEMO=1` flag. The production build wires **every** panel to a real gateway endpoint. The full endpoint contract and the backend work each panel needs are specified in **§11**; the staged path to get there is **§12**. A panel may show an empty/typed-error state when its upstream is down, but it must never silently render fabricated data.
---
## 3. Design system — Cognitum platform conventions
The implementor **must study `https://seed.cognitum.one/store` as the definitive design reference before writing a single line of CSS.** The existing platform's design tokens, extracted from production, are:
### 3.1 Colour palette (CSS custom properties)
| Token | Value | Role |
|---|---|---|
| `--bg` | `#0a0e1a` | page background (very dark navy) |
| `--bg2` | `#111627` | secondary background / nav strip |
| `--card` | `#171d30` | card / panel surface |
| `--card-h` | `#1e2540` | card hover state |
| `--border` | `#252d45` | all border strokes (≈0.67px, subtle) |
| `--t1` | `#e0e4f0` | primary text (near-white) |
| `--t2` | `#8890a8` | secondary / muted text |
| `--t3` | `#505872` | tertiary / disabled text |
| `--cyan` | `#4ecdc4` | primary action colour (Install buttons, live indicators, accents) |
| `--cyan-d` | `rgba(78,205,196,0.15)` | cyan tint background for status badges |
| `--green` | `#6bcb77` | success / online / healthy states |
| `--green-d` | `rgba(107,203,119,0.15)` | green tint background |
| `--amber` | `#d4a574` | warning / stale / degraded states |
| `--amber-d` | `rgba(212,165,116,0.15)` | amber tint background |
| `--red` | `#e06060` | error / offline / veto states |
| `--red-d` | `rgba(224,96,96,0.15)` | red tint background |
| `--purple` | `#a78bfa` | informational / epoch / chain indicators |
| `--purple-d` | `rgba(167,139,250,0.15)` | purple tint background |
| `--r` | `10px` | standard border radius on all cards and panels |
### 3.2 Typography
- `--font`: `'Segoe UI', system-ui, -apple-system, sans-serif` — all body and heading text.
- `--mono`: `'Cascadia Code', 'Fira Code', Consolas, monospace` — all entity IDs, API endpoints, hex values, JSON payloads, COG binary hashes.
### 3.3 Component patterns (from the live Cog Store and API Explorer)
- **Cards**: `background: var(--card)`, `border: 0.67px solid var(--border)`, `border-radius: var(--r)`, `padding: 24px`.
- **Category pills / status badges**: small `border-radius: 46px`, uppercase text, coloured background tint (e.g. `background: var(--cyan-d); color: var(--cyan)` for `RUNNING`; `background: var(--amber-d); color: var(--amber)` for `STALE`).
- **Primary action buttons**: `background: var(--cyan)`, `color: var(--bg)`, no border — matching the existing "Install" button style exactly.
- **Secondary / ghost buttons**: transparent background, `border: 1px solid var(--border)`, `color: var(--t1)` — matching the existing "Details" button style.
- **Nav strip**: `background: var(--bg2)`, text items in `--t2`, active item highlighted in `--cyan` with a bottom underline.
- **Featured card gradient borders**: top-edge linear gradient from `var(--cyan)` to `var(--purple)` — replicate for HOMECORE section headers.
- **Live metric cards** (API Explorer status page): icon + large numeric value in `--cyan` or `--green`, label in `--t2` below, on a `var(--card)` background.
- **Method badge pills** on the API Explorer (`GET` in green, `POST` in amber, `AUTH` in purple) — reuse this same pill system for COG status indicators.
The implementor **must not introduce new colours, typefaces, or border radii.** Every component should feel like it was built by the same team that built the Cog Store and the API Explorer. A user navigating from the Cog Store into the HOMECORE dashboard should not notice a visual seam.
---
## 4. UI sections — required panels
### 4.1 System Dashboard (the "home screen")
The always-visible overview panel. Modelled on the API Explorer's live metric cards. All values update in real-time.
- **v0 Appliance health strip** — reuse the exact metric-card pattern from `seed.cognitum.one/status`: one card each for CPU %, RAM usage, Hailo-10H inference load (% utilisation), Hailo temperature, uptime, and the running services (`ruview-mcp-brain:9876`, `cognitum-rvf-agent:9004`, `ruvector-hailo-worker:50051`). Values in `--cyan`, labels in `--t2`. This strip is always at the top — it represents the machine the user is looking at.
- **SEED Fleet overview** — a grid of SEED node cards (one per paired SEED) on the `var(--card)` surface with `var(--border)`. Each card shows: online/offline status pill (green/red), firmware version, epoch number, current vector count, last ingest timestamp, and witness-chain validity badge. A collapsed row shows the SEED's 5 onboard sensors in summary (PIR: yes/no, door: open/closed, temperature from BME280). Offline SEEDs render the entire card with a `--red-d` background tint. Clicking a SEED card navigates to the SEED Detail view (§4.2).
- **ESP32 Node summary** — count of active ESP32 nodes per SEED, current frame rate (target: 100 Hz CSI + 1 Hz feature vectors), and a compact warning list for nodes with known issues (presence_score normalisation anomaly, stale firmware version).
- **COG Runtime status row** — a horizontal strip of status pills for each installed COG on the v0 Appliance. Pill colours follow the existing badge convention: `--green-d`/`--green` for running, `--red-d`/`--red` for failed, `--t3`/`--t2` for stopped. COG name in `--mono`. Clicking a pill navigates to COG Management (§4.6).
- **Event Bus activity indicator** — a small real-time sparkline showing the homecore broadcast channel event rate (events/sec). Indicate channel lag if a subscriber is falling behind the 4,096-event capacity.
### 4.2 SEED Detail View (per-SEED drill-down)
Accessible from the fleet grid. Full-page panel for a single SEED node, using the card + section-header pattern from the Cog Store's detail views.
- **SEED identity header** — `device_id` in `--mono`, firmware version, paired status in green, USB vs WiFi connection mode. A section-header gradient border (cyan → purple, matching the featured card style) visually separates this from Appliance content.
- **Vector Store panel** — current vector count, dimension (8), last kNN query latency, current epoch number, a small sparkline of ingest rate over the last hour, and a storage budget bar showing usage against the 100K working-set target. A "Compact now" button (`POST /api/v1/store/compact`) in ghost style. When usage exceeds 80%, the bar renders in `--amber`.
- **Witness Chain panel** — chain length (SHA-256 entries), last verification timestamp, a one-click "Verify chain" button (`POST /api/v1/witness/verify`), and an "Export attestation bundle" button for regulated deployments. The Ed25519 custody attestation (device-bound keypair, epoch + vector count + witness head) renders here. Chain length in `--purple`, following the existing epoch/chain colour convention.
- **Onboard Sensors panel** — live readings from all 5 sensors in individual sub-cards: BME280 (temperature °C, humidity %, pressure hPa), PIR (motion boolean with last-triggered timestamp), reed switch (open/closed with last-changed timestamp), ADS1115 (4 analog channels with configurable labels), vibration (boolean with last-triggered). These are ground-truth validators against CSI readings and are critical for diagnosing false positives in the mixture-of-specialists. Sensor values in `--cyan`; sensor names in `--t2`.
- **Reflex Rules panel** — the 3 pre-configured rules with current state: `fragility_alarm` (threshold 0.3 → relay actuator), `drift_cutoff` (threshold 1.0), `hd_anomaly_indicator` (threshold 200 → PWM brightness). Show last-fired time for each. The `fragility_alarm` threshold is the most commonly adjusted field and should be editable inline. Rules that have recently fired render with a `--amber-d` background tint.
- **Cognitive Analysis panel** — boundary fragility score (0.01.0, from Stoer-Wagner min-cut on the kNN graph) rendered as a progress bar: green below 0.3, amber 0.30.6, red above 0.6. High fragility (>0.3) indicates a regime change in the environment and should be visually prominent. Temporal coherence phase boundaries shown as a labelled timeline of detected environment state transitions. kNN graph rebuild cadence indicator (every 10 s).
- **Ingest pipeline status** — which ESP32 nodes feed this SEED, the packet type each is sending (`0xC5110003` native feature vectors vs `0xC5110002` vitals fallback path — distinguished visually since native is preferred), current ingest batch size, flush interval, and bridge path topology (direct vs host-laptop hop). The bridge-hop warning (known architectural limitation) renders in `--amber` since it adds a network hop.
### 4.3 SEED Fleet Map (multi-SEED topology)
For deployments with more than one SEED, a topology view showing the mesh:
- **Node hierarchy diagram** — v0 Appliance at root, SEEDs as second tier (grouped by room/zone), ESP32 nodes as leaves under each SEED. Lines represent active data flows. ESP-NOW mesh sync links between SEEDs shown as dashed lines. Connection health shown via line colour (green/amber/red). All labels in `--mono`.
- **Cross-SEED event deduplication indicator** — for events that span multiple SEEDs (one fall detected by two rooms; one occupant tracked through room A → hallway → room B), show a fusion badge indicating how many SEEDs contributed to the composite event.
- **Federation config** ([ADR-105](ADR-105-federated-csi-training.md)) — federated-learning round coordinator role (which SEED is the round coordinator), current round number, K healthy nodes selected, delta exchange status. **Model deltas only — never raw CSI** is a design invariant that must be labelled explicitly in the UI.
### 4.4 Entity & State Browser
The homecore state machine (`DashMap<EntityId, Arc<State>>`) is the authoritative source of truth. Every COG running on the v0 Appliance contributes entities.
- **Entity list by domain** — grouped by the `domain.` prefix of `EntityId`, using collapsible section headers. The 21 entities per ESP32 node (11 raw + 10 semantic primitives from `cog-ha-matter`) are the most important set. For each entity: current state string (in `--t1`), last-changed timestamp (in `--t3`), attribute map as collapsible JSON in `--mono`, and the Context (`user_id` + `parent_id` causality chain, critical for care/audit deployments). Entity IDs always in `--mono`.
- **SEED provenance badge** — each entity carries a small badge showing its data lineage: which ESP32 node → which SEED → which COG → homecore state machine. This trace is invaluable for debugging false positives and is a **first-class UI element, not a collapsed detail.**
- **Domain filter + semantic search** — filter by domain prefix and, once [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (homecore-recorder) lands, ruvector-backed semantic search: "when did the living room anomaly score last correlate with a door-open event?" A keyword filter across entity IDs and attribute keys ships in the initial release regardless of [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) status, given entity density; the semantic search layers on top once the recorder lands.
- **Real-time WebSocket feed** — entity states update live via the homecore `subscribe_events` WebSocket command ([ADR-130](ADR-130-homecore-rest-websocket-api.md)). The UI must never poll. Show a broadcast-channel lag indicator; warn visually if the subscriber is falling behind the 4,096-event channel capacity.
- **StateChanged detail panel** — clicking any entity opens a slide-over panel showing the full `StateChangedEvent`: `old_state`, `new_state`, `context.id`, `context.user_id`, and the `context.parent_id` chain rendered as a breadcrumb trail.
### 4.5 RoomState / Sensing Panel
Surfaces the mixture-of-specialists output from the calibration service — the highest-level per-room sensing result. Data comes from `GET /api/v1/room/state?bank=<room_id>` on the v0 Appliance.
- **Per-room cards** — one card per `room_id` on the `var(--card)` surface. Each card shows live `RoomState` JSON fields as sub-rows: presence (occupied/absent chip in green/red with confidence bar), posture (standing/sitting/lying chip with confidence), breathing BPM (numeric in `--cyan` with range indicator 630), heart rate BPM (numeric in `--cyan` with range indicator 40120), restlessness score (01 progress bar), and anomaly score (01 with normal/anomalous label, bar turns red above a configurable threshold).
- **STALE warning** — when `stale: true` (the specialist bank was trained against a different baseline), render the entire room card with a `--amber-d` background tint and a prominent amber banner reading "Bank stale — baseline has changed" with a direct "Recalibrate room" link into the calibration wizard (§4.7). This is the most common real-world failure mode and **must never be subtle.**
- **VETO indicator** — when `vetoed: true` (anomaly veto suppressed vitals/posture because the window was physically implausible), render the affected specialist slots in `--red` with a "Veto active" label. Values suppressed by veto **must not render as zeros** — they must render as explicitly withheld.
- **Null specialist placeholders** — specialists not yet trained (`null` in the specialist bank) render as "Not trained" placeholders in `--t3` with a small "Calibrate to enable" prompt in ghost style. They are **not** errors.
- **Confidence bars** — each specialist output has a confidence float, shown as a small inline bar (`--cyan` fill) next to the reading. Low confidence (< 0.4) renders the bar in `--amber`.
- **Multi-SEED fusion indicator** — for rooms served by multiple SEEDs, show a small badge indicating how many SEED nodes contributed to the `MultiNodeMixture` for this room's reading.
### 4.6 v0 Appliance COG Management
The v0 Appliance hosts COGs at `/var/lib/cognitum/apps/`. This panel is the operational companion to the existing Cog Store (`seed.cognitum.one/store`). It must match the Cog Store's visual conventions precisely — same card layout, same category pills, same install/detail button pair — because operators will move between the two surfaces.
- **Installed COGs list** — for each COG: `id` and `version` in `--mono`, architecture badge (`arm`/`hailo10` etc., category-pill pattern), status pill (running/stopped/failed/updating in green/grey/red/amber), `binary_sha256` verified badge (Ed25519 signature verification shown as a shield icon in `--green` or `--red`), and PID from the pid file. Actions: start, stop, restart (ghost style), and view `output.log` / `error.log` in a monospace drawer using `--mono`. Edit `config.json` inline with syntax highlighting.
- **COG Store / App Registry** — browsable `app-registry.json` listing. This panel should visually mirror `seed.cognitum.one/store` as closely as possible — same featured-card hero layout, same icon + title + description + category pill + action button structure. One-click install downloads the binary from GCS, verifies `binary_sha256` + `binary_signature`, writes the manifest, and starts the COG. Show which new homecore entities will appear in the state machine after install, as a preview list before confirming.
- **OTA Updates** — a badge count on installed COGs with available updates, matching the "Installed (N)" tab badge convention from the existing Cog Store. Show a diff panel (version change, new entities, config schema changes) before confirming the update.
- **Hailo HEF status** — for COGs with `arch: hailo10`: loaded HEF files on the Hailo-10H, current inference throughput, and `ruvector-hailo-worker:50051` connection status. The RF Foundation Encoder ([ADR-150](ADR-150-rf-foundation-encoder.md)) and neural pose head display here once available.
### 4.7 Calibration Wizard
The full baseline → enroll → train → verify pipeline runs via HTTP against the v0 Appliance ([ADR-151](ADR-151-room-calibration-specialist-training.md)). This is a multi-step guided flow — not a raw API panel. Use a stepped wizard layout with a progress indicator at the top (steps 15 as numbered pills, active step in `--cyan`, completed in `--green`, pending in `--t3`).
- **Step 1 — Select room and SEED** — enter a `room_id` name (validated against `[A-Za-z0-9_-]{1,64}`) and select which SEED(s) and ESP32 nodes serve this room from a dropdown populated from the live fleet. Show current CSI ingest health for the selected nodes inline — if frames are not arriving at the expected rate, display an amber warning **before** allowing the operator to proceed. A broken ingest pipeline will silently fail calibration.
- **Step 2 — Baseline capture** — `POST /api/v1/calibration/start`. A large full-width animated progress bar (cyan fill) reads from `GET /api/v1/calibration/status`: frames recorded vs target, ETA in seconds, `z_median` value. If `motion_flagged` is true, overlay an amber banner: "Room must be empty — movement detected." The baseline UUID produced here is the anchor for all future STALE detection for this room — display it in `--mono` once complete so operators can record it.
- **Step 3 — Anchor enrollment** — the 8 anchor labels in enforced order: `empty`, `stand_still`, `sit`, `lie_down`, `breathe_slow`, `breathe_normal`, `small_move`, `sleep_posture`. For each: a human-readable instruction with an illustration, a countdown timer rendered as a circular progress ring in `--cyan`, and an immediate quality-gate result (accepted in green, retry in amber with a reason string). Drive via `POST /api/v1/enroll/anchor` + `GET /api/v1/enroll/status`. After each accepted anchor, show the extracted feature values (mean, variance, breathing_score, heart_score) in a small `--mono` data row so operators can sanity-check the capture. Show overall progress as "N / 8 anchors accepted."
- **Step 4 — Train** — a single `POST /api/v1/room/train` call. Show the 6 specialist results as a checklist: presence (threshold + occupied_var), posture (prototype count), breathing (min_score), heartbeat (min_score), restlessness (calm/active motion values), anomaly (prototype count + scale). Specialists that returned non-null render in `--green`. Null specialists (insufficient anchor data) render in `--amber` with a "Re-enroll missing anchors" prompt linking back to Step 3 for the specific missing labels.
- **Step 5 — Verify live** — display the live `RoomState` for the just-trained room using the same per-room card layout as §4.5. Prompt the operator to stand in the room and verify presence is detected, try sitting/lying to confirm posture, and breathe normally to confirm vitals are in plausible range. A "Confirm and save" button (cyan, primary) closes the wizard; a "Something's wrong — re-enroll" button (ghost) loops back to Step 3.
### 4.8 Event Bus & Automation Feed
- **Live event stream panel** — a virtualized scrolling list of `SystemEvent` variants (`StateChanged`, `EntityRegistered`, `ConfigReloaded`) and notable `DomainEvent`s from the homecore Tokio broadcast channel. Each row shows: event-type pill (coloured by variant), `entity_id` in `--mono`, old state → new state arrow, timestamp, and `context.user_id`. The stream is filterable by entity domain, event type, or source SEED/COG. The filter bar uses the same search-input style as the Cog Store's search field.
- **Context causality breadcrumb** — expanding any event row shows the full Context chain (`context.id``parent_id``grandparent_id`) as a breadcrumb trail in `--mono`. This is how automation loops become visible without any separate debugging tool.
- **Automation builder** ([ADR-129](ADR-129-homecore-automation-engine.md) scope) — a trigger → condition → action editor on the card surface. The most important RuView-specific trigger types to support are: `state_changed` on `RoomState` entities with a threshold expression (e.g. `anomaly.value > 0.8`), SEED reflex-rule firing events (`fragility_alarm`, `hd_anomaly_indicator`), and custom `domain_event` topics. Actions include calling services in the homecore service registry and firing domain events. The condition expression editor uses `--mono`.
### 4.9 Witness / Audit Log
- **Unified witness timeline** — a chronological merged view of events from both tiers: the SEED's SHA-256 ingest chain (every RVF store write attested) and homecore's Ed25519 state-transition chain (biometric crossings, BFLD identity-risk elevations). Each row: `entity_id` in `--mono`, old/new state, timestamp, source SEED `device_id`, signing key fingerprint (first 8 chars in `--mono`). Pagination uses the same "Showing XY of Z" convention from the Cog Store's cog grid.
- **Privacy mode banner** — a persistent top-of-panel banner showing current privacy mode: `--green-d`/green text for full-publish mode; `--amber-d`/amber text for audit-only mode (SHA-256 digests on-SEED only, no MQTT state messages). Show the per-SEED privacy mode state, since SEEDs can be individually configured. Toggling privacy mode is a high-stakes action — require an explicit "Confirm" step with a summary of what will change.
- **Export bundle** — an "Export attestation bundle" button (ghost) that packages the SEED witness chain + homecore Ed25519 chain as a downloadable archive for regulated-deployment (care home, hotel, shared office) compliance handoff.
### 4.10 Settings & Integration Config
- **SEED fleet management** — add, remove, and reprovision SEEDs. Show the USB-only pairing requirement prominently (the pairing window only opens via `169.254.42.1`, not WiFi — a security invariant). Per-SEED: `device_id` in `--mono`, firmware version, bearer token status, and a "Rotate token" action (ghost) that walks the operator through the secure token rotation flow.
- **ESP32 node provisioning** — per-node NVS config display (target IP, target port, node_id), last-seen firmware version, and a link to the provisioning script. The `node_id` → room/zone assignment is editable here and persists to the room calibration system's `room_id` mapping.
- **MQTT / cog-ha-matter config** ([ADR-116](ADR-116-cog-ha-matter-seed.md)) — broker URL, credentials (masked), MQTT topic prefix, mDNS advertisement status (`_ruview-ha._tcp`), and a live connection indicator (green dot for connected, red for unreachable). The 21 HA-DISCO entities per node are listed here with their `via_device` assignments showing which SEED they belong to in HA's device registry.
- **Long-lived access tokens** — for homecore-api companion-app connections (HA 2025.1 wire-compat, [ADR-130](ADR-130-homecore-rest-websocket-api.md)). Token creation, last-used timestamp, and revocation. The HA companion-app pairing QR-code flow surfaces here.
- **Federation config** — for multi-SEED deployments: ESP-NOW mesh sync status, cross-SEED epoch alignment values, and federated-learning round settings (coordinator SEED, round cadence, Krum aggregation parameters per [ADR-105](ADR-105-federated-csi-training.md)). The design invariant **"model deltas only, never raw CSI"** must be labelled explicitly in this panel.
---
## 5. Navigation structure
HOMECORE-UI must integrate into the existing Cognitum Appliance nav shell. The top nav should read:
```
Framework | Guide | Cog Store | HOMECORE | Status
```
— inserting **HOMECORE** as a first-class nav item between the existing "Cog Store" and "Status" entries, using the same nav-item style (text in `--t2`, active state in `--cyan` with bottom underline).
Within the HOMECORE section, a left sidebar (or top sub-nav on narrow viewports) provides section navigation:
```
Dashboard | SEED Fleet | Entities | Rooms | COGs | Calibration | Events | Audit | Settings
```
The COG Store panel within HOMECORE (§4.6) links out to `seed.cognitum.one/store` for the full catalog view, ensuring the existing Cog Store remains the canonical browsing experience.
---
## 6. Key UX invariants
These must be maintained across every panel:
1. **Always make the tier origin of any data explicit.** A `RoomState` reading traces to an ESP32 node → SEED → COG → v0 Appliance state machine. The provenance badge (§4.4) must appear wherever entity states are displayed.
2. **The `stale` and `vetoed` flags from `RoomState` and the kNN fragility score from SEED cognitive analysis are meaningful diagnostic signals** — they must never be silently hidden, styled grey-on-grey, or collapsed behind an expand toggle. They represent system health operators need to act on.
3. **Values that are `null` because a specialist has not been trained must be visually distinct from values that are unavailable due to an error.** The distinction is operationally important: `null` means "calibrate to enable," unavailable means "investigate."
4. **All entity IDs, hashes, API endpoints, binary signatures, device UUIDs, and JSON payloads must use `--mono` font.** This is already the convention in the API Explorer and must be consistent throughout HOMECORE-UI.
5. **The v0 Appliance Hailo HAT is a separate subsystem from the SEED's edge compute.** Inference results tagged as Hailo-sourced (COGs with `arch: hailo10`) must be visually distinguished from results from CPU-only COGs (`arch: arm`) so operators can triage hardware-specific failures.
---
## 7. Scope — complete UI delivery
The deliverable is the **entire** dashboard. Every panel below ships fully implemented and wired to its live data source — there is no scaffold-only milestone and no panel left as a placeholder. The table records each panel's authoritative backing API so the build can proceed in whatever order best fits the dependency graph; it is a dependency map, **not** a sequence of partial releases.
| Panel | Section | Backing API / source |
|---|---|---|
| System Dashboard | §4.1 | [ADR-130](ADR-130-homecore-rest-websocket-api.md) WebSocket + appliance health endpoints |
| SEED Detail View | §4.2 | SEED HTTPS API (vector store, witness, sensors, reflex, cognitive analysis) |
| SEED Fleet Map | §4.3 | fleet topology + federation ([ADR-105](ADR-105-federated-csi-training.md)) |
| Entity & State Browser | §4.4 | [ADR-127](ADR-127-homecore-state-machine-rust.md) state machine via [ADR-130](ADR-130-homecore-rest-websocket-api.md) `subscribe_events`; semantic search via [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) |
| RoomState / Sensing | §4.5 | [ADR-151](ADR-151-room-calibration-specialist-training.md) `GET /api/v1/room/state` |
| COG Management | §4.6 | [ADR-128](ADR-128-homecore-integration-plugin-system.md) plugin runtime + [ADR-100](ADR-100-cog-packaging-specification.md) app registry |
| Calibration Wizard | §4.7 | [ADR-151](ADR-151-room-calibration-specialist-training.md) calibration HTTP API |
| Event Bus & Automation | §4.8 | [ADR-130](ADR-130-homecore-rest-websocket-api.md) broadcast channel + [ADR-129](ADR-129-homecore-automation-engine.md) automation engine |
| Witness / Audit Log | §4.9 | SEED SHA-256 ingest chain + homecore Ed25519 chain |
| Settings & Integration | §4.10 | SEED provisioning, [ADR-116](ADR-116-cog-ha-matter-seed.md) MQTT/Matter, LLAT, federation |
### 7.1 Build sequencing within the complete deliverable
The complete UI depends on backing services that mature on their own timelines. Each panel is built against the **real gateway endpoint** defined in §11; where the upstream is not yet available the panel renders a typed empty/error state, **not** fabricated data (the dev-only `?demo=1` fixture of §2.2 exists for offline development only and is never the shipped behaviour). Concretely, the hard contract dependencies are: [ADR-130](ADR-130-homecore-rest-websocket-api.md) (REST + WebSocket), [ADR-127](ADR-127-homecore-state-machine-rust.md) (state machine), [ADR-151](ADR-151-room-calibration-specialist-training.md) (calibration), [ADR-128](ADR-128-homecore-integration-plugin-system.md) (plugin runtime), [ADR-129](ADR-129-homecore-automation-engine.md) (automation), [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) (event history + semantic search), [ADR-116](ADR-116-cog-ha-matter-seed.md) (SEED/Matter), [ADR-069](ADR-069-cognitum-seed-csi-pipeline.md) (SEED ingest), and [ADR-105](ADR-105-federated-csi-training.md) (federation). The keyword entity filter (§4.4) ships immediately; semantic search layers on once [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) lands. The exact panel→endpoint→upstream map and the new gateway code each requires are §11; the staged delivery is §12.
---
## 8. Consequences
### 8.1 Positive
- Operators, integrators, and residents get a single coherent surface for the full two-tier stack, replacing the need to SSH into SEEDs or hand-craft API calls.
- The dashboard reuses the proven Cognitum design tokens and component patterns verbatim, so it ships visually consistent with no separate design effort and no perceptible seam between surfaces.
- Diagnostic signals that today are invisible (`stale`/`vetoed` flags, kNN fragility, provenance lineage, channel lag) become first-class, surfacing the system's most common real-world failure modes directly to operators.
### 8.2 Negative / risks
- The UI hard-depends on the wire-compat guarantees of ADR-130 and the calibration contract of ADR-151; schema drift in either breaks panels silently. Integration tests against every backing contract in §7 are required.
- Committing to the complete UI in one deliverable is a larger up-front effort and couples the UI's readiness to the maturity of multiple backing services (§7.1, §11). The mitigation is the BFF gateway (§2.1): each panel targets one same-origin endpoint, and the gateway absorbs upstream churn behind a stable contract.
- Promoting `homecore-server` to a gateway means it now **proxies cross-tier traffic** (calibration API, SEED HTTPS, appliance daemons). This adds a network hop, a place for upstream timeouts/partial failures to surface, and a server-side store of SEED bearer tokens that must be protected (§11.10). Each proxied route needs an explicit timeout + typed error mapping so one slow SEED cannot stall the dashboard.
- Several panels depend on data that only exists on **real hardware or new daemons** (SEED device tier, appliance host metrics, COG supervisor). Until those upstreams exist the corresponding gateway routes return `503 upstream_unavailable`; this is honest but means the dashboard is only as "live" as the tiers behind it (§11 classifies every endpoint by what it depends on).
- Faithfully mirroring `seed.cognitum.one/store` couples HOMECORE-UI to the external Cog Store's evolving design; token drift there must be tracked and re-synced.
- The two-tier mental model (Appliance root, SEED children, ESP32 leaves) must be enforced consistently; any panel that flattens or peers the tiers undermines the core architectural constraint.
---
## 9. References
- `https://seed.cognitum.one/store` — primary design reference for all visual conventions.
- `https://seed.cognitum.one/status` — reference for live metric-card layout.
- [ADR-126](ADR-126-ruview-native-ha-port-master.md) — HOMECORE master ADR.
- [ADR-127](ADR-127-homecore-state-machine-rust.md) — HOMECORE-CORE state machine and entity registry.
- [ADR-128](ADR-128-homecore-integration-plugin-system.md) — HOMECORE-PLUGINS WASM COG substrate.
- [ADR-129](ADR-129-homecore-automation-engine.md) — HOMECORE automation engine.
- [ADR-130](ADR-130-homecore-rest-websocket-api.md) — HOMECORE-API REST + WebSocket wire-compat.
- [ADR-132](ADR-132-homecore-recorder-history-semantic-search.md) — homecore-recorder, history + semantic search.
- [ADR-100](ADR-100-cog-packaging-specification.md) — Cognitum Cog packaging specification (manifest.json, status values, on-device layout).
- [ADR-116](ADR-116-cog-ha-matter-seed.md) — cog-ha-matter (SEED cog, HA-DISCO entity surface, mDNS).
- [ADR-069](ADR-069-cognitum-seed-csi-pipeline.md) — ESP32 CSI → Cognitum SEED RVF ingest pipeline (SEED architecture detail).
- [ADR-105](ADR-105-federated-csi-training.md) — Federated CSI training (multi-SEED federation).
- [ADR-151](ADR-151-room-calibration-specialist-training.md) — Per-room calibration specialist training (calibration HTTP API).
- `v2/crates/homecore/src/` — state machine, entity, event, registry source.
- `docs/integration/calibration-appliance-integration.md` — calibration API contract and RoomState schema.
---
## 10. Implementation status
Implemented as a zero-dependency, no-build-step vanilla TS/JS + CSS frontend served by `homecore-server` at `/homecore` (the `rufield-viewer` "Axum + vanilla-JS" pattern). The complete deliverable per §2/§7 — all ten panels, fully rendered, wired to live data where the backing service exists and to a contract-conformant DEMO-flagged mock layer (§7.1) where it does not.
**Location:** `v2/crates/homecore-server/ui/``css/tokens.css` (the §3.1 palette, verbatim) + `css/app.css` (§3.3 components); `js/{ui,api,ws,mock,app}.js` (shared helpers, REST client, `subscribe_events` WS client, mock layer, shell+router); `js/panels/*.js` (one module per §4 panel). Mounted via `tower-http` `ServeDir` in `homecore-server::build_app`, gated by `--ui-dir`/`HOMECORE_UI_DIR`.
**Verification:**
- **Rust** — `#[cfg(test)] mod ui_tests` in `homecore-server/src/main.rs`: 5 integration tests (`tower::oneshot`) covering index, design tokens, all ten panel modules served, API coexistence, and mount-disable. *Written but not compiled in the authoring environment (no Rust toolchain present); run `cargo test -p homecore-server` on a Rust host before merge.*
- **Frontend** — `ui/` test suite under plain `node` (no npm install): `npm test` → import/export graph verifier (15 modules) + render-smoke (executes every panel against a DOM shim; 21 checks) + interaction suite (live WS patch, ws.js handshake/parse, calibration contract; 3 checks). **24/24 green.**
- **Benchmark** — `npm run bench`: total bundle **136.8 KB** uncompressed (**~37× smaller** than HA's ~5 MB Lit bundle, the ADR-126 §1.1 foil); slowest panel **1.5 ms/cold-render**.
**Honest scope — current vs. target.** *Earlier cut:* the front-end was complete but only §4.4 Entities was wired to a real backend; the rest rendered from an in-browser mock. *This revision implements the §11 wiring:*
- **Front-end (§11.11) — DONE and verified.** `api.js` rewritten: all data accessors are async and call the §11.2 gateway routes; the mock layer is demoted to a dev-only fixture reachable **only** under `?demo=1` / `HOMECORE_UI_DEMO` (§2.2); every panel `await`s and renders a typed empty/error state on failure (no mock fallback in production). All ten panels converted (3 by hand, 7 via parallel agents). Verified under Node: 5 test files green — import graph, boot, render-smoke (22), interaction (3), **and a new prod-errors suite (13) that runs with demo OFF + gateway unreachable and asserts every panel renders an error state, never mock, never throws** (it caught and fixed a real unhandled-rejection in the events panel).
- **Gateway (§11.1–§11.6) — IMPLEMENTED, COMPILED, TESTED, RUN.** New `homecore-server/src/gateway.rs` (+`reqwest` dep, +CLI/env flags `--calibration-url`/`--calibration-token`/`--apps-dir`/`--gateway-timeout-ms`, merged into `build_app` via `gateway_router`). Real handlers: `/api/cal/*` reverse-proxy (W2), `GET /api/homecore/rooms` with the §11.3 RoomState adapter (W2), `GET /api/homecore/cogs` supervisor over the apps dir (W4), `GET /api/homecore/appliance` from `/proc` + port probes (W6). SEED-device/appliance-daemon routes (seeds, federation, witness, privacy, settings, automations, events-history, hailo, tokens — W3/W5) return a typed `503 upstream_unavailable` per §11.2. **Verified on Rust 1.89: `cargo test -p homecore-server --no-default-features` = 12/12 pass** (6 gateway + 6 UI mount). **Run live:** `GET /api/homecore/appliance` returns real `/proc` metrics + TCP service probes; unauth → `401`; `cogs``[]` with no apps dir; SEED-tier → typed `503`; and against a mock calibration upstream the `/api/cal/*` proxy passes through (`200`) and `GET /api/homecore/rooms` correctly adapts `RoomState` to the UI shape (`breathing``breathing_bpm`, `heartbeat:null``heart_bpm:null`, injected `anomaly.threshold`/`room_id`, `stale` passthrough). **Live testing caught + fixed one real bug** — a double-`v1` path in the `/api/cal/*` proxy URL.
The endpoint-by-endpoint contract is **§11**; the staged plan and which endpoints depend on real SEED/appliance hardware vs. pure software is **§12**.
---
## 11. Backend wiring — making every panel real
This section is the authoritative contract for full functionality. It removes the mock layer from the production path (§2.2) by routing every panel through the `homecore-server` BFF gateway (§2.1). Each endpoint is classified by what it depends on:
- **EXISTS** — backend code already in this repo; gateway only proxies/adapts.
- **NEW-GW** — pure software the gateway itself implements (filesystem, `/proc`, process control, recorder query) — no new external service.
- **NEW-API** — a small HTTP wrapper to add to an existing in-repo crate (`homecore-api`, `homecore-automation`).
- **SEED-DEV** — depends on a SEED node's on-device HTTPS API (separate hardware/firmware).
- **APPLIANCE** — depends on an appliance daemon / accelerator stat source.
### 11.1 Gateway shape
`homecore-server` already mounts `homecore-api` at `/api/*` and the UI at `/homecore`. It gains a new **`/api/homecore/*`** namespace (the dashboard-specific aggregation surface) plus a **`/api/cal/*`** reverse-proxy to the calibration service. The browser issues only same-origin requests; the gateway fans out server-side, holding all upstream credentials (§11.10). Every proxied route has an explicit timeout and maps upstream failure to a typed body (`503 upstream_unavailable`, `504 upstream_timeout`) so one slow tier never stalls the dashboard.
### 11.2 Master endpoint contract (panel → gateway route → upstream → status)
| Panel | UI method (`api.js`) | Gateway route | Upstream / source | Class |
|---|---|---|---|---|
| §4.4 Entities | `states()` | `GET /api/states` | `homecore` state machine | **EXISTS** ✅ wired |
| §4.4/§4.8 live feed | WS | `GET /api/websocket` (`subscribe_events`) | `homecore` event bus | **EXISTS** ✅ wired |
| §4.8 Event history | `eventHistory(q)` | `GET /api/events?since=…` | `homecore-recorder` ([ADR-132](ADR-132-homecore-recorder-history-semantic-search.md)) | **NEW-API** |
| §4.8 Automations | `automations()` / `saveAutomation()` | `GET/POST/DELETE /api/homecore/automations` | `homecore-automation` ([ADR-129](ADR-129-homecore-automation-engine.md)) | **NEW-API** |
| §4.5 Rooms | `roomStates()` | `GET /api/homecore/rooms` → per-room `GET /api/cal/v1/room/state?bank=` | `calibrate-serve` ([ADR-151](ADR-151-room-calibration-specialist-training.md)) | **EXISTS** (proxy + adapter) |
| §4.7 Calibration | `calibration.*` | `POST /api/cal/v1/calibration/{start,stop}`, `GET …/status`, `POST …/enroll/anchor`, `GET …/enroll/status`, `POST …/room/train` | `calibrate-serve` | **EXISTS** (proxy) |
| §4.6 COGs | `cogs()` / `cogAction()` / `cogLogs()` | `GET /api/homecore/cogs`, `POST …/cogs/:id/{start,stop,restart}`, `GET …/cogs/:id/logs`, `GET/PUT …/cogs/:id/config` | COG supervisor over `/var/lib/cognitum/apps/` ([ADR-100](ADR-100-cog-packaging-specification.md)/[ADR-128](ADR-128-homecore-integration-plugin-system.md)) | **NEW-GW** |
| §4.6 Hailo HEF | `hailo()` | `GET /api/homecore/hailo` | `ruvector-hailo-worker:50051` | **APPLIANCE** |
| §4.1 Appliance health | `appliance()` | `GET /api/homecore/appliance` | host `/proc` + Hailo stats + service probes | **NEW-GW** (+APPLIANCE for Hailo) |
| §4.1/§4.2 Fleet + SEED detail | `seeds()` / `seed(id)` | `GET /api/homecore/seeds`, `GET …/seeds/:id` | SEED device HTTPS API ([ADR-069](ADR-069-cognitum-seed-csi-pipeline.md)) via registry | **SEED-DEV** |
| §4.2 SEED actions | `seedCompact()` / `seedVerify()` | `POST …/seeds/:id/{compact,witness/verify}` | SEED device API | **SEED-DEV** |
| §4.3 Federation | `federation()` | `GET /api/homecore/federation` | federation coordinator ([ADR-105](ADR-105-federated-csi-training.md)) | **SEED-DEV/APPLIANCE** |
| §4.9 Witness/Audit | `witnessLog(p,s)` | `GET /api/homecore/witness?page=…` | merge: `homecore` Ed25519 chain + per-SEED SHA-256 chains | **NEW-API + SEED-DEV** |
| §4.9 Privacy mode | `privacyModes()` / `setPrivacy()` | `GET/POST /api/homecore/privacy` | SEED privacy control plane ([ADR-141](ADR-141-bfld-privacy-control-plane-modes-attestation.md)) + cog-ha-matter | **SEED-DEV** |
| §4.9 Export bundle | `exportAttestation()` | `GET /api/homecore/witness/export` | gateway packages both chains | **NEW-GW** |
| §4.10 Tokens (LLAT) | `tokens()` / `createToken()` / `revokeToken()` | `GET/POST/DELETE /api/homecore/tokens` | `homecore-api` `LongLivedTokenStore` | **NEW-API** |
| §4.10 MQTT/Matter | `mqttConfig()` | `GET /api/homecore/integrations/mqtt` | cog-ha-matter config ([ADR-116](ADR-116-cog-ha-matter-seed.md)) | **NEW-GW/SEED-DEV** |
| §4.10 ESP32 provisioning | `nodes()` / `assignRoom()` | `GET/PUT /api/homecore/nodes` | SEED ingest config ([ADR-069](ADR-069-cognitum-seed-csi-pipeline.md)) | **SEED-DEV** |
| §4.10 SEED mgmt | `pairSeed()` / `rotateToken()` | `POST /api/homecore/seeds/{pair,:id/rotate-token}` | SEED pairing (USB `169.254.42.1`) | **SEED-DEV** |
### 11.3 Calibration proxy + RoomState adapter
The calibration service is real but on a different binary/port; the gateway reverse-proxies it under `/api/cal/*` (upstream base from `HOMECORE_CALIBRATION_URL`). Its `RoomState` (`wifi-densepose-calibration/src/runtime.rs`) does **not** match the UI's shape, so the gateway adapts it in `GET /api/homecore/rooms`:
| Real field (`RoomState`) | UI field | Adapter rule |
|---|---|---|
| `breathing: Option<SpecialistReading>` | `breathing_bpm: {value,confidence}\|null` | rename; `value`=`reading.value`, `confidence`=`reading.confidence`; `None``null` (preserves "not trained") |
| `heartbeat: Option<…>` | `heart_bpm: {…}\|null` | rename `heartbeat``heart_bpm` |
| `presence/posture/restlessness` | same names `{value,confidence}\|null` | `posture.value`=`reading.label` (class), else numeric |
| `anomaly: Option<…>` | `anomaly: {value,confidence,threshold}` | inject `threshold`=`MixtureOfSpecialists.veto_threshold` (0.5) |
| `vetoed` / `stale` | `vetoed` / `stale` | pass through (drives the §4.5/§6 banners) |
| *(absent)* | `room_id`, `seeds[]` | injected by the gateway from the **room registry** |
A **room registry** (config or derived from `GET /api/cal/v1/calibration/baselines`) maps each `room_id` → bank name + serving SEED ids, so `GET /api/homecore/rooms` returns one adapted record per room. `Option::None` → JSON `null` keeps the null-vs-withheld distinction (§6 invariant 3) intact end-to-end.
### 11.4 SEED registry & device-API proxy
The gateway holds a **SEED registry** (`device_id` → base URL + bearer token + zone), populated by pairing (§4.10) and persisted server-side. `GET /api/homecore/seeds[/:id]` fans out to each SEED's on-device API and shapes the result to the §4.2 card/detail model. Expected SEED-side endpoints (the contract the SEED firmware must satisfy — a subset of its 98 endpoints): health; vector-store stats (`vector_count`, `dim`, `epoch`, `knn_latency_ms`, ingest rate); witness (`len`, `last_verify`, `valid`) + `POST verify`; onboard sensors (BME280/PIR/reed/ADS1115/vibration); reflex rules + thresholds; cognitive analysis (fragility, coherence phases); ingest feeders (ESP32 node ids + packet type `0xC5110003`/`0xC5110002` + rate). Offline/unreachable SEEDs surface as `online:false` (drives the §4.1 red tint) rather than failing the whole list.
### 11.5 Appliance metrics collector (§4.1)
`GET /api/homecore/appliance`, implemented in the gateway: CPU/RAM/uptime from `/proc`; Hailo load + temperature from the Hailo runtime/sysfs (or `ruvector-hailo-worker` stats); service health by probing `ruview-mcp-brain:9876`, `cognitum-rvf-agent:9004`, `ruvector-hailo-worker:50051`; event-bus rate from the `homecore` broadcast channel + its lag counter (already exposed for §4.1/§4.4).
### 11.6 COG supervisor (§4.6)
`GET /api/homecore/cogs`: read each `/var/lib/cognitum/apps/*/manifest.json` ([ADR-100](ADR-100-cog-packaging-specification.md)), the pid file, and verify `binary_sha256` + `binary_signature` (Ed25519) → status/shield. `POST …/cogs/:id/{start,stop,restart}` performs supervised process control; `GET …/cogs/:id/logs` tails `output.log`/`error.log`; `GET/PUT …/cogs/:id/config` reads/writes `config.json`. Hailo-arch COGs join the §11.5 Hailo stats. The Cog Store/App-Registry **browsing** panel was removed per product decision; this is operational management only.
### 11.7 Witness aggregation + privacy (§4.9)
`GET /api/homecore/witness` merges two chains chronologically: the `homecore` Ed25519 state-transition chain (exposed by a small `homecore-api` route over its witness log) and each paired SEED's SHA-256 ingest chain (proxied via the registry), paginated server-side. `GET/POST /api/homecore/privacy` reads/sets per-SEED privacy mode via the SEED privacy control plane ([ADR-141](ADR-141-bfld-privacy-control-plane-modes-attestation.md)) — the POST is the high-stakes confirmed toggle (§4.9). `GET /api/homecore/witness/export` packages both chains into the downloadable attestation bundle.
### 11.8 Event history + automation CRUD (§4.8)
`homecore-api` adds `GET /api/events?since=…` backed by `homecore-recorder` ([ADR-132](ADR-132-homecore-recorder-history-semantic-search.md)) for history (live updates continue over the existing WS). The automation builder persists through `GET/POST/DELETE /api/homecore/automations`, a thin HTTP wrapper over the `homecore-automation` engine's register/list/remove ([ADR-129](ADR-129-homecore-automation-engine.md)). RuView-specific triggers (RoomState thresholds, SEED reflex events) map onto the engine's trigger types.
### 11.9 Entity provenance convention (§4.4/§6)
The first-class provenance badge requires each entity to carry its lineage. Convention: every integration writes `attributes.source` (and, where known, `attributes.seed` / `attributes.cog`) when it sets state; `cog-ha-matter` ([ADR-116](ADR-116-cog-ha-matter-seed.md)) populates these from the ESP32 node → SEED → COG path and HA `via_device`. The gateway/UI resolves node→seed→cog from these attributes (no fabrication; missing lineage renders as "unknown", not invented).
### 11.10 Auth, credentials, config
- **Browser → gateway:** one long-lived access token (the §4.10 LLAT), sent as `Authorization: Bearer`; validated by `homecore-api`'s `LongLivedTokenStore`. The dev default (`allow_any_non_empty`) stays for local runs; production provisions `HOMECORE_TOKENS`.
- **Gateway → upstreams:** SEED bearer tokens and the calibration token live **only** server-side (SEED registry + `HOMECORE_CALIBRATION_TOKEN`); never sent to the browser. This is the reason the gateway exists.
- **Config:** `HOMECORE_CALIBRATION_URL`, SEED registry store path, per-proxy timeout (default 2 s), `HOMECORE_UI_DEMO` (dev fixture). No browser CORS needed (same origin); gateway→upstream is server-to-server.
### 11.11 Front-end changes
`api.js`: drop the mock fallback from the production path — methods call the §11.2 gateway routes; `this.base` stays same-origin; the mock layer is reachable only under `?demo=1`/`HOMECORE_UI_DEMO`. Every panel renders a **typed empty/error state** (not mock) when its route returns `503/504`. `mock.js` moves to a dev fixture (kept for the offline test harness, excluded from the production bundle). The §10 frontend tests are re-pointed at the gateway contract (and gain contract tests per §11.2 route).
---
## 12. Delivery plan to full functionality
Staged so each wave is independently shippable behind the gateway, lands real data for a coherent set of panels, and has an explicit acceptance gate. "Class" reuses §11's tags.
| Wave | Scope | Class | Acceptance gate |
|---|---|---|---|
| **W1 — Gateway foundation** | `/api/homecore/*` scaffold in `homecore-server`; auth passthrough; per-proxy timeout + typed errors; `api.js` base + remove prod mock (`?demo=1` only); panels get typed empty/error states | NEW-GW | Entities + live WS still green; with no upstreams, every other panel shows "upstream unavailable", **never** mock (unless `?demo=1`); Rust + JS suites pass |
| **W2 — Rooms + Calibration** | `/api/cal/*` reverse-proxy; `GET /api/homecore/rooms` with the §11.3 RoomState adapter + room registry; wire §4.5 + the §4.7 wizard to real endpoints; delete the in-browser calibration stub | EXISTS (proxy+adapter) | Against a running `calibrate-serve` (replayed CSI), the wizard drives a real baseline→enroll→train→verify and §4.5 shows real `RoomState` with correct stale/veto/null mapping; contract test on the adapter |
| **W3 — Events + Automations** | `GET /api/events` over `homecore-recorder`; `/api/homecore/automations` over `homecore-automation` | NEW-API | §4.8 history loads from recorder; an automation created in the UI persists and fires via the engine |
| **W4 — COG management** | `/api/homecore/cogs*` supervisor over `/var/lib/cognitum/apps/` (manifest + pid + sig verify + logs + config) | NEW-GW | §4.6 lists real installed COGs; start/stop/restart works; sha256/signature shield reflects real verification; logs tail |
| **W5 — SEED tier** | SEED registry + pairing; `/api/homecore/seeds*` device proxy; witness merge + privacy control; ESP32 provisioning | SEED-DEV | Against a real or emulated SEED API, §4.2/§4.3/§4.9/§4.10 show real vector-store/witness/sensor/reflex/cognition data; SEED tokens stay server-side; offline SEED → red tint, not a failed page |
| **W6 — Appliance + federation + Hailo** | `/api/homecore/appliance` (host metrics + service probes); `/api/homecore/hailo`; `/api/homecore/federation` ([ADR-105](ADR-105-federated-csi-training.md)) | NEW-GW + APPLIANCE | §4.1 health is real; §4.6 Hailo HEF/throughput real; §4.3 federation round/coordinator/Krum real |
**Definition of done (full functionality):** with W1W6 merged and the upstream tiers running, loading `/homecore` with **no** `?demo=1` flag shows live data on all ten panels, `api.anyDemo()` is false, and no panel renders fabricated values. Panels whose tier is offline show typed empty/error states. The mock layer is reachable only as the `?demo=1` developer fixture.
### 12.1 Wave status (this revision)
| Wave | Status |
|---|---|
| **W1 — Gateway foundation** | ✅ DONE — `gateway.rs`, auth passthrough, typed `503/504`, merged into `build_app`; front-end mock removed from prod path + `?demo=1` fixture; typed error states. **Compiled + 12/12 Rust tests + JS suite green + run live.** |
| **W2 — Rooms + Calibration** | ✅ DONE — `/api/cal/*` reverse-proxy + `GET /api/homecore/rooms` RoomState adapter; front-end calibration stub deleted (now proxies the real API). **Proven live against a calibration upstream** (proxy 200 + adapted shape); null-preservation unit-tested. |
| **W3 — Events + Automations** | ⏳ gateway returns typed `503` (recorder/automation HTTP wrappers pending); front-end handles it gracefully (history note, builder still usable). |
| **W4 — COG management** | ✅ supervisor DONE — lists `/var/lib/cognitum/apps/` manifests + pid liveness (returns `[]` live with no apps dir); start/stop/log/config control is the remaining follow-up. |
| **W5 — SEED tier** | ⏳ gateway returns typed `503` (SEED registry + device proxy pending real/emulated SEED hardware). |
| **W6 — Appliance + federation + Hailo** | ◑ appliance host metrics from `/proc` + port probes DONE (live `/proc` data verified); Hailo stats + federation remain `503` (need the accelerator stat source / coordinator). |
**Status:** the gateway is **compiled and tested on Rust 1.89** (`cargo test -p homecore-server` = 12/12) and was **run live** (curl proof in §10). The one remaining caveat is intrinsic, not an environment limit: **W3/W5/W6-Hailo/federation depend on services/hardware that are not in this repo** (recorder/automation HTTP wrappers, real SEED nodes, the Hailo stat source), so they return honest typed `503`s and the UI shows error states — exactly as §2.2/§11.2 prescribe. W1/W2/W4/W6-appliance are functional now.
### 12.2 Security review (PR #1082)
A high-effort public-PR review of the merged gateway + front-end surfaced the following, all fixed and pinned by tests (`cargo test -p homecore-server` is now **18/18**):
| # | Severity | Finding | Fix |
|---|---|---|---|
| 1 | **HIGH** | **Path-traversal / confused-deputy SSRF** in the `/api/cal/*` reverse-proxy. The wildcard path was interpolated into the upstream URL while `proxy()` attaches the privileged server-side calibration bearer, so `/api/cal/v1/../../x` (or `..%2f`, `%2e%2e`, leading `/`, `\`, double-encoded `%252e`) could escape the `…/api/` scope **with the token**. | `validate_proxy_path()` decode-then-checks and rejects absolute / backslash / dot-segment / encoded-traversal paths with a typed **400 before the URL is built** (GET **and** POST); legit `v1/...` paths still pass. |
| 2 | Correctness | **CORS + tracing didn't cover gateway routes**`/api/homecore/*` + `/api/cal/*` were `.merge()`d outside `homecore-api::router()`'s layers. | The audited HC-05 `build_cors_layer()` + `TraceLayer` are now applied to the whole merged app in `main.rs`. |
| 3 | Honesty (§6) | **Fabricated data** — hardcoded `anomaly.threshold: 0.5` in the adapter; dashboard rendered `"null%"`/`"null°C"`; COG Hailo pill hardcoded `"connected"`; `rooms.js` defaulted a null threshold to `0.8`. | Threshold passes through the real upstream value or emits `null` (withheld); dashboard renders `—`; the Hailo pill reflects the real appliance probe; the UI treats a null threshold as withheld. |
| 4 | Robustness | A string `hef` (forwarded verbatim) threw on `.forEach`/`.join`; `frames/target` could be `NaN%`/`Infinity%`; calibration Restart leaked the baseline `setTimeout` poll. | `asArray()` coercion; `target > 0` guard; cancellable poll cleared on Restart / panel teardown. |
| 5 | Perf | Sequential per-bank RoomState fetches; blocking `std::net::TcpStream::connect_timeout` probes on an async handler; `mock.js` statically bundled. | Concurrent `futures::join_all`; async `tokio::net::TcpStream` + `timeout`; demo-only dynamic `import()` of `mock.js`. |
**Known limitations carried forward (not regressions):**
- **`reqwest` rustls-only is a workspace-wide concern.** `homecore-server` opts into `rustls-tls` only, but cargo feature-unification means any sibling crate enabling the default `native-tls` re-introduces OpenSSL into the final binary. A true "no OpenSSL on the appliance" guarantee requires aligning **every** reqwest-pulling crate on rustls-only — out of scope for this PR; documented at the dependency in `Cargo.toml`.
- **DEV-mode auth.** When `HOMECORE_TOKENS` is unset, the token store falls back to `allow_any_non_empty()` (any non-empty bearer accepted) on `0.0.0.0`. This is pre-existing and intentionally **unchanged** here; the loud boot `warn!` is retained. Provision real tokens (`HOMECORE_TOKENS=…`) before exposing the server to a network.
@@ -1,166 +0,0 @@
# ADR-132: HOMECORE-RECORDER — State History + Semantic Search
| Field | Value |
|-------|-------|
| **Status** | Accepted |
| **Date** | 2026-05-25 |
| **Deciders** | ruv |
| **Codename** | **HOMECORE-RECORDER** |
| **Crate** | `v2/crates/homecore-recorder` |
| **Relates to** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (HOMECORE master — series map row ADR-132), [ADR-127](ADR-127-homecore-state-machine-rust.md) (HOMECORE-CORE state machine), [ADR-124](ADR-124-rvagent-mcp-ruvector-npm-integration.md) (ruvector/SENSE-BRIDGE), [ADR-130](ADR-130-homecore-rest-websocket-api.md) (HOMECORE-API query surface, downstream) |
| **Tracking issue** | [#800](https://github.com/ruvnet/RuView/pull/800) (HOMECORE intake) |
> **Documented retroactively (2026-06-12).** The `homecore-recorder` crate shipped under
> the ADR-126 series map (which planned an "ADR-132 HOMECORE-RECORDER") but the standalone
> ADR file was never written; the crate's `Cargo.toml`, `README.md`, `lib.rs`, `schema.rs`,
> and `semantic.rs` all cite "ADR-132". This ADR reverse-documents the decision that the
> shipped, tested code already embodies (ADR-164 Gap G3 / Coverage-Gaps Lens §A). It does
> **not** introduce new design; it records what is built. Date reflects the crate's intake
> era (first commit `e96ebaea8`, 2026-05-25); real-impl pass landed in `7c8071145`
> (2026-06-11).
---
## 1. Context
ADR-126 (the HOMECORE master) decided to reimplement Home Assistant (HA) natively in Rust.
HA persists every state change to a SQLite *recorder* database; downstream features
(history graphs, the logbook, long-term statistics, automation conditions that reference
past state) all read that store. HOMECORE therefore needs a durable state-history backbone.
Two forces shape the decision:
1. **Migration / coexistence.** Users adopting HOMECORE will have an existing HA
`recorder` database. Reusing HA's on-disk schema (rather than inventing a new one) lets
HOMECORE read an existing HA `home-assistant_v2.db` directly and lets HA-aware tooling
read HOMECORE's store. This is the same trust boundary that `homecore-migrate`
(ADR-165) handles for `.storage/*.json`.
2. **Semantic queries.** HA history is queried with SQL `BETWEEN`/`WHERE` clauses. The
HOMECORE platform already carries ruvector (ADR-124) for vector search, so the recorder
can additionally embed state changes and answer natural-language queries
("which kitchen devices were warm at 3 PM?") via k-NN — a capability HA does not have.
The recorder is the **durable-state surface**: if it is wrong, history, logbook, and
historical-condition automations are all wrong. ADR-164 flagged it as a CRITICAL coverage
gap precisely because such a load-bearing crate had no governing ADR.
## 2. Decision
Ship `homecore-recorder` as a SQLite state-history recorder with an HA-compatible schema
and an optional ruvector-backed semantic index, in three phases. P1 and P2 are built and
tested; P3 is planned.
### 2.1 Storage — SQLite with the HA recorder schema (P1, shipped)
- Persist via `sqlx` with the SQLite backend only (no Postgres, no TLS feature set).
- Mirror HA recorder **schema v48** so the store is bidirectionally readable
(`src/schema.rs`):
- `state_attributes` — shared attribute JSON blobs, deduped by an FNV-1a 64-bit hash
stored as a signed `i64` (matches HA's dedup key);
- `states` — one row per state write (`entity_id`, `state`, `attributes_id` FK,
`last_changed_ts`/`last_updated_ts` as REAL Unix seconds, `context_id` UUID);
- `events` — domain events (`event_type`, `event_data` JSON, `time_fired_ts`);
- `recorder_runs` — boot/shutdown bookends for history-gap detection.
- All DDL uses `CREATE TABLE IF NOT EXISTS`, so schema application is idempotent and safe
on every startup.
- Default persistence path `.homecore/home.db` (configurable).
### 2.2 Capture — listener on the HOMECORE event bus (P1, shipped)
- `RecorderListener` subscribes to the HOMECORE event bus (ADR-127) and captures
`StateChanged` events, writing snapshots through `Recorder` (`src/listener.rs`,
`src/db.rs`).
- A `DedupEngine` (`src/dedup.rs`) skips redundant writes when the state hash is unchanged,
matching HA's stateful-listener behaviour.
### 2.3 Semantic search — ruvector HNSW (P2, shipped, feature-gated)
- Behind the `ruvector` Cargo feature, the `Recorder` additionally calls a `SemanticIndex`
implementation (`src/semantic.rs`) that embeds state attributes and stores vectors in a
`ruvector-core` HNSW index for k-NN search.
- P2 embeddings are **hash-based** (sha2) — a deliberate, honest placeholder. They give a
working HNSW surface without claiming sentence-level semantic quality.
- When the feature is off, `NullSemanticIndex` satisfies the `SemanticIndex` trait bound
with no allocation, so the structural recorder ships independently of ruvector.
### 2.4 Real sentence embeddings (P3, planned — not yet built)
- Replace the hash embeddings with ruvector-attention sentence embeddings (dim → 384). Not
implemented; tracked as a follow-up. The README and `Cargo.toml` label this P3 explicitly.
### 2.5 Test evidence (as shipped)
- P1: 14 tests (`cargo test -p homecore-recorder --no-default-features`).
- P2: 20 tests (`cargo test -p homecore-recorder --features ruvector`).
## 3. Consequences
**Positive.**
- HA-schema compatibility makes migration (ADR-165) and coexistence cheap: HOMECORE can
read an existing HA `recorder.db`, and any SQLite tool can read HOMECORE's history.
- The semantic index is **additive** and feature-gated: the durable structural recorder has
no hard dependency on ruvector, so the storage backbone ships first.
- Standard SQLite means no proprietary export format; history is directly queryable.
**Negative / honest limits.**
- P2 semantic search uses **hash embeddings**, not real sentence embeddings — query quality
is limited until P3. This is disclosed in the crate docs and here; it must not be cited as
semantic-quality-validated.
- No per-crate benchmarks exist yet; the latency figures in the README
(state-write p50 < 2 ms, semantic search < 10 ms on 1 M records) are design targets /
estimates, **needs verification** with a criterion baseline.
- Pinning to HA schema v48 couples HOMECORE to a specific HA recorder schema generation;
future HA schema bumps require an explicit migration step.
**Neutral.**
- This ADR governs the recorder crate only. The query/REST surface over recorder data is
HOMECORE-API (ADR-130, P3); automation conditions on historical state are
HOMECORE-automation (ADR-129, P3).
## 3a. Security review (2026-06, post-ADR-154159 sweep)
A beyond-SOTA security review of `homecore-recorder` covered SQL injection, retention/purge
correctness, fail-closed write integrity, semantic-store NaN poisoning, and PII exposure.
**Confirmed clean (with evidence):**
- **SQL injection — clean.** Every query in `db.rs` uses bound `?` parameters; no user- or
entity-influenceable value is interpolated into SQL via `format!`/concatenation. The only
`format!` builds the `LIKE` *pattern* string, which is itself **bound** as a parameter with
`ESCAPE '\\'` and `% _ \` escaping — so a metacharacter payload is matched literally. Pinned
by `malicious_entity_id_is_stored_literally_not_executed` (a `'; DROP TABLE states; --` state
value leaves the table intact and round-trips verbatim) and
`like_metacharacters_in_query_are_literal_not_wildcards`.
- **NaN-index poisoning — structurally impossible.** Embeddings are SHA-256 → `i32`
`f32`; an `i32``f32` cast is always finite (never NaN/Inf), and an all-zero-digest is
guarded by the `norm > 1e-10` check. Empty-index search, empty-string query, and `k=0` were
probed and all return `Ok(0)` with no panic. (Unlike the calibration/vitals/geo paths, no raw
sensor float ever reaches the index.)
- **Fail-closed writes.** A removal event returns `Ok(None)`; semantic-index failure is logged,
not propagated, so it never blocks the durable SQLite write; `EntityId` parse failure falls
back to a sentinel rather than panicking.
**Fixed (real bounding bugs):**
- **Memory-DoS — `get_state_history` was unbounded.** No `LIMIT`, so a wide time window over a
high-frequency entity loaded an unbounded row set into memory. Now capped at
`MAX_HISTORY_ROWS` (1,000,000); sibling search paths were already `k`-bounded.
- **Disk-DoS / documented-but-missing `purge`.** The README advertised `Recorder::purge`, but
no retention path existed → unbounded disk growth. Added a **transactional** `purge(older_than)`
with an **exclusive** cutoff (idempotent, no off-by-one) that deletes old `states`/`events` and
GCs orphaned `state_attributes` blobs (dedup-shared blobs kept until their last referrer is gone).
`homecore-recorder` tests: 19 → 25 (`--no-default-features`) / 25 → 31 (`--features ruvector`),
0 failed. Python deterministic proof unchanged (recorder is off the signal proof path).
## 4. Links
- Crate: `v2/crates/homecore-recorder/``Cargo.toml`, `README.md`, `src/lib.rs`,
`src/db.rs`, `src/schema.rs`, `src/dedup.rs`, `src/listener.rs`, `src/semantic.rs`.
- [ADR-126](ADR-126-ruview-native-ha-port-master.md) — HOMECORE master (series map: ADR-132 = HOMECORE-RECORDER).
- [ADR-165](ADR-165-homecore-migrate-from-home-assistant.md) — HOMECORE-MIGRATE (reads HA `.storage`; P2 exports a side-by-side recorder DB).
- [ADR-164](ADR-164-adr-corpus-gap-analysis.md) — gap analysis that surfaced this missing ADR (Gap G3).
- [Home Assistant Recorder integration](https://www.home-assistant.io/integrations/recorder/).
-68
View File
@@ -174,71 +174,3 @@ vs. an in-memory array at compile time), which intersects with ADR-084 (RabitQ)
| **P1** (this ADR) | `intent`, `recognizer` (regex), `handler` (5 built-ins), `runner` (trait + noop), `pipeline` (end-to-end wiring), 1015 tests |
| **P2** | Real `tokio::process::Child` runner with Windows-safe teardown; `SemanticIntentRecognizer` with ruvector HNSW |
| **P3** | STT/TTS bridge, satellite protocol, cloud fallback |
---
## 6. Security review (beyond-SOTA, untrusted-input → action path)
A focused security review of the Assist pipeline — `utterance → recognizer →
intent → handler → action`, plus `RufloRunner` — treating the utterance as
untrusted input (voice transcripts, the WebSocket `assist` command). This
surface was not covered by the ADR-154159 sweep.
### 6.1 Finding fixed — HC-ASSIST-01 (unbounded-utterance DoS, LOW)
Both `RegexIntentRecognizer::recognize` and the semantic `recognize_scored`
accepted utterances of **unbounded length** and ran `to_lowercase()` (a full
clone) + a per-registered-pattern scan (and, in the semantic path, full
tokenisation + feature-hash embedding) before any bound — an allocation/CPU
amplification on attacker-controlled input. The `regex` crate is **linear-time**
(RE2-style finite automaton, no catastrophic backtracking), so this was a
throughput/memory DoS, not a hang.
**Fix:** `MAX_UTTERANCE_BYTES = 4096` (far above any real spoken command),
checked at **both** recognizer boundaries *before* any allocation/scan. An
over-length utterance **fails closed** to `Ok(None)` — no intent, no action,
identical to an unrecognised phrase — so it can never be coerced into firing a
handler. Pinned by `over_length_utterance_fails_closed` (an over-length
utterance that *contains* a valid command resolves to `None`, which would have
matched on the old code) and `over_length_utterance_fails_closed_semantic`.
### 6.2 Dimensions confirmed clean (with evidence)
- **Command / argument injection — NO SUBPROCESS SURFACE.** The `RufloRunner`
has exactly two impls: `NoopRunner` (no process) and `LocalRunner` (runs the
local recognizer, no process). There is **no** `std::process` / `tokio::process`
/ `Command` / process `.spawn()` anywhere in the crate — the trait `spawn` is
only a `started: bool` lifecycle flag — and `RufloRunnerOpts.{script_path,env}`
are **inert data, never consumed**. The live `node ruflo-agent.js` runner is
genuinely data-gated/future (P2). Defence-in-depth: the `entity_id` capture
class `[a-z_][a-z0-9_ .]*` **excludes every shell/SQL metacharacter**, so even
when an injection-shaped utterance resolves (the regex is not exact-anchored),
the captured slot is a clean token — sanitisation by construction. Pins:
`shell_metachars_never_survive_into_a_resolved_slot`,
`runner_opts_are_inert_no_process_spawned`,
`pipeline_injection_shaped_utterance_carries_no_metachars_to_service`.
- **ReDoS — STRUCTURALLY IMPOSSIBLE.** `regex 1.12.3` (no `fancy-regex` in the
dependency tree) is linear-time; a classic `(a+)+$` shape on adversarial input
completes in bounded time. Pin:
`pathological_backtracking_pattern_completes_in_bounded_time`. Patterns are
operator-registered, not user-supplied, in any case.
- **NaN-poisoning — EMBEDDINGS STRUCTURALLY FINITE.** The embedding path takes
only `&str` and produces values via FNV feature-hashing + a guarded L2
normalise (`norm > 1e-12`); no external float input, no unguarded division, so
a crafted utterance cannot inject NaN/Inf to poison the cosine k-NN. Cosine
against the zero vector is a finite `0.0`; an empty index `max_by` returns
`None` (no panic); the NaN-safe `partial_cmp().unwrap_or(Equal)` is already in
place. Pins: `embeddings_are_structurally_finite`,
`cosine_with_zero_vector_is_finite_not_nan`,
`empty_utterance_against_empty_index_no_panic_no_match`.
- **Intent confusion / fail-closed.** An unrecognised utterance → `not_understood()`
(no service call); a recognised intent with no registered handler →
`not_understood()`; semantic below-threshold / empty-index → regex fallback.
No default high-privilege intent, no fail-open path.
- **Panic-on-input.** No `unwrap`/`expect`/index reachable from a crafted
utterance; the one `exemplars[id]` index uses an `id` from `enumerate()` over
the append-only exemplar `Vec` (no remove API), so it is always in bounds.
`cargo test -p homecore-assist --no-default-features`: **29→36, 0 failed** (+7);
default/`semantic`: **39→48, 0 failed** (+9). Python deterministic proof
unchanged (homecore-assist is off the signal proof path).
@@ -2,7 +2,7 @@
| Field | Value |
|-------|-------|
| **Status** | Accepted — partial (built + tested building block; integration glue pending — see §8 Implementation Status, commit `11f89727f`) |
| **Status** | Proposed |
| **Date** | 2026-05-28 |
| **Deciders** | ruv |
| **Codebase target** | `wifi-densepose-core` (`types.rs`: `CsiFrame`/`CsiMetadata`); `wifi-densepose-signal/src/ruvsense/mod.rs` (`RuvSensePipeline`, six-stage flow); `v2/Cargo.toml` (workspace topology) |

Some files were not shown because too many files have changed in this diff Show More