* fix(signal): circular phase variance for ghost-tap guard (ADR-154 §7.4 #1) `phase_variance` computed a LINEAR sample variance over phase angles that wrap at ±π, so a tightly-clustered set straddling the branch cut reported spuriously HIGH dispersion — false-tripping the `> TAU` ghost-tap guard on real, tightly-clustered CIR taps. Replace with Mardia's circular variance V = 1 − R̄, bounded [0,1] and invariant to where the cluster sits on the circle. Re-derive the guard against the bounded metric via a named const `GHOST_TAP_CIRCULAR_VARIANCE_MAX` (the old TAU-scaled threshold is meaningless on [0,1]). Grade: metric fix MEASURED; threshold value DATA-GATED — a clean single-path ramp also sweeps the circle, so V alone cannot separate clean from unsanitized without labelled frames. Conservative default (0.99) errs toward never false-rejecting, strictly more permissive at the wrap boundary than the buggy linear guard. Fails-on-old test: `phase_variance_circular_not_fooled_by_branch_cut` — inlines the old linear variance to show it exceeds TAU on wrap-straddling phases while circular V≈0 and the guard no longer trips. Plus `phase_variance_circular_is_bounded_and_extremal` (V∈[0,1], V≈0 identical, V≈1 uniform). cargo test -p wifi-densepose-signal --no-default-features --features cir --lib → 432 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(signal): pin Welford n=0/n=1 finiteness guard (ADR-154 §7.4 #10) The shared `WelfordStats` (field_model.rs, used by longitudinal.rs and others) relies on `count < 2` guards in `variance`/`sample_variance`/`std_dev`/ `z_score` to stay finite at the boundaries. The guards existed but the n=0 boundary was UNTESTED — exactly the §4 divide-by-(n−1) family the ADR groups this with. Add `welford_finite_at_n0_and_n1` asserting every statistic is finite and returns the documented sentinel (0.0) at n=0 and n=1, plus load-bearing doc comments on the two guards. Fails-on-old proof: with the `sample_variance` guard removed, the test FAILS with "attempt to subtract with overflow" at the `(self.count - 1)` underflow (0usize − 1); `variance` would similarly yield 0.0/0.0 = NaN. The guard is restored; the test pins it so a future regression is caught. Grade: MEASURED (boundary finiteness is asserted; the guard is the §4-family fix made testable). cargo test -p wifi-densepose-signal --no-default-features --lib field_model → 22 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * refactor(signal): de-magic adversarial thresholds + boundary tests (ADR-154 §7.4 #13) Lift the bare numeric literals buried in `check`/`check_consistency` into named, documented module consts (FIELD_MODEL_GINI_VIOLATION=0.8, ENERGY_RATIO_HIGH_VIOLATION=2.0, ENERGY_RATIO_LOW_VIOLATION=0.1, CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1, SCORE_W_* weights). VALUES UNCHANGED — each const equals the original literal; only names + pinning tests are new. Grade: DATA-GATED. The operating values stay empirical (defensible values need labelled spoofed/clean CSI — Wi-Spoof, §6.2/§7.3). The de-magicking + characterization tests are MEASURED: `tuning_consts_unchanged_from_literals`, `energy_ratio_high_boundary`, `energy_ratio_low_boundary`, `field_model_gini_boundary`, `consistency_active_fraction_boundary` pin the decision boundaries at/just-below/just-above each threshold, so a future data-driven retune is a visible, tested change. Fails-on-change proof: bumping ENERGY_RATIO_HIGH_VIOLATION 2.0→3.0 makes `energy_ratio_high_boundary` FAIL (restored). Operating values explicitly NOT changed. cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::adversarial → 20 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * refactor(signal): de-magic coherence drift/gate thresholds (ADR-154 §7.4 #9) Lift the bare detection literals in `coherence.rs::classify_drift` (DRIFT_STABLE_SCORE=0.85, DRIFT_STEP_CHANGE_MAX_STALE=10) and the `coherence_gate.rs` Default impl (DEFAULT_ACCEPT_THRESHOLD=0.85, DEFAULT_REJECT_THRESHOLD=0.5, DEFAULT_MAX_STALE_FRAMES=200, DEFAULT_PREDICT_ONLY_NOISE=3.0) into named, documented consts. VALUES UNCHANGED. The gate already exposed these via GatePolicyConfig (config seam); this names + pins the defaults. Grade: DATA-GATED. Operating values stay empirical (defensible Z-score thresholds need labelled stable/drifting coherence traces). De-magicking + boundary tests are MEASURED: `classify_drift_stable_score_boundary`, `classify_drift_stale_count_boundary` pin the at/just-below/just-above decisions; `drift_consts_unchanged_from_literals` / `gate_default_consts_unchanged_from_literals` pin the values. Operating values explicitly NOT changed. cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::coherence → 40 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-154): mark §7.4 P1 backlog cleared — Milestone-1 (#1,#10 RESOLVED; #9,#13 DATA-GATED) Update ADR-154 §7.4 backlog rows #1, #9, #10, #13 with commit refs + grades, the §7.4 intro count (four P1 items cleared, ~41 P2/P3 remain), the Horizon-ledger one-liner (Milestone-1 DONE), and the §8 honest-limits #1 line (metric now correct; threshold still DATA-GATED). Add CHANGELOG [Unreleased] entry. Grades: #1 RESOLVED (MEASURED metric / DATA-GATED threshold), #10 RESOLVED (MEASURED), #9 & #13 RESOLVED-PARTIAL (DATA-GATED — de-magicked + boundary tested, operating values unchanged). Validation: cargo test --workspace --no-default-features → 2057 passed, 0 failed; wifi-densepose-signal lib → 442 passed (no-default + --features cir); python archive/v1/data/proof/verify.py → VERDICT: PASS, hash f8e76f21…46f7a UNCHANGED (CIR ghost-tap guard is not on the deterministic proof path). Co-Authored-By: claude-flow <ruv@ruv.net> * fix(sensing-server): stop leaking internal errors in HTTP responses (ADR-080 #2) Six handlers in `main.rs` serialized the internal error `Display` straight into the JSON response body, leaking server internals to any client (ADR-080 finding #2, CWE-209; reframed onto the Rust boundary by ADR-164 G11): - edge_registry_endpoint: a panicked spawn_blocking `JoinError` ("task … panicked") in a 500, and the raw upstream error in a 503 - delete_model / delete_recording / start_recording: std::io::Error strings carrying OS detail / filesystem paths - calibration_start / calibration_stop: the FieldModel error chain New `error_response` module: `internal_error` / `internal_error_json` / `upstream_unavailable` log the full detail server-side only (tagged with a correlation id) and return a generic body (`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file paths, no Debug chain. The correlation id lets an operator join a client report to the exact server log line without ever shipping the detail. Pinned by 5 error_response tests, incl. a leak-substring guard (internal_error_body_does_not_leak_detail) verified to FAIL on the reverted old body (returns the panic message / path / "os error"). The HOMECORE sweep (ADR-161) covered homecore-server, not this crate. Co-Authored-By: claude-flow <ruv@ruv.net> * test(sensing-server): pin XFF-immunity + no-query-token (ADR-080 #1, #3) Findings #1 (XFF-spoofing bypass) and #3 (JWT-in-URL, CWE-598) were logged against the Python v1 API but are VERIFIED ABSENT on the current Rust sensing-server, so they get regression tests rather than redundant fixes: - #1 XFF: there is no IP-based rate-limiter or IP-allowlist to bypass, and neither security middleware reads a forwarded header. Added bearer_auth::xff_header_never_affects_auth_decision (spoofed X-Forwarded-For never flips a 401<->200 decision) and host_validation::forwarded_headers_never_bypass_host_allowlist (spoofed X-Forwarded-Host: localhost never lets Host: evil.com past the allowlist). - #3 JWT-in-URL: require_bearer reads the token only from the Authorization header; WS handlers take no query token; the sole Query extractor (EdgeRegistryParams) is a non-secret refresh flag. Added bearer_auth::query_string_token_is_never_accepted — ?token= / ?access_token= in the URL never authenticates (stays 401) while the header path still 200s. Verified to FAIL when a query-token path is injected into require_bearer. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-080): mark P0 security findings #1-#3 RESOLVED; close ADR-164 G11 - ADR-080: Status note + per-finding closure (#1 XFF and #3 JWT-in-URL verified absent + regression-pinned; #2 leaked errors fixed via the error_response module). Records the v1-vs-Rust boundary distinction explicitly: v1 paths remain archived; this closure governs the shipped Rust sensing-server. - ADR-164: Gap Register G11 and the Open/Gated Backlog entry marked RESOLVED with the fix + branch reference. - CHANGELOG: [Unreleased] -> ### Security entry covering all three findings. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): renumber 6 displaced ADRs to resolve duplicate-number collisions (ADR-164 G1) Resolves the 5 duplicate ADR numbers (6 displaced files) flagged by ADR-164 Gap Register item G1. Canonical keeper per number = first file committed at that number (date tie-broken by inbound cross-reference count / parent-appendix relationship). Displaced files renumbered to the next free numbers (166-171): 050 keeps provisioning-tool-enhancements (5 refs vs 1) -> ADR-166-quality-engineering-security-hardening 052 keeps tauri-desktop-frontend (parent ADR) -> ADR-167-ddd-bounded-contexts (its appendix) 147 keeps nvidia-cosmos/OccWorld (the actual ADR, has Status header) -> ADR-168-benchmark-proof (proof companion, no Status) -> ADR-169-adam-mode-light-theme (was untracked) 148 keeps drone-swarm-control-system (committed #862) -> ADR-170-yoga-mode-pose-system (was untracked) 149 keeps public-community-leaderboard-huggingface (committed 16:47 vs 17:38) -> ADR-171-swarm-benchmarking-evaluation-methodology Updates in-file `# ADR-NNN` headers and intra-file self-references (yoga-modes * docs(adr): repoint inbound cross-references to renumbered ADRs (166-171) Follow-up to the ADR renumbering (ADR-164 G1). Updates every inbound reference that pointed at a displaced ADR, disambiguating shared numbers by title/slug so only references to the DISPLACED topic move and keeper references stay put. ADR-168 (was 147 benchmark-proof): README, CHANGELOG, user-guide, proof-of-capabilities, research docs 00/03 — all path/label refs updated. ADR-169 (was 147 adam-mode) / ADR-170 (was 148 yoga-mode): docs/adr/README index. ADR-171 (was 149 swarm-benchmarking): all ruview-swarm eval code+docs (Cargo.toml, evals/, eval_swarm.rs, metrics/mod/report/runner.rs), research doc 03 (every §-ref matched ADR-171 sections, not AetherArena), 00-system-review, series README, CHANGELOG, and ADR-148's forward/"open issues" pointers. ADR-166 (was 050 quality-engineering / security-hardening): disambiguated from the ADR-050 provisioning KEEPER by topic. The HMAC/secure_tdm, directory-traversal, bind-address, and OTA-PSK-auth references in code comments (wifi-densepose-hardware Cargo.toml + secure_tdm.rs, sensing-server main.rs) and in ADR-052-tauri / ADR-167 all describe the security-hardening ADR -> ADR-166. ADR-167 (was 052 ddd-appendix): inbound appendix references. Index/registry updates: docs/adr/README.md, gap-analysis/census.md (rows + header count), gap-analysis/lens-findings.md (collision table marked RESOLVED), and ADR-164 Gap Register G1 marked RESOLVED with the full renumber map. Keeper references deliberately untouched: all ADR-147 OccWorld code, all ADR-148 drone-swarm code/docs, all ADR-149 AetherArena refs (incl. ADR-150's SSL/resampling refs, which ADR-150 explicitly binds to the AetherArena benchmark), ADR-050 provisioning refs, ADR-052 tauri refs. The frozen GitHub blob URLs in docs/adr/.issue-177-body.md (pinned to an old branch) are left as historical. Comment-only code edits; no behavior change. wifi-densepose-hardware compiles clean; the sensing-server build's sole blocker is the pre-existing upstream midstreamer-temporal-compare@0.2.1 registry crate, unrelated to these edits. Co-Authored-By: claude-flow <ruv@ruv.net>
13 KiB
Proof of Capabilities — answering the "it's fake / misleading" claims
Short version: don't trust us — verify. Every claim below comes with a command you can run yourself in minutes. Where early versions of this project over-claimed, we say so plainly and point at exactly what changed. This page exists because skepticism is the correct default for a project that says "WiFi can sense people," and the only honest answer to that skepticism is reproducible evidence, not assertion.
1. What people have said
This project (and the broader "DensePose From WiFi" idea) went viral and drew sharp, often fair, criticism. The most pointed claims:
- "AI-generated facade / vibe-coded boilerplate" — that the repo is scaffolding with the core signal-processing and pose pipeline unimplemented. (Hacker News, Cybernews)
- "Fake CSI data" — that the Python extractor returned random arrays instead of real
hardware data (e.g.
csi_extractor.pyreturning random amplitude/phase). (audit fork) - "No trained models, fabricated metrics" — that headline numbers like "94.2% pose accuracy," "96.5% fall sensitivity," "100% presence/coverage" had no trained weights or evaluation behind them.
- "Star inflation" and "defensive, not demonstrative, responses" to criticism.
- "Reads like ad copy" — emoji-heavy AI documentation that conveys little.
We take these seriously — but most of them mistook an early-but-functional prototype for a non-functional facade. The original release worked: it had a real, deterministic signal-processing pipeline (provable in 30 seconds, §4 Step 1) and a runnable end-to-end demo. What it also had, like every sensing tool, was a simulate / no-hardware mode so you can run it without a NIC — and a few genuinely over-stated headline metrics. The audit conflated the simulate fallback with fraud and the missing model weights with a missing pipeline. Here is the honest accounting, then the proof.
2. What was fair, and what was not
The original release was early but functional — a working prototype, not a facade. Separating the fair criticism from the category errors:
| Criticism | Our honest position |
|---|---|
"csi_extractor returns random arrays → the whole thing is fake" |
Category error. Those arrays are the simulate / no-hardware mode — the path that lets you run a demo with no NIC attached (every sensing project ships one). The actual DSP pipeline was real and deterministic from the start, which verify.py proves bit-for-bit (§4 Step 1). A reproducible hash is impossible from random data. |
| "Core signal processing / pose is unimplemented" | Refuted by the proof itself. verify.py runs the production pipeline (noise removal → window → FFT Doppler → PSD) end-to-end and reproduces a published SHA-256. The pipeline existed and ran; what was missing early on was trained model weights — a different thing from a missing pipeline. |
| "100% presence accuracy" was unsupported | Fair — formally retracted. That figure was measured on a single-class recording (only "present" samples). It's replaced everywhere by an honest 82.3% held-out temporal-triplet accuracy. See the in-place retraction in README.md / docs/user-guide.md. |
| Some headline metrics (94.2% pose, 96.5% fall) lacked published evaluation early on | Fair at the time. Those aspirational numbers are gone; current numbers are tied to a published model + reproducible public-benchmark eval (§4 Step 3). |
| Docs read like AI ad copy | Partly fair. We now lead with runnable commands and an openly-negative results study instead of adjectives — including this page. |
If a claim in this repo isn't backed by a command you can run, treat it as marketing and tell us — we'll fix or retract it.
3. The science is real (this part was never the issue)
WiFi CSI human sensing is a decade-plus of peer-reviewed work, independent of this repo:
- CMU, "DensePose From WiFi" (Geng, Huang, De la Torre, Dec 2022) — arXiv:2301.00250.
- MIT CSAIL RF-Pose / RF-Pose3D (Zhao et al.) — through-wall skeletal pose from radio.
- IEEE 802.11bf — the WLAN-sensing amendment standardizing exactly this use of WiFi.
- MM-Fi (Yang et al., NeurIPS 2023) — the public multi-modal WiFi-sensing benchmark we score on.
The legitimate question was never "is WiFi sensing real?" — it's "does this implementation actually do it?" The rest of this page answers that.
4. Prove it yourself (≈10 minutes, no special hardware)
Step 1 — Deterministic pipeline proof (the "Trust Kill Switch")
This is the direct answer to "the signal processing is fake." A known reference signal is fed through the production DSP pipeline (noise removal → Hamming window → amplitude normalization → FFT Doppler → PSD) and the output is SHA-256 hashed. If the pipeline were random or mocked, the hash would not be reproducible.
python archive/v1/data/proof/verify.py
# Expect: VERDICT: PASS
# Pipeline hash: f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a
The published expected hash is committed at archive/v1/data/proof/expected_features.sha256.
Run it on your machine — it reproduces bit-for-bit across platforms (verified identical on
Windows, two independent Linux hosts, and the GitHub Azure CI runner). For the one feature that
isn't bit-stable — the peak-normalized Doppler spectrum, whose argmax flips under
cross-microarchitecture FFT reordering — the proof excludes it from the hash and additionally
checks every other feature against a committed reference vector within a strict relative tolerance
(expected_features_reference.npz), so a genuine regression still fails while CPU-level float
noise does not. Five features (amplitude mean/variance, phase difference, correlation matrix, and
the FFT-based PSD) carry the deterministic proof.
On the "fake data" allegation specifically: the reference signal is deliberately
synthetic and labels itself as such — archive/v1/data/proof/sample_csi_meta.json says:
{ "is_synthetic": true, "is_real_capture": false, "numpy_seed": 42, ... }
and generate_reference_signal.py states in its header: "It is NOT a real WiFi capture."
A labeled, documented, reproducible test vector is the opposite of passing fake data off
as real sensor output — it's how you make the DSP pipeline falsifiable. Conflating the two
was the central error in the "fake CSI" audit.
Step 2 — Real code, real tests (the "unimplemented core" claim)
cd v2
cargo test --workspace --no-default-features
The Rust v2 workspace is 38 crates with tests in 490+ files (several thousand test
functions). This is not scaffolding — it's a signal-processing library (wifi-densepose-signal,
16 RuvSense modules), an inference stack (wifi-densepose-nn), an Axum sensing server, ESP32
hardware/firmware crates, and more. The test run is the proof — don't take the count on
faith, run it.
Step 3 — Real trained model, verifiable on a public benchmark
The headline number is not self-reported on a private split — it's on the public MM-Fi benchmark, with the weights published so you can re-run it:
pip install huggingface_hub
huggingface-cli download ruvnet/wifi-densepose-mmfi-pose --local-dir models/mmfi-pose
Metric (MM-Fi, matched random_split) |
Value |
|---|---|
| torso-PCK@20, single model | 82.69% |
| torso-PCK@20, 3-model ensemble + TTA | 83.59% |
| 75K-param micro (edge) variant | 74.30% |
| Prior published SOTA — MultiFormer (2025) | 72.25% |
| Prior — CSI2Pose | 68.41% |
- Model card:
ruvnet/wifi-densepose-mmfi-pose - Self-correcting, auditable leaderboard: AetherArena Space
- Pretrained encoder (82.3% held-out temporal-triplet):
ruvnet/wifi-densepose-pretrained
Step 4 — Real CSI from real hardware
A $9 ESP32-S3 produces genuine 802.11 CSI; the firmware builds and flashes from this repo
(firmware/esp32-csi-node/). The data path is ESP-IDF CSI callbacks (or nexmon_csi .pcap on a
Raspberry Pi via the rvCSI runtime) — measured radio
reflections, not synthesized arrays. Build/flash/provision steps are in
docs/user-guide.md and CLAUDE.local.md.
5. Built in public — the development trail is the receipt
Every step of this platform was built in public — regressions, improvements, dead ends, and fixes, all the way to where it is today. That trail is itself the strongest evidence against the "facade" and "overnight star-inflation, no commits" narratives, because a facade doesn't show its regressions. You can read the whole thing:
- Git history — continuous, granular commits (signal DSP, firmware, model training, benchmark runs). Not a README drop followed by silence.
- 96 ADRs (
docs/adr/) — every architectural decision recorded with its reasoning and its trade-offs, including superseded and reversed ones. - CHANGELOG — additions, fixes, and reversals dated in place (e.g. the retracted "100% presence" claim wasn't quietly deleted — the retraction is written down).
- Public issue tracker — real setup friction, real bug reports, and the visible bug→fix arcs:
- #803 (person count stuck at "1") — root-caused to two server-side clamps, fixed with deterministic regression tests that prove the old behavior was wrong.
- #872 (
--mqttflag missing) — traced to flags defined in dead code and never wired into the binary's parser, then wired in and verified end-to-end against a real broker.
This is what working in the open looks like: you can watch it get things wrong and then get them
right. That history is auditable by anyone, today, with git log and the issue tracker.
A facade hides its failures. We document ours in detail:
- Full MM-Fi study — openly reports that WiFi sensing does not generalize zero-shot to new people/rooms (cross-environment accuracy collapses to ~17–64% raw), and that a ~30-second in-room calibration is what fixes it. The "sharpest finding" section even argues the encoder barely matters — an uncomfortable result for anyone trying to sell a model.
- Efficiency frontier — SOTA-beating pose in a 20 KB int4 edge model, with the quantization trade-offs shown.
- Retractions — the "100% presence" figure was withdrawn in-place rather than quietly edited away.
- ADR-168 benchmark proof and WITNESS-LOG-028 — how the numbers are produced and a 33-row per-claim attestation matrix.
6. Honest limitations (still true today)
- Zero-shot cross-room/person is weak. Plan on ~30 s of in-room calibration per deployment.
- Single-node spatial resolution is limited. Use 2+ ESP32 nodes (or add a Cognitum Seed) for multi-person / localization.
- Multi-person counting is hard. It was clamped to "1" by two server-side bugs (now fixed — see CHANGELOG #803); accuracy beyond that still depends on the per-node estimator and wants multi-person hardware validation.
- Camera-free pose trained only on proxy labels is low-accuracy; camera-supervised fine-tuning (ADR-079) is the path to good pose.
- Beta software. APIs and firmware change.
7. Sources
- Carnegie Mellon, "DensePose From WiFi" — https://arxiv.org/abs/2301.00250
- IEEE 802.11bf WLAN Sensing — https://www.ieee802.org/11/Reports/tgbf_update.htm
- MM-Fi benchmark — https://github.com/ybhbingo/MMFi_dataset
- Hacker News discussion — https://news.ycombinator.com/item?id=46388904
- Cybernews coverage — https://cybernews.com/security/viral-github-project-wifi-see-through-walls/
- byteiota, "Real or AI-Generated Hype?" — https://byteiota.com/wifi-densepose-hits-github-2-real-or-ai-generated-hype/
- agentpedia, "RuView and the Reproducibility Question" — https://agentpedia.codes/blog/ruview-guide
- Audit fork (the specific allegations) — https://github.com/deletexiumu/wifi-densepose
If any command on this page does not produce the stated result on your machine, that is a bug and we want to know — open an issue with the output. Reproducibility is the whole point.