Files
ruvnet--RuView/docs/proof-of-capabilities.md
T
rUv 42dcf49f4d fix(adr): resolve duplicate ADR numbers + close ADR-080 security + ADR-154 M1 signal backlog (#1051)
* fix(signal): circular phase variance for ghost-tap guard (ADR-154 §7.4 #1)

`phase_variance` computed a LINEAR sample variance over phase angles that
wrap at ±π, so a tightly-clustered set straddling the branch cut reported
spuriously HIGH dispersion — false-tripping the `> TAU` ghost-tap guard on
real, tightly-clustered CIR taps.

Replace with Mardia's circular variance V = 1 − R̄, bounded [0,1] and
invariant to where the cluster sits on the circle. Re-derive the guard
against the bounded metric via a named const
`GHOST_TAP_CIRCULAR_VARIANCE_MAX` (the old TAU-scaled threshold is
meaningless on [0,1]).

Grade: metric fix MEASURED; threshold value DATA-GATED — a clean single-path
ramp also sweeps the circle, so V alone cannot separate clean from
unsanitized without labelled frames. Conservative default (0.99) errs toward
never false-rejecting, strictly more permissive at the wrap boundary than the
buggy linear guard.

Fails-on-old test: `phase_variance_circular_not_fooled_by_branch_cut` —
inlines the old linear variance to show it exceeds TAU on wrap-straddling
phases while circular V≈0 and the guard no longer trips. Plus
`phase_variance_circular_is_bounded_and_extremal` (V∈[0,1], V≈0 identical,
V≈1 uniform).

cargo test -p wifi-densepose-signal --no-default-features --features cir --lib
→ 432 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(signal): pin Welford n=0/n=1 finiteness guard (ADR-154 §7.4 #10)

The shared `WelfordStats` (field_model.rs, used by longitudinal.rs and others)
relies on `count < 2` guards in `variance`/`sample_variance`/`std_dev`/
`z_score` to stay finite at the boundaries. The guards existed but the n=0
boundary was UNTESTED — exactly the §4 divide-by-(n−1) family the ADR groups
this with.

Add `welford_finite_at_n0_and_n1` asserting every statistic is finite and
returns the documented sentinel (0.0) at n=0 and n=1, plus load-bearing doc
comments on the two guards.

Fails-on-old proof: with the `sample_variance` guard removed, the test FAILS
with "attempt to subtract with overflow" at the `(self.count - 1)` underflow
(0usize − 1); `variance` would similarly yield 0.0/0.0 = NaN. The guard is
restored; the test pins it so a future regression is caught.

Grade: MEASURED (boundary finiteness is asserted; the guard is the §4-family
fix made testable).

cargo test -p wifi-densepose-signal --no-default-features --lib field_model
→ 22 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* refactor(signal): de-magic adversarial thresholds + boundary tests (ADR-154 §7.4 #13)

Lift the bare numeric literals buried in `check`/`check_consistency` into
named, documented module consts (FIELD_MODEL_GINI_VIOLATION=0.8,
ENERGY_RATIO_HIGH_VIOLATION=2.0, ENERGY_RATIO_LOW_VIOLATION=0.1,
CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1, SCORE_W_* weights). VALUES UNCHANGED —
each const equals the original literal; only names + pinning tests are new.

Grade: DATA-GATED. The operating values stay empirical (defensible values need
labelled spoofed/clean CSI — Wi-Spoof, §6.2/§7.3). The de-magicking +
characterization tests are MEASURED: `tuning_consts_unchanged_from_literals`,
`energy_ratio_high_boundary`, `energy_ratio_low_boundary`,
`field_model_gini_boundary`, `consistency_active_fraction_boundary` pin the
decision boundaries at/just-below/just-above each threshold, so a future
data-driven retune is a visible, tested change.

Fails-on-change proof: bumping ENERGY_RATIO_HIGH_VIOLATION 2.0→3.0 makes
`energy_ratio_high_boundary` FAIL (restored). Operating values explicitly
NOT changed.

cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::adversarial
→ 20 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* refactor(signal): de-magic coherence drift/gate thresholds (ADR-154 §7.4 #9)

Lift the bare detection literals in `coherence.rs::classify_drift`
(DRIFT_STABLE_SCORE=0.85, DRIFT_STEP_CHANGE_MAX_STALE=10) and the
`coherence_gate.rs` Default impl (DEFAULT_ACCEPT_THRESHOLD=0.85,
DEFAULT_REJECT_THRESHOLD=0.5, DEFAULT_MAX_STALE_FRAMES=200,
DEFAULT_PREDICT_ONLY_NOISE=3.0) into named, documented consts. VALUES
UNCHANGED. The gate already exposed these via GatePolicyConfig (config seam);
this names + pins the defaults.

Grade: DATA-GATED. Operating values stay empirical (defensible Z-score
thresholds need labelled stable/drifting coherence traces). De-magicking +
boundary tests are MEASURED: `classify_drift_stable_score_boundary`,
`classify_drift_stale_count_boundary` pin the at/just-below/just-above
decisions; `drift_consts_unchanged_from_literals` /
`gate_default_consts_unchanged_from_literals` pin the values. Operating values
explicitly NOT changed.

cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::coherence
→ 40 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-154): mark §7.4 P1 backlog cleared — Milestone-1 (#1,#10 RESOLVED; #9,#13 DATA-GATED)

Update ADR-154 §7.4 backlog rows #1, #9, #10, #13 with commit refs + grades,
the §7.4 intro count (four P1 items cleared, ~41 P2/P3 remain), the
Horizon-ledger one-liner (Milestone-1 DONE), and the §8 honest-limits #1 line
(metric now correct; threshold still DATA-GATED). Add CHANGELOG [Unreleased]
entry.

Grades: #1 RESOLVED (MEASURED metric / DATA-GATED threshold), #10 RESOLVED
(MEASURED), #9 & #13 RESOLVED-PARTIAL (DATA-GATED — de-magicked + boundary
tested, operating values unchanged).

Validation: cargo test --workspace --no-default-features → 2057 passed, 0
failed; wifi-densepose-signal lib → 442 passed (no-default + --features cir);
python archive/v1/data/proof/verify.py → VERDICT: PASS, hash f8e76f21…46f7a
UNCHANGED (CIR ghost-tap guard is not on the deterministic proof path).

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(sensing-server): stop leaking internal errors in HTTP responses (ADR-080 #2)

Six handlers in `main.rs` serialized the internal error `Display` straight
into the JSON response body, leaking server internals to any client (ADR-080
finding #2, CWE-209; reframed onto the Rust boundary by ADR-164 G11):

  - edge_registry_endpoint: a panicked spawn_blocking `JoinError`
    ("task … panicked") in a 500, and the raw upstream error in a 503
  - delete_model / delete_recording / start_recording: std::io::Error
    strings carrying OS detail / filesystem paths
  - calibration_start / calibration_stop: the FieldModel error chain

New `error_response` module: `internal_error` / `internal_error_json` /
`upstream_unavailable` log the full detail server-side only (tagged with a
correlation id) and return a generic body
(`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file
paths, no Debug chain. The correlation id lets an operator join a client
report to the exact server log line without ever shipping the detail.

Pinned by 5 error_response tests, incl. a leak-substring guard
(internal_error_body_does_not_leak_detail) verified to FAIL on the reverted
old body (returns the panic message / path / "os error"). The HOMECORE sweep
(ADR-161) covered homecore-server, not this crate.

Co-Authored-By: claude-flow <ruv@ruv.net>

* test(sensing-server): pin XFF-immunity + no-query-token (ADR-080 #1, #3)

Findings #1 (XFF-spoofing bypass) and #3 (JWT-in-URL, CWE-598) were logged
against the Python v1 API but are VERIFIED ABSENT on the current Rust
sensing-server, so they get regression tests rather than redundant fixes:

  - #1 XFF: there is no IP-based rate-limiter or IP-allowlist to bypass, and
    neither security middleware reads a forwarded header. Added
    bearer_auth::xff_header_never_affects_auth_decision (spoofed
    X-Forwarded-For never flips a 401<->200 decision) and
    host_validation::forwarded_headers_never_bypass_host_allowlist (spoofed
    X-Forwarded-Host: localhost never lets Host: evil.com past the allowlist).

  - #3 JWT-in-URL: require_bearer reads the token only from the Authorization
    header; WS handlers take no query token; the sole Query extractor
    (EdgeRegistryParams) is a non-secret refresh flag. Added
    bearer_auth::query_string_token_is_never_accepted — ?token= / ?access_token=
    in the URL never authenticates (stays 401) while the header path still 200s.
    Verified to FAIL when a query-token path is injected into require_bearer.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-080): mark P0 security findings #1-#3 RESOLVED; close ADR-164 G11

- ADR-080: Status note + per-finding closure (#1 XFF and #3 JWT-in-URL
  verified absent + regression-pinned; #2 leaked errors fixed via the
  error_response module). Records the v1-vs-Rust boundary distinction
  explicitly: v1 paths remain archived; this closure governs the shipped
  Rust sensing-server.
- ADR-164: Gap Register G11 and the Open/Gated Backlog entry marked
  RESOLVED with the fix + branch reference.
- CHANGELOG: [Unreleased] -> ### Security entry covering all three findings.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): renumber 6 displaced ADRs to resolve duplicate-number collisions (ADR-164 G1)

Resolves the 5 duplicate ADR numbers (6 displaced files) flagged by ADR-164
Gap Register item G1. Canonical keeper per number = first file committed at
that number (date tie-broken by inbound cross-reference count / parent-appendix
relationship). Displaced files renumbered to the next free numbers (166-171):

  050 keeps provisioning-tool-enhancements (5 refs vs 1)
    -> ADR-166-quality-engineering-security-hardening
  052 keeps tauri-desktop-frontend (parent ADR)
    -> ADR-167-ddd-bounded-contexts (its appendix)
  147 keeps nvidia-cosmos/OccWorld (the actual ADR, has Status header)
    -> ADR-168-benchmark-proof (proof companion, no Status)
    -> ADR-169-adam-mode-light-theme (was untracked)
  148 keeps drone-swarm-control-system (committed #862)
    -> ADR-170-yoga-mode-pose-system (was untracked)
  149 keeps public-community-leaderboard-huggingface (committed 16:47 vs 17:38)
    -> ADR-171-swarm-benchmarking-evaluation-methodology

Updates in-file `# ADR-NNN` headers and intra-file self-references (yoga-modes

* docs(adr): repoint inbound cross-references to renumbered ADRs (166-171)

Follow-up to the ADR renumbering (ADR-164 G1). Updates every inbound reference
that pointed at a displaced ADR, disambiguating shared numbers by title/slug so
only references to the DISPLACED topic move and keeper references stay put.

ADR-168 (was 147 benchmark-proof): README, CHANGELOG, user-guide,
  proof-of-capabilities, research docs 00/03 — all path/label refs updated.
ADR-169 (was 147 adam-mode) / ADR-170 (was 148 yoga-mode): docs/adr/README index.
ADR-171 (was 149 swarm-benchmarking): all ruview-swarm eval code+docs
  (Cargo.toml, evals/, eval_swarm.rs, metrics/mod/report/runner.rs), research
  doc 03 (every §-ref matched ADR-171 sections, not AetherArena), 00-system-review,
  series README, CHANGELOG, and ADR-148's forward/"open issues" pointers.
ADR-166 (was 050 quality-engineering / security-hardening): disambiguated from the
  ADR-050 provisioning KEEPER by topic. The HMAC/secure_tdm, directory-traversal,
  bind-address, and OTA-PSK-auth references in code comments
  (wifi-densepose-hardware Cargo.toml + secure_tdm.rs, sensing-server main.rs) and
  in ADR-052-tauri / ADR-167 all describe the security-hardening ADR -> ADR-166.
ADR-167 (was 052 ddd-appendix): inbound appendix references.

Index/registry updates: docs/adr/README.md, gap-analysis/census.md (rows +
header count), gap-analysis/lens-findings.md (collision table marked RESOLVED),
and ADR-164 Gap Register G1 marked RESOLVED with the full renumber map.

Keeper references deliberately untouched: all ADR-147 OccWorld code, all ADR-148
drone-swarm code/docs, all ADR-149 AetherArena refs (incl. ADR-150's SSL/resampling
refs, which ADR-150 explicitly binds to the AetherArena benchmark), ADR-050
provisioning refs, ADR-052 tauri refs. The frozen GitHub blob URLs in
docs/adr/.issue-177-body.md (pinned to an old branch) are left as historical.

Comment-only code edits; no behavior change. wifi-densepose-hardware compiles
clean; the sensing-server build's sole blocker is the pre-existing upstream
midstreamer-temporal-compare@0.2.1 registry crate, unrelated to these edits.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-13 14:31:38 -04:00

13 KiB
Raw Blame History

Proof of Capabilities — answering the "it's fake / misleading" claims

Short version: don't trust us — verify. Every claim below comes with a command you can run yourself in minutes. Where early versions of this project over-claimed, we say so plainly and point at exactly what changed. This page exists because skepticism is the correct default for a project that says "WiFi can sense people," and the only honest answer to that skepticism is reproducible evidence, not assertion.


1. What people have said

This project (and the broader "DensePose From WiFi" idea) went viral and drew sharp, often fair, criticism. The most pointed claims:

  • "AI-generated facade / vibe-coded boilerplate" — that the repo is scaffolding with the core signal-processing and pose pipeline unimplemented. (Hacker News, Cybernews)
  • "Fake CSI data" — that the Python extractor returned random arrays instead of real hardware data (e.g. csi_extractor.py returning random amplitude/phase). (audit fork)
  • "No trained models, fabricated metrics" — that headline numbers like "94.2% pose accuracy," "96.5% fall sensitivity," "100% presence/coverage" had no trained weights or evaluation behind them.
  • "Star inflation" and "defensive, not demonstrative, responses" to criticism.
  • "Reads like ad copy" — emoji-heavy AI documentation that conveys little.

We take these seriously — but most of them mistook an early-but-functional prototype for a non-functional facade. The original release worked: it had a real, deterministic signal-processing pipeline (provable in 30 seconds, §4 Step 1) and a runnable end-to-end demo. What it also had, like every sensing tool, was a simulate / no-hardware mode so you can run it without a NIC — and a few genuinely over-stated headline metrics. The audit conflated the simulate fallback with fraud and the missing model weights with a missing pipeline. Here is the honest accounting, then the proof.


2. What was fair, and what was not

The original release was early but functional — a working prototype, not a facade. Separating the fair criticism from the category errors:

Criticism Our honest position
"csi_extractor returns random arrays → the whole thing is fake" Category error. Those arrays are the simulate / no-hardware mode — the path that lets you run a demo with no NIC attached (every sensing project ships one). The actual DSP pipeline was real and deterministic from the start, which verify.py proves bit-for-bit (§4 Step 1). A reproducible hash is impossible from random data.
"Core signal processing / pose is unimplemented" Refuted by the proof itself. verify.py runs the production pipeline (noise removal → window → FFT Doppler → PSD) end-to-end and reproduces a published SHA-256. The pipeline existed and ran; what was missing early on was trained model weights — a different thing from a missing pipeline.
"100% presence accuracy" was unsupported Fair — formally retracted. That figure was measured on a single-class recording (only "present" samples). It's replaced everywhere by an honest 82.3% held-out temporal-triplet accuracy. See the in-place retraction in README.md / docs/user-guide.md.
Some headline metrics (94.2% pose, 96.5% fall) lacked published evaluation early on Fair at the time. Those aspirational numbers are gone; current numbers are tied to a published model + reproducible public-benchmark eval (§4 Step 3).
Docs read like AI ad copy Partly fair. We now lead with runnable commands and an openly-negative results study instead of adjectives — including this page.

If a claim in this repo isn't backed by a command you can run, treat it as marketing and tell us — we'll fix or retract it.


3. The science is real (this part was never the issue)

WiFi CSI human sensing is a decade-plus of peer-reviewed work, independent of this repo:

  • CMU, "DensePose From WiFi" (Geng, Huang, De la Torre, Dec 2022) — arXiv:2301.00250.
  • MIT CSAIL RF-Pose / RF-Pose3D (Zhao et al.) — through-wall skeletal pose from radio.
  • IEEE 802.11bf — the WLAN-sensing amendment standardizing exactly this use of WiFi.
  • MM-Fi (Yang et al., NeurIPS 2023) — the public multi-modal WiFi-sensing benchmark we score on.

The legitimate question was never "is WiFi sensing real?" — it's "does this implementation actually do it?" The rest of this page answers that.


4. Prove it yourself (≈10 minutes, no special hardware)

Step 1 — Deterministic pipeline proof (the "Trust Kill Switch")

This is the direct answer to "the signal processing is fake." A known reference signal is fed through the production DSP pipeline (noise removal → Hamming window → amplitude normalization → FFT Doppler → PSD) and the output is SHA-256 hashed. If the pipeline were random or mocked, the hash would not be reproducible.

python archive/v1/data/proof/verify.py
# Expect:  VERDICT: PASS
# Pipeline hash: f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a

The published expected hash is committed at archive/v1/data/proof/expected_features.sha256. Run it on your machine — it reproduces bit-for-bit across platforms (verified identical on Windows, two independent Linux hosts, and the GitHub Azure CI runner). For the one feature that isn't bit-stable — the peak-normalized Doppler spectrum, whose argmax flips under cross-microarchitecture FFT reordering — the proof excludes it from the hash and additionally checks every other feature against a committed reference vector within a strict relative tolerance (expected_features_reference.npz), so a genuine regression still fails while CPU-level float noise does not. Five features (amplitude mean/variance, phase difference, correlation matrix, and the FFT-based PSD) carry the deterministic proof.

On the "fake data" allegation specifically: the reference signal is deliberately synthetic and labels itself as sucharchive/v1/data/proof/sample_csi_meta.json says:

{ "is_synthetic": true, "is_real_capture": false, "numpy_seed": 42, ... }

and generate_reference_signal.py states in its header: "It is NOT a real WiFi capture." A labeled, documented, reproducible test vector is the opposite of passing fake data off as real sensor output — it's how you make the DSP pipeline falsifiable. Conflating the two was the central error in the "fake CSI" audit.

Step 2 — Real code, real tests (the "unimplemented core" claim)

cd v2
cargo test --workspace --no-default-features

The Rust v2 workspace is 38 crates with tests in 490+ files (several thousand test functions). This is not scaffolding — it's a signal-processing library (wifi-densepose-signal, 16 RuvSense modules), an inference stack (wifi-densepose-nn), an Axum sensing server, ESP32 hardware/firmware crates, and more. The test run is the proof — don't take the count on faith, run it.

Step 3 — Real trained model, verifiable on a public benchmark

The headline number is not self-reported on a private split — it's on the public MM-Fi benchmark, with the weights published so you can re-run it:

pip install huggingface_hub
huggingface-cli download ruvnet/wifi-densepose-mmfi-pose --local-dir models/mmfi-pose
Metric (MM-Fi, matched random_split) Value
torso-PCK@20, single model 82.69%
torso-PCK@20, 3-model ensemble + TTA 83.59%
75K-param micro (edge) variant 74.30%
Prior published SOTA — MultiFormer (2025) 72.25%
Prior — CSI2Pose 68.41%

Step 4 — Real CSI from real hardware

A $9 ESP32-S3 produces genuine 802.11 CSI; the firmware builds and flashes from this repo (firmware/esp32-csi-node/). The data path is ESP-IDF CSI callbacks (or nexmon_csi .pcap on a Raspberry Pi via the rvCSI runtime) — measured radio reflections, not synthesized arrays. Build/flash/provision steps are in docs/user-guide.md and CLAUDE.local.md.


5. Built in public — the development trail is the receipt

Every step of this platform was built in public — regressions, improvements, dead ends, and fixes, all the way to where it is today. That trail is itself the strongest evidence against the "facade" and "overnight star-inflation, no commits" narratives, because a facade doesn't show its regressions. You can read the whole thing:

  • Git history — continuous, granular commits (signal DSP, firmware, model training, benchmark runs). Not a README drop followed by silence.
  • 96 ADRs (docs/adr/) — every architectural decision recorded with its reasoning and its trade-offs, including superseded and reversed ones.
  • CHANGELOG — additions, fixes, and reversals dated in place (e.g. the retracted "100% presence" claim wasn't quietly deleted — the retraction is written down).
  • Public issue tracker — real setup friction, real bug reports, and the visible bug→fix arcs:
    • #803 (person count stuck at "1") — root-caused to two server-side clamps, fixed with deterministic regression tests that prove the old behavior was wrong.
    • #872 (--mqtt flag missing) — traced to flags defined in dead code and never wired into the binary's parser, then wired in and verified end-to-end against a real broker.

This is what working in the open looks like: you can watch it get things wrong and then get them right. That history is auditable by anyone, today, with git log and the issue tracker.

A facade hides its failures. We document ours in detail:

  • Full MM-Fi study — openly reports that WiFi sensing does not generalize zero-shot to new people/rooms (cross-environment accuracy collapses to ~1764% raw), and that a ~30-second in-room calibration is what fixes it. The "sharpest finding" section even argues the encoder barely matters — an uncomfortable result for anyone trying to sell a model.
  • Efficiency frontier — SOTA-beating pose in a 20 KB int4 edge model, with the quantization trade-offs shown.
  • Retractions — the "100% presence" figure was withdrawn in-place rather than quietly edited away.
  • ADR-168 benchmark proof and WITNESS-LOG-028 — how the numbers are produced and a 33-row per-claim attestation matrix.

6. Honest limitations (still true today)

  • Zero-shot cross-room/person is weak. Plan on ~30 s of in-room calibration per deployment.
  • Single-node spatial resolution is limited. Use 2+ ESP32 nodes (or add a Cognitum Seed) for multi-person / localization.
  • Multi-person counting is hard. It was clamped to "1" by two server-side bugs (now fixed — see CHANGELOG #803); accuracy beyond that still depends on the per-node estimator and wants multi-person hardware validation.
  • Camera-free pose trained only on proxy labels is low-accuracy; camera-supervised fine-tuning (ADR-079) is the path to good pose.
  • Beta software. APIs and firmware change.

7. Sources


If any command on this page does not produce the stated result on your machine, that is a bug and we want to know — open an issue with the output. Reproducibility is the whole point.