Compare commits

...

5 Commits

Author SHA1 Message Date
rUv f250149e94 feat(ADR-262 P1): wifi-densepose-rufield bridge — RuView sensing → signed RuField FieldEvents (fail-closed privacy map) (#1070)
* feat(rufield): ADR-262 P1 — wifi-densepose-rufield anti-corruption bridge

New v2 workspace member that converts RuView WiFi-CSI sensing output into
signed RuField FieldEvents. Path-deps the vendor/rufield submodule crates
(rufield-core/-provenance/-privacy/-fusion); single coupling point between
RuView and the standalone RuField MFS spec (ADR-262 §5.4).

- SensingSnapshot: owned primitives mirroring SensingUpdate + TrustedOutput
  (no dependency on wifi-densepose-sensing-server).
- snapshot_to_field_event(): builds a WifiCsi FieldTensor + Observation,
  derives a real position from the signal-field peak (never fabricated),
  real sha256 provenance + ed25519 signature (synthetic=false).
- map_privacy() (§3.3 crux): maps by information content, NEVER byte value —
  Derived (byte 1) → P4/P5, never P1; fail-closed demotion floor to P2.

P1 gates (tests/p1_gates.rs): round-trip serde, is_fusable verified receipt,
RuFieldFusion::ingest accept + infer runs, privacy-safety (Derived never P1),
full §3.3 table, fail-closed demotion, determinism, no-fabricated-position.
15 tests pass (5 unit + 9 integration + 1 doc), 0 failed.

Honesty: P1 plumbing (tested conversion + safe privacy mapping), NOT wired
into the live server (P3) and NOT an accuracy claim.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-262): mark P1 implemented + CI submodules:recursive + CHANGELOG/CLAUDE

- ADR-262 Status → "Proposed — P1 implemented"; add §0.1 Implementation
  status (the bridge crate + the five P1 gates that pass; defers the
  provenance-carrier reuse, P3 live wiring, and P4 multi-modality).
- ci.yml: add `submodules: recursive` to the rust-tests checkout so the new
  crate's `vendor/rufield` path-deps resolve in CI (they fail otherwise even
  though the workspace build passes locally with the submodule present).
- CHANGELOG [Unreleased]: P1 bridge entry (kept alongside the upstream
  ADR-262 research entry).
- CLAUDE.md: crate table row for `wifi-densepose-rufield`.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-14 12:46:58 -04:00
rUv faca0530de docs(adr-262): RuField↔RuView integration design (Proposed) (#1069)
Researched integration ADR: thin wifi-densepose-rufield bridge crate
(rvcsi pattern), live SensingServerAdapter emitting signed FieldEvents,
vertical fusion composition (ruvsense within-WiFi → rufield cross-modal),
and ONE canonical privacy/provenance model (RuView effective_class →
RuField P0-P5 at egress; reuse cog-ha-matter SHA-256+Ed25519 receipt).
Key finding: RuView has 2 privacy enums + 3 witness mechanisms; the
Derived(byte=1)<Anonymous(byte=2)-but-carries-identity trap means the
bridge must map by information content, not byte value. Plumbing
architecture, not accuracy (real-CSI is unlabeled replay today).

Co-authored-by: ruv <ruvnet@gmail.com>
2026-06-14 12:03:16 -04:00
rUv 6f6c867629 feat(rufield): CsiReplayAdapter — first real WiFi-CSI adapter (submodule bump) (#1068)
Bumps vendor/rufield to include CsiReplayAdapter: RuField now ingests real
captured WiFi CSI (.csi.jsonl) → FieldTensor → CSI-variance motion/presence
proxy → signed FieldEvents → fusion. Measured on 199 real frames: 182 fused
inferences (115 breathing, 67 person_present) from real signal. Replay-from-file,
unlabeled (proxy not validated accuracy) — live streaming + labeled accuracy
remain roadmap; mmWave/thermal stay synthetic.

Co-authored-by: ruv <ruvnet@gmail.com>
2026-06-14 11:45:50 -04:00
rUv 95a5ecc746 feat(rufield): rufield-viewer dashboard — completes ADR-260 §27.9 (#1067)
Bumps the vendor/rufield submodule to include the new rufield-viewer crate
(Axum + vanilla JS read-only dashboard streaming the deterministic
SyntheticSim→fusion camera-free room-intelligence demo: live room state,
P0–P5 privacy-badged event log, fusion graph, signed-receipt viewer, behind
a permanent SYNTHETIC banner). All ADR-260 §27 criteria 1–10 now PASS.
Read-only demo viewer, not device management (real-adapter milestone later).
rufield repo now 7 crates / 72 tests.

Co-authored-by: ruv <ruvnet@gmail.com>
2026-06-14 11:10:02 -04:00
rUv 1f05456588 feat(ADR-261 M2): multi-bit + large-N ANN scaling study — measured, no crossover (refutes M1 prediction) (#1066)
* feat(ADR-261): multi-bit (b∈{1,2,4}) quantized HNSW traversal + scaling harness

Generalize the SymphonyQG-style quantized-traversal HNSW from 1-bit Hamming to a
b-bit-per-dimension code (b ∈ {1,2,4}), mirroring ADR-156 §10's multi-bit RaBitQ
scheme (rotate via FHT Pass-2, uniform mid-rise scalar quantizer over [-3,3],
ranked by per-dim L1). b=1 is byte-for-byte the original construction (codes in
{0,1} ⇒ L1 == Hamming), pinned by one_bit_build_bits_matches_legacy_build.
Bytes/node scales linearly: 128-d → 16/32/64 B for b=1/2/4.

- hnsw_quantized.rs: QuantizedHnswIndex::build_bits(...,bits,...), bits()/
  bytes_per_node() accessors, code-L1 greedy+beam traversal. build(...) kept as
  the b=1 backward-compatible entry point. +4 tests (multi-bit recall regression,
  bits clamp, bytes/node, legacy parity).
- ann_measure.rs: build_indices_bits / build_quant_bits / run_scaling_study +
  best_float_op / best_quant_op; scaling_report (#[ignore], --release) and a
  CI-safe scaling_study_small_is_consistent.
- ann_bench.rs: 2-bit and 4-bit quant criterion benches over the shared graph.

ruvector lib 151 → 156 passed, 0 failed, 1 ignored (scaling_report).

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-261): record M2 multi-bit scaling study — measured, no crossover (refutes M1 prediction)

Multi-bit (b∈{1,2,4}) quantized HNSW traversal + N∈{10k,100k,250k} scaling study,
measured on this box. No crossover at any (N,b): at 10k more bits help (ratio
0.19→0.48×, b≥2 reaches 0.90 recall) but quant stays slower than float HNSW at
equal recall; at 100k/250k quant recall collapses (b=4: 1.0→0.788→0.624, never
≥0.90) while float holds ≥0.92. The predicted large-N crossover moved the wrong
way. Published negative with the mechanism explained. ADR-261 §11.

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: ruv <ruvnet@gmail.com>
2026-06-14 10:31:00 -04:00
18 changed files with 1683 additions and 90 deletions
+5
View File
@@ -82,6 +82,11 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
# ADR-262 P1: `wifi-densepose-rufield` path-deps the `vendor/rufield`
# submodule. Without a recursive checkout the workspace build fails to
# resolve those path deps in CI even though it passes locally.
with:
submodules: recursive
# `wifi-densepose-desktop` is a Tauri v2 app — `glib-sys`, `gtk-sys`,
# `webkit2gtk-sys`, etc. need the Linux dev libraries via pkg-config or the
+5
View File
@@ -8,7 +8,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
- **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy``Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path).
- **ADR-262 (Proposed): RuField MFS ↔ RuView integration — a live `SensingServerAdapter`, a privacy/provenance bridge, MAPPED not papered-over.** Researched integration design for wiring RuField into RuView. Recommends: a thin **`wifi-densepose-rufield` bridge crate** (anti-corruption layer, path-deps on the `vendor/rufield` submodule — the `vendor/rvcsi` pattern, since rufield crates are unpublished); a **live `SensingServerAdapter`** that taps the real `SensingUpdate` emit site joined with `TrustedOutput` trust state and emits one signed `FieldEvent`/cycle (the file-based `CsiReplayAdapter` stays for offline replay); **vertical fusion composition** (ruvsense fuses *within* WiFi → one `wifi_csi` event → rufield-fusion graph fuses *across* modalities above it); and **one canonical privacy/provenance model** (RuView `effective_class` is source-of-truth, mapped to RuField P0P5 at egress; reuse the existing `cog-ha-matter` SHA-256+Ed25519 chain for the `ProvenanceReceipt`). **Key honest finding:** RuView has **two privacy enums + three witness mechanisms across two hash algorithms** that do not map 1:1 onto P0P5, and a real trap — RuView's `Derived` privacy byte (`1`) sorts *below* `Anonymous` (`2`) yet carries identity embeddings, so the bridge must map by **information content** (`Derived → P4/P5`), never by byte value, or it would leak identity as low-privacy P1. 4 independently-shippable phases, each with a test gate (round-trip / `is_fusable` / privacy-monotonicity / ed25519-verify). Honest scope: this is **plumbing architecture, not accuracy** — RuField v0.1 is synthetic and RuView's only real-CSI path is unlabeled replay; the ADR claims only architecture, gated by round-trip/monotonicity/signature tests.
- **RuField `CsiReplayAdapter` — first real (non-synthetic) WiFi-CSI adapter (ADR-260 §17).** RuField now ingests **real captured WiFi CSI** instead of only the synthetic simulator. New `rufield-adapters::csi_replay` parses RuView's `.csi.jsonl` recording format (`{timestamp, subcarriers[]}`), normalizes each frame to a `FieldTensor` (`WifiCsi`, real amplitudes + real `timestamp_ns`), establishes a per-subcarrier Welford **empty-room baseline** via `calibrate()`, derives a **physically-grounded CSI-variance motion/presence proxy** (normalized MAD vs baseline → P2 motion/presence, else P1), and emits `FieldEvent`s with a **real sha256 + ed25519 provenance receipt** (`synthetic=false`). **Measured on 199 real captured frames:** 184 presence-proxy / 69 motion-proxy → fed through `RuFieldFusion`**182 fused inferences (115 breathing, 67 person_present) from real signal.** 12 tests (9 unit + 3 integration over real-CSI fixtures), deterministic (byte-identical stream per file). **Honest caveats (stated everywhere):** it's **replay from file, not live hardware**; recordings are **unlabeled**, so the motion/presence output is a **proxy, NOT validated accuracy** (no pose, no accuracy numbers); live streaming + labeled validation remain roadmap; mmWave/thermal stay synthetic. The win is "RuField ingests real WiFi CSI and produces fused events from it." [`ruvnet/rufield`](https://github.com/ruvnet/rufield) `crates/rufield-adapters`; `vendor/rufield` submodule bumped.
- **RuField `rufield-viewer` web dashboard — completes ADR-260 §27.9 (all §27 criteria 110 now PASS).** A read-only Axum + vanilla-JS dashboard (no build step — `cargo run -p rufield-viewer`) that streams the deterministic SyntheticSim→fusion camera-free room-intelligence demo: live room-state inferences with confidence, a scrolling event log where every event carries its modality + a colour-coded **P0P5 privacy badge**, the fusion graph (supporting=green / contradicting=red per inference), and a click-to-open **provenance-receipt modal** (sha256 + ed25519 signer + verified ✓ / fusable ✓) — behind a permanent, undismissable `SYNTHETIC — simulated sensors, no hardware` banner. Endpoints `/` · `/app.js` · `/health` · `/api/run` (full deterministic JSON) · `/events` (SSE). 12 new tests. Honest scope: a read-only SYNTHETIC demo viewer, **not** a device-management console — fleet/real-adapter management is a separate later milestone. Lives in [`ruvnet/rufield`](https://github.com/ruvnet/rufield) (`crates/rufield-viewer`, repo now 7 crates / 72 tests); `vendor/rufield` submodule bumped to include it.
- **ADR-261: RuVector graph-ANN index — a real HNSW baseline + a SymphonyQG-style quantized variant, MEASURED (honest negative).** Closes the [ADR-156 §5 #1](docs/adr/ADR-156-ruvector-fusion-beyond-sota.md) gap: the SymphonyQG (SIGMOD 2025) **3.517× QPS-over-HNSW** claim was CLAIMED-only because **no HNSW baseline existed to compare against**. This adds one. New pure-Rust, `--no-default-features`-buildable modules in `wifi-densepose-ruvector`: `hnsw.rs` (a correct float HNSW — Malkov & Yashunin: multi-layer NSW graph, `ef_construction`/`ef_search`, Algorithm-4 neighbour selection, **seeded-deterministic** level assignment via SplitMix64, L2 + cosine, full degenerate-case guards), `hnsw_quantized.rs` (the SymphonyQG-style variant — the **same** graph traversed by a cheap **1-bit Hamming** score over the RaBitQ Pass-2 rotated sign code, then **exact-float rerank**), `ann_measure.rs` + `benches/ann_bench.rs` (one shared deterministic planted-cluster fixture; the `ann_bench_report` test is the source of truth). **MEASURED (dim=128, N=10k, K=10, `--release`):** float HNSW = **~25× QPS over linear scan at recall ≥0.99** (the baseline this gap needed; recall@10 correctness gate ≥0.95 holds, L2 + cosine). **Honest negative:** the 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at this scale** — its best recall is **0.738**, never reaching the ≥0.90 equal-recall point, so there is **no QPS win** over float HNSW; the 3.517× is **not reproduced** by our 1-bit construction here. The recall gate also **caught a real index-out-of-bounds bug** in the insert path (disclosed in ADR-261 §4). Caveat: this is **our** HNSW + **our** 1-bit quant, not SymphonyQG's exact system — it tests the *direction* of the claim, with the expected crossover at large N + a multi-bit traversal code. **We did not tune to manufacture a speedup.** +20 tests (ruvector lib 131→151, 0 failed). ADR-156 §5 #1 / §8 backlog: CLAIMED → **MEASURED-direction-tested**. Python deterministic proof unchanged (off the signal proof path).
- **ADR-261 Milestone-2: multi-bit quantized HNSW traversal + large-N scaling study — MEASURED (honest negative).** Extends ADR-261's quantized index from 1-bit to **`b`-bit-per-dimension** (`b ∈ {1,2,4}`, 16/32/64 B/node) over the Pass-2 rotated coordinates, and runs a deterministic scaling study (N ∈ {10k, 100k, 250k}) to test M1's *prediction* of a large-N crossover. **Result: no crossover at any measured (N, b), and the trend refutes the prediction.** At N=10k more bits lift the equal-recall QPS ratio (0.19×→0.46×→0.48×) and let b≥2 reach the 0.90 recall bar 1-bit missed — but quant stays slower than float HNSW at equal recall; at N=100k/250k quant recall *collapses* (b=4: 1.000→0.788→0.624, never ≥0.90) while float holds ≥0.92 (denser graph → low-bit codes can't separate near-neighbours, beam goes off-path faster than the float-distance saving repays). Caveat: our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph — refutes the *direction* at ≤250k, not their million-scale numbers. ruvector lib **151→156** (+5 tests; `scaling_report` `#[ignore]` produced the table). A published negative with the mechanism explained. ADR-261 §11.
- **ADR-260: RuField MFS — the open specification for camera-free multimodal field sensing.** A common event / tensor / calibration / privacy / provenance model that sits *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and future quantum sensors (each modality emits a normalized `FieldEvent``FieldTensor``FusionGraph``PrivacyClass``ProvenanceReceipt`). Published as a **standalone repo** [`ruvnet/rufield`](https://github.com/ruvnet/rufield) and vendored here as the `vendor/rufield` submodule (the `vendor/rvcsi` pattern — not a `v2/` workspace member). The v0.1 reference stack is a self-contained 6-crate Rust workspace (`rufield-core`, `-provenance` [sha256 + ed25519], `-privacy` [P0P5 guard], `-adapters` [deterministic `SyntheticSim` across wifi_csi/mmwave_radar/infrared_thermal], `-fusion` [graph + TOML weighted-Bayes rules → 7 room-state inferences], `-bench` [deterministic runner + the §31 acceptance test]). **60 tests / 0 failed, clippy-clean.** §27 acceptance criteria 18 and 10 PASS; the live dashboard (9) is deferred. **All benchmark metrics are SYNTHETIC** (scored against the simulator's own ground truth — presence/breathing/bed_exit/room_transition F1 = 1.000, nocturnal_scratch 0.923 reported honestly, p95 latency ~0.01 ms, provenance coverage 100%, 0 privacy violations) — they prove the pipeline recovers known truth, **not** field accuracy; real hardware adapters (ESP32 CSI, mmWave, thermal IR) are a documented roadmap item, none validated in v0.1. The Python deterministic proof is unchanged (rufield is off the signal-processing proof path).
### Security
+2 -1
View File
@@ -22,7 +22,8 @@ Dual codebase: Python v1 (`v1/`) and Rust port (`v2/`).
| `wifi-densepose-vitals` | ESP32 CSI-grade vital sign extraction (ADR-021) |
| `nvsim` | Deterministic NV-diamond magnetometer pipeline simulator (ADR-089) — standalone leaf, WASM-ready |
| `vendor/rvcsi` (submodule) | **rvCSI** — edge RF sensing runtime (ADR-095/096): 9 crates (`rvcsi-core`/`-dsp`/`-events`/`-adapter-file`/`-adapter-nexmon`/`-ruvector`/`-runtime`/`-node`/`-cli`). Lives in its own repo ([github.com/ruvnet/rvcsi](https://github.com/ruvnet/rvcsi)), vendored here under `vendor/rvcsi`, published to crates.io as `rvcsi-* 0.3.x` and to npm as `@ruv/rvcsi`. Not a `v2/` workspace member — depend on the published crates (or the submodule's `crates/rvcsi-*` paths). Normalized `CsiFrame`/`CsiWindow`/`CsiEvent` schema, validate-before-FFI, reusable DSP, typed confidence-scored events, the napi-c Nexmon shim (real nexmon_csi `.pcap` from a Raspberry Pi 5 / 4 / 3B+ — BCM43455c0), the napi-rs SDK, the `rvcsi` CLI, a Claude Code plugin. |
| `vendor/rufield` (submodule) | **RuField MFS** — the open spec for camera-free multimodal field sensing (ADR-260). A common `FieldEvent`/`FieldTensor`/`FusionGraph`/`PrivacyClass`/`ProvenanceReceipt` model *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and quantum sensors. Lives in its own repo ([github.com/ruvnet/rufield](https://github.com/ruvnet/rufield)), vendored here under `vendor/rufield`. Not a `v2/` workspace member. v0.1 reference stack = 6 crates (`rufield-core`/`-provenance`/`-privacy`/`-adapters`/`-fusion`/`-bench`), 60 tests/0 failed; all benchmark metrics are **SYNTHETIC** (simulator ground truth, no hardware — real adapters are roadmap). |
| `vendor/rufield` (submodule) | **RuField MFS** — the open spec for camera-free multimodal field sensing (ADR-260). A common `FieldEvent`/`FieldTensor`/`FusionGraph`/`PrivacyClass`/`ProvenanceReceipt` model *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and quantum sensors. Lives in its own repo ([github.com/ruvnet/rufield](https://github.com/ruvnet/rufield)), vendored here under `vendor/rufield`. Not a `v2/` workspace member. v0.1 reference stack = 7 crates (`rufield-core`/`-provenance`/`-privacy`/`-adapters`/`-fusion`/`-bench`/`-viewer`), 72 tests/0 failed; `rufield-viewer` is an Axum + vanilla-JS read-only dashboard (`cargo run -p rufield-viewer`) completing ADR-260 §27.9. The WiFi-CSI modality is now **real-replay-backed** via `CsiReplayAdapter` (ingests real captured `.csi.jsonl` → fused presence/breathing inferences; replay-from-file, unlabeled CSI-variance proxy, not validated accuracy); mmWave/thermal + all synthetic-bench F1 numbers remain **SYNTHETIC** (no live hardware — live streaming + labeled accuracy are roadmap). |
| `wifi-densepose-rufield` | ADR-262 P1 **anti-corruption bridge** — converts RuView WiFi-CSI sensing output (`SensingSnapshot` mirroring `SensingUpdate` + `TrustedOutput`, owned primitives, no dep on `wifi-densepose-sensing-server`) into **signed RuField `FieldEvent`s** (`Modality::WifiCsi`, real `timestamp_ns`, sha256 + ed25519 provenance, `synthetic=false`). The single coupling point between RuView and the standalone RuField MFS spec (§5.4); path-deps the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion`). **Critical §3.3 privacy mapping** (`map_privacy`): maps RuView class → RuField P0P5 by **information content, never byte value**, fail-closed (`Derived → P4/P5`, never P1; `demoted` floors to ≥ P2). 15 tests / 0 failed (round-trip / `is_fusable` / fusion-ingest / privacy-safety / determinism). P1 plumbing — not wired into the live server (P3), no accuracy claim. |
| `ruview-swarm` | Drone swarm control system (ADR-148) — hierarchical-mesh topology, Raft consensus, MARL, CSI sensing payload, MAVLink/PX4 compat, Ruflo AI-agent integration |
### RuvSense Modules (`signal/src/ruvsense/`)
+3 -4
View File
@@ -351,12 +351,11 @@ Total test count across the workspace: **60 tests, 0 failed**.
| 6 | Benchmark runner produces deterministic reports | **PASS** — identical report across runs (latency is the only wall-clock field) |
| 7 | Raw waveform storage disabled by default | **PASS** — P0 network transmission denied by default policy |
| 8 | P4 inference requires consent policy approval | **PASS** — P4 without consent → RequiresConsent; breathing/scratch rules carry `requires_consent = true` |
| 9 | Dashboard shows live camera-free room intelligence | **DEFERRED** no `rufield-viewer` dashboard in v0.1; the benchmark + `room_intelligence` example provide a CLI view. Follow-up. |
| 9 | Dashboard shows live camera-free room intelligence | **PASS**`rufield-viewer` (Axum + vanilla JS) streams the deterministic SyntheticSim→fusion demo: live room state, privacy-badged (P0P5) event log, fusion graph, click-to-open signed-receipt modal, behind a permanent `SYNTHETIC — simulated sensors, no hardware` banner. `cargo run -p rufield-viewer`. Read-only demo viewer (not a device-management console — that's the real-adapter milestone). |
| 10 | Spec readable for external implementers | **PASS** — ADR-260 + detailed standalone README with compiling usage examples |
**Decision:** §27 criteria 18 and 10 PASS; criterion 9 (live dashboard) is
**deferred** to a follow-up. Per the acceptance rule (18, 10 pass; 9 may be
deferred), Status is set to **Accepted — v0.1 reference stack**.
**Decision:** **all §27 criteria 110 PASS** (criterion 9, the live dashboard,
was completed by `rufield-viewer`). Status is **Accepted — v0.1 reference stack**.
### Deterministic benchmark report (SYNTHETIC, seed = 2026)
+31 -3
View File
@@ -139,7 +139,7 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
## 8. Validation
- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **151 passed / 0 failed** (was 131; +20 new tests: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`).
- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **156 passed / 0 failed, 1 ignored** (M1 added 20: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`; M2 added 5 multi-bit/scaling tests; `scaling_report` is the `#[ignore]` measurement that produced the §11 table).
- **`cargo test --workspace --no-default-features`** — GREEN (see §10 for the count).
- **Correctness gate verified to bite:** the recall@10 gate **panicked** on the first (buggy) insert path (§4); after the fix it passes at 0.99+ recall (L2 and cosine).
- **`cargo test -p wifi-densepose-ruvector --no-default-features --release ann_bench_report -- --nocapture`** — prints the §6 table; the numbers above are copied verbatim from that run.
@@ -154,10 +154,13 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
**Negative / honest.** The 1-bit quantized variant is **not** an equal-recall QPS win at our scale; it is shipped as a measured experiment with a clearly-stated ceiling, not as a recommended default. Anyone reaching for it must read §7.
**Resolved by Milestone-2 (§11, MEASURED — no longer deferred).**
- **Multi-bit traversal score** — implemented (`b ∈ {1,2,4}` bits/dim over the Pass-2 rotated coordinates) and measured. It *does* lift quantized recall (at N=10k, b=4 reaches the 0.90 equal-recall regime where 1-bit could not), but still does not beat float HNSW QPS.
- **Large-N crossover measurement** — measured at N ∈ {10k, 100k, 250k}. **The predicted large-N crossover did NOT materialize — it moved the wrong way** (quant recall *collapses* as N grows). See §11.
**Deferred (not silently dropped).**
- **Multi-bit / RaBitQ-estimator traversal score.** Replace 1-bit Hamming traversal with a ≤4-bit code or the `estimator.rs` unbiased rescale (ADR-156 §10/§11) — the lever most likely to lift quantized recall to the equal-recall regime.
- **Large-N crossover measurement.** Re-run §6 at N=100k1M (`ANN_BENCH_N`) to find where quantization's per-node saving starts to dominate.
- **Wiring HNSW into the live re-ID path** (AETHER hot-cache / sketch prefilter) behind a flag.
- **N ≥ 1M + SymphonyQG's exact RaBitQ-fused construction** — our impl refutes the *direction* at ≤250k; a true 1:1 reproduction at million-scale with their fused codes remains a separate, larger build.
---
@@ -170,3 +173,28 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
- `lib.rs``pub mod hnsw / hnsw_quantized / ann_measure`; re-export `HnswIndex`, `HnswParams`, `Metric`, `QuantizedHnswIndex`.
- `ADR-156-ruvector-fusion-beyond-sota.md` §5 #1 + §8 backlog — SymphonyQG regraded **CLAIMED → MEASURED-direction-tested (refuted at N=10k for our 1-bit construction)**, pointing here.
- `CHANGELOG.md``[Unreleased]` entry.
---
## 11. Milestone-2 — multi-bit traversal + large-N scaling study (MEASURED)
M1 (§7) refuted the SymphonyQG direction at N=10k with a 1-bit code, and *predicted* a crossover at "large N + a higher-bit code." M2 builds both levers and measures them — so the prediction is tested, not assumed.
**Built:** `hnsw_quantized.rs` generalized from 1-bit to a **`b`-bit-per-dimension** code (`b ∈ {1,2,4}`, a mid-rise quantizer over the same `RANGE=3.0` rotated coordinates as ADR-156 §10's `measure_multibit`); `ann_measure.rs` gained `run_scaling_study` / `best_float_op` / `best_quant_op` + a deterministic `scaling_report` (`#[ignore]`, `--release`) and a CI-safe `scaling_study_small_is_consistent`. Memory: **16 / 32 / 64 bytes/node** for b = 1 / 2 / 4.
**MEASURED** (dim=128, 64 clusters, 200 queries, K=10, L2, M=16, ef_construction=200, seeded, `--release`, this box; target recall ≥ 0.90):
| N | bits | B/node | quant best recall | float @ target | quant @ target | quant/float |
|--:|--:|--:|--:|--|--|--:|
| 10,000 | 1 | 16 | 1.000 | 23,155 QPS @ r=0.995 | 4,482 QPS @ r=0.965 | **0.19×** |
| 10,000 | 2 | 32 | 1.000 | 23,155 QPS @ r=0.995 | 10,658 QPS @ r=0.908 | **0.46×** |
| 10,000 | 4 | 64 | 1.000 | 23,155 QPS @ r=0.995 | 11,217 QPS @ r=0.946 | **0.48×** |
| 100,000 | 1 / 2 / 4 | 16/32/64 | 0.207 / 0.346 / 0.788 | 2,493 QPS @ r=0.938 | none (never ≥ 0.90) | — |
| 250,000 | 1 / 2 / 4 | 16/32/64 | 0.108 / 0.210 / 0.624 | 1,593 QPS @ r=0.925 | none | — |
**Verdict — NO crossover at any measured (N, b) up to 250k, and the trend REFUTES the large-N prediction:**
1. **Multi-bit helps at small N but not enough.** At N=10k, more bits lift the equal-recall QPS ratio 0.19× → 0.46× → 0.48× (and let b≥2 actually *reach* the 0.90 bar that 1-bit missed) — but quant stays **below 1.0×**, i.e. slower than float HNSW at equal recall.
2. **The predicted large-N crossover moved the wrong way.** As N grows 10k → 100k → 250k, quant's best achievable recall **collapses** (b=4: 1.000 → 0.788 → 0.624) and never reaches the 0.90 comparison point, while float HNSW holds ≥0.92. A denser graph packs near-neighbours whose low-bit codes are nearly identical, so the approximate score steers the beam off-path faster than the bigger float-distance savings can repay. The "crossover at millions" intuition is **not supported by our construction's trend** — if anything it diverges.
3. **Caveat unchanged:** this is our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph. The result refutes the *direction* for our construction at ≤250k; it does not disprove their published numbers on their system at their scale. A real 1:1 reproduction is the deferred million-scale build.
This is a **published negative with the mechanism explained** — the multi-bit + scaling levers were built and measured rather than asserted, and the honest outcome (no crossover, trend diverging) is recorded, not hidden.
@@ -0,0 +1,199 @@
# ADR-262: RuField MFS ↔ RuView integration — a live SensingServerAdapter, a privacy/provenance bridge, MAPPED not papered-over
| Field | Value |
|-------|-------|
| **Status** | Proposed — P1 implemented |
| **Date** | 2026-06-14 |
| **Deciders** | ruv |
| **Codebase target** | New thin bridge crate `wifi-densepose-rufield` (v2 workspace member); taps `wifi-densepose-sensing-server` emit path + `wifi-densepose-engine` `TrustedOutput`; depends on `vendor/rufield/crates/rufield-*` via path (the `vendor/rvcsi` pattern) |
| **Relates to** | ADR-260 (RuField MFS spec + v0.1 reference stack), ADR-261 (RuVector graph-ANN), ADR-141 (BFLD privacy control-plane / modes / attestation), ADR-137 (fusion-engine quality scoring / contradiction), ADR-032 (multistatic mesh security hardening / witness), ADR-116 (cog tamper-evident audit log — `cog-ha-matter` SHA-256+Ed25519), ADR-095/096 (`rvcsi` vendored-submodule precedent) |
| **Scope** | Decide **how** RuView's live WiFi-CSI sensing-server emits RuField `FieldEvent`s, **whether** RuView's ruvsense fusion composes with or is wrapped by rufield-fusion, and **how** to reconcile RuView's existing privacy/witness/provenance machinery with RuField's P0P5 + ed25519 `ProvenanceReceipt`. The privacy/provenance reconciliation is the crux. |
---
## 0. PROOF discipline (this ADR's contract)
This project has been publicly accused of "AI slop." This ADR answers with **evidence, not adjectives** — every "RuView already does X" carries a `file:line`, and every external/SOTA claim is graded.
- **No accuracy is claimed.** RuField v0.1 is **SYNTHETIC** end-to-end by its own admission (ADR-260 "Honest statement", line 386390: *"Every metric here is simulator-based. No ESP32 CSI, mmWave, or thermal capture was used."*). RuView's only real-CSI rufield path today would be **replay of recorded `.csi.jsonl`, unlabeled**`rufield-adapters::CsiReplayAdapter`'s own module doc (`vendor/rufield/crates/rufield-adapters/src/csi_replay.rs:19-31`) states it is *"real signal, replay from file not live hardware, unlabeled ⇒ proxy not validated accuracy."* This ADR therefore proposes **plumbing**, and grades its own claims as "ARCHITECTURE" (a design decision, testable by a round-trip/compile gate) vs "ACCURACY" (which it explicitly does not assert).
- The privacy/provenance section reports an **honest conflict**: RuView has **three** witness mechanisms across two hash algorithms, and **two** privacy enums, none of which map 1:1 onto RuField's P0P5. We map them and recommend the cleanest reconciliation rather than asserting they already align.
- Each phase below ships an **independently testable gate** (a round-trip test, a privacy-monotonicity test, a signature-verify test) so the integration is provable, not aspirational.
---
## 0.1 Implementation status
**P1 (§4) is implemented** as the `wifi-densepose-rufield` bridge crate (`v2/crates/wifi-densepose-rufield/`, a new v2 workspace member; path-deps the `vendor/rufield` submodule per §5.4):
- **Input** — `SensingSnapshot` (owned primitives mirroring `SensingUpdate` features/classification/signal_field joined with the `TrustedOutput` `trust_class`/`demoted`/`identity_bound`); the bridge does **not** depend on `wifi-densepose-sensing-server` (anti-corruption layer).
- **Conversion** — `snapshot_to_field_event(&snap, &Signer)` emits a signed `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`, real `timestamp_ns`); position derived from the signal-field peak when present (never fabricated); real sha256 `ProvenanceRef` + ed25519 signature, `synthetic = false`.
- **Privacy (§3.3 crux)** — `map_privacy()` maps by information content, **fail-closed**: `Raw → P0`, `Derived → P4` (or `P5` if identity-bound — **never P1**), `Anonymous → P2`, `Restricted → P2`; a `demoted` cycle floors egress to ≥ P2.
- **Gates that pass** (`tests/p1_gates.rs`, 15 tests / 0 failed = 5 unit + 9 integration + 1 doc): round-trip (snapshot → `FieldEvent` → serde → equal); `is_fusable` (verified ed25519 receipt); `RuFieldFusion::ingest` accept + `infer()` runs; **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy``Derived → P4/P5`, never P1; full §3.3 table; fail-closed demotion); determinism (same snapshot + same signer seed → byte-identical event).
**Deferred:** the §3.3 *provenance carrier* recommendation (reuse the `cog-ha-matter` SHA-256+Ed25519 chain + embed the BLAKE3 engine witness) is **not** in P1 — P1 takes a dedicated `Signer` param (the §8 open question 1 key-ownership decision is unresolved). P2's BLAKE3-embed, P3 (live `/ws/field` surfacing — the bridge is **not** wired into the running server yet), and P4 (multi-modality) remain future work. **No accuracy is claimed** (§0 / §6) — P1 is tested plumbing + a safe privacy mapping.
---
## 1. Context — two architectures, mapped
### 1.1 RuField MFS (ADR-260, `vendor/rufield/`)
A standalone pure-Rust Cargo workspace (serde, serde_json, toml, sha2, ed25519-dalek; **no tch/ndarray/candle**), vendored here as a git submodule (`git submodule status vendor/rufield``ba66e2e…`), **not** a v2 workspace member — exactly the `vendor/rvcsi` precedent (ADR-095/096). **Not published to crates.io**: every internal dep is a path dep with a nominal `version = "0.1.0"` (`vendor/rufield/Cargo.toml:31-37`); the `docs.rs/rufield-*` URLs are aspirational.
The data model (graded ARCHITECTURE, evidence read directly):
- **`FieldEvent`** (`vendor/rufield/crates/rufield-core/src/event.rs:96-112`): `spec_version, event_id, timestamp_ns: u64, sensor: SensorDescriptor, tensor: FieldTensor, observation: Observation, provenance: ProvenanceRef`.
- **`Observation`** (`event.rs:25-51`): `zone_id, space_cell, range_m, velocity_mps, motion_vector, confidence: f32, features: BTreeMap<String,f32>` (the derived P1 scalars the fusion engine actually reads), `labels: Vec<String>` (ground-truth, **never read by fusion**), `privacy_class: PrivacyClass`.
- **`PrivacyClass`** (`rufield-core/src/privacy.rs:8-25`): `P0..P5`, `#[serde(rename_all="UPPERCASE")]`, `Ord` by declaration order so **P0 < P1 < … < P5** — higher = more private; `level()->u8` returns 0..=5 (`privacy.rs:27-40`).
- **`ProvenanceRef`** (on-wire, `event.rs:73-93`): `raw_hash, firmware_hash` (`sha256:…`), `model_id, calibration_id, synthetic: bool`, optional `signature_hex` / `signer_pubkey_hex` (detached ed25519).
- The four traits (`rufield-core/src/traits.rs`): **`FieldAdapter`** (`:26-38`, `next_event() -> Result<Option<FieldEvent>>`), **`FieldEncoder`** (`:41-51`, **unimplemented in v0.1** — an open seam), **`FusionEngine`** (`:54-63`, `ingest(event)` + `infer(&query)`), **`PrivacyGuard`** (`:86-97`, `authorize(class, Destination, consent, identity_bound) -> PrivacyDecision{Allow|Deny|RequiresConsent}`).
- **`CsiReplayAdapter`** (`rufield-adapters/src/csi_replay.rs`): constructed from **already-loaded text** (`from_jsonl(&str)` `:249-251`; `from_jsonl_with(text, device_id, &[u8;32])` `:254-323`) — **not** a path/`Read`/`Iterator`. Deserializes `CsiFrameRecord { timestamp: f64 (seconds), subcarriers: Vec<f64> }` (`:74-80`), buffers all frames into a `Vec<CsiFrame>`, then streams via a cursor (`next_event` `:550-557`). Maps each frame → `FieldEvent` with `Modality::WifiCsi`, axes `[Frequency]`, a Welford motion proxy, observation `privacy_class = P2 if presence else P1` (`:439-443`), real `sha256` raw-hash, and a **real ed25519 signature** (`signer.sign_event` `:507-510`). `max_privacy_class = P2`.
- **`RuFieldFusion`** (`rufield-fusion/src/engine.rs:55-78`): `ingest()` **rejects non-fusable events on its first line**`if !is_fusable(&event) { return Err(NotFusable) }` (`:212-215`) — then reads `event.observation.features` into a bounded temporal window; `infer()` applies TOML rules (`WeightedBayes` noisy-OR / `TemporalWindow`) → `Vec<FieldInference>`. TOML rule struct: `inputs, method, feature, threshold, privacy_max, window_ms, requires_consent` (`rules.rs:17-35`).
- **`is_fusable`** (`rufield-provenance/src/lib.rs:179-184`): `synthetic == true` **OR** `verify_event().is_ok()` — the §11 invariant. Signing key is `ed25519_dalek 2.1`, deterministic from a 32-byte seed; raw hash is `sha256_hex``"sha256:<hex>"` (`:26-35`).
- **`DefaultPrivacyGuard`** (`rufield-privacy/src/lib.rs:38-110`): default `network_max = P2`, `allow_p0_network = false`. P5-no-identity → `Deny`; P4-no-consent → `RequiresConsent`; `EdgeLocal``Allow`; `Network` denies P0 and `class > network_max`.
- **`rufield-viewer`** (Axum 0.7): **self-contained, consumes `SyntheticSim` only** — all routes are read-only GET/SSE (`GET /api/run`, `GET /events`); **there is no ingest endpoint** (`vendor/rufield/crates/rufield-viewer/src/server.rs:63-72`). Feeding it a live stream requires adding a route.
### 1.2 RuView (the integration target)
- **Sensing-server is Axum** (`v2/crates/wifi-densepose-sensing-server/src/main.rs:7498-7629`), two listeners (WS `:8765`, HTTP). CSI does **not** arrive over WS/HTTP — it arrives over **UDP** from ESP32 nodes (`use tokio::net::UdpSocket`, `main.rs:53`; `recv_from` loop `main.rs:5286-5299`), parsed by magic `0xC511_0001`**`Esp32Frame`** (`types.rs:84-100`: `node_id, n_subcarriers, ppdu_type, amplitudes: Vec<f64>, phases: Vec<f64>`, rssi/freq/sequence) → pushed into per-node `NodeState.frame_history: VecDeque<Vec<f64>>` (`main.rs:441-497`).
- **`/ws/sensing` emits a `SensingUpdate`** (`main.rs:267-317`), broadcast over a `tokio::sync::broadcast` channel (`s.tx.send(json)` `main.rs:5938-5991`; the WS handler just subscribes and forwards, `main.rs:3021-3073`). `SensingUpdate` carries `nodes`, `features`, `classification {motion_level, presence, confidence}`, `signal_field`, `persons: Vec<PersonDetection>` (17 COCO keypoints + `position:[f64;3]` from `field_localize`, `main.rs:403-428`), pose, vitals. **`field_localize` (PR #1050) is a module, not a route** (`mod field_localize` `main.rs:17`; honesty caveat `field_localize.rs:16-27` — a single ESP32 link cannot resolve true room position, `position` is "strongest field peak").
- **ruvsense fusion is strictly WITHIN-WiFi-modality.** `MultistaticFuser::fuse(&[MultiBandCsiFrame]) -> FusedSensingFrame` (`v2/crates/wifi-densepose-signal/src/ruvsense/multistatic.rs:285-288`) attention-weights **multiple WiFi CSI nodes/viewpoints** (every input is ESP32 CSI; `multistatic_bridge.rs:50-62` builds the frames from `NodeState` amplitude with `HardwareType::Esp32S3`). `coherence_gate.rs:18-37` is the `GateDecision{Accept|PredictOnly|Reject|Recalibrate}`; `pose_tracker.rs:255-263` is the 17-keypoint Kalman tracker with 128-dim AETHER re-ID; `field_model.rs:301-308` does SVD room-eigenstructure perturbation extraction. **No camera/mmWave/audio enters this path** — ruvsense is a multi-link WiFi-CSI fuser.
- **The governed-trust cycle** runs in the separate **`wifi-densepose-engine`** crate. `StreamingEngine::process_cycle` (`v2/crates/wifi-densepose-engine/src/lib.rs:409`, `run_cycle` `:434-533`) produces **`TrustedOutput`** (`:82-112`): `semantic_id, quality: QualityScore, effective_class: PrivacyClass, demoted: bool, provenance: SemanticProvenance, witness: [u8;32]` (BLAKE3 over `evidence‖model‖calibration‖privacy_decision‖class`, `witness_of` `:598-613`), `recalibration_recommended`. **Crucially, none of this trust metadata is on the `SensingUpdate` wire today** — it is exposed only out-of-band on `GET /api/v1/status` (`main.rs:4173-4178`) and as a single live effect: `EngineBridge::suppress_raw_outputs()` strips per-node amplitude when `effective_class >= Restricted` (`engine_bridge.rs:240-243`, applied `main.rs:5908-5932`). The honest scope is stated in `engine_bridge.rs:14-27`: the governed engine runs *alongside* the bare fusion path; derived outputs are "published ungoverned."
---
## 2. Decision
1. **Build a thin RuView-side bridge crate `wifi-densepose-rufield`** (a new v2 workspace member) that depends on `vendor/rufield/crates/rufield-core` (+ `rufield-provenance`, `rufield-privacy`, `rufield-fusion`) **via path** — mirroring the `vendor/rvcsi` pattern. RuView does **not** depend on published rufield crates (there are none) and does **not** vendor rufield into the v2 workspace; rufield stays a standalone submodule and the bridge is the only coupling point (an anti-corruption layer).
2. **Emit `FieldEvent`s from the live server via an in-process `SensingServerAdapter`**, not by re-using the file-based `CsiReplayAdapter` on the hot path. The bridge taps the existing `SensingUpdate` build site and the `EngineBridge` trust state, joins them, and emits one signed `FieldEvent` per cycle on a new `tokio::broadcast` topic / optional `/ws/field` endpoint. `CsiReplayAdapter` is retained for the **offline/replay** path (recorded `.csi.jsonl` → events) because it already reads RuView's recording format (`recording.rs` writes `{session}.csi.jsonl`).
3. **Compose the two fusion engines vertically, do not merge them.** ruvsense stays the **WiFi-modality node** (multi-link fusion → one fused WiFi belief); rufield-fusion sits **above** it as the **cross-modality** graph. ruvsense's `FusedSensingFrame`/`TrustedOutput` becomes one `FieldEvent` (modality `wifi_csi`); rufield fuses it against future mmWave/thermal/`rvcsi` events. They do not conflict because ruvsense has no cross-modality fusion to collide with (§1.2 evidence).
4. **Reconcile privacy/provenance with ONE canonical model + a documented mapping** (§3, the crux): RuView's `effective_class` is the **source of truth**, mapped onto RuField `PrivacyClass` at the bridge; RuView's existing **`cog-ha-matter` SHA-256+Ed25519 witness chain** (already RuField's exact crypto) is adopted as the carrier for RuField `ProvenanceReceipt`, with the live BLAKE3 engine witness embedded as a hashed field. We do **not** maintain two parallel signed-receipt systems.
---
## 3. Privacy & provenance reconciliation (the crux)
This is the most important section. RuView and RuField genuinely **overlap and partially conflict**. We map both honestly.
### 3.1 What RuView actually has (implemented, with evidence)
- **TWO privacy enums, not one ladder.** `PrivacyClass`**4 variants** `Raw=0, Derived=1, Anonymous=2, Restricted=3` (`v2/crates/wifi-densepose-bfld/src/lib.rs:103-116`, `#[repr(u8)]`, higher byte = more private, **non-monotonic in information**`Derived=1` carries *more* identity than `Anonymous=2`). And `PrivacyMode`**5 variants** `RawResearch, PrivateHome, EnterpriseAnonymous, CareWithConsent, StrictNoIdentity` (`bfld/src/privacy_mode.rs:18-31`), each mapping to a `PrivacyClass` via `target_class()` (`:63-70`; two modes collapse to `Anonymous`).
- **THREE witness mechanisms across TWO hash algorithms:**
- BFLD `PrivacyAttestationProof`**BLAKE3, unsigned**, attests mode/class continuity only; **built but NOT on the live path** (ADR-141 status line ~597; `bfld/src/privacy_mode.rs:121-148`).
- Engine-cycle `TrustedOutput.witness: [u8;32]`**BLAKE3, unsigned**, over the full trust decision; **LIVE every cycle** (`wifi-densepose-engine/src/lib.rs:598-613`).
- `cog-ha-matter::WitnessChain`**SHA-256 hash chain + Ed25519 signatures** (`v2/crates/cog-ha-matter/src/witness.rs:138-151`; `witness_signing.rs:39-76`), JSONL-persisted, `verify()` + `verify_signature()`. Implemented for ADR-116 (cog/Matter audit log); **standalone, not wired to BFLD/engine**. Its `WitnessHash` newtype doc explicitly anticipates a hash-algo migration (`witness.rs:37-41`).
- **No numeric trust score.** "Trust" in code = `base_coherence: f32∈[0,1]` + `penalized_coherence()` (`signal/.../fusion_quality.rs:99,122-126`) + a **boolean** `forces_privacy_demotion()` (`:116`). Demotion is monotonic and irreversible (`demote_one` clamps at Restricted, `engine/src/lib.rs:617-619`).
- **Structured provenance exists, but no signed "receipt" on the sensing path.** `SemanticProvenance { evidence, model_version, calibration_version, privacy_decision }` (`v2/crates/wifi-densepose-worldgraph/src/model.rs:137-147`) is attached to every belief and is the *input* to the BLAKE3 witness — but it is unsigned and not called a receipt.
### 3.2 Side-by-side, graded
| Dimension | RuView (file:line) | RuField | Alignment |
|---|---|---|---|
| Privacy ladder | `PrivacyClass` 4 (`bfld/lib.rs:103`) **or** `PrivacyMode` 5 (`bfld/privacy_mode.rs:18`) | `PrivacyClass` 6 (P0P5, `rufield-core/privacy.rs:8`) | **PARTIAL→CONFLICT** — no clean 1:1; counts differ (4/5 vs 6); RuView class ordering non-monotonic |
| Demotion direction | higher = more private, irreversible (`engine/lib.rs:617`) | higher P# = more private, `Ord` by decl order (`privacy.rs:8-25`) | **STRONG** (same direction) |
| Provenance receipt | `SemanticProvenance` unsigned (`worldgraph/model.rs:137`) | `ProvenanceRef` + ed25519 (`event.rs:73`) | **PARTIAL** — structured but unsigned |
| Witness crypto (live path) | BLAKE3 `[u8;32]`, unsigned (`engine/lib.rs:598`) | sha256 + ed25519 (`rufield-provenance/lib.rs:26,135`) | **CONFLICT** (algo + signing) |
| Witness crypto (cog-ha-matter) | **SHA-256 + Ed25519** (`cog-ha-matter/witness.rs`, `witness_signing.rs`) | **sha256 + ed25519** | **STRONG** — RuField's exact crypto, already in-repo, but unwired and in another bounded context |
| Trust / confidence | `penalized_coherence: f32` + boolean demote (`fusion_quality.rs:122`) | `confidence: f32` per observation | **WEAK** — RuView has no graded trust object; confidence maps, demotion is binary |
### 3.3 The recommendation (the key call)
**Adopt ONE canonical model with a documented, lossy-but-monotonic mapping — do not run two parallel schemes.** Concretely:
1. **Privacy: RuView `effective_class` is the source of truth; the bridge maps it onto RuField `PrivacyClass`** at the egress boundary. The honest mapping (graded ARCHITECTURE — it is a *policy* decision, and it is **monotonicity-testable**, not an accuracy claim):
| RuView `PrivacyClass` | → RuField | Rationale |
|---|---|---|
| `Raw` (raw CSI amplitude) | `P0` | raw waveform |
| `Derived` (identity embedding, LAN-only) | `P4` *(or P5 if identity-bound)* | derived **identity** features ⇒ biometric/identity tier, **not** P1 — RuView's non-monotonic `Derived=1` is the trap; map by *information content*, not byte value |
| `Anonymous` (occupancy/aggregate) | `P2`/`P3` | occupancy → P2, room-count aggregate → P3 |
| `Restricted` (zeroized) | `P2`-capped, raw suppressed | matches `suppress_raw_outputs` (`engine_bridge.rs:240`) |
The bridge **must** map `Derived → P4/P5`, never P1, because RuView's `Derived` carries `identity_embedding` (§3.1) — this is the single most dangerous mapping mistake and gets a dedicated test (P2 in §4). `PrivacyMode` (5) is the better *operator-facing* join to RuField's 6 levels but the **class** is what gates egress, so the class mapping is canonical.
2. **Provenance: adopt `cog-ha-matter`'s SHA-256+Ed25519 chain as the carrier for RuField `ProvenanceReceipt`** — it is already RuField's exact crypto (graded STRONG above), already implemented, already tamper-evident. The bridge constructs the RuField `ProvenanceRef` by: `raw_hash = sha256(csi bytes)`, `model_id`/`calibration_id` from `SemanticProvenance`, and **embeds the live BLAKE3 engine witness `[u8;32]` as a hashed provenance field** (it is already computed every cycle — do not throw it away), then **signs with ed25519** so `is_fusable` passes for live (non-synthetic) events. We do **not** add a second BLAKE3-vs-ed25519 argument: BLAKE3 stays RuView's internal fast cycle-fingerprint; ed25519 is the *external* attestation RuField requires. One signer, one chain.
3. **Trust: map `penalized_coherence` → `Observation.confidence`; keep demotion binary.** RuView has no graded trust object to reconcile; the coherence scalar is the honest analog and the demotion boolean already drives `effective_class`.
This is a **bridge-with-canonical-source**, not "keep both forever." RuView owns the privacy decision (it has the live governed cycle); RuField owns the *external wire shape* (P0P5 + signed receipt). The bridge is the one-directional translation, and it is the only place the two schemes meet.
---
## 4. Phased plan (each phase independently shippable + testable)
**P1 — `SensingServerAdapter` emitting `FieldEvent`s (ARCHITECTURE).**
New crate `wifi-densepose-rufield` with a `SensingServerAdapter` that consumes a `(SensingUpdate, TrustedOutput)` pair (tapped at `main.rs:5886`/`:5938`) and emits a signed `FieldEvent` (`Modality::WifiCsi`, axes `[Frequency]`, observation features from `SensingUpdate.features`, `confidence` from `penalized_coherence`). Offline path: keep `CsiReplayAdapter` for recorded `.csi.jsonl`. **Gate:** a round-trip test — emit a `FieldEvent` from a fixture `SensingUpdate`, assert it serializes, `is_fusable` passes (ed25519-signed), and `RuFieldFusion::ingest` accepts it. No server changes required beyond exposing the tap; the adapter is a library.
**P2 — privacy/provenance bridge (the crux, ARCHITECTURE).**
Implement the §3.3 mapping: `effective_class → PrivacyClass`, `cog-ha-matter` ed25519 signer for the receipt, BLAKE3 witness embedded. **Gates (three, all monotonicity/safety, not accuracy):** (a) `Derived → P4|P5` never P1 (the dangerous-mapping test); (b) privacy monotonicity — `demoted == true` ⇒ emitted `PrivacyClass >= P2` and raw suppressed; (c) signature round-trip — sign with the cog-ha-matter key, `rufield_provenance::verify_event` passes. This phase is shippable without P3 (events emitted on an internal topic, not yet on the public wire).
**P3 — surface in `/ws` + viewer (ARCHITECTURE).**
Add an opt-in `/ws/field` endpoint (or a `field_events` array on `SensingUpdate` behind a flag) carrying the signed `FieldEvent` + a privacy badge. Add an ingest route to `rufield-viewer` (it has none today — `server.rs:63-72`) so it can replay RuView's live feed instead of only `SyntheticSim`. **Gate:** a WS integration test asserting a connected client receives a privacy-badged, signature-verifiable `FieldEvent`; a viewer test asserting the new ingest route renders a live event. The `cognitum` appliance can speak RuField by consuming this endpoint (it already runs `ruview-vitals-worker`); deferred to its own ADR.
**P4 — fusion composition + multi-modality (ARCHITECTURE, optional).**
Wire a second modality (cheapest: an `rvcsi`-sourced event, or recorded mmWave) into `RuFieldFusion` alongside the WiFi event, proving cross-modality fusion above ruvsense. **Gate:** a fusion test with two modalities producing ≥1 cross-modal inference, with provenance coverage 100%.
---
## 5. Decision matrix
### 5.1 Data-path emission (P1)
| Option | Latency | Reuse | Live-fit | Risk | Verdict |
|---|---|---|---|---|---|
| Re-use `CsiReplayAdapter` on hot path | poor (file buffer, `&str` ctor) | high | **bad** — it's a file-cursor, not a live source | low | **Reject for live** (keep for replay) |
| In-process `SensingServerAdapter` (tap `SensingUpdate`+`TrustedOutput`) | good | medium | **good** — taps the real emit + real trust state | low | **CHOSEN** |
| Server publishes `FieldEvent` on its own topic (no adapter trait) | good | low | good | medium (bypasses `FieldAdapter` contract) | Reject — loses the trait seam |
### 5.2 Fusion relationship (P3/P4)
| Option | Verdict |
|---|---|
| Merge ruvsense into rufield-fusion | **Reject** — different scopes; ruvsense is within-WiFi multi-link, rufield is cross-modality |
| rufield-fusion wraps ruvsense (vertical compose) | **CHOSEN** — ruvsense → one WiFi `FieldEvent` → rufield cross-modality graph |
| Run both as peers, reconcile after | Reject — duplicates fusion semantics, two contradiction models |
### 5.3 Privacy/provenance reconciliation (P2)
| Option | Verdict |
|---|---|
| (a) Map RuView classes onto RuField P0P5, RuView canonical | **CHOSEN (privacy)**`effective_class` is the live source of truth |
| (b) Adopt RuField ed25519 receipts as RuView's provenance | **CHOSEN (provenance)** — via the already-present `cog-ha-matter` SHA-256+Ed25519 chain |
| (c) Keep both schemes with a permanent bridge | **Reject** — two signed-receipt systems is the duplication we must not ship |
### 5.4 Dependency direction
| Option | Verdict |
|---|---|
| Depend on published rufield crates | **Reject** — not published (`vendor/rufield/Cargo.toml:31-37`) |
| Make rufield a v2 workspace member | **Reject** — breaks the standalone-spec/`rvcsi` precedent |
| Thin `wifi-densepose-rufield` bridge → path deps on submodule | **CHOSEN** — anti-corruption layer, single coupling point |
---
## 6. Security & honesty notes
- **No accuracy claim.** Live RuField events from RuView are derived from the same single-link CSI whose own caveats are on record (`field_localize.rs:16-27`); the offline path is unlabeled replay (`csi_replay.rs:19-31`). This ADR ships **plumbing with monotonicity/signature gates**, not validated F1.
- **The dangerous mapping is `Derived → P1`.** RuView's `Derived` byte value (1) is numerically below `Anonymous` (2) but carries identity (`bfld/lib.rs`); a naive byte-mapping would leak identity-bearing features as low-privacy P1. P2's gate (a) exists specifically to prevent this.
- **One signer, not two.** Adding a second ed25519 keypair alongside `cog-ha-matter`'s would create two roots of trust. The bridge reuses the cog-ha-matter signing key (`witness_signing.rs`).
- **`is_fusable` is a real gate, not decoration** (`rufield-provenance/lib.rs:179-184`): live events that fail to sign are rejected by `RuFieldFusion::ingest` — we must not paper over a signing failure with `synthetic = true` on a real event (that would be the §11 invariant violation the spec forbids).
- BLAKE3 stays internal; ed25519 is the external attestation. We do not relitigate RuView's BLAKE3 cycle-witness — it is embedded, not replaced.
## 7. Consequences
**Positive:** RuView becomes one honest adapter in the larger RuField ecosystem (ADR-260 goal §9) without forking its fusion or privacy engine; the three witness mechanisms get a single external attestation path; cross-modality fusion becomes possible above the existing WiFi fusion; the `cognitum` appliance gains a standard wire format. The bridge is the only coupling point, so rufield can evolve as a standalone spec.
**Negative:** a fourth crate to maintain; the privacy mapping is lossy (4/5 → 6) and must be kept honest by tests; reusing the `cog-ha-matter` key crosses a bounded-context boundary (cog/Matter ↔ sensing) that ADR-116 kept separate — that coupling needs review. The live trust metadata (`witness`, `effective_class`) is **currently decoupled** from `SensingUpdate` (§1.2), so P1 must do real join work, not a field read.
## 8. Open questions
1. **Signer ownership:** should the bridge reuse the `cog-ha-matter` Ed25519 key, or mint a dedicated RuView-sensing key with its own rotation? (Reuse couples bounded contexts; a new key adds a second root of trust.)
2. **`PrivacyMode` vs `PrivacyClass` as the canonical map target:** class gates egress (chosen), but the 5-mode ladder is the cleaner join to 6 levels — do we expose mode in the receipt too?
3. **Where does the BLAKE3 engine witness live in the RuField receipt** — a `firmware_hash`-style field, an extension field, or a `CalibrationReceipt.data_hash`? (RuField's `ProvenanceRef` has no spare slot; needs a spec extension or reuse of `model_id`.)
4. **Should `field_localize` positions ride in `Observation.space_cell`/`motion_vector`** given the explicit single-link caveat, or stay RuView-only until multi-node calibration lands?
5. **`rvcsi` relationship:** `rvcsi` has its own `CsiFrame`/`CsiWindow` and could implement `FieldAdapter` directly — should the second modality in P4 be `rvcsi`, making RuField the convergence point for *both* vendored sensing runtimes?
6. **Transport:** RuField ADR-260 §29 leaves default transport open (MQTT/NATS/WS/MCP). RuView is WS + UDP + broadcast; does `/ws/field` suffice, or does the appliance need MQTT to match the cog stack?
## 9. Recommendation
Proceed with P1+P2 behind a feature flag. They are independently shippable, carry real gates (round-trip, monotonicity, signature-verify), and require no change to RuView's fusion or privacy engine — only a tap and a translation. Defer P3/P4 and the appliance/transport questions to follow-up ADRs once the bridge round-trips on recorded `.csi.jsonl` and on one live cycle.
Generated
+48
View File
@@ -7085,6 +7085,42 @@ dependencies = [
"smallvec",
]
[[package]]
name = "rufield-core"
version = "0.1.0"
dependencies = [
"serde",
"serde_json",
]
[[package]]
name = "rufield-fusion"
version = "0.1.0"
dependencies = [
"rufield-core",
"rufield-provenance",
"serde",
"toml 0.8.23",
]
[[package]]
name = "rufield-privacy"
version = "0.1.0"
dependencies = [
"rufield-core",
]
[[package]]
name = "rufield-provenance"
version = "0.1.0"
dependencies = [
"ed25519-dalek",
"rufield-core",
"serde",
"serde_json",
"sha2",
]
[[package]]
name = "rumqttc"
version = "0.24.0"
@@ -11045,6 +11081,18 @@ dependencies = [
"tower-http",
]
[[package]]
name = "wifi-densepose-rufield"
version = "0.3.0"
dependencies = [
"rufield-core",
"rufield-fusion",
"rufield-privacy",
"rufield-provenance",
"serde",
"serde_json",
]
[[package]]
name = "wifi-densepose-ruvector"
version = "0.3.2"
+5
View File
@@ -72,6 +72,11 @@ members = [
"crates/homecore-assist", # ADR-133 — HOMECORE voice assistant + ruflo bridge
"crates/homecore-server", # iter-9 — HOMECORE integration binary (all 8 crates wired together)
"crates/ruview-swarm", # ADR-148 — drone swarm control system
# ADR-262 P1 — anti-corruption bridge converting RuView WiFi-CSI sensing
# output into signed RuField FieldEvents. Path-deps the `vendor/rufield`
# submodule crates (rufield-core/-provenance/-privacy/-fusion); single
# coupling point between RuView and the standalone RuField MFS spec.
"crates/wifi-densepose-rufield",
]
# ADR-040: WASM edge crate targets wasm32-unknown-unknown (no_std),
# excluded from workspace to avoid breaking `cargo test --workspace`.
@@ -0,0 +1,26 @@
[package]
name = "wifi-densepose-rufield"
version = "0.3.0"
edition = "2021"
description = "ADR-262 anti-corruption bridge: converts RuView WiFi-CSI sensing output into signed RuField FieldEvents (P0P5 privacy mapping + ed25519 provenance)"
license.workspace = true
authors.workspace = true
repository.workspace = true
# ADR-262 §5.4: this crate is the single coupling point ("anti-corruption
# layer") between RuView and the standalone RuField MFS spec. It depends on the
# `vendor/rufield` submodule crates **via path** (the `vendor/rvcsi` pattern) —
# RuView does NOT depend on published rufield crates (there are none) and does
# NOT make rufield a v2 workspace member. The four crates below are pure-Rust
# (serde / serde_json / toml / sha2 / ed25519-dalek only — no tch / openblas /
# ndarray / candle), so they build under `--no-default-features`.
[dependencies]
rufield-core = { path = "../../../vendor/rufield/crates/rufield-core" }
rufield-provenance = { path = "../../../vendor/rufield/crates/rufield-provenance" }
rufield-privacy = { path = "../../../vendor/rufield/crates/rufield-privacy" }
rufield-fusion = { path = "../../../vendor/rufield/crates/rufield-fusion" }
serde = { workspace = true }
serde_json = { workspace = true }
[dev-dependencies]
serde_json = { workspace = true }
@@ -0,0 +1,206 @@
//! The conversion: `SensingSnapshot` → signed `FieldEvent` (ADR-262 P1).
//!
//! This is the in-process `SensingServerAdapter` core (ADR-262 §4 P1 / §5.1):
//! it consumes a `(SensingUpdate, TrustedOutput)` join — modelled here as a
//! [`SensingSnapshot`] of owned primitives — and emits one signed
//! [`FieldEvent`] (`Modality::WifiCsi`, axis `[Frequency]`) per cycle.
use crate::privacy::egress_class;
use crate::snapshot::{SensingSnapshot, SignalField};
use rufield_core::{
FieldAxis, FieldEvent, FieldTensor, Modality, Observation, PrivacyClass, ProvenanceRef,
SensorDescriptor,
};
use rufield_provenance::{sha256_hex, Signer};
use std::collections::BTreeMap;
/// Model id stamped on emitted events (ADR-262 — derived features come from
/// RuView's `/ws/sensing` pipeline, not a trained encoder).
const MODEL_ID: &str = "ruview_sensing_server_v1";
/// Firmware hash placeholder until the real ESP32 firmware image hash is wired
/// through (ADR-262 §8 open question 3 — the BLAKE3 engine witness slot). A
/// stable `sha256:` over the model id keeps it a real digest, not a fake.
fn firmware_hash() -> String {
sha256_hex(MODEL_ID.as_bytes())
}
/// Squash a non-negative power-like scalar into `[0, 1]` deterministically.
/// `x / (x + 1)` — monotone, no panics, no calibration claim.
fn squash(x: f64) -> f32 {
if !x.is_finite() || x <= 0.0 {
return 0.0;
}
(x / (x + 1.0)) as f32
}
/// Build the `Observation.features` map the RuField fusion engine reads
/// (`rufield-fusion/engine.rs:217-228`: `motion_energy`, `breathing_band`,
/// `transient`, `presence`, `range_m`, plus `posture_height`).
fn build_features(snap: &SensingSnapshot, range_m: Option<f32>) -> BTreeMap<String, f32> {
let f = &snap.features;
let mut m = BTreeMap::new();
m.insert("motion_energy".to_string(), squash(f.motion_band_power));
m.insert("breathing_band".to_string(), squash(f.breathing_band_power));
m.insert("transient".to_string(), squash(f.change_points as f64));
m.insert(
"presence".to_string(),
if snap.classification.presence { 1.0 } else { 0.0 },
);
if let Some(r) = range_m {
m.insert("range_m".to_string(), r);
}
m
}
/// Derive a real range (metres) and motion vector from the strongest signal
/// field peak, if a field is present. Returns `(range_m, motion_vector,
/// space_cell)` — all `None` when there is no field (we do NOT fabricate
/// coordinates, per ADR-262 §4 P1).
fn derive_position(
field: Option<&SignalField>,
) -> (Option<f32>, Option<[f32; 3]>, Option<[i32; 3]>) {
let Some(field) = field else {
return (None, None, None);
};
let Some(cell) = field.peak_cell() else {
return (None, None, None);
};
// Range from origin in grid-cell units (real readout, not calibrated
// metres — the honesty caveat from `field_localize.rs:16-27`).
let [x, y, z] = cell;
let range = ((x * x + y * y + z * z) as f32).sqrt();
let mag = if range > 0.0 { range } else { 1.0 };
let motion_vector = [x as f32 / mag, y as f32 / mag, z as f32 / mag];
(Some(range), Some(motion_vector), Some(cell))
}
/// Stable, deterministic event id from `(node_id, timestamp_ns)`. No RNG, so
/// the same snapshot always yields the same id (required for the determinism
/// gate).
fn event_id(snap: &SensingSnapshot) -> String {
format!("ruview-{}-{}", snap.node_id, snap.timestamp_ns)
}
/// Convert a [`SensingSnapshot`] to a **signed** [`FieldEvent`] (ADR-262 P1).
///
/// 1. Builds a `FieldTensor` (`Modality::WifiCsi`, axis `[Frequency]`) whose
/// values are the RuView feature scalars, with the real `timestamp_ns`.
/// 2. Builds an `Observation` — `motion_vector`/`range_m`/`space_cell` derived
/// from the signal-field peak when present (else `None`; coordinates are
/// never fabricated), `confidence` from the classification, labels from
/// motion-level/presence.
/// 3. Stamps the §3.3 egress privacy class (information-content mapping with
/// the demotion floor) on both tensor and observation.
/// 4. Builds a real `ProvenanceRef` (sha256 raw hash over the tensor/feature
/// bytes, `synthetic = false`) and **signs** it with the supplied ed25519
/// [`Signer`] so `rufield_provenance::is_fusable` passes.
///
/// Determinism: with no RNG anywhere and a deterministic ed25519 signer, the
/// same `snap` + same signer seed yields a byte-identical event.
#[must_use]
pub fn snapshot_to_field_event(snap: &SensingSnapshot, signer: &Signer) -> FieldEvent {
let class = egress_class(snap.trust_class, snap.identity_bound, snap.demoted);
let (range_m, motion_vector, space_cell) = derive_position(snap.signal_field.as_ref());
// ── 1. Tensor ──────────────────────────────────────────────────────────
// The frequency-domain feature scalars, in a stable order.
let f = &snap.features;
let values: Vec<f32> = vec![
f.mean_rssi as f32,
f.variance as f32,
f.motion_band_power as f32,
f.breathing_band_power as f32,
f.dominant_freq_hz as f32,
f.spectral_power as f32,
];
let confidence = (snap.classification.confidence as f32).clamp(0.0, 1.0);
let noise_floor = f.variance.max(0.0) as f32;
let calibration_id = format!("ruview_node_{}", snap.node_id);
// `FieldTensor::new` only errors on a shape/axis mismatch; our shape
// exactly matches `values.len()` and one axis, so this is infallible here.
let tensor = FieldTensor::new(
snap.timestamp_ns,
Modality::WifiCsi,
vec![FieldAxis::Frequency],
vec![values.len()],
values,
confidence,
noise_floor,
Some(calibration_id.clone()),
class,
)
.expect("feature tensor shape is well-formed by construction");
// ── 2. Observation ─────────────────────────────────────────────────────
let observation = Observation {
zone_id: Some(snap.node_id.clone()),
space_cell,
range_m,
velocity_mps: None,
motion_vector,
confidence,
features: build_features(snap, range_m),
labels: build_labels(snap),
privacy_class: class,
};
// ── 3. Provenance (real sha256 over the tensor bytes) ───────────────────
let raw_hash = sha256_hex(
&serde_json::to_vec(&tensor).expect("tensor serializes to JSON for hashing"),
);
let provenance = ProvenanceRef {
raw_hash,
firmware_hash: firmware_hash(),
model_id: MODEL_ID.to_string(),
calibration_id,
synthetic: false, // a real (non-synthetic) live/replay event
signature_hex: None,
signer_pubkey_hex: None,
};
let sensor = SensorDescriptor {
modality: "wifi_csi".to_string(),
vendor: "esp32".to_string(),
device_id: snap.node_id.clone(),
placement: "unknown".to_string(),
clock_domain: "local".to_string(),
};
let mut event = FieldEvent::new(
event_id(snap),
snap.timestamp_ns,
sensor,
tensor,
observation,
provenance,
);
// ── 4. Sign (ed25519) so `is_fusable` passes for this real event ────────
signer
.sign_event(&mut event)
.expect("ed25519 signing of a serializable event is infallible");
event
}
/// Labels from the classification. These are descriptive (`person_present`,
/// `motion_<level>`); the RuField fusion engine never reads labels
/// (`event.rs:45-48`), so this carries no identity.
fn build_labels(snap: &SensingSnapshot) -> Vec<String> {
let mut labels = Vec::new();
if snap.classification.presence {
labels.push("person_present".to_string());
}
labels.push(format!("motion_{}", snap.classification.motion_level));
labels
}
/// Convenience: the privacy class that *would* be stamped for a snapshot,
/// without building the whole event. Useful for egress badges (P3) and tests.
#[must_use]
pub fn snapshot_egress_class(snap: &SensingSnapshot) -> PrivacyClass {
egress_class(snap.trust_class, snap.identity_bound, snap.demoted)
}
@@ -0,0 +1,85 @@
//! # wifi-densepose-rufield
//!
//! ADR-262 **anti-corruption bridge**: converts RuView's live WiFi-CSI sensing
//! output into signed RuField [`FieldEvent`](rufield_core::FieldEvent)s.
//!
//! This crate is the **single coupling point** (ADR-262 §5.4) between RuView and
//! the standalone RuField MFS spec (`vendor/rufield`, ADR-260). It depends on
//! the four pure-Rust rufield crates **via path** — `rufield-core`,
//! `-provenance`, `-privacy`, `-fusion` — and on **no** RuView internal crate.
//! Inputs are owned primitives ([`SensingSnapshot`]) that mirror what RuView's
//! sensing cycle produces, so the bridge never imports `SensingUpdate` /
//! `TrustedOutput` directly.
//!
//! ## What P1 ships (honesty — ADR-262 §0 / §6)
//!
//! This is **P1 plumbing**: a tested `SensingSnapshot → FieldEvent` conversion
//! plus the **fail-closed privacy mapping** that is the §3.3 correctness item.
//! It is **not** wired into the live server (that is P3) and makes **no accuracy
//! claim** — RuField v0.1 is synthetic end-to-end and RuView's single-link CSI
//! carries its own caveats. The gates here are round-trip / fusability /
//! privacy-safety / determinism, not validated F1.
//!
//! ## The critical correctness item: the privacy mapping (§3.3)
//!
//! RuView's `Derived` class has byte value `1` (below `Anonymous = 2`) yet
//! carries an identity embedding. The bridge maps it to **P4/P5 by information
//! content, never P1** — see [`map_privacy`]. Mapping off the byte would leak
//! identity as low-privacy; [`map_privacy`] (and its dedicated test
//! `derived_identity_never_maps_to_low_privacy`) exist specifically to prevent
//! that.
//!
//! ## Example
//!
//! ```
//! use wifi_densepose_rufield::{
//! snapshot_to_field_event, SensingSnapshot, SensingFeatures, SensingClass,
//! RuViewPrivacyClass,
//! };
//! use rufield_provenance::{Signer, is_fusable};
//!
//! let snap = SensingSnapshot {
//! timestamp_ns: 1_791_986_400_000_000_000,
//! features: SensingFeatures {
//! mean_rssi: -55.0,
//! variance: 0.4,
//! motion_band_power: 2.0,
//! breathing_band_power: 0.3,
//! dominant_freq_hz: 0.25,
//! change_points: 1,
//! spectral_power: 3.0,
//! },
//! classification: SensingClass {
//! motion_level: "low".into(),
//! presence: true,
//! confidence: 0.82,
//! },
//! signal_field: None,
//! trust_class: RuViewPrivacyClass::Anonymous,
//! demoted: false,
//! identity_bound: false,
//! node_id: "esp32_room_01".into(),
//! };
//!
//! let signer = Signer::from_seed(b"adr-262-bridge-seed-32-bytes-ok!");
//! let event = snapshot_to_field_event(&snap, &signer);
//! assert!(is_fusable(&event)); // ed25519-signed, non-synthetic ⇒ fusable
//! ```
#![forbid(unsafe_code)]
pub mod bridge;
pub mod privacy;
pub mod snapshot;
pub use bridge::{snapshot_egress_class, snapshot_to_field_event};
pub use privacy::{apply_demotion_floor, egress_class, map_privacy};
pub use snapshot::{
RuViewPrivacyClass, SensingClass, SensingFeatures, SensingSnapshot, SignalField,
};
// Re-export the rufield surface a bridge consumer needs, so callers depend on
// one crate.
pub use rufield_core::{FieldEvent, Modality, PrivacyClass};
pub use rufield_fusion::RuFieldFusion;
pub use rufield_provenance::{is_fusable, verify_event, Signer};
@@ -0,0 +1,147 @@
//! The ADR-262 §3.3 privacy mapping — the critical correctness item.
//!
//! RuView's effective `PrivacyClass` (4 byte-level classes) is the source of
//! truth; the bridge maps it onto RuField's `PrivacyClass` (P0P5) **at the
//! egress boundary, by information content, NEVER by byte value**.
//!
//! ## The trap (ADR-262 §3, §6)
//!
//! RuView's `Derived` has byte value `1`, which sorts *below* `Anonymous`
//! (byte `2`). A naive byte-mapping (`Derived = 1 → P1`) would leak
//! identity-bearing features (`identity_embedding`, `identity_risk_score`) as a
//! **low-privacy P1** event. Because `Derived` carries derived *identity*, it
//! must map to the **biometric/identity tier (P4/P5)** — never P1. This is the
//! single most dangerous mapping mistake; it gets a dedicated test
//! (`derived_identity_never_maps_to_low_privacy`).
//!
//! ## Fail-closed
//!
//! [`RuViewPrivacyClass`] is a closed enum, so there is no runtime "unknown"
//! value to receive — but the mapping is written `match`-exhaustively with an
//! explicit, documented arm per class, and the `demoted`/`identity_bound`
//! overlays only ever move the result **toward more privacy**, never less.
use crate::snapshot::RuViewPrivacyClass;
use rufield_core::PrivacyClass;
/// Map a RuView effective `PrivacyClass` onto a RuField `PrivacyClass`
/// (ADR-262 §3.3), by information content.
///
/// | RuView (byte) | → RuField | Rationale |
/// |---|---|---|
/// | `Raw` (0) | `P0` | raw CSI waveform |
/// | `Derived` (1) | `P4` (or `P5` if `identity_bound`) | derived **identity** features ⇒ biometric/identity tier, **not** P1 |
/// | `Anonymous` (2) | `P2` | occupancy / motion only |
/// | `Restricted` (3) | `P2` (raw suppressed) | matches `suppress_raw_outputs` |
///
/// `identity_bound` only promotes `Derived` (already identity-derived) from P4
/// to P5; it can never lower the class.
#[must_use]
pub fn map_privacy(ruview_class: RuViewPrivacyClass, identity_bound: bool) -> PrivacyClass {
match ruview_class {
// Raw CSI amplitude → raw waveform tier.
RuViewPrivacyClass::Raw => PrivacyClass::P0,
// THE CRITICAL ARM (§3.3 / §6): `Derived` carries identity. Map by
// information content to the biometric/identity tier P4, and to P5 when
// the surface is bound to a named identity. NEVER P1.
RuViewPrivacyClass::Derived => {
if identity_bound {
PrivacyClass::P5
} else {
PrivacyClass::P4
}
}
// Anonymous occupancy / motion aggregate → P2.
RuViewPrivacyClass::Anonymous => PrivacyClass::P2,
// Restricted: occupancy with risk score / hash stripped and raw
// suppressed. Capped at P2 (occupancy tier), matching
// `EngineBridge::suppress_raw_outputs` (`engine_bridge.rs:240`).
RuViewPrivacyClass::Restricted => PrivacyClass::P2,
}
}
/// The §4 P2 gate (b) monotonicity overlay: a governed-engine **demotion**
/// (`TrustedOutput.demoted == true`) must never let the emitted class fall
/// below P2 (occupancy floor), and raw is suppressed.
///
/// This is applied *after* [`map_privacy`] and can only raise the class
/// (toward more privacy) — it is fail-closed by construction.
#[must_use]
pub fn apply_demotion_floor(class: PrivacyClass, demoted: bool) -> PrivacyClass {
if demoted && class < PrivacyClass::P2 {
PrivacyClass::P2
} else {
class
}
}
/// The full egress class for a snapshot: information-content mapping with the
/// demotion floor overlaid. This is what the bridge stamps on the emitted
/// `FieldEvent`.
#[must_use]
pub fn egress_class(
ruview_class: RuViewPrivacyClass,
identity_bound: bool,
demoted: bool,
) -> PrivacyClass {
apply_demotion_floor(map_privacy(ruview_class, identity_bound), demoted)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn derived_maps_to_identity_tier_not_p1() {
// The single most dangerous mapping mistake: Derived (byte 1) must NOT
// become P1. It carries identity ⇒ P4, or P5 if identity-bound.
assert_eq!(map_privacy(RuViewPrivacyClass::Derived, false), PrivacyClass::P4);
assert_eq!(map_privacy(RuViewPrivacyClass::Derived, true), PrivacyClass::P5);
}
#[test]
fn full_table_matches_adr_262_section_3_3() {
assert_eq!(map_privacy(RuViewPrivacyClass::Raw, false), PrivacyClass::P0);
assert_eq!(map_privacy(RuViewPrivacyClass::Derived, false), PrivacyClass::P4);
assert_eq!(map_privacy(RuViewPrivacyClass::Anonymous, false), PrivacyClass::P2);
assert_eq!(map_privacy(RuViewPrivacyClass::Restricted, false), PrivacyClass::P2);
}
#[test]
fn mapping_ignores_non_monotonic_byte_value() {
// Derived's byte (1) is *below* Anonymous's byte (2), but Derived's
// mapped class must be *above* Anonymous's mapped class — proving the
// mapping uses information content, not the byte.
assert!(RuViewPrivacyClass::Derived.raw_byte() < RuViewPrivacyClass::Anonymous.raw_byte());
assert!(
map_privacy(RuViewPrivacyClass::Derived, false)
> map_privacy(RuViewPrivacyClass::Anonymous, false)
);
}
#[test]
fn demotion_floor_only_raises_privacy() {
// Raw → P0, but a demoted cycle floors to P2 with raw suppressed.
assert_eq!(apply_demotion_floor(PrivacyClass::P0, true), PrivacyClass::P2);
// Already-high classes are never lowered by the floor.
assert_eq!(apply_demotion_floor(PrivacyClass::P5, true), PrivacyClass::P5);
// No demotion ⇒ unchanged.
assert_eq!(apply_demotion_floor(PrivacyClass::P0, false), PrivacyClass::P0);
}
#[test]
fn identity_bound_only_promotes() {
// identity_bound never lowers privacy; it only promotes Derived P4→P5.
for c in [
RuViewPrivacyClass::Raw,
RuViewPrivacyClass::Derived,
RuViewPrivacyClass::Anonymous,
RuViewPrivacyClass::Restricted,
] {
assert!(map_privacy(c, true) >= map_privacy(c, false));
}
}
}
@@ -0,0 +1,152 @@
//! Owned, primitive input types for the ADR-262 bridge.
//!
//! These deliberately **mirror** the shapes RuView's sensing cycle produces
//! (the `/ws/sensing` `SensingUpdate` build site at
//! `wifi-densepose-sensing-server/src/main.rs:~5938` and the `TrustedOutput`
//! trust state surfaced via `EngineBridge` at `main.rs:~5886`) **without
//! importing** RuView's internal crates. Keeping the bridge an anti-corruption
//! layer (ADR-262 §5.4) means it takes owned primitives, not `SensingUpdate`
//! or `TrustedOutput` directly — so this crate never depends on
//! `wifi-densepose-sensing-server`.
use serde::{Deserialize, Serialize};
/// The CSI feature scalars RuView publishes on every `/ws/sensing` cycle.
///
/// Mirrors `FeatureInfo` (`main.rs:368-377`). All values are in RuView's own
/// units; the bridge normalizes them into `Observation.features` for fusion.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct SensingFeatures {
/// Mean RSSI across the CSI window (dBm).
pub mean_rssi: f64,
/// CSI amplitude variance.
pub variance: f64,
/// Motion-band spectral power (drives `motion_energy`).
pub motion_band_power: f64,
/// Breathing-band spectral power (drives `breathing_band`).
pub breathing_band_power: f64,
/// Dominant frequency of the CSI window (Hz).
pub dominant_freq_hz: f64,
/// Number of change points detected in the window (drives `transient`).
pub change_points: usize,
/// Total spectral power of the window.
pub spectral_power: f64,
}
/// The RuView classification block. Mirrors `ClassificationInfo`
/// (`main.rs:379-384`).
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct SensingClass {
/// Coarse motion level label (e.g. `"none"`, `"low"`, `"high"`).
pub motion_level: String,
/// Whether a person is present.
pub presence: bool,
/// Classification confidence `0.0..=1.0`.
pub confidence: f64,
}
/// A RuView signal field — a floor-plane grid of field values. Mirrors
/// `SignalField` (`main.rs:386-390`). The bridge derives a real position from
/// the strongest field peak (like `field_localize`) and **never fabricates**
/// coordinates when this is absent.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct SignalField {
/// Grid dimensions `[x, y, z]`.
pub grid_size: [usize; 3],
/// Row-major flattened field values; `len() == grid_size.product()`.
pub values: Vec<f64>,
}
impl SignalField {
/// Index `[x, y, z]` of the strongest field cell, or `None` if the grid is
/// empty / all-NaN. This is the honest "strongest field peak" readout that
/// `field_localize` (`field_localize.rs:16-27`) exposes — **not** calibrated
/// triangulation.
#[must_use]
pub fn peak_cell(&self) -> Option<[i32; 3]> {
let [nx, ny, nz] = self.grid_size;
if nx == 0 || ny == 0 || nz == 0 || self.values.is_empty() {
return None;
}
let mut best_idx: Option<usize> = None;
let mut best_val = f64::NEG_INFINITY;
for (i, &v) in self.values.iter().enumerate() {
if v.is_finite() && v > best_val {
best_val = v;
best_idx = Some(i);
}
}
let idx = best_idx?;
// Row-major: idx = ((x * ny) + y) * nz + z.
let z = idx % nz;
let y = (idx / nz) % ny;
let x = idx / (nz * ny);
Some([x as i32, y as i32, z as i32])
}
}
/// RuView's effective privacy class (the `effective_class` / privacy byte on
/// `TrustedOutput`).
///
/// This **mirrors** `wifi_densepose_bfld::PrivacyClass` (`bfld/lib.rs:103-116`,
/// `#[repr(u8)]`) — the four byte-level classes. The byte values are
/// **deliberately non-monotonic in information content**: `Derived = 1` carries
/// an identity embedding yet sorts *below* `Anonymous = 2`. The bridge's
/// `map_privacy` must therefore map by information content, NEVER by byte value
/// (ADR-262 §3.3 — the central correctness item).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum RuViewPrivacyClass {
/// Byte `0` — raw CSI amplitude, local-only.
Raw,
/// Byte `1` — derived **identity** features (identity_embedding +
/// identity_risk_score), LAN-only. The dangerous one (§3.3).
Derived,
/// Byte `2` — aggregate occupancy / motion, no identity.
Anonymous,
/// Byte `3` — care/regulated: occupancy minus risk score and hash;
/// raw suppressed.
Restricted,
}
impl RuViewPrivacyClass {
/// The raw byte value used by RuView's `#[repr(u8)]` enum
/// (`bfld/lib.rs:103`). Exposed only so callers can demonstrate the
/// non-monotonicity trap in tests; the bridge never maps off this byte.
#[must_use]
pub fn raw_byte(self) -> u8 {
match self {
RuViewPrivacyClass::Raw => 0,
RuViewPrivacyClass::Derived => 1,
RuViewPrivacyClass::Anonymous => 2,
RuViewPrivacyClass::Restricted => 3,
}
}
}
/// One sensing cycle, as a bridge input. Mirrors the join of `SensingUpdate`
/// (features + classification + signal_field) and the `TrustedOutput` trust
/// state (`trust_class`) that ADR-262 §1.2 / P1 say must be done at the bridge.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct SensingSnapshot {
/// Capture time, nanoseconds since Unix epoch (the real `SensingUpdate`
/// timestamp, ns).
pub timestamp_ns: u64,
/// CSI feature scalars (`/ws/sensing` feature set).
pub features: SensingFeatures,
/// Classification (motion level / presence / confidence).
pub classification: SensingClass,
/// Optional signal field for a real position readout.
pub signal_field: Option<SignalField>,
/// RuView's effective privacy class (the source-of-truth, §3.3).
pub trust_class: RuViewPrivacyClass,
/// Whether the governed engine demoted this cycle (`TrustedOutput.demoted`).
/// When `true` the emitted event must be `>= P2` and raw suppressed
/// (§3.3 / §4 P2 gate (b)).
pub demoted: bool,
/// Whether this cycle's identity surface is bound to an enrolled identity
/// (RuView's `identity_bound`). Promotes `Derived` to P5 when set.
pub identity_bound: bool,
/// Stable node id (e.g. `"esp32_room_01"`).
pub node_id: String,
}
@@ -0,0 +1,172 @@
//! ADR-262 P1 acceptance gates. Each test below IS an acceptance criterion.
//!
//! - round-trip: snapshot → FieldEvent → serde → equal
//! - is_fusable: emitted event passes the §11 fusability invariant
//! - fusion ingest accept: `RuFieldFusion::ingest` accepts it + `infer` runs
//! - privacy safety: `Derived` never maps to a low-privacy class (the §3.3 trap)
//! - determinism: same snapshot + same signer seed → identical event
use rufield_core::{FusionEngine, InferenceQuery, PrivacyClass};
use rufield_fusion::RuFieldFusion;
use rufield_provenance::{is_fusable, verify_event, Signer};
use wifi_densepose_rufield::{
map_privacy, snapshot_to_field_event, RuViewPrivacyClass, SensingClass, SensingFeatures,
SensingSnapshot, SignalField,
};
const SEED: &[u8; 32] = b"adr-262-bridge-seed-32-bytes-ok!";
fn signer() -> Signer {
Signer::from_seed(SEED)
}
/// A representative snapshot with a real signal field (so a position is derived).
fn sample_snapshot() -> SensingSnapshot {
SensingSnapshot {
timestamp_ns: 1_791_986_400_123_456_789,
features: SensingFeatures {
mean_rssi: -52.5,
variance: 0.73,
motion_band_power: 2.4,
breathing_band_power: 0.6,
dominant_freq_hz: 0.27,
change_points: 2,
spectral_power: 4.1,
},
classification: SensingClass {
motion_level: "high".into(),
presence: true,
confidence: 0.88,
},
signal_field: Some(SignalField {
grid_size: [2, 1, 2],
// peak at flat index 2 → cell [1,0,0]
values: vec![0.1, 0.2, 0.9, 0.3],
}),
trust_class: RuViewPrivacyClass::Anonymous,
demoted: false,
identity_bound: false,
node_id: "esp32_room_01".into(),
}
}
#[test]
fn gate_round_trip_serde_equal() {
let ev = snapshot_to_field_event(&sample_snapshot(), &signer());
let json = serde_json::to_string(&ev).expect("serialize");
let back: rufield_core::FieldEvent = serde_json::from_str(&json).expect("deserialize");
assert_eq!(ev, back, "FieldEvent must round-trip through serde unchanged");
}
#[test]
fn gate_is_fusable_verified_receipt() {
let ev = snapshot_to_field_event(&sample_snapshot(), &signer());
// Real (non-synthetic) event must carry a verifying ed25519 signature.
assert!(!ev.provenance.synthetic, "live event must NOT be marked synthetic");
assert!(ev.provenance.signature_hex.is_some(), "must be signed");
assert!(verify_event(&ev).is_ok(), "signature must verify");
assert!(is_fusable(&ev), "verified receipt ⇒ fusable (§11 invariant)");
}
#[test]
fn gate_fusion_ingest_accepts_and_infers() {
let ev = snapshot_to_field_event(&sample_snapshot(), &signer());
let mut engine = RuFieldFusion::new();
engine.ingest(ev).expect("fusion engine must accept the signed event");
// infer() must run without error (may or may not produce inferences).
let inferences = engine
.infer(&InferenceQuery::all())
.expect("infer() must run");
// The graph recorded the event/sensor provenance nodes.
assert!(
engine.graph().node_count() >= 2,
"ingest should record sensor + event nodes"
);
let _ = inferences; // count is not an accuracy claim
}
#[test]
fn gate_privacy_safety_derived_never_maps_to_low_privacy() {
// THE critical §3.3 gate. Derived carries identity ⇒ P4/P5, NEVER P1.
let p4 = map_privacy(RuViewPrivacyClass::Derived, false);
let p5 = map_privacy(RuViewPrivacyClass::Derived, true);
assert_eq!(p4, PrivacyClass::P4);
assert_eq!(p5, PrivacyClass::P5);
assert!(p4 >= PrivacyClass::P4, "Derived must be in the identity tier");
assert_ne!(p4, PrivacyClass::P1, "Derived must NEVER be P1");
// And end-to-end: an emitted event from a Derived snapshot must be P4/P5.
let mut snap = sample_snapshot();
snap.trust_class = RuViewPrivacyClass::Derived;
let ev = snapshot_to_field_event(&snap, &signer());
assert!(
ev.observation.privacy_class >= PrivacyClass::P4,
"emitted Derived event must be P4 or P5, got {:?}",
ev.observation.privacy_class
);
assert_eq!(ev.observation.privacy_class, ev.tensor.privacy_class);
}
/// Full §3.3 table over every RuView class → expected RuField class.
#[test]
fn gate_privacy_table_over_every_ruview_class() {
let cases = [
(RuViewPrivacyClass::Raw, false, PrivacyClass::P0),
(RuViewPrivacyClass::Derived, false, PrivacyClass::P4),
(RuViewPrivacyClass::Derived, true, PrivacyClass::P5),
(RuViewPrivacyClass::Anonymous, false, PrivacyClass::P2),
(RuViewPrivacyClass::Restricted, false, PrivacyClass::P2),
];
for (ruview, id_bound, expected) in cases {
assert_eq!(
map_privacy(ruview, id_bound),
expected,
"{ruview:?} (identity_bound={id_bound}) must map to {expected:?}"
);
}
}
/// Fail-closed: a demoted Raw snapshot must NOT emit P0 (raw) — it floors to P2.
#[test]
fn gate_demotion_is_fail_closed() {
let mut snap = sample_snapshot();
snap.trust_class = RuViewPrivacyClass::Raw; // would be P0
snap.demoted = true; // governed engine demotion
let ev = snapshot_to_field_event(&snap, &signer());
assert!(
ev.observation.privacy_class >= PrivacyClass::P2,
"demoted cycle must floor to >= P2, got {:?}",
ev.observation.privacy_class
);
}
#[test]
fn gate_determinism_same_seed_identical_event() {
let snap = sample_snapshot();
let a = snapshot_to_field_event(&snap, &Signer::from_seed(SEED));
let b = snapshot_to_field_event(&snap, &Signer::from_seed(SEED));
assert_eq!(a, b, "same snapshot + same signer seed ⇒ identical event");
// Including the signature (ed25519 is deterministic).
assert_eq!(a.provenance.signature_hex, b.provenance.signature_hex);
}
#[test]
fn no_fabricated_position_when_field_absent() {
let mut snap = sample_snapshot();
snap.signal_field = None;
let ev = snapshot_to_field_event(&snap, &signer());
assert!(ev.observation.range_m.is_none(), "no field ⇒ no fabricated range");
assert!(ev.observation.space_cell.is_none(), "no field ⇒ no fabricated cell");
assert!(
ev.observation.motion_vector.is_none(),
"no field ⇒ no fabricated motion vector"
);
}
#[test]
fn derives_real_position_from_field_peak() {
let ev = snapshot_to_field_event(&sample_snapshot(), &signer());
// peak at flat index 2, grid [2,1,2] (row-major) → cell [1,0,0]
assert_eq!(ev.observation.space_cell, Some([1, 0, 0]));
assert_eq!(ev.observation.range_m, Some(1.0));
}
@@ -16,12 +16,17 @@
//! so the bench and the report can never measure different graphs.
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use wifi_densepose_ruvector::ann_measure::{build_indices, queries, AnnBenchParams};
use wifi_densepose_ruvector::ann_measure::{
build_indices, build_quant_bits, queries, AnnBenchParams,
};
fn bench_ann(c: &mut Criterion) {
// Modest N so the bench builds quickly; the report covers the larger N.
let p = AnnBenchParams::default_fixture(10_000);
let (float_idx, quant_idx, _v) = build_indices(p);
let (float_idx, quant_idx, vectors) = build_indices(p);
// Multi-bit quant variants over the SAME graph/fixture (ADR-261 §11).
let quant_2bit = build_quant_bits(p, &vectors, 2);
let quant_4bit = build_quant_bits(p, &vectors, 4);
let qs = queries(p);
let k = p.k;
@@ -52,10 +57,10 @@ fn bench_ann(c: &mut Criterion) {
});
}
// Quantized HNSW at matched beam widths + rerank.
// Quantized HNSW (1-bit) at matched beam widths + rerank.
for &ef in &[64usize, 128] {
let rr = k * 5;
group.bench_function(format!("quant_hnsw_ef{ef}_rr{rr}"), |b| {
group.bench_function(format!("quant_hnsw_1bit_ef{ef}_rr{rr}"), |b| {
b.iter(|| {
let mut sink = 0u64;
for q in &qs {
@@ -67,6 +72,25 @@ fn bench_ann(c: &mut Criterion) {
});
}
// Multi-bit quant HNSW (ADR-261 §11): 2-bit and 4-bit traversal codes at a
// mid beam width, so the criterion medians show the per-bit QPS cost the
// scaling study reports against recall.
for (label, idx) in [("2bit", &quant_2bit), ("4bit", &quant_4bit)] {
for &ef in &[64usize, 128] {
let rr = k * 5;
group.bench_function(format!("quant_hnsw_{label}_ef{ef}_rr{rr}"), |b| {
b.iter(|| {
let mut sink = 0u64;
for q in &qs {
sink = sink
.wrapping_add(idx.search_quantized(black_box(q), k, ef, rr).len() as u64);
}
black_box(sink)
})
});
}
}
group.finish();
}
@@ -229,8 +229,24 @@ pub fn measure_quantized_hnsw(
}
/// Build both indices for `p` (shared insertion order + graph seed so the float
/// and quantized graphs are identical — the only variable is scoring).
/// and quantized graphs are identical — the only variable is scoring). The
/// quantized index uses the legacy **1-bit** code (ADR-261 §6); use
/// [`build_indices_bits`] for the multi-bit scaling study (§11).
pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
build_indices_bits(p, 1)
}
/// Build the float HNSW + a `bits`-bit quantized HNSW over the same fixture,
/// sharing the graph seed and insertion order so the *only* variable between the
/// float and quantized search is the traversal score. `bits ∈ {1, 2, 4}` (clamped
/// in [`QuantizedHnswIndex::build_bits`]). The float index is **independent of
/// `bits`** — callers sweeping `bits` should build the float index once and reuse
/// it (the quantized graph is identical across `bits`; only the per-node code
/// changes).
pub fn build_indices_bits(
p: AnnBenchParams,
bits: u32,
) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
let vectors = fixture(p);
let params = HnswParams {
m: 16,
@@ -242,11 +258,140 @@ pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<V
for v in &vectors {
float_idx.insert(v);
}
let quant_idx =
QuantizedHnswIndex::build(&vectors, p.dim, Metric::L2, params, p.rot_seed, p.k * 4);
let quant_idx = QuantizedHnswIndex::build_bits(
&vectors,
p.dim,
Metric::L2,
params,
p.rot_seed,
bits,
p.k * 4,
);
(float_idx, quant_idx, vectors)
}
/// Build only the `bits`-bit quantized index for `p`, reusing a fixture the
/// caller already has (avoids regenerating `N×dim` floats per bit-depth in the
/// scaling sweep). The graph seed/insertion order match [`build_indices_bits`],
/// so this quantized graph is identical to that one's at the same `p`.
pub fn build_quant_bits(p: AnnBenchParams, vectors: &[Vec<f32>], bits: u32) -> QuantizedHnswIndex {
let params = HnswParams {
m: 16,
ef_construction: 200,
ef_search: 64,
seed: p.graph_seed,
};
QuantizedHnswIndex::build_bits(vectors, p.dim, Metric::L2, params, p.rot_seed, bits, p.k * 4)
}
/// The fastest operating point of a method that meets `target` recall, as
/// `(qps, recall, label)`; `None` if no swept op met it.
type BestOp = Option<(f64, f64, String)>;
/// Sweep float HNSW over a fixed `ef` ladder; return the fastest op meeting
/// `target` recall.
pub fn best_float_op(
idx: &HnswIndex,
qs: &[Vec<f32>],
truth: &[HashSet<u32>],
k: usize,
target: f64,
) -> BestOp {
let mut best: BestOp = None;
for &ef in &[16usize, 32, 64, 128, 256] {
let r = measure_float_hnsw(idx, qs, truth, k, ef);
if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) {
best = Some((r.qps, r.recall, format!("ef={ef}")));
}
}
best
}
/// Sweep quant HNSW over a fixed `(ef, rerank)` ladder; return the fastest op
/// meeting `target` recall, plus the best recall reached anywhere on the ladder
/// (so a not-found verdict can report how close it got).
pub fn best_quant_op(
qidx: &QuantizedHnswIndex,
qs: &[Vec<f32>],
truth: &[HashSet<u32>],
k: usize,
target: f64,
) -> (BestOp, f64) {
let mut best: BestOp = None;
let mut best_recall_seen = 0.0f64;
for &ef in &[32usize, 64, 128, 256, 512] {
for &rr in &[k * 2, k * 5, k * 10, k * 20] {
let r = measure_quantized_hnsw(qidx, qs, truth, k, ef, rr);
best_recall_seen = best_recall_seen.max(r.recall);
if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) {
best = Some((r.qps, r.recall, format!("ef={ef} rr={rr}")));
}
}
}
(best, best_recall_seen)
}
/// One row of the ADR-261 §11 scaling study: at a fixed `(N, b)`, the equal-recall
/// (≥ `target`) operating points for float vs quant HNSW and their QPS ratio.
#[derive(Debug, Clone)]
pub struct ScalingRow {
/// Indexed vector count.
pub n: usize,
/// Traversal-code bit-depth (1, 2, or 4).
pub bits: u32,
/// Packed bytes per node of the quant code at this `b`.
pub bytes_per_node: usize,
/// Fastest float-HNSW op meeting `target` recall (qps, recall, label).
pub float_op: BestOp,
/// Fastest quant-HNSW op meeting `target` recall (qps, recall, label).
pub quant_op: BestOp,
/// Best recall the quant ladder reached at this `(N, b)` (≤ `target` ⇒ no op).
pub quant_best_recall: f64,
/// quant/float QPS ratio at equal recall, if both met `target`.
pub ratio: Option<f64>,
}
/// Run the ADR-261 §11 multi-bit scaling study: for each `N ∈ ns` and each
/// `b ∈ bits_set`, measure the equal-recall (≥ `target`) QPS ratio of quant-HNSW
/// vs float-HNSW on the shared fixture. Deterministic and `--no-default-features`
/// runnable. Returns one [`ScalingRow`] per `(N, b)`; the caller prints the table
/// and decides the crossover verdict. The float index is built once per `N` and
/// reused across `b` (the quant graph is identical across `b`).
pub fn run_scaling_study(
base: AnnBenchParams,
ns: &[usize],
bits_set: &[u32],
target: f64,
) -> Vec<ScalingRow> {
let mut rows = Vec::new();
for &n in ns {
let p = AnnBenchParams { n, ..base };
let (float_idx, _q1, vectors) = build_indices_bits(p, 1);
let qs = queries(p);
let truth = ground_truth(&float_idx, &qs, p.k);
let float_op = best_float_op(&float_idx, &qs, &truth, p.k, target);
for &b in bits_set {
let qidx = build_quant_bits(p, &vectors, b);
let (quant_op, quant_best_recall) =
best_quant_op(&qidx, &qs, &truth, p.k, target);
let ratio = match (&float_op, &quant_op) {
(Some((fqps, _, _)), Some((qqps, _, _))) => Some(qqps / fqps),
_ => None,
};
rows.push(ScalingRow {
n,
bits: qidx.bits(),
bytes_per_node: qidx.bytes_per_node(),
float_op: float_op.clone(),
quant_op,
quant_best_recall,
ratio,
});
}
}
rows
}
#[cfg(test)]
mod tests {
use super::*;
@@ -397,4 +542,143 @@ mod tests {
"best quant-HNSW recall {best_quant_recall:.4} below the 0.30 not-broken floor"
);
}
/// The ADR-261 §11 **multi-bit scaling study**. Sweeps `N` and `b ∈ {1,2,4}`,
/// printing the `(N, b) → recall / QPS / quant-vs-float ratio at equal recall`
/// surface and the crossover verdict. This is the source of truth for the §11
/// table. Run for the published numbers with:
///
/// ```text
/// cd v2 && ANN_SCALE_NS=10000,100000,250000 \
/// cargo test -p wifi-densepose-ruvector --no-default-features --release \
/// scaling_report -- --nocapture --ignored
/// ```
///
/// Marked `#[ignore]` so the default (debug) gate stays fast: it builds and
/// queries several indices up to large `N`, which is minutes under `--release`
/// and far too slow in debug. The CI-safe structural invariants are checked by
/// `scaling_study_small_is_consistent` below at tiny `N`.
#[test]
#[ignore = "scaling study — run explicitly with --release --ignored; minutes at large N"]
fn scaling_report() {
// N ladder: default 10k→100k→250k (a clean 25× span that builds+queries in
// a few minutes under --release on the test box). Override with
// ANN_SCALE_NS=a,b,c. The largest feasible N is documented in the ADR with
// the measured build/query time at the cap.
let ns: Vec<usize> = std::env::var("ANN_SCALE_NS")
.ok()
.map(|s| s.split(',').filter_map(|x| x.trim().parse().ok()).collect())
.unwrap_or_else(|| vec![10_000, 100_000, 250_000]);
let bits_set = [1u32, 2, 4];
let target = 0.90f64;
let base = AnnBenchParams::default_fixture(ns[0]);
println!("\n=== ADR-261 §11 multi-bit scaling study (planted-cluster synthetic) ===");
println!(
"dim={} clusters={} queries={} K={} noise={} graph_seed=0x{:X} rot_seed=0x{:X}",
base.dim, base.clusters, base.n_queries, base.k, base.noise, base.graph_seed, base.rot_seed
);
println!("metric=L2 M=16 ef_construction=200 target recall >= {target:.2} (use --release for QPS)");
println!(
"{:<9} {:>4} {:>9} {:>10} {:>22} {:>22} {:>12}",
"N", "bits", "B/node", "q_best_rec", "float@target", "quant@target", "quant/float"
);
let rows = run_scaling_study(base, &ns, &bits_set, target);
for row in &rows {
let float_s = row
.float_op
.as_ref()
.map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}"))
.unwrap_or_else(|| "none".to_string());
let quant_s = row
.quant_op
.as_ref()
.map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}"))
.unwrap_or_else(|| "none".to_string());
let ratio_s = row
.ratio
.map(|x| format!("{x:.2}x"))
.unwrap_or_else(|| "".to_string());
println!(
"{:<9} {:>4} {:>9} {:>10.3} {:>22} {:>22} {:>12}",
row.n, row.bits, row.bytes_per_node, row.quant_best_recall, float_s, quant_s, ratio_s
);
}
// Crossover verdict: report whether the quant/float ratio EVER exceeds 1.0
// at equal recall, and the per-bit trend of the best-quant-recall as N grows
// (is quant getting closer to the equal-recall regime, or not).
println!("\n--- crossover verdict (quant-HNSW > float-HNSW at equal recall?) ---");
let crossover: Vec<&ScalingRow> = rows
.iter()
.filter(|r| r.ratio.map(|x| x > 1.0).unwrap_or(false))
.collect();
if crossover.is_empty() {
println!("NO crossover at any measured (N, b): quant never met target recall AND beat float QPS.");
} else {
for r in &crossover {
println!(
"CROSSOVER at N={} b={}: quant/float = {:.2}x at recall >= {target:.2}",
r.n, r.bits, r.ratio.unwrap()
);
}
}
for &b in &bits_set {
let trend: Vec<(usize, f64)> = rows
.iter()
.filter(|r| r.bits == b)
.map(|r| (r.n, r.quant_best_recall))
.collect();
let trend_s: Vec<String> = trend
.iter()
.map(|(n, r)| format!("N={n}:{r:.3}"))
.collect();
println!("b={b} best-quant-recall trend: {}", trend_s.join(" "));
}
println!("======================================================================\n");
// Structural invariants (gate-safe at any N): at least one float op met
// target at every N (the baseline must work), and quant recall is in range.
for &n in &ns {
let any_float = rows.iter().any(|r| r.n == n && r.float_op.is_some());
assert!(any_float, "no float-HNSW op met target recall at N={n} — baseline broken");
}
for r in &rows {
assert!(
(0.0..=1.0).contains(&r.quant_best_recall),
"quant recall out of range at N={} b={}: {}",
r.n,
r.bits,
r.quant_best_recall
);
}
}
/// CI-safe structural check for the scaling study at tiny `N` (debug-fast):
/// the study runs end-to-end, bytes/node scales with `b`, and the float
/// baseline meets target at the smallest N. Does **not** assert any crossover
/// (that is the §11 measured question, answered by `scaling_report`).
#[test]
fn scaling_study_small_is_consistent() {
let base = AnnBenchParams::default_fixture(1500);
let ns = [1500usize, 3000];
let bits_set = [1u32, 2, 4];
let rows = run_scaling_study(base, &ns, &bits_set, 0.90);
assert_eq!(rows.len(), ns.len() * bits_set.len());
// Bytes/node scales with b at dim=128 (D=128): 16 / 32 / 64.
for r in rows.iter().filter(|r| r.n == 1500) {
let expect = match r.bits {
1 => 16,
2 => 32,
_ => 64,
};
assert_eq!(r.bytes_per_node, expect, "B/node wrong for b={}", r.bits);
}
// Float baseline must meet target at the smallest N.
assert!(
rows.iter().any(|r| r.n == 1500 && r.float_op.is_some()),
"float baseline failed target at small N"
);
}
}
@@ -1,4 +1,4 @@
//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261.
//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261 (multi-bit, §11).
//!
//! # The SymphonyQG bet (what we are testing)
//!
@@ -25,20 +25,26 @@
//! float and quantized search is **how a candidate is scored during traversal**,
//! so any QPS/recall difference is attributable to the quantization, not to a
//! different graph.
//! - **Quantized score = 1-bit Hamming over the RaBitQ Pass-2 rotated sign code**
//! ([`crate::rotation`] + the sign-quantization in [`crate::sketch`]). Each
//! node stores its `ceil(D/8)`-byte sign code (`D = next_pow2(dim)`). During
//! traversal we compare query-code vs node-code by **POPCNT Hamming** — a few
//! machine words, no per-dimension float work.
//! - **Quantized score = `b`-bit code over the RaBitQ Pass-2 rotated coordinates**
//! ([`crate::rotation`] + the multi-bit scalar quantizer mirrored from
//! [ADR-156 §10](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)'s
//! `coverage::measure_multibit`). Each node stores a `b`-bit-per-dimension code
//! over the padded rotation length `D = next_pow2(dim)`. During traversal we
//! compare query-code vs node-code by the **L1 distance over the per-dim
//! codes** — a few machine words of integer work, no per-dimension float work.
//! For `b == 1` the codes are `{0, 1}` and the L1 distance is **exactly the
//! 1-bit Hamming distance** of the original ADR-261 construction, so `b == 1`
//! is fully backward-compatible.
//! - **Exact float rerank** of the final beam: the top `rerank` candidates by
//! Hamming are re-scored with the true float metric and the best `k` returned.
//! code-L1 are re-scored with the true float metric and the best `k` returned.
//!
//! This trades a small recall hit (the 1-bit code is a coarse angle proxy — the
//! same ~46%-strict limitation ADR-156 §10 measured) for far cheaper per-node
//! scoring, recovered by the float rerank. **Whether that nets a QPS win at our
//! test scale is the measured question ADR-261 answers** — and at small N the
//! float distance is cheap enough that the Hamming saving may not pay off. We
//! report the real number, win or lose, and do not tune to manufacture a speedup.
//! Higher `b` keeps the traversal beam on-path better than 1-bit (ADR-156 §10
//! measured 1/2/3/4-bit strict-K coverage at ~46/54/67/74%), at a memory cost
//! that scales linearly with `b` (bytes/node = `ceil(D·b/8)`). **Whether the
//! extra bits net a QPS win at equal recall — and at what N a crossover with
//! float HNSW appears, if any — is the measured question ADR-261 §11 answers.**
//! We report the real number, win or lose, and do not tune to manufacture a
//! speedup.
//!
//! # Determinism & robustness
//!
@@ -53,56 +59,95 @@ use std::collections::{BinaryHeap, HashSet};
use crate::hnsw::{HnswIndex, HnswParams, Metric};
use crate::rotation::Rotation;
/// A 1-bit Pass-2 sign code for one vector, over the padded rotation length `D`.
/// Stored as packed bytes; compared by POPCNT Hamming.
/// Symmetric clamp range for the uniform mid-rise scalar quantizer, in rotated-
/// coordinate units. The normalized FHT (`1/√D`) puts AETHER-shape rotated
/// coordinates roughly in `[-3, 3]`; out-of-range coords clamp to the end codes.
/// This is the **same `RANGE = 3.0`** as ADR-156 §10's `coverage::measure_multibit`,
/// so the multi-bit code here is the same scheme that module measured.
const RANGE: f32 = 3.0;
/// A `b`-bit-per-dimension scalar code of a rotated embedding over the padded
/// length `D`, compared by per-dim L1.
///
/// For `bits == 1` the per-dim code is `{0, 1}` (sign), and L1 over those codes
/// is exactly POPCNT Hamming — so the 1-bit case is bit-for-bit the original
/// ADR-261 construction. For `bits ∈ {2, 4}` the code is a uniform mid-rise
/// quantizer with `2^bits` levels over `[-RANGE, RANGE]`.
#[derive(Debug, Clone)]
struct Code {
bits: Vec<u8>,
/// Per-dimension codes (`0..2^bits`), one entry per padded dimension `D`.
/// Kept unpacked as `u8` for branch-free L1; the *reported* memory cost is
/// the packed footprint (`ceil(D·bits/8)`), since a production node would
/// store the packed form. (We measure the packed bytes/node explicitly in
/// [`QuantizedHnswIndex::bytes_per_node`].)
codes: Vec<u8>,
}
impl Code {
/// Hamming distance to another code of the same length (popcount of XOR).
/// L1 distance over the per-dimension codes — the multi-bit generalization
/// of Hamming. At `bits == 1` (codes in `{0,1}`) this equals the popcount of
/// the XOR, i.e. the 1-bit Hamming distance.
#[inline]
fn hamming(&self, other: &Code) -> u32 {
let n = self.bits.len().min(other.bits.len());
fn l1(&self, other: &Code) -> u32 {
let n = self.codes.len().min(other.codes.len());
let mut acc = 0u32;
for i in 0..n {
acc += (self.bits[i] ^ other.bits[i]).count_ones();
acc += (self.codes[i] as i32 - other.codes[i] as i32).unsigned_abs();
}
acc
}
}
/// Build the packed 1-bit sign code of a rotated embedding over the padded
/// length `D = rotation.padded_dim()`. Bit set ⇒ rotated coord ≥ 0.
fn encode(embedding: &[f32], rotation: &Rotation) -> Code {
/// Quantize the rotated coordinates of `embedding` to a `bits`-bit-per-dimension
/// [`Code`] over the padded rotation length `D = rotation.padded_dim()`.
///
/// `bits == 1` reduces to sign-quantization (code `1` iff the rotated coord ≥ 0),
/// preserving the original 1-bit construction; `bits ∈ {2, 4}` uses a uniform
/// mid-rise quantizer with `2^bits` levels over `[-RANGE, RANGE]`, identical to
/// ADR-156 §10's `measure_multibit`.
fn encode(embedding: &[f32], rotation: &Rotation, bits: u32) -> Code {
let rotated = rotation.apply_padded(embedding);
let d = rotated.len();
let mut bits = vec![0u8; d.div_ceil(8)];
for (i, &c) in rotated.iter().enumerate() {
if c >= 0.0 {
bits[i / 8] |= 1 << (7 - (i % 8));
}
}
Code { bits }
let levels = 1u32 << bits; // 2^bits codes per dim
let codes: Vec<u8> = rotated
.iter()
.map(|&x| {
if bits == 1 {
// Sign code: identical to the original 1-bit construction.
u8::from(x >= 0.0)
} else {
let t = ((x + RANGE) / (2.0 * RANGE)).clamp(0.0, 1.0); // → [0,1]
let code = (t * (levels - 1) as f32).round() as u32;
code.min(levels - 1) as u8
}
})
.collect();
Code { codes }
}
/// Min-heap node for the quantized beam (closest Hamming at the top).
/// Packed bytes a node's `bits`-bit code occupies over padded length `D`:
/// `ceil(D·bits/8)`. The memory cost reported by ADR-261 §11 (1-bit → `D/8`,
/// 2-bit → `D/4`, 4-bit → `D/2`).
#[inline]
fn packed_bytes(padded_dim: usize, bits: u32) -> usize {
(padded_dim * bits as usize).div_ceil(8)
}
/// Min-heap node for the quantized beam (closest code-L1 at the top).
#[derive(Debug, Clone, Copy)]
struct HScored {
/// Hamming distance (quantized score) — the traversal key.
ham: u32,
/// Code-L1 distance (quantized score) — the traversal key.
dist: u32,
id: u32,
}
impl PartialEq for HScored {
fn eq(&self, other: &Self) -> bool {
self.ham == other.ham && self.id == other.id
self.dist == other.dist && self.id == other.id
}
}
impl Eq for HScored {}
impl Ord for HScored {
fn cmp(&self, other: &Self) -> Ordering {
self.ham.cmp(&other.ham).then(self.id.cmp(&other.id))
self.dist.cmp(&other.dist).then(self.id.cmp(&other.id))
}
}
impl PartialOrd for HScored {
@@ -110,7 +155,7 @@ impl PartialOrd for HScored {
Some(self.cmp(other))
}
}
/// Reversed wrapper for a min-heap (smallest Hamming at the top).
/// Reversed wrapper for a min-heap (smallest code-L1 at the top).
#[derive(Debug, Clone, Copy)]
struct MinH(HScored);
impl PartialEq for MinH {
@@ -131,33 +176,34 @@ impl PartialOrd for MinH {
}
/// A SymphonyQG-style HNSW: the same graph as [`HnswIndex`], traversed by a
/// **cheap 1-bit Hamming score**, with a final **exact-float rerank**.
/// **cheap `b`-bit code-L1 score**, with a final **exact-float rerank**.
///
/// Built by inserting the same vectors in the same order with the same seed as
/// a float [`HnswIndex`], so the two indices share identical graph structure and
/// only differ in how the beam is scored. The shared [`Rotation`] (seed + dim)
/// is the index/query frame for the 1-bit codes.
/// is the index/query frame for the `b`-bit codes. `bits ∈ {1, 2, 4}` selects
/// the traversal-code resolution; `bits == 1` is the original 1-bit Hamming
/// construction.
#[derive(Debug, Clone)]
pub struct QuantizedHnswIndex {
/// The underlying graph (built with the float metric for exact rerank).
graph: HnswIndex,
/// Per-node 1-bit Pass-2 codes, indexed by id (parallel to graph vectors).
/// Per-node `b`-bit codes, indexed by id (parallel to graph vectors).
codes: Vec<Code>,
/// The rotation frame shared by index and query codes.
rotation: Rotation,
/// Bits per dimension of the traversal code (`1`, `2`, or `4`).
bits: u32,
/// Number of final candidates to exact-float rerank (≥ k at query time).
default_rerank: usize,
}
impl QuantizedHnswIndex {
/// Build a quantized index over `vectors`, mirroring a float [`HnswIndex`]
/// built with the same `(dim, metric, params)` and insertion order. The
/// `rotation_seed` fixes the 1-bit code frame (index and query share it).
/// Build a 1-bit quantized index (the original ADR-261 construction).
///
/// `default_rerank` is how many top-Hamming candidates get an exact float
/// re-score before returning the best `k`; it is clamped to `≥ k` at query
/// time. A larger rerank recovers more recall at more float cost — the knob
/// that, alongside `ef`, sets the equal-recall operating point.
/// Equivalent to [`QuantizedHnswIndex::build_bits`] with `bits = 1`; kept as
/// the backward-compatible entry point so existing callers and tests are
/// unchanged.
pub fn build(
vectors: &[Vec<f32>],
dim: usize,
@@ -166,17 +212,41 @@ impl QuantizedHnswIndex {
rotation_seed: u64,
default_rerank: usize,
) -> Self {
Self::build_bits(vectors, dim, metric, params, rotation_seed, 1, default_rerank)
}
/// Build a `bits`-bit quantized index over `vectors`, mirroring a float
/// [`HnswIndex`] built with the same `(dim, metric, params)` and insertion
/// order. The `rotation_seed` fixes the code frame (index and query share it).
///
/// `bits` is clamped to `{1, 2, 4}` (the resolutions ADR-261 §11 sweeps): any
/// other value is rounded up to the nearest of these so the constructor is
/// total. `default_rerank` is how many top-code-L1 candidates get an exact
/// float re-score before returning the best `k`; it is clamped to `≥ k` at
/// query time. A larger rerank recovers more recall at more float cost — the
/// knob that, alongside `ef`, sets the equal-recall operating point.
pub fn build_bits(
vectors: &[Vec<f32>],
dim: usize,
metric: Metric,
params: HnswParams,
rotation_seed: u64,
bits: u32,
default_rerank: usize,
) -> Self {
let bits = clamp_bits(bits);
let rotation = Rotation::new(rotation_seed, dim);
let mut graph = HnswIndex::new(dim, metric, params);
let mut codes = Vec::with_capacity(vectors.len());
for v in vectors {
graph.insert(v);
codes.push(encode(v, &rotation));
codes.push(encode(v, &rotation, bits));
}
Self {
graph,
codes,
rotation,
bits,
default_rerank: default_rerank.max(1),
}
}
@@ -207,9 +277,23 @@ impl QuantizedHnswIndex {
self.default_rerank
}
/// SymphonyQG-style search: traverse the graph scoring candidates by **1-bit
/// Hamming**, collect a beam of `ef`, then **exact-float rerank** the top
/// `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
/// Bits per dimension of the traversal code.
#[inline]
pub fn bits(&self) -> u32 {
self.bits
}
/// Packed memory footprint of one node's traversal code, in bytes:
/// `ceil(D·bits/8)` where `D = next_pow2(dim)` is the padded rotation length.
/// This is the per-node cost ADR-261 §11 reports for each `b`.
#[inline]
pub fn bytes_per_node(&self) -> usize {
packed_bytes(self.rotation.padded_dim(), self.bits)
}
/// SymphonyQG-style search: traverse the graph scoring candidates by the
/// **`b`-bit code-L1**, collect a beam of `ef`, then **exact-float rerank**
/// the top `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
///
/// Degenerate cases mirror [`HnswIndex::search`]: empty ⇒ empty; `k == 0` ⇒
/// empty; `k > n` ⇒ all; never panics.
@@ -225,7 +309,7 @@ impl QuantizedHnswIndex {
}
let ef = ef.max(k).max(1);
let rerank = rerank.max(k);
let q_code = encode(query, &self.rotation);
let q_code = encode(query, &self.rotation, self.bits);
// Entry point: the graph's entry (highest-level node).
let entry = match self.graph.entry_point() {
@@ -233,18 +317,18 @@ impl QuantizedHnswIndex {
None => return Vec::new(),
};
// Greedy-descend upper layers by Hamming, then beam-search layer 0.
// Greedy-descend upper layers by code-L1, then beam-search layer 0.
let mut ep = entry;
let mut layer = self.graph.top_level();
while layer > 0 {
ep = self.greedy_hamming(&q_code, ep, layer);
ep = self.greedy_code(&q_code, ep, layer);
layer -= 1;
}
let beam = self.beam_hamming(&q_code, ep, ef);
let beam = self.beam_code(&q_code, ep, ef);
// Exact-float rerank of the top `rerank` Hamming candidates.
// Exact-float rerank of the top `rerank` code-L1 candidates.
let mut cand: Vec<HScored> = beam;
cand.sort_by_key(|c| c.ham);
cand.sort_by_key(|c| c.dist);
cand.truncate(rerank);
let mut reranked: Vec<(u32, f32)> = cand
.iter()
@@ -265,16 +349,16 @@ impl QuantizedHnswIndex {
self.search_quantized(query, k, self.graph.params_ef_search(), self.default_rerank)
}
/// Greedy single-best descent on a layer scored by Hamming.
fn greedy_hamming(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
/// Greedy single-best descent on a layer scored by code-L1.
fn greedy_code(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
let mut best = start;
let mut best_h = self.codes[best as usize].hamming(q_code);
let mut best_d = self.codes[best as usize].l1(q_code);
loop {
let mut improved = false;
for &nbr in self.graph.neighbours(best, layer) {
let h = self.codes[nbr as usize].hamming(q_code);
if h < best_h {
best_h = h;
let d = self.codes[nbr as usize].l1(q_code);
if d < best_d {
best_d = d;
best = nbr;
improved = true;
}
@@ -285,32 +369,32 @@ impl QuantizedHnswIndex {
}
}
/// Beam search on layer 0 scored by Hamming. Returns the `ef` best-Hamming
/// nodes (unsorted). Iterative — bounded by the visited set + the ef beam.
fn beam_hamming(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
/// Beam search on layer 0 scored by code-L1. Returns the `ef` best-code nodes
/// (unsorted). Iterative — bounded by the visited set + the ef beam.
fn beam_code(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
let mut visited: HashSet<u32> = HashSet::new();
let mut candidates: BinaryHeap<MinH> = BinaryHeap::new();
let mut results: BinaryHeap<HScored> = BinaryHeap::new(); // max-heap: worst at top
let h0 = self.codes[ep as usize].hamming(q_code);
let s0 = HScored { ham: h0, id: ep };
let d0 = self.codes[ep as usize].l1(q_code);
let s0 = HScored { dist: d0, id: ep };
visited.insert(ep);
candidates.push(MinH(s0));
results.push(s0);
while let Some(MinH(cur)) = candidates.pop() {
let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
if cur.ham > worst && results.len() >= ef {
let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX);
if cur.dist > worst && results.len() >= ef {
break;
}
for &nbr in self.graph.neighbours(cur.id, 0) {
if !visited.insert(nbr) {
continue;
}
let h = self.codes[nbr as usize].hamming(q_code);
let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
if results.len() < ef || h < worst {
let s = HScored { ham: h, id: nbr };
let d = self.codes[nbr as usize].l1(q_code);
let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX);
if results.len() < ef || d < worst {
let s = HScored { dist: d, id: nbr };
candidates.push(MinH(s));
results.push(s);
while results.len() > ef {
@@ -323,6 +407,17 @@ impl QuantizedHnswIndex {
}
}
/// Clamp a requested bit-depth to the supported `{1, 2, 4}` set (round up to the
/// nearest supported value; `0` → `1`, `3` → `4`, `> 4` → `4`).
#[inline]
fn clamp_bits(bits: u32) -> u32 {
match bits {
0 | 1 => 1,
2 => 2,
_ => 4,
}
}
#[cfg(test)]
mod tests {
use super::*;
@@ -463,4 +558,116 @@ mod tests {
let r = idx.search_quantized(&[], 2, 16, 4);
assert_eq!(r.len(), 2);
}
// ----- multi-bit (ADR-261 §11) -----
/// `bits == 1` via `build_bits` is byte-for-byte the legacy `build` 1-bit
/// construction: same codes, same search output. Backward-compatibility pin.
#[test]
fn one_bit_build_bits_matches_legacy_build() {
let vectors = planted(32, 400, 8, 0x1B17);
let legacy = QuantizedHnswIndex::build(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 40);
let viabits =
QuantizedHnswIndex::build_bits(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 1, 40);
assert_eq!(legacy.bits(), 1);
assert_eq!(viabits.bits(), 1);
let q = &vectors[123];
assert_eq!(
legacy.search_quantized(q, 10, 64, 40),
viabits.search_quantized(q, 10, 64, 40),
"build_bits(…,1,…) must equal legacy build(…)"
);
}
/// Unsupported bit-depths round up to the supported `{1,2,4}` set so the
/// constructor is total (no panic, predictable resolution).
#[test]
fn bits_are_clamped_to_supported_set() {
let vectors = planted(16, 50, 4, 0xB175);
for (req, exp) in [(0u32, 1u32), (1, 1), (2, 2), (3, 4), (4, 4), (7, 4)] {
let idx = QuantizedHnswIndex::build_bits(
&vectors,
16,
Metric::L2,
params(0x9),
0xB,
req,
16,
);
assert_eq!(idx.bits(), exp, "bits {req} should clamp to {exp}");
// and it must still search without panic
assert!(!idx.search_quantized(&vectors[0], 5, 32, 20).is_empty());
}
}
/// Bytes/node scales linearly with `bits`: for a power-of-two dim `D`,
/// 1-bit → D/8, 2-bit → D/4, 4-bit → D/2.
#[test]
fn bytes_per_node_scales_with_bits() {
let vectors = planted(128, 20, 4, 0xBEEF);
let b1 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 1, 16);
let b2 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 2, 16);
let b4 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 4, 16);
assert_eq!(b1.bytes_per_node(), 16, "128-d 1-bit = 16 B/node");
assert_eq!(b2.bytes_per_node(), 32, "128-d 2-bit = 32 B/node");
assert_eq!(b4.bytes_per_node(), 64, "128-d 4-bit = 64 B/node");
}
/// More bits must not *reduce* recall at a fixed (ef, rerank): the multi-bit
/// code is a strictly finer angle proxy than 1-bit, so the traversal beam can
/// only land on equal-or-better candidates for the rerank to repair. This is
/// the core ADR-261 §11 hypothesis (multi-bit keeps the beam on-path better),
/// pinned as a regression gate. We assert a small tolerance for ties.
#[test]
fn more_bits_does_not_reduce_recall() {
let dim = 64;
let n = 3000;
let clusters = 32;
let seed = 0x7A11;
let vectors = planted(dim, n, clusters, seed);
let recall_for = |bits: u32| -> f64 {
let idx = QuantizedHnswIndex::build_bits(
&vectors,
dim,
Metric::L2,
params(0xA11A),
0x5EED,
bits,
// Modest rerank so traversal quality — not a huge rerank pool —
// is what drives the recall difference between bit depths.
20,
);
let mut total = 0.0f64;
let n_queries = 64;
for q in 0..n_queries {
let c = q % clusters;
let mut cs = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
let centre: Vec<f32> = (0..dim).map(|_| gauss(&mut cs) * 3.0).collect();
let mut s = seed ^ 0xDEAD_0000 ^ (q as u64).wrapping_mul(0x2545_F491);
let qv: Vec<f32> = (0..dim).map(|d| centre[d] + gauss(&mut s) * 0.35).collect();
let truth: HashSet<u32> = idx
.graph()
.brute_force(&qv, 10)
.into_iter()
.map(|(id, _)| id)
.collect();
let got = idx.search_quantized(&qv, 10, 64, 20);
let hit = got.iter().filter(|(id, _)| truth.contains(id)).count();
total += hit as f64 / 10.0;
}
total / n_queries as f64
};
let r1 = recall_for(1);
let r2 = recall_for(2);
let r4 = recall_for(4);
// 2-bit and 4-bit must be at least as good as 1-bit (small tie tolerance).
assert!(
r2 + 0.02 >= r1,
"2-bit recall {r2:.4} regressed vs 1-bit {r1:.4}"
);
assert!(
r4 + 0.02 >= r1,
"4-bit recall {r4:.4} regressed vs 1-bit {r1:.4}"
);
}
}
+1 -1