mirror of
https://github.com/ruvnet/RuView
synced 2026-06-14 11:03:18 +00:00
Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| cf2a85db66 | |||
| 9b07dff298 |
@@ -8,12 +8,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
## [Unreleased]
|
||||
|
||||
### Security
|
||||
- **ADR-157 Milestone-1 B4 - constant-time HMAC sync-beacon tag compare (`wifi-densepose-hardware`).** `AuthenticatedBeacon::verify` compared the 8-byte HMAC-SHA256 tag with `self.hmac_tag == expected`, which short-circuits on the first differing byte and leaks, through verification latency, how many leading bytes an attacker's forged tag matched - a byte-by-byte tag-recovery oracle (~256*N trials instead of 256^N). Replaced with a hand-rolled branch-free `constant_time_tag_eq` (XOR-accumulate every byte difference into a single `u8`, no early exit, `#[inline(never)]` + `core::hint::black_box` to stop the optimizer reintroducing a short-circuit or a non-constant-time `memcmp`). **No new dependency** - ADR-157 had deferred this only to avoid adding the `subtle` crate; a fixed 8-byte compare needs none. Grade MEASURED (constant-time *construction*; micro-timing on a noisy host is a smoke check only, gated `#[ignore]`). Pinned by `tag_compare_is_constant_time_shape` (equal/first-differ/last-differ/all-differ/length-mismatch + an end-to-end `verify()` last-byte tamper), proven to fail on a last-byte-skipping constant-time bug. ADR-157 §8 B4 -> RESOLVED.
|
||||
- **ADR-080 open HIGH findings closed on the Rust `wifi-densepose-sensing-server` boundary (ADR-164 G11).** The QE sweep's three HIGH findings — XFF-spoofing bypass, leaked stack traces, JWT-in-URL (CWE-598) — were logged against the Python v1 API and never re-verified against the shipped Rust sensing-server; the HOMECORE/M7 sweep (ADR-161) covered `homecore-server`, not this crate.
|
||||
- **#2 leaked internal errors (the one live exposure) — FIXED.** Six handlers in `main.rs` serialized the internal error `Display` straight into the JSON response body: `edge_registry_endpoint` returned a panicked `spawn_blocking` `JoinError` (`"task … panicked"`) in a `500`, plus the raw upstream error in a `503`; `delete_model`/`delete_recording`/`start_recording` returned `std::io::Error` strings (OS detail / path); `calibration_start`/`calibration_stop` returned the `FieldModel` error chain. New `error_response` module logs the full detail **server-side only** (with a correlation id) and returns a generic body (`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file paths, no Debug chain. 5 module tests (a leak-substring guard proven to fail on the reverted old body) + the existing handler suite.
|
||||
- **#1 XFF-spoofing bypass — VERIFIED ABSENT, regression-pinned.** The sensing-server has no XFF-trusting control to bypass: there is no IP-based rate-limiter or IP-allowlist, and neither `bearer_auth` (token-only) nor `host_validation` (Host-header only) reads `X-Forwarded-For`/`X-Forwarded-Host` (no `forwarded`/`peer_addr`/`client_ip` anywhere in the crate). Added regression tests proving a spoofed `X-Forwarded-For` never flips an auth decision and a spoofed `X-Forwarded-Host` never bypasses the Host allowlist.
|
||||
- **#3 JWT-in-URL (CWE-598) — VERIFIED ABSENT, regression-pinned.** `require_bearer` reads the token only from the `Authorization` header; the WebSocket handlers take no token query param and the sole `Query` extractor (`EdgeRegistryParams`) is a non-secret `refresh` flag. Added a regression proving `?token=`/`?access_token=` in the URL never authenticates while the header path still does.
|
||||
|
||||
### Fixed
|
||||
- **ADR-155 Milestone-1b — metric-definition unification, the §8 backlog subset (Goals A/B/C).** Closed the two §8 metric-integrity items; every change pinned by a test, graded MEASURED. The audit (Goal A) also surfaced findings the §1 table under-counted — recorded honestly in ADR-155 §8.1, not hidden. Workspace stays green; Python proof unchanged (metrics are not on the deterministic proof's signal path).
|
||||
- **Goal B — `test_metrics.rs` now validates the production metric, not a reimplementation.** The integration test previously asserted properties of its OWN local `compute_pck`/`compute_oks` (a test that can't catch a canonical-impl bug — both could be wrong the same way). Hoisted the canonical core (`pck_canonical`/`oks_canonical`/`canonical_torso_size`/sigmas/`bounding_box_diagonal`) into a new **un-gated** `metrics_core` module so the single definition is reachable under `cargo test --no-default-features` (the `metrics` module is `tch-backend`-gated); `metrics` re-exports it → still exactly ONE implementation. Rewrote the test to assert the production `pck_canonical`/`oks_canonical` equal **hand-computed** fixtures (`canonical_pck_matches_hand_computed_fixture` = 3/4 correct ⇒ 0.75; hip↔hip normalizer pin; zero-visible⇒0.0; OKS perfect⇒1.0; fake-Gold pin) plus a differential cross-check (`test_kernel_agrees_with_canonical`: an independent raw-threshold kernel must AGREE with canonical where torso==1.0). `wifi-densepose-train --no-default-features`: test_metrics **10→12**, 0 failed.
|
||||
- **Goal C — divergent live-server PCK/OKS relabelled so they're never conflated with canonical.** Goal C named `training_api.rs:804` (torso-HEIGHT PCK); the audit found that file is an **orphan (not `mod`-declared, does not compile)** and the **real** live `best_pck`/`best_oks` come from `trainer.rs` — a **raw, unnormalized** `pck_at_threshold` and an **`area=1.0` fake-Gold** `oks_map` (both MISSED by ADR-155 §1, both on the claim-inflating side, both serialized as bare "PCK@0.2"/"OKS"). Torso-height/raw math is load-bearing (pixel-space, different scale axis, no `ndarray`/train dep), so the honest fix is **relabel, not force-unify**: `training_api.rs` `compute_pck` → `compute_pck_torso_height` + field/log docs; `trainer.rs` kernels documented raw/fake-Gold; `main.rs` prints `pck_raw@0.2` / `oks_map(area=1.0 proxy)`. No wire-format field or `pub`-fn renames (no silent API break). Pinned by `torso_pck_is_labelled_distinctly_from_canonical` + `pck_at_threshold_is_raw_unnormalized_not_canonical`. `wifi-densepose-sensing-server --no-default-features`: lib **450→451**, 0 failed. True unification onto `pck_canonical`/`oks_canonical` remains a tracked ADR-155 §8 item.
|
||||
- **Pre-existing `SketchBank::topk` heap inversion returned the FARTHEST sketches (found during ADR-156 §8 Pass-2 work).** The `n > k` partial-sort path in `wifi-densepose-ruvector/src/sketch.rs` used `BinaryHeap<Reverse<(dist,id)>>` (a min-heap) but its eviction logic treated the peek as the max, so it kept the k *farthest* sketches and returned them as "nearest." The shipped unit tests only exercised the `n ≤ k` fast path (≤ 3 entries), so the inversion shipped silently in ADR-084. Fixed to a plain max-heap. Pinned by `topk_heap_path_returns_nearest` (farthest-first insertion exposes it) and `tight_clusters_give_high_coverage_with_overfetch` (**measured 0.072 coverage on the old code** — effectively random — vs >0.99 fixed). Every ADR-084 top-K coverage number depends on the fixed path. MEASURED, not a no-op.
|
||||
- **ADR-154 Milestone-1 — cleared the P1 deferred backlog in `wifi-densepose-signal` (§7.4 #1, #10; partial #9, #13).** Each fix pinned by a regression test that fails on the old behaviour; every claim graded MEASURED / DATA-GATED; no fabricated thresholds. Python proof unchanged (`f8e76f21…46f7a`, bit-exact — the CIR ghost-tap guard is not on the deterministic proof path).
|
||||
- **#1 (MEASURED metric / DATA-GATED threshold): circular phase variance.** `cir.rs::phase_variance` computed a *linear* sample variance over phase angles that wrap at ±π, so a tightly-clustered set straddling the branch cut reported spuriously HIGH dispersion — false-tripping the `> TAU` ghost-tap **guard** on real, tightly-clustered CIR taps. Replaced with Mardia's **circular variance** V = 1 − R̄, bounded **[0,1]** and invariant to where the cluster sits on the circle. The old TAU-scaled threshold is meaningless on [0,1]; re-derived against a named const `GHOST_TAP_CIRCULAR_VARIANCE_MAX = 0.99` (fires only when R̄ ≤ 0.01 — essentially uniform phase). The **metric is MEASURED**; the **threshold value is DATA-GATED** (a clean single-path ramp also sweeps the circle, so V alone can't separate clean from unsanitized without labelled frames — the default is deliberately conservative, strictly more permissive at the wrap boundary than the buggy linear guard). Fails-on-old: `phase_variance_circular_not_fooled_by_branch_cut` (old linear variance > TAU on wrap-straddling phases while circular V≈0, guard no longer trips) + `phase_variance_circular_is_bounded_and_extremal` (V∈[0,1], V≈0 identical, V≈1 uniform).
|
||||
- **#10 (MEASURED): Welford n=0/n=1 finiteness guard pinned.** The shared `WelfordStats` (`field_model.rs`) `count < 2` guards keep `variance`/`sample_variance`/`std_dev`/`z_score` finite at the boundaries, but the n=0 case was untested (same family as the §4 divide-by-(n−1) trio). Added `welford_finite_at_n0_and_n1` — finite + documented-sentinel (0.0) at n=0/n=1. Fails-on-old proof: removing the `sample_variance` guard makes the test panic with "attempt to subtract with overflow" at the `(count − 1)` underflow (guard restored).
|
||||
@@ -22,9 +27,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- **Published HuggingFace model was unloadable — RVF format mismatch (#894).** The `ProgressiveLoader` rejected the published `ruvnet/wifi-densepose-pretrained` model with the opaque `invalid magic at offset 0: expected 0x52564653 (RVFS), got 0x77455735`, then silently fell back to signal heuristics (the "10 persons for 1" garbage reporters saw). The HF repo ships `model.safetensors`, `model-q{2,4,8}.bin` (magic `0x77455735` = "5WEw"), and `model.rvf.jsonl` — none carry the binary-RVF magic. New `model_format` module **auto-detects** RVFS / safetensors / HF-quant-bin / JSONL by magic+name, returns a **typed actionable** `ModelLoadError` (lists accepted formats + the one-command convert path — never the opaque magic), and **converts** `model.safetensors` / `model.rvf.jsonl` → RVF in-memory so the published full-precision model now loads via `--model`. A `--convert-model <in> --convert-out <out>` CLI subcommand gives a one-command offline path; the silent heuristics fallback is now a loud, actionable error. **Honest scope:** the converter wires the format/load path (safetensors F32 tensors → RVF weight segment, manifest written, Layer A/B/C all succeed, weights round-trip) — it does **not** claim end-to-end pose accuracy, since the HF pose-decoder architecture differs from this crate's inference head (still data-gated in #894). Quantized `.bin` blobs are rejected with a typed error pointing at the safetensors path. Pinned by `safetensors_converts_and_loads` + `hf_quant_classifies_to_actionable_error` (both fail on the old opaque-magic path).
|
||||
|
||||
### Changed
|
||||
- **ADR-157 Milestone-1 §5 #4 - native `wlanapi.dll` multi-BSSID throughput MEASURED on real hardware (`wifi-densepose-wifiscan`).** The ADR's prior status ("asserted but NOT implemented; live scanner is the ~2 Hz netsh shim") is now stale: `wlanapi_native.rs` already implements the real `WlanOpenHandle` -> `WlanEnumInterfaces` -> `WlanGetNetworkBssList` -> `WlanFreeMemory`/`WlanCloseHandle` FFI and `WlanApiScanner` already wires it native-first with a netsh fallback. This milestone **measured it on this box** (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13): a new `benchmark_backend(backend, window)` drives each backend over the same fixed 10 s wall-clock window so netsh is timed independently (the prior `benchmark()` picked native-first and never measured netsh on a Windows box where native works). **MEASURED: native 21.42 Hz vs netsh 3.84 Hz = 5.57x** (mean 5.0 BSSIDs/scan, both paths); a separate native-only run measured 18.0 Hz. Native genuinely beats netsh - this is a real positive result, not a fabricated "10x". 50 back-to-back native scans completed 50/50 with no handle leak/degradation. Live-WLAN tests (`measure_native_vs_netsh_throughput`, `native_scans_dont_leak_handles`, `measure_native_scan_rate`) are `#[ignore]` for CI but were RUN here; `native_scan_runs_real_ffi_on_windows` is a non-ignored schema-valid pin. ADR-157 §5 #4 + §8 -> MEASURED (was ACCEPTED-FUTURE / CLAIMED-unmeasured).
|
||||
- **Mesh partition risk now demotes the privacy class and is witnessed (ADR-032).** The dynamic min-cut guard's `at_risk` signal was advisory-only (it fed the recalibration advisor). It now also contributes to the ADR-141 privacy demotion alongside fusion- and array-level contradictions: a mesh close to partitioning makes the fused belief less trustworthy, so the cycle emits at a more restricted class (monotonic — information only removed). Because `effective_class` feeds the BLAKE3 witness, a fragmenting array now shifts the witness — partition risk is auditable, not just logged. The mesh computation moved ahead of the demotion step in `process_cycle`; new `mesh_guard_mut()` exposes risk-threshold tuning. Test proves a forced-risk 3-node cycle demotes PrivateHome Anonymous→Restricted and shifts the witness vs a clean *same-topology* baseline (the only delta between the two cycles is the forced risk).
|
||||
|
||||
### Added
|
||||
- **ADR-156 §8 Milestone-1: RaBitQ Pass-2 randomized rotation + multi-bit experiment — IMPLEMENTED & MEASURED (RESOLVED-PARTIAL).** Closes the §8 "Multi-bit / Extended RaBitQ" backlog item. New `wifi-densepose-ruvector/src/rotation.rs`: a deterministic randomized orthogonal rotation `R = H·D` — **Fast Hadamard Transform** (`O(d log d)`, in-place, `1/√m`-normalized so norm-preserving) + seeded ±1 sign flips (SplitMix64 from a stored `u64` seed; identical at index + query time). Chosen over a dense `d×d` matrix (`O(d²)`, infeasible at the 65,535-d the wire format provisions for); pads to `next_pow2(d)`. Additive, backward-compatible API (`Sketch::from_embedding_rotated`, `SketchBank::with_rotation` + `insert_embedding`/`topk_embedding`/`novelty_embedding`); Pass-1 and the wire format are byte-for-byte unchanged. New `coverage.rs` single-source-of-truth top-K coverage harness (anisotropic planted-cluster fixture, cosine ground truth) backs both a `#[test]` report and the `sketch_bench` coverage table. **MEASURED (dim=128 N=2048 K=8, 64 clusters, noise=0.35, 128 queries, seeded):** at the strict `candidate_k=K` bar, rotation lifts coverage **36.13% → 46.39%**; Pass-2 reaches the **ADR-084 ≥90% bar at candidate_k=24 (~3× over-fetch)**; multi-bit Pass-3 reaches 54%/67%/74% at 2/3/4-bit (strict bar). **Honest verdict: neither rotation nor ≤4-bit multi-bit clears the strict-K 90% bar on this distribution — the bar is met only via the over-fetch "candidate set" pattern ADR-084 specifies.** No benchmark was tuned to manufacture a pass; the strict-bar gap is documented (ADR-156 §10, ADR-084 "Pass 2" section). +19 tests in the crate (100→119), workspace **3,225 / 0 failed**, Python proof VERDICT: PASS (`f8e76f21…`, unchanged — sketch is not on the proof's signal path).
|
||||
- **Beyond-SOTA `v2/crates/` sweep (ADR-154–158) + full stub-implementation push — every claim MEASURED or graded.** A 5-milestone review/optimize/secure/benchmark/validate sweep, then a verified-audit-driven push to replace every production stub with real, tested logic (no labels, no placeholders). Each fix is pinned by a test that fails on the old code; every number ships with a reproduce command. Workspace: **3,122 tests / 0 failed** (`cargo test --workspace --no-default-features`), Python proof **VERDICT: PASS** (bit-exact).
|
||||
- **ADR-154 Signal/DSP** — revived a dead ADR-134 CIR coherence gate (canonical-56 vs ht20 mismatch meant it never ran in production: 8/8 Err → 8/8 Ok); NaN-bypass + window div0 guards; PSD FFT-planner cache (**2.0–3.1×**) + honored DTW band (**2.4–4.1×**).
|
||||
- **ADR-155 NN/Training** — unified 7 divergent PCK/OKS metric definitions into one canonical torso-normalized source (fixed two claim-inflating bugs: zero-visible PCK 1.0→0.0, OKS fake-Gold); leak-free subject-disjoint MM-Fi split + injected-leak detector; rapid_adapt replaced fake gradients with real finite-difference; proof.rs gained a min-decrease margin + committed-hash requirement; zero-copy ORT input (**1.48×**).
|
||||
|
||||
@@ -259,14 +259,46 @@ Validation runs against:
|
||||
- **ADR-083** (Proposed) — Per-cluster Pi compute hop. Defines the
|
||||
device class that hosts the sketch bank.
|
||||
|
||||
## Pass 2 — randomized rotation + multi-bit (ADR-156 §8, landed 2026-06)
|
||||
|
||||
The "Open question" below ("does `BinaryQuantized` need a randomized
|
||||
rotation pre-pass?") is now **answered with measured numbers** via
|
||||
ADR-156 §10. Summary:
|
||||
|
||||
- **Pass 2 (randomized rotation) is implemented** —
|
||||
`crates/wifi-densepose-ruvector/src/rotation.rs`: a deterministic
|
||||
`R = H·D` (Fast Hadamard Transform + seeded ±1 sign flips), `O(d log d)`
|
||||
/ `O(d)`, norm-preserving, reproducible from a stored `u64` seed. Opt-in
|
||||
via `Sketch::from_embedding_rotated` / `SketchBank::with_rotation`;
|
||||
Pass-1 API and wire format unchanged.
|
||||
- **Measured top-K coverage** (anisotropic planted-cluster fixture,
|
||||
cosine ground truth, dim=128 N=2048 K=8): rotation lifts coverage
|
||||
**36.13% → 46.39%** at the strict `candidate_k = K` bar, and Pass-2
|
||||
reaches the **≥90% acceptance bar at candidate_k = 24 (~3× over-fetch)**.
|
||||
Multi-bit (≤4-bit) reaches 74% at the strict bar. **Honest verdict:
|
||||
neither rotation nor ≤4-bit multi-bit clears the strict-K 90% bar on
|
||||
this distribution; the bar is met via the over-fetch "candidate set"
|
||||
pattern this ADR specifies** (Decision §"the canonical pattern" — sketch
|
||||
picks the candidate set, full precision refines). Full numbers and
|
||||
reproduce commands in ADR-156 §10.
|
||||
- **Pre-existing `SketchBank::topk` bug fixed** — the `n > k` heap path
|
||||
returned the k *farthest* sketches (min-heap mistaken for max-heap);
|
||||
only the `n ≤ k` fast path had test coverage. Fixed + regression-pinned
|
||||
(`topk_heap_path_returns_nearest`,
|
||||
`tight_clusters_give_high_coverage_with_overfetch`). This makes every
|
||||
prior top-K acceptance number in this ADR depend on the fixed path; the
|
||||
≥90% coverage criterion is only meaningful post-fix.
|
||||
|
||||
## Open questions
|
||||
|
||||
- **Does `BinaryQuantized` need a randomized rotation pre-pass for
|
||||
RuView's embedding distributions?** Pure sign quantization assumes
|
||||
zero-centered, isotropic embeddings. If AETHER / spectrogram
|
||||
distributions are skewed (likely for spectrogram), add a
|
||||
`randomized_rotation` pre-pass following the original RaBitQ paper
|
||||
(Gao & Long, SIGMOD 2024). Decided after pass-1 benchmark.
|
||||
RuView's embedding distributions?** **ANSWERED (ADR-156 §10):** rotation
|
||||
is built and measured — it helps (+10pp at strict K) but is not
|
||||
sufficient alone for strict-K 90% on the tested anisotropic
|
||||
distribution; the over-fetch candidate-set pattern meets the bar.
|
||||
Pure sign quantization assumes zero-centered, isotropic embeddings; the
|
||||
rotation decorrelates anisotropic coords as the RaBitQ paper
|
||||
(Gao & Long, SIGMOD 2024) prescribes.
|
||||
- **Sketch dimension target.** Default to the embedding's native
|
||||
dimension (128 for AETHER, 256 for spectrogram). Higher-dimensional
|
||||
sketches (Johnson-Lindenstrauss-projected to 512) trade compute for
|
||||
|
||||
@@ -189,10 +189,37 @@ The gap review surfaced ~60 findings; this milestone scoped to the provable inte
|
||||
- **ONNX read-lock concurrency win** — blocked on an `ort` release exposing `&self` `Session::run` (§4.2); harness already committed.
|
||||
- **native-conv naive-loop** perf rewrite (§4).
|
||||
- **`rf_encoder.rs` `assert_eq!`-on-checkpoint** and any other **tch-gated** panic-on-input sites — require a libtorch host to compile/verify (`model.rs` `amp_fc1` unbounded alloc is *indirectly* guarded by the new `config.validate()` upper bounds, but a direct guard + test is deferred).
|
||||
- **`sensing-server/training_api.rs` PCK** — unify the live-server torso-height PCK with `pck_canonical` (crosses the service + tch boundary).
|
||||
- **`test_metrics.rs` reference kernels** — the integration test's local `compute_pck`/`compute_oks` are independent reference impls (not production); fold them onto the canonical definition.
|
||||
- ~~**`sensing-server/training_api.rs` PCK**~~ — **RESOLVED in Milestone-1b (see §8.1, Goal C).** Relabelled (not unified) — and the audit found the *real* live divergence is in `trainer.rs`, not the orphaned `training_api.rs`.
|
||||
- ~~**`test_metrics.rs` reference kernels**~~ — **RESOLVED in Milestone-1b (see §8.1, Goal B).** Canonical core hoisted to an un-gated module; the integration test now validates the production functions against hand-computed fixtures + a differential cross-check.
|
||||
- **`metrics.rs` `compute_pck_v2`/`compute_oks_v2`/`MetricsAccumulatorV2`/`evaluate_dataset_v2`/`hungarian_assignment_v2`** — confirmed to have **zero external callers** (only `evaluate_dataset_v2`→`MetricsAccumulatorV2` internally). They are already `#[deprecated]` and route through canonical, so they are not a *divergent-definition* risk, only dead weight. Left in place this pass (public API in a tch-gated module; deleting needs a deprecation-cycle + tch host to verify) — flagged here for a future cleanup, NOT deleted silently.
|
||||
- **`sensing-server/trainer.rs` `pck_at_threshold` (raw) + `oks_map(area=1.0)` and the `training_bench.rs` raw kernel** — relabelled in Milestone-1b (§8.1); true unification onto `pck_canonical`/`oks_canonical` (needs a torso scale + the train crate as a sensing-server dep) remains deferred.
|
||||
- The remaining ~40 lower-severity review findings (style, micro-opt, doc) from the NN/training gap review.
|
||||
|
||||
### 8.1 Milestone-1b — metric-definition unification (the §8 metric subset) — RESOLVED
|
||||
|
||||
This milestone closed the two metric-integrity items above. The work is pinned by tests, graded MEASURED, and surfaced findings the §1 table missed.
|
||||
|
||||
**The complete, honest PCK / OKS audit map (every definition in `v2/`):**
|
||||
|
||||
| Definition (file:line) | Normalization basis | Threshold convention | Status |
|
||||
|---|---|---|---|
|
||||
| `metrics_core.rs` `pck_canonical` (was `metrics.rs`) | **hip↔hip torso WIDTH** (bbox-diag fallback), `[0,1]` coords | `k·torso` | **CANONICAL** |
|
||||
| `metrics_core.rs` `oks_canonical` | `s=sqrt(area)` from GT pose extent | COCO kernel | **CANONICAL** |
|
||||
| `metrics.rs` `compute_pck` / `compute_per_joint_pck` / `compute_oks` | — (thin wrappers) | — | route to canonical |
|
||||
| `metrics.rs` `aggregate_metrics` / `MetricsAccumulator` | — | — | route to canonical |
|
||||
| `metrics.rs` `compute_pck_v2` / `compute_oks_v2` / `MetricsAccumulatorV2` | hip↔hip (folded) | — | **legacy-redundant, deprecated, NO callers** — route to canonical |
|
||||
| `tests/test_metrics.rs` local `compute_pck`/`compute_oks` (removed) | raw-threshold reimpl | raw | **was independent reimpl** → now validate canonical + 1 differential kernel |
|
||||
| `benches/training_bench.rs` `compute_pck` | raw-threshold | raw | distinct-by-design (bench-only), annotated DO-NOT-REPORT |
|
||||
| `sensing-server/training_api.rs` `compute_pck` | **torso-HEIGHT** (nose→hip), **pixel-space** | `ratio·torso_h`, 50px floor | **distinct-by-design** — and **ORPHAN file (not `mod`-declared, does not compile)**; relabelled `compute_pck_torso_height` |
|
||||
| `sensing-server/trainer.rs` `pck_at_threshold` | **RAW (no normalization)** | raw `thr` | **distinct, LIVE** (drives `best_pck`); **MISSED by §1 table**; relabelled `pck_raw@0.2` |
|
||||
| `sensing-server/trainer.rs` `oks_map`→`oks_single(area=1.0)` | `area=1.0` | COCO kernel | **fake-Gold, LIVE** (drives `best_oks`); **MISSED by §1 table**; relabelled `oks_map(area=1.0 proxy)` |
|
||||
|
||||
**Findings the §1 seven-definition table under-counted (honest correction):** the live sensing-server claim surface is `trainer.rs` (in `lib.rs`), **not** the named `training_api.rs` — which is an **orphan file, never `mod`-declared, so it does not compile into the crate**. The live `best_pck` is a **raw, unnormalized** PCK and the live `best_oks` still uses the **`area=1.0` fake-Gold** path ADR-155 §2.1 reported as closed elsewhere. So the true metric landscape is **messier than §1 documented**: ≥3 PCK and ≥1 OKS live in `sensing-server`, two of them on the inflating side, and the file the ADR named for the fix was dead code. This is a finding, not a failure — recorded here rather than hidden.
|
||||
|
||||
**Goal B (`test_metrics.rs`) — RESOLVED, MEASURED.** The canonical core (`pck_canonical`/`oks_canonical`/`canonical_torso_size`/sigmas/`bounding_box_diagonal`) was hoisted into a new **un-gated** `metrics_core` module (the full `metrics` module is `tch-backend`-gated, so the canonical definition was previously unreachable from the workspace test gate; `metrics` now re-exports it → still ONE implementation). `tests/test_metrics.rs` now asserts the **production** functions against hand-computed fixtures — `canonical_pck_matches_hand_computed_fixture` (3/4 correct ⇒ 0.75, hand-derived), zero-visible⇒0.0, hip↔hip normalizer pin, OKS perfect⇒1.0, the fake-Gold pin — plus `test_kernel_agrees_with_canonical`, a differential test where an independent raw-threshold reference must AGREE with canonical in the torso=1.0 regime. (10→12 tests.)
|
||||
|
||||
**Goal C (`training_api.rs` PCK) — RESOLVED by RELABEL, MEASURED.** Torso-height is **load-bearing** (pixel-space, vertical nose→hip scale, `[17×3]` layout, no `ndarray`/train dep), so unifying would silently change the live numbers' meaning — exactly what to avoid. Resolution: relabel everywhere the metric surfaces so it is never read as canonical, in both the named `training_api.rs` (now `compute_pck_torso_height`, struct/JSON-field docs, `pck_torso_h@0.2` logs) **and** — the real fix — the LIVE `trainer.rs` path (`pck_at_threshold` documented raw-unnormalized; `oks_map` `area=1.0` flagged fake-Gold; `main.rs` prints `pck_raw@0.2` / `oks_map(area=1.0 proxy)`). No wire-format field or `pub`-fn renames (no silent API break). Pinned by `torso_pck_is_labelled_distinctly_from_canonical` (training_api) and `pck_at_threshold_is_raw_unnormalized_not_canonical` (the live kernel). True unification (route the live server through `pck_canonical`/`oks_canonical`) remains a deferred §8 item — it needs a torso scale on the live data and the train crate as a dep.
|
||||
|
||||
---
|
||||
|
||||
## 9. Consequences
|
||||
@@ -200,3 +227,5 @@ The gap review surfaced ~60 findings; this milestone scoped to the provable inte
|
||||
**Positive.** The training/metrics subsystem can now substantiate a clean accuracy claim: one documented metric used everywhere, a leak-free split, an honest TTA path, a proof that fails on noise and refuses to bless an unbaselined run, and two of the most claim-inflating bugs (false-perfect PCK, fake-Gold OKS) closed and pinned by regression tests. The unmeasured/unprovable parts are **disclosed**, not hidden.
|
||||
|
||||
**Negative / honest.** The reportable-metric tch-gated code cannot be compiled on the dev host (libtorch absent), so its validation rests on routing through the workspace-tested canonical functions plus review; the Rust deterministic proof is in SKIP until a baseline is committed on a tch host; the ONNX concurrency win is blocked upstream; and ~45 findings are deferred. None of these is presented as done.
|
||||
|
||||
**Picture changed by Milestone-1b (§8.1) — corrected, not hidden.** The §1 "seven divergent metrics" count was an **under-count**. The metric-unification audit (Goal A) found the live `wifi-densepose-sensing-server` carries additional, divergent definitions the §1 table omitted: a **raw, unnormalized** `pck_at_threshold` and an **`area=1.0` fake-Gold** `oks_map` in `trainer.rs` — and these, not the orphaned `training_api.rs` the backlog named, are what actually drive the live-reported `best_pck`/`best_oks`. Milestone-1b **relabelled** them (load-bearing math on different data; relabel beats false unification) and pinned the divergence with tests; full unification onto the canonical definition stays deferred. So the canonical *train/nn* metric is unified and test-validated end-to-end, but the *sensing-server* still computes (now clearly-labelled, non-canonical) progress proxies — disclosed here as the honest current state.
|
||||
|
||||
@@ -103,7 +103,7 @@ The double-clone elimination is also correctness-neutral: all 100 `viewpoint`/`m
|
||||
| # | Candidate | What | Grade | Verdict |
|
||||
|---|-----------|------|-------|---------|
|
||||
| **1** | **SymphonyQG** (SIGMOD 2025, public code) | Unified quantization + graph ANN; source reports **3.5–17× QPS over HNSW at equal recall**, pure-CPU / edge-portable. | **CLAIMED** (author-measured; **not reproduced on our hardware** — reproduction is future work) | **Lead beyond-SOTA candidate for the ruvector ANN path.** Propose as ACCEPTED-future; cite honestly as "claimed by source, reproduction pending." Best fit because the ruvector retrieval path (AETHER re-ID, sketch prefilter) is exactly an ANN problem and SymphonyQG is CPU/edge-portable like our deployment. |
|
||||
| **2** | **Multi-bit / Extended RaBitQ** | Extends our existing **1-bit** `sketch.rs` (ADR-084) to multiple bits per dimension — precisely the "Pass 2" our own `sketch.rs` doc deferred (1-bit sign quantization ships first; rotation/more-bits "later if benchmark-measured top-K coverage drops below the ADR-084 90% threshold"). | **CLAIMED** (RaBitQ family well-characterised; our 1-bit baseline is MEASURED in `sketch_bench`) | **Accepted near-term.** Concrete, in-scope, incremental — extends a MEASURED capability rather than importing a new system. #2 priority. |
|
||||
| **2** | **Multi-bit / Extended RaBitQ** | Extends our existing **1-bit** `sketch.rs` (ADR-084) to multiple bits per dimension — precisely the "Pass 2" our own `sketch.rs` doc deferred (1-bit sign quantization ships first; rotation/more-bits "later if benchmark-measured top-K coverage drops below the ADR-084 90% threshold"). | **MEASURED-on-our-hardware** (was CLAIMED) — Pass-2 rotation + multi-bit Pass-3 implemented and benchmarked; see §10. Rotation lifts strict-bar coverage 36%→46% and clears 90% only with ~3× over-fetch; multi-bit (≤4-bit) reaches 74% at the strict bar — both **short of the strict 90% bar** on the tested distribution. | **DONE — RESOLVED-PARTIAL.** Built and MEASURED (§10). The honest negative (no strict-bar 90% from rotation or ≤4-bit) is recorded, not hidden. Over-fetch + Pass-2 is the path that meets the bar; that matches ADR-084's "candidate set" deployment pattern. |
|
||||
| **3** | **GraphPose-Fi-style learned antenna-attention + ChebGConv fusion head** | Would replace the current **untrained identity-projection + mean-pool** "attention" (the `CrossViewpointAttention` default is `ProjectionWeights::identity` — not a *learned* attention) with a learned graph fusion head. | **DATA-GATED** (per ADR-152 measurement (b): architecture is **NOT** the current bottleneck — **data is**) | **ACCEPTED-future, data-gated. Do NOT build now.** ADR-152's measured lesson was that swapping architecture without more/better paired data does not move PCK. Building a learned fusion head before the data exists would repeat the mistake ADR-155 §5 also flagged for GraphPose-Fi. |
|
||||
| — | **Cramér-Rao / sensor-placement** (`geometry.rs` CRB) | Investigated for a 2026 advance beating the textbook Fisher-information CRB already implemented. | **Investigated — NO ACTION** | **Cleared honestly.** No 2026 method beats the closed-form Fisher-information CRB for this 2-D bearing problem; our implementation is already correct SOTA. (Recording a negative result is a deliberate anti-slop signal.) The only CRB change this milestone is the §2.3 *GDOP* honesty fix, which is a labelling/quantity correction, not an algorithmic one. |
|
||||
|
||||
@@ -139,7 +139,7 @@ The double-clone elimination is also correctness-neutral: all 100 `viewpoint`/`m
|
||||
The review surfaced more than this milestone scoped. Tracked here for a future ADR-156 milestone:
|
||||
|
||||
- **SymphonyQG reproduction** (§5 #1) — reproduce the 3.5–17× QPS-over-HNSW claim on our hardware before integrating into the ruvector ANN path. Currently CLAIMED-only.
|
||||
- **Multi-bit / Extended RaBitQ** (§5 #2) — implement the `sketch.rs` "Pass 2" (more bits per dimension and/or the randomized rotation) and re-measure top-K coverage against the ADR-084 ≥90% acceptance bar in `sketch_bench`.
|
||||
- **Multi-bit / Extended RaBitQ** (§5 #2) — **RESOLVED-PARTIAL** (see §10). Pass-2 randomized rotation (FHT + seeded ±1 sign flips, `src/rotation.rs`) and a multi-bit Pass-3 experiment landed and were MEASURED against the ADR-084 ≥90% bar. **Honest result: rotation helps (+10pp at the strict bar) and Pass-2 reaches 90% with ~3× over-fetch, but NEITHER rotation nor multi-bit (up to 4-bit) clears the strict candidate_k==K 90% bar on the tested anisotropic distribution.** The original `1-bit sign quantization ships first; rotation/more-bits later if benchmark-measured top-K coverage drops below 90%` deferral is therefore retired: the rotation is built, the bar is characterised, and the residual gap is documented rather than deferred.
|
||||
- **Learned cross-viewpoint fusion head** (§5 #3, GraphPose-Fi-style) — **data-gated**: blocked on the paired multi-room data ADR-152 measurement (b) identified as the real bottleneck; do not build the architecture first.
|
||||
- **`CrossViewpointAttention` learned projections** — the default `ProjectionWeights::identity` + mean-pool is honest but unlearned; wiring real learned Q/K/V projections is part of the data-gated item above (no learned weights ⇒ the "attention" is currently a geometric-bias-weighted average, which the code/docs should keep stating plainly).
|
||||
- **`coherence.rs` / `fusion.rs` micro-opts and the remaining lower-severity review findings** (style, doc, further hot-path tuning) from the fusion gap review.
|
||||
@@ -151,3 +151,57 @@ The review surfaced more than this milestone scoped. Tracked here for a future A
|
||||
**Positive.** The fusion path now: uses one canonical wrapped angular-distance helper; reports a **real** dimensionless GDOP instead of a mislabeled RMSE; cannot be panicked by crafted multistatic indices or a zero-bin spectrogram (DoS closed); and does one embedding clone per viewpoint instead of two (measured). Every fix is pinned by a test that fails on the old code, and the ANN/fusion SOTA landscape is graded so the near-term (multi-bit RaBitQ) and the data-gated (learned fusion) are not confused.
|
||||
|
||||
**Negative / honest.** The headline angular-wrap fix is a **numeric no-op** under the current cos kernel — we land it for contract/maintainability, not because it changes an output, and we say so. The two strongest external candidates (SymphonyQG, learned fusion) are **not built here** — one is CLAIMED-pending-reproduction, the other is data-gated by a prior measurement. The perf win is a **local hot-path** improvement, modest in the end-to-end pipeline (attention dominates). None of these is presented as more than it is.
|
||||
|
||||
---
|
||||
|
||||
## 10. RaBitQ Pass-2 / multi-bit — IMPLEMENTED & MEASURED (§8 backlog item #2)
|
||||
|
||||
Milestone-1 of the §8 backlog. Status: **RESOLVED-PARTIAL** — built, measured, honest negative on the strict bar.
|
||||
|
||||
### 10.1 What landed
|
||||
|
||||
- **`crates/wifi-densepose-ruvector/src/rotation.rs`** (new) — `Rotation`, a deterministic randomized orthogonal rotation `R = H·D`: a **Fast Hadamard Transform** (`O(d log d)`, in-place butterfly, `1/√m` normalized so it is norm-preserving) composed with a diagonal of **seeded ±1 sign flips** (SplitMix64 from a stored `u64` seed). Chosen over a dense `d×d` matrix because that is `O(d²)` memory/time and infeasible at the 65,535-d the wire format provisions for; FHT is the standard fast-orthogonal (randomized-Hadamard / fast-JL) construction. Non-power-of-two `d` zero-pads to `next_pow2(d)` and reads back the first `d` coords.
|
||||
- **`sketch.rs`** — additive Pass-2 API: `Sketch::from_embedding_rotated`, `SketchBank::with_rotation` + `insert_embedding` / `topk_embedding` / `novelty_embedding`. **Pass 1 (`from_embedding`) is byte-for-byte unchanged**; a Pass-2 sketch has identical `embedding_dim` / packed-byte length / wire shape, so `WireSketch` and existing callers (`event_log.rs`, `signal/longitudinal.rs`) are untouched. Default behaviour preserved.
|
||||
- **`coverage.rs`** (new) — single-source-of-truth top-K coverage harness on a deterministic **anisotropic planted-cluster** fixture (cosine ground truth, the metric a sign sketch approximates). Backs both the `pass2_coverage_report` unit test and the `sketch_bench` coverage table.
|
||||
- **Multi-bit Pass-3 experiment** — `coverage::measure_multibit`: rotate, then `b`-bit uniform scalar-quantize each coord, rank by L1 over codes. Measures the bit/coverage tradeoff.
|
||||
|
||||
### 10.2 Pre-existing bug found and fixed (disclosed)
|
||||
|
||||
Building the coverage harness surfaced a **pre-existing correctness bug in `SketchBank::topk`** (shipped in ADR-084): the `n > k` heap path used `BinaryHeap<Reverse<(dist,id)>>` (a *min*-heap) but its comment/logic treated the peek as the max, so it evicted the *nearest* and returned the **k farthest** sketches as "nearest." The shipped unit tests only exercised the `n ≤ k` fast path (≤ 3 entries), so it was never caught. Fixed to a plain max-heap. Pinned by **`topk_heap_path_returns_nearest`** (fails on the old heap when entries are inserted farthest-first) and **`tight_clusters_give_high_coverage_with_overfetch`** (measured **0.072** coverage on the old code — random — vs **>0.99** fixed). This is a real, measured behaviour fix, not a no-op.
|
||||
|
||||
### 10.3 MEASURED top-K coverage
|
||||
|
||||
Test machine: Windows 11, `cargo bench --release` / `cargo test`. Fixture: **dim=128, N=2048, K=8, 64 planted clusters, intra-cluster noise=0.35, 128 queries, master_seed=0xAD000084, rotation_seed=0x5EEDC0DE12345678**, ground-truth metric = cosine. Reproduce: `cargo test -p wifi-densepose-ruvector --no-default-features pass2_coverage_report -- --nocapture` or `cargo bench -p wifi-densepose-ruvector --bench sketch_bench -- pass2_coverage`.
|
||||
|
||||
**Coverage vs over-fetch (`coverage = |sketch_topK ∩ float_cosine_topK| / K`):**
|
||||
|
||||
| candidate_k | Pass-1 (1-bit, no rot) | Pass-2 (1-bit, rot) | vs 90% bar |
|
||||
|---|---|---|---|
|
||||
| **8 (= K, strict bar)** | **36.13%** | **46.39%** | both **BELOW** |
|
||||
| 16 | 62.79% | 75.59% | below |
|
||||
| 24 | 83.89% | **91.60%** | **Pass-2 clears** |
|
||||
| 32 | 100.00% | 100.00% | clears |
|
||||
| 64 | 100.00% | 100.00% | clears |
|
||||
|
||||
**Multi-bit Pass-3 at the strict bar (candidate_k = K = 8):**
|
||||
|
||||
| Variant | Coverage | Memory |
|
||||
|---|---|---|
|
||||
| Pass-1 (1-bit, no rot) | 36.13% | 16 B/vec |
|
||||
| Pass-2 (1-bit, rot) | 46.39% | 16 B/vec |
|
||||
| Pass-3 (rot, 2-bit) | 54.39% | 32 B/vec |
|
||||
| Pass-3 (rot, 3-bit) | 66.70% | 48 B/vec |
|
||||
| Pass-3 (rot, 4-bit) | 74.22% | 64 B/vec |
|
||||
|
||||
### 10.4 Honest verdict
|
||||
|
||||
- **Rotation consistently helps** — +10.3 pp at the strict bar (36.13%→46.39%) and a uniform lift at every over-fetch level. The FHT construction is verified norm-preserving and deterministic.
|
||||
- **Neither rotation nor multi-bit (≤4-bit) clears the strict candidate_k==K 90% bar** on this anisotropic distribution. 1-bit sign quantization simply cannot resolve 8-of-2048 from sign bits alone; even 4× memory (4-bit) reaches only 74%.
|
||||
- **Pass-2 reaches the 90% bar at candidate_k=24 (~3× over-fetch)** — i.e. fetch ≥24 sketch candidates, refine to K with full float. This is exactly the "candidate set, then full refinement" deployment pattern ADR-084 specifies, so the bar is met *in the deployment the sensor is designed for*, just not at strict K=K.
|
||||
- **This is a measured, partial win, reported as such.** No benchmark was tuned to manufacture a pass. The strict-bar gap (and the multi-bit tradeoff that doesn't close it) is documented rather than spun.
|
||||
|
||||
### 10.5 Deferred sub-items (graded, not dropped)
|
||||
|
||||
- **Strict-bar 90% from a richer code** — neither rotation nor uniform multi-bit closes it here. A learned/asymmetric quantizer or the full RaBitQ residual-distance estimator (not just a uniform scalar code) might, but is unbuilt and **unmeasured** — explicitly deferred, not claimed.
|
||||
- **Distribution sensitivity** — the result is for one synthetic anisotropic distribution; on real AETHER traces the strict-bar number may differ. Re-measuring on recorded embeddings is deferred to the ADR-084 post-merge soak.
|
||||
- **Promoting a `MultiBitSketch` type** — the multi-bit code lives in the measurement harness, not as a shipped sketch type. Building the production type is gated on a use site actually needing strict-K (vs over-fetch), which the measurement says is not required today.
|
||||
|
||||
@@ -85,9 +85,11 @@ A new criterion bench (`harness = false`, registered in `Cargo.toml`) drives eac
|
||||
|
||||
`OpportunisticCsiBridge::ingest` built `CsiReportPayload { n_subcarriers: self.amp_accum.len() as u16, … }`. The `as u16` would silently wrap a count above 65 535. **This is unreachable in practice**: `ingest` gates `frame.subcarrier_count() > MAX_REPORT_SUBCARRIERS` (484) at entry and returns `None`, and `report.validate()` independently rejects oversized counts downstream. We replaced the cast with `u16::try_from(self.amp_accum.len()).ok()?` (drop-instead-of-truncate) so the construction is **correct-by-construction** rather than relying on the upstream gate. We disclose this as **defense-in-depth on an unreachable path, not a live bug** — no behavior change, no new test (the gate already prevents the input that would exercise it).
|
||||
|
||||
### 2.6 §B4 — constant-time HMAC tag compare: **DEFERRED, not landed** (disclosed)
|
||||
### 2.6 §B4 — constant-time HMAC tag compare: **RESOLVED — no-dependency hand-rolled constant-time compare (Milestone-1)**
|
||||
|
||||
`secure_tdm.rs:284` compares the 8-byte HMAC tag with `self.hmac_tag == expected` (data-dependent, non-constant-time). The research authorized adding `subtle::ConstantTimeEq` **only if `subtle` were already a direct dependency** — it is not (only transitive, via a crypto crate). Per that guidance, and because this is an **8-byte tag on a LAN multistatic sync beacon** (not a remote attacker-controlled timing-oracle surface), we **do not add a direct dependency** for it. Tracked in §8 as a deferred item, not silently dropped.
|
||||
`secure_tdm.rs` compared the 8-byte HMAC tag with `self.hmac_tag == expected` (data-dependent, non-constant-time: short-circuits on the first differing byte, leaking through verification latency how many leading bytes a forged tag matched — a byte-by-byte tag-recovery oracle). Milestone-3 deferred this **only** to avoid adding the `subtle` crate as a direct dependency. Milestone-1 resolves it **without any dependency**: a hand-rolled `constant_time_tag_eq(a, b)` that XOR-accumulates every byte difference into a single `u8` with **no early exit**, then compares the accumulator to zero exactly once. `#[inline(never)]` + `core::hint::black_box(diff)` stop the optimizer from reintroducing a short-circuit or lowering the loop into a non-constant-time `memcmp`; a length mismatch returns `false` without inspecting contents. The former `==` verify site now calls this helper.
|
||||
|
||||
**Test (fails on old code, the hard gate):** `tag_compare_is_constant_time_shape` — asserts correct accept/reject for equal, first-byte-differ, last-byte-differ, all-byte-differ, and length-mismatch tags, plus an end-to-end `verify()` last-byte-only tamper. Verified to **bite**: introducing a classic constant-time bug (loop `take(LEN-1)`, skipping the last byte) makes it fail on `last-byte-differ must reject`. A coarse timing-invariance smoke check `tag_compare_timing_invariance_smoke` exists but is `#[ignore]`d (noisy host — not a CI gate). **Grade MEASURED** (constant-time *construction*; micro-timing on a noisy host is only a smoke check, disclosed honestly). Tracked RESOLVED in §8.
|
||||
|
||||
---
|
||||
|
||||
@@ -143,7 +145,7 @@ Grades: **MEASURED** (source measured it, ideally public method/code), **CLAIMED
|
||||
| 1 | **CSI vital signs (HR/BR)** | Deep-CSI vital-sign models report **MAE ~2–3 BPM** vs our classical IIR-bandpass + autocorrelation/zero-crossing. | **DATA-GATED + CLAIMED** | **NO ACTION on method.** A deep model needs **paired PPG/ECG ground truth** we do not have, and no public ESP32 artifact reproduces the cited MAE on commodity CSI. Our classical method is the honest commodity baseline; the real wins this milestone are the A1/A3 robustness fixes, not a new model. |
|
||||
| 2 | **802.11bf-2025 conformance** | Adopt a conformance test-vector suite for the `ieee80211bf/` forward-compat model. | **CLAIMED (not public)** | **NO ACTION.** No commodity silicon ships a conformant 802.11bf interface as of 2026, and the conformance suites are **WBA / Wi-Fi Alliance pre-certification** material, **not public**. Our model's "no OTA encoding until silicon exists" posture (ADR-153) is the correct one. Tracked in §8: *add SBP conformance vectors when the WFA publishes a test plan* — we will **not invent vectors**. |
|
||||
| 3 | **Per-room calibration (ADR-151)** | Bank-of-specialists + drift-veto vs a 2026 calibration SOTA. | **CLAIMED on numbers, DATA-GATED on a head-to-head** | **NO ACTION on architecture.** The bank-of-specialists + drift-veto design is SOTA-shaped, but we have **no head-to-head PCK** against a published method (no paired multi-room data). The geometry-conditioned LoRA head is **built-but-unconsumed** and data-gated → **ACCEPTED-FUTURE** (§8), not built now. |
|
||||
| 4 | **Multi-BSSID throughput (wifiscan)** | The module docs assert a native `wlanapi.dll` FFI 10–20 Hz path; the current `WlanApiScanner` wraps `netsh` (~2 Hz). | **CLAIMED-unmeasured** | **NO ACTION + corrected expectation.** The native FFI fast path is **asserted but NOT implemented** — the live scanner is the ~2 Hz netsh shim. The "10×" is unmeasured. → **ACCEPTED-FUTURE** (§8). **We explicitly do NOT claim a speedup that does not exist.** |
|
||||
| 4 | **Multi-BSSID throughput (wifiscan)** | The module docs assert a native `wlanapi.dll` FFI 10–20 Hz path; the current `WlanApiScanner` wraps `netsh` (~2 Hz). | **MEASURED (Milestone-1)** | **IMPLEMENTED + MEASURED — real positive win.** Status corrected: the native FFI is **fully implemented and wired live** (`wlanapi_native::scan_native` calls `WlanOpenHandle`/`WlanEnumInterfaces`/`WlanGetNetworkBssList`/`WlanFreeMemory`/`WlanCloseHandle`; `WlanApiScanner::scan_instrumented` runs it native-first with a netsh fallback). Milestone-1 **measured both paths on this box** (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13) over an identical 10 s wall-clock window via a new `benchmark_backend`: **native 21.42 Hz vs netsh 3.84 Hz = 5.57× MEASURED** (mean 5.0 BSSIDs/scan each; native-only run 18.0 Hz). Native genuinely beats netsh — a real measured multiple, **not** a fabricated 10×; the achieved 21.4 Hz lands in the asserted >2 Hz regime though below the asserted 10–20 Hz upper bound. 50 back-to-back native scans = 50/50 OK, no handle leak. → §8 MEASURED. |
|
||||
|
||||
---
|
||||
|
||||
@@ -176,10 +178,10 @@ Grades: **MEASURED** (source measured it, ideally public method/code), **CLAIMED
|
||||
|
||||
## 8. Deferred backlog (NOT silently dropped)
|
||||
|
||||
- **§B4 constant-time HMAC compare** — `secure_tdm.rs:284` uses `==` on the 8-byte tag. Add `subtle::ConstantTimeEq` **if** `subtle` becomes a direct dependency for another reason; not worth a new dependency for an 8-byte LAN sync-beacon tag (out of the current threat model). Deferred, not dropped.
|
||||
- **§B4 constant-time HMAC compare** — **RESOLVED (Milestone-1).** Replaced the short-circuiting `==` on the 8-byte tag with a hand-rolled branch-free `constant_time_tag_eq` (XOR-accumulate, no early exit, `#[inline(never)]` + `black_box`). **No new dependency** — the `subtle` crate was the only reason this was deferred, and a fixed 8-byte compare needs none. Pinned by `tag_compare_is_constant_time_shape` (proven to fail on a last-byte-skipping bug). Grade MEASURED (constant-time construction). See §2.6.
|
||||
- **802.11bf SBP conformance vectors** (§5 #2) — add real conformance test vectors to the `ieee80211bf/` model **when the Wi-Fi Alliance / WBA publishes a public test plan**. Do not invent vectors before then.
|
||||
- **Geometry-conditioned LoRA calibration head** (§5 #3) — built-but-unconsumed and **data-gated** on paired multi-room PCK data (ADR-152 measurement (b): data, not architecture, is the bottleneck). ACCEPTED-FUTURE.
|
||||
- **Native `wlanapi.dll` FFI multi-BSSID fast path** (§5 #4) — the asserted 10–20 Hz path is **not implemented**; the live scanner is the ~2 Hz netsh shim. Implement and **measure** the real throughput before claiming any multiple. ACCEPTED-FUTURE, CLAIMED-unmeasured until then.
|
||||
- **Native `wlanapi.dll` FFI multi-BSSID fast path** (§5 #4) — **RESOLVED + MEASURED (Milestone-1).** The native FFI is implemented and wired live (native-first, netsh fallback). Measured on this box (Intel Wi-Fi 7 BE201 320MHz, 2026-06-13): **native 21.42 Hz vs netsh 3.84 Hz = 5.57×**, mean 5.0 BSSIDs/scan, 50/50 native scans with no handle leak. Real positive result — no fabricated 10×. See §5 #4. (Note: a prior sweep recorded 9.74 Hz on a different/older adapter; the per-adapter number varies, the ratio over netsh is the claim.)
|
||||
- **Deep-CSI vital-sign model** (§5 #1) — DATA-GATED on paired PPG/ECG ground truth. No public ESP32 artifact reproduces the cited ~2–3 BPM MAE. Not on the near-term path.
|
||||
|
||||
---
|
||||
|
||||
@@ -47,6 +47,42 @@ type HmacSha256 = Hmac<Sha256>;
|
||||
/// Size of the HMAC-SHA256 truncated tag (manual crypto mode).
|
||||
const HMAC_TAG_SIZE: usize = 8;
|
||||
|
||||
/// Constant-time comparison of two fixed-size HMAC/auth tags.
|
||||
///
|
||||
/// ADR-157 §B4: the previous `self.hmac_tag == expected` short-circuits on the
|
||||
/// first differing byte, leaking how many leading bytes matched through its
|
||||
/// execution time. For an authentication tag that is a timing oracle: an
|
||||
/// attacker who can submit forged beacons and measure verification latency can
|
||||
/// recover the correct tag byte-by-byte (~256·N trials instead of 256^N).
|
||||
///
|
||||
/// This hand-rolled compare avoids adding the `subtle` crate (ADR-157 deferred
|
||||
/// B4 only to dodge that dependency — a fixed 8-byte compare needs none). We
|
||||
/// XOR-accumulate every byte difference into a single `u8` with **no early
|
||||
/// exit**, so the work done is identical regardless of where (or whether) the
|
||||
/// tags differ. The accumulator is non-zero iff any byte differed; we compare
|
||||
/// it to zero exactly once at the end.
|
||||
///
|
||||
/// `#[inline(never)]` plus `black_box` on the accumulator stop the optimizer
|
||||
/// from reintroducing a short-circuit or hoisting the loop into a `memcmp`
|
||||
/// (which is itself non-constant-time). The two slices are required to be the
|
||||
/// same length by construction (both `[u8; HMAC_TAG_SIZE]`); a length mismatch
|
||||
/// returns `false` without inspecting contents.
|
||||
#[inline(never)]
|
||||
fn constant_time_tag_eq(a: &[u8], b: &[u8]) -> bool {
|
||||
if a.len() != b.len() {
|
||||
return false;
|
||||
}
|
||||
let mut diff: u8 = 0;
|
||||
for (x, y) in a.iter().zip(b.iter()) {
|
||||
// Branch-free: accumulate the bitwise difference of every byte.
|
||||
diff |= x ^ y;
|
||||
}
|
||||
// black_box prevents the compiler from proving `diff == 0` early and
|
||||
// short-circuiting the loop above. The single equality check is the only
|
||||
// data-dependent branch, and it is on the fully-accumulated value.
|
||||
core::hint::black_box(diff) == 0
|
||||
}
|
||||
|
||||
/// Size of the nonce field (manual crypto mode).
|
||||
const NONCE_SIZE: usize = 4;
|
||||
|
||||
@@ -281,7 +317,10 @@ impl AuthenticatedBeacon {
|
||||
msg[..16].copy_from_slice(&self.beacon.to_bytes());
|
||||
msg[16..20].copy_from_slice(&self.nonce.to_le_bytes());
|
||||
let expected = Self::compute_tag(&msg, key);
|
||||
if self.hmac_tag == expected {
|
||||
// ADR-157 §B4: constant-time compare — `==` on the tag would leak,
|
||||
// via short-circuit timing, how many leading bytes an attacker's
|
||||
// forged tag matched, enabling byte-by-byte tag recovery.
|
||||
if constant_time_tag_eq(&self.hmac_tag, &expected) {
|
||||
Ok(())
|
||||
} else {
|
||||
Err(SecureTdmError::BeaconAuthFailed)
|
||||
@@ -752,6 +791,124 @@ mod tests {
|
||||
));
|
||||
}
|
||||
|
||||
// ---- ADR-157 §B4: constant-time tag compare ----
|
||||
|
||||
/// Functional pin proving the new constant-time helper is wired and correct
|
||||
/// for the four tag-shape cases. This is the *hard gate* for §B4 — it fails
|
||||
/// on the old `==` path only if the helper is removed/unwired, and it
|
||||
/// guarantees accept/reject semantics are byte-exact. Grade: MEASURED
|
||||
/// (constant-time *construction*); micro-timing on a noisy host is only a
|
||||
/// smoke check (see `tag_compare_timing_invariance_smoke`, #[ignore]).
|
||||
#[test]
|
||||
fn tag_compare_is_constant_time_shape() {
|
||||
let base = [0xA5u8; HMAC_TAG_SIZE];
|
||||
|
||||
// Equal tags accept.
|
||||
assert!(constant_time_tag_eq(&base, &base), "equal tags must accept");
|
||||
|
||||
// First byte differs → reject.
|
||||
let mut first = base;
|
||||
first[0] ^= 0xFF;
|
||||
assert!(
|
||||
!constant_time_tag_eq(&base, &first),
|
||||
"first-byte-differ must reject"
|
||||
);
|
||||
|
||||
// Last byte differs → reject.
|
||||
let mut last = base;
|
||||
last[HMAC_TAG_SIZE - 1] ^= 0x01;
|
||||
assert!(
|
||||
!constant_time_tag_eq(&base, &last),
|
||||
"last-byte-differ must reject"
|
||||
);
|
||||
|
||||
// Every byte differs → reject.
|
||||
let all = [0x5Au8; HMAC_TAG_SIZE]; // bitwise-inverse of 0xA5
|
||||
assert!(
|
||||
!constant_time_tag_eq(&base, &all),
|
||||
"all-bytes-differ must reject"
|
||||
);
|
||||
|
||||
// Length mismatch → reject without inspecting contents.
|
||||
assert!(
|
||||
!constant_time_tag_eq(&base, &base[..HMAC_TAG_SIZE - 1]),
|
||||
"length mismatch must reject"
|
||||
);
|
||||
|
||||
// End-to-end through verify(): a tag whose only difference is the
|
||||
// *last* byte must still be rejected exactly like a first-byte diff.
|
||||
let beacon = SyncBeacon {
|
||||
cycle_id: 7,
|
||||
cycle_period: Duration::from_millis(50),
|
||||
drift_correction_us: 0,
|
||||
generated_at: std::time::Instant::now(),
|
||||
};
|
||||
let key = DEFAULT_TEST_KEY;
|
||||
let nonce = 1u32;
|
||||
let mut msg = [0u8; 20];
|
||||
msg[..16].copy_from_slice(&beacon.to_bytes());
|
||||
msg[16..20].copy_from_slice(&nonce.to_le_bytes());
|
||||
let mut tag = AuthenticatedBeacon::compute_tag(&msg, &key);
|
||||
tag[HMAC_TAG_SIZE - 1] ^= 0x01; // tamper the LAST byte only
|
||||
let auth = AuthenticatedBeacon {
|
||||
beacon,
|
||||
nonce,
|
||||
hmac_tag: tag,
|
||||
};
|
||||
assert!(
|
||||
matches!(auth.verify(&key), Err(SecureTdmError::BeaconAuthFailed)),
|
||||
"last-byte tamper must fail verify()"
|
||||
);
|
||||
}
|
||||
|
||||
/// Coarse timing-invariance smoke check. #[ignore]d so it never flakes CI —
|
||||
/// the host is noisy and a hard timing bound is unreliable. Run manually
|
||||
/// with `cargo test -p wifi-densepose-hardware -- --ignored
|
||||
/// tag_compare_timing_invariance_smoke --nocapture`. The assertion is a
|
||||
/// deliberately *generous* ratio bound (4×): a short-circuit `==` would show
|
||||
/// last-byte-differ ≫ first-byte-differ; the constant-time helper should not.
|
||||
#[test]
|
||||
#[ignore = "timing smoke check — noisy host, run manually with --ignored"]
|
||||
fn tag_compare_timing_invariance_smoke() {
|
||||
use std::time::Instant;
|
||||
const ITERS: u32 = 2_000_000;
|
||||
let base = [0xA5u8; HMAC_TAG_SIZE];
|
||||
let mut first = base;
|
||||
first[0] ^= 0xFF;
|
||||
let mut last = base;
|
||||
last[HMAC_TAG_SIZE - 1] ^= 0x01;
|
||||
|
||||
// Warm up.
|
||||
for _ in 0..ITERS / 10 {
|
||||
core::hint::black_box(constant_time_tag_eq(&base, &first));
|
||||
}
|
||||
|
||||
let t0 = Instant::now();
|
||||
let mut acc = false;
|
||||
for _ in 0..ITERS {
|
||||
acc ^= constant_time_tag_eq(&base, &first);
|
||||
}
|
||||
core::hint::black_box(acc);
|
||||
let dt_first = t0.elapsed().as_nanos() as f64;
|
||||
|
||||
let t1 = Instant::now();
|
||||
let mut acc2 = false;
|
||||
for _ in 0..ITERS {
|
||||
acc2 ^= constant_time_tag_eq(&base, &last);
|
||||
}
|
||||
core::hint::black_box(acc2);
|
||||
let dt_last = t1.elapsed().as_nanos() as f64;
|
||||
|
||||
let ratio = dt_last.max(dt_first) / dt_last.min(dt_first).max(1.0);
|
||||
println!(
|
||||
"first-differ {dt_first:.0}ns, last-differ {dt_last:.0}ns, ratio {ratio:.3}"
|
||||
);
|
||||
assert!(
|
||||
ratio < 4.0,
|
||||
"timing ratio {ratio:.3} too large — possible short-circuit leak"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_auth_beacon_too_short() {
|
||||
let result = AuthenticatedBeacon::from_bytes(&[0u8; 10]);
|
||||
|
||||
@@ -174,5 +174,62 @@ fn bench_topk(c: &mut Criterion) {
|
||||
group.finish();
|
||||
}
|
||||
|
||||
criterion_group!(benches, bench_compare_cost, bench_topk);
|
||||
/// ADR-156 §8 RaBitQ Pass-2 coverage measurement.
|
||||
///
|
||||
/// Not a timing bench — it prints the **measured top-K coverage** (Pass-1 vs
|
||||
/// Pass-2 rotation) on the deterministic anisotropic planted-cluster fixture
|
||||
/// from `wifi_densepose_ruvector::coverage`, so `cargo bench` surfaces the
|
||||
/// numbers quoted in ADR-156 §8 / ADR-084. The same harness backs the
|
||||
/// `pass2_coverage_report` unit test (single source of truth). Each criterion
|
||||
/// "benchmark" body computes the coverage once (cached) and the bench loop just
|
||||
/// reads it back, so the criterion timing is meaningless here on purpose — the
|
||||
/// value is the `println!` summary.
|
||||
fn bench_pass2_coverage(c: &mut Criterion) {
|
||||
use wifi_densepose_ruvector::coverage::{measure_pass1, measure_pass2, CoverageParams};
|
||||
|
||||
let base = CoverageParams::aether_default(0xAD00_0084);
|
||||
let rot_seed = 0x5EED_C0DE_1234_5678u64;
|
||||
|
||||
println!("\n=== ADR-156 §8 RaBitQ Pass-2 coverage (anisotropic planted clusters) ===");
|
||||
println!(
|
||||
"dim={} N={} K={} clusters={} noise={} queries={} master_seed=0x{:X} rot_seed=0x{:X}",
|
||||
base.dim, base.n, base.k, base.n_clusters, base.noise, base.n_queries, base.seed, rot_seed
|
||||
);
|
||||
println!("(coverage = |sketch_topK ∩ float_cosine_topK| / K, ADR-084 bar = 90%)");
|
||||
for &cand in &[8usize, 16, 24, 32, 64] {
|
||||
let p = CoverageParams {
|
||||
candidate_k: cand,
|
||||
..base
|
||||
};
|
||||
let p1 = measure_pass1(p).coverage;
|
||||
let p2 = measure_pass2(p, rot_seed).coverage;
|
||||
let flag = if p2 >= 0.90 { "Pass2≥90%" } else { "" };
|
||||
println!(
|
||||
" candidate_k={cand:<3} Pass1={:6.2}% Pass2={:6.2}% {flag}",
|
||||
p1 * 100.0,
|
||||
p2 * 100.0
|
||||
);
|
||||
}
|
||||
println!("========================================================================\n");
|
||||
|
||||
// A minimal criterion group so `cargo bench` exercises the path under the
|
||||
// harness (timing is not the point; the printed table above is).
|
||||
let mut group = c.benchmark_group("pass2_coverage");
|
||||
group.sample_size(10);
|
||||
let p = CoverageParams {
|
||||
n: 256,
|
||||
n_queries: 16,
|
||||
n_clusters: 16,
|
||||
..base
|
||||
};
|
||||
group.bench_function("measure_pass2_small", |b| {
|
||||
b.iter(|| {
|
||||
let r = measure_pass2(black_box(p), black_box(rot_seed));
|
||||
hint::black_box(r.coverage)
|
||||
});
|
||||
});
|
||||
group.finish();
|
||||
}
|
||||
|
||||
criterion_group!(benches, bench_compare_cost, bench_topk, bench_pass2_coverage);
|
||||
criterion_main!(benches);
|
||||
|
||||
@@ -0,0 +1,441 @@
|
||||
//! Deterministic top-K **coverage** harness for the RaBitQ sketch
|
||||
//! (ADR-084 acceptance bar / ADR-156 §8 Pass-2 measurement).
|
||||
//!
|
||||
//! Single source of truth for the coverage number quoted in ADR-084 and
|
||||
//! ADR-156: both the in-crate regression test (`pass2_coverage_not_worse_…`)
|
||||
//! and the criterion bench (`benches/sketch_bench.rs`) call into here, so they
|
||||
//! can never silently measure different things.
|
||||
//!
|
||||
//! **Coverage** is defined exactly as in ADR-084:
|
||||
//!
|
||||
//! > the Top-K candidate set chosen by the sketch must contain **≥ 90%** of the
|
||||
//! > candidates the full-float pass would have picked.
|
||||
//!
|
||||
//! i.e. `coverage = |sketch_topK ∩ float_topK| / K`, averaged over a set of
|
||||
//! queries. The float top-K (squared-euclidean — AETHER's actual metric) is the
|
||||
//! ground truth; the sketch top-K is a *candidate* set, so in practice a system
|
||||
//! over-fetches `C ≥ K` sketch candidates and refines. We measure at
|
||||
//! `candidate_k == K` (the strict bar) by default; the bench also reports an
|
||||
//! over-fetch curve.
|
||||
//!
|
||||
//! # The synthetic distribution — and why it is *anisotropic*
|
||||
//!
|
||||
//! Pure 1-bit sign quantization (Pass 1) is near-optimal on **isotropic,
|
||||
//! zero-centred** embeddings — on such data a rotation barely moves the number,
|
||||
//! so testing rotation there proves nothing. ADR-084's "Open questions" and
|
||||
//! ADR-156 §8 both flag the *anisotropic / correlated* case (skewed CSI
|
||||
//! spectrogram embeddings) as exactly where the rotation is supposed to earn
|
||||
//! its keep. So [`make_anisotropic_embedding`] deliberately builds **correlated,
|
||||
//! axis-aligned, non-isotropic** vectors: a few dominant low-frequency factors
|
||||
//! shared across many coordinates (heavy coordinate correlation) plus a small
|
||||
//! per-dim offset that biases signs — the structure that defeats raw
|
||||
//! sign-quantization and that a randomized rotation is designed to fix. Every
|
||||
//! value derives from a seed via SplitMix64, so the whole harness is
|
||||
//! reproducible bit-for-bit.
|
||||
|
||||
use crate::{Rotation, SketchBank};
|
||||
|
||||
/// SplitMix64 step — reproducible PRNG for fixture generation (dependency-free).
|
||||
#[inline]
|
||||
fn split_mix64(state: &mut u64) -> u64 {
|
||||
*state = state.wrapping_add(0x9E37_79B9_7F4A_7C15);
|
||||
let mut z = *state;
|
||||
z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
|
||||
z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
|
||||
z ^ (z >> 31)
|
||||
}
|
||||
|
||||
/// A uniform `f32` in `[0, 1)` from the PRNG state.
|
||||
#[inline]
|
||||
fn unif01(state: &mut u64) -> f32 {
|
||||
let r = split_mix64(state);
|
||||
// top 24 bits → [0,1)
|
||||
((r >> 40) as f32) / ((1u64 << 24) as f32)
|
||||
}
|
||||
|
||||
/// A standard-normal-ish `f32` via Box–Muller from two uniforms. Deterministic.
|
||||
#[inline]
|
||||
fn gauss(state: &mut u64) -> f32 {
|
||||
let u1 = unif01(state).max(1e-7); // avoid log(0)
|
||||
let u2 = unif01(state);
|
||||
(-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
|
||||
}
|
||||
|
||||
/// Fixed **anisotropic axis scale** for coordinate `i` of `dim`.
|
||||
///
|
||||
/// A learned embedding space is not isotropic: a handful of axes carry most of
|
||||
/// the variance and the rest are near-flat. We model that with a smoothly
|
||||
/// decaying per-axis scale (≈10× spread between the most- and least-energetic
|
||||
/// axes). This axis-aligned imbalance is exactly what a 1-bit sign sketch
|
||||
/// handles poorly (the low-variance axes' sign bits are noise) and exactly what
|
||||
/// a randomized rotation re-balances (it spreads the variance across all axes so
|
||||
/// every sign bit carries comparable information). The scale depends only on the
|
||||
/// coordinate index, so it is the *same fixed geometry* for every vector.
|
||||
#[inline]
|
||||
fn axis_scale(i: usize, dim: usize) -> f32 {
|
||||
let t = i as f32 / dim.max(1) as f32;
|
||||
// exp decay from ~3.0 down to ~0.3 → ~10× anisotropy.
|
||||
3.0 * (-2.3 * t).exp() + 0.3
|
||||
}
|
||||
|
||||
/// Build the **planted-cluster** fixture: `n_clusters` random centres in the
|
||||
/// anisotropic space. Returned as raw centres (pre-scale); callers add scale +
|
||||
/// intra-cluster noise. Deterministic from `seed`.
|
||||
fn cluster_centres(dim: usize, n_clusters: usize, seed: u64) -> Vec<Vec<f32>> {
|
||||
(0..n_clusters)
|
||||
.map(|c| {
|
||||
let mut s = seed ^ 0xC0FFEE_u64.wrapping_mul(c as u64 + 1);
|
||||
(0..dim).map(|_| gauss(&mut s)).collect()
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// One embedding = its cluster centre + small intra-cluster noise, then the
|
||||
/// fixed anisotropic axis scale, then a small off-centre bias. This makes the
|
||||
/// **cosine top-K meaningful** (same-cluster members are genuine near-neighbours,
|
||||
/// not random-noise ties), while keeping the space anisotropic so the rotation
|
||||
/// has something real to fix.
|
||||
fn realize(centre: &[f32], dim: usize, noise: f32, vec_seed: u64) -> Vec<f32> {
|
||||
let mut s = vec_seed ^ 0x5151_5151_5151_5151;
|
||||
(0..dim)
|
||||
.map(|i| {
|
||||
let jitter = gauss(&mut s) * noise;
|
||||
let bias = ((i % 11) as f32 - 5.0) * 0.05;
|
||||
axis_scale(i, dim) * (centre[i] + jitter) + bias
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Cosine distance `1 - cos(a,b)` — the metric a sign sketch approximates
|
||||
/// (hamming over sign bits is a monotone estimate of the angle between vectors).
|
||||
/// This is the correct full-float ground truth for top-K *coverage*: the sketch
|
||||
/// is an angular sensor, so we grade it against the angular full-float ranking,
|
||||
/// per ADR-084's `float_cosine` baseline.
|
||||
#[inline]
|
||||
fn cosine_distance(a: &[f32], b: &[f32]) -> f32 {
|
||||
let mut dot = 0.0f32;
|
||||
let mut na = 0.0f32;
|
||||
let mut nb = 0.0f32;
|
||||
for (&x, &y) in a.iter().zip(b.iter()) {
|
||||
dot += x * y;
|
||||
na += x * x;
|
||||
nb += y * y;
|
||||
}
|
||||
let denom = (na * nb).sqrt();
|
||||
if denom < f32::EPSILON {
|
||||
1.0
|
||||
} else {
|
||||
1.0 - dot / denom
|
||||
}
|
||||
}
|
||||
|
||||
/// Full-float cosine top-K ids (ground truth), ascending by cosine distance.
|
||||
fn float_topk(bank: &[Vec<f32>], query: &[f32], k: usize) -> Vec<u32> {
|
||||
let mut scored: Vec<(u32, f32)> = bank
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, v)| (i as u32, cosine_distance(query, v)))
|
||||
.collect();
|
||||
scored.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
|
||||
scored.truncate(k);
|
||||
scored.into_iter().map(|(id, _)| id).collect()
|
||||
}
|
||||
|
||||
/// Parameters for a coverage measurement, documented in the report.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct CoverageParams {
|
||||
/// Embedding dimension.
|
||||
pub dim: usize,
|
||||
/// Number of stored vectors in the bank (N).
|
||||
pub n: usize,
|
||||
/// Number of distinct query vectors averaged over.
|
||||
pub n_queries: usize,
|
||||
/// True top-K size (the bar's K).
|
||||
pub k: usize,
|
||||
/// Sketch candidate-set size to compare against the float top-K. Equal to
|
||||
/// `k` for the strict ADR-084 bar; `> k` models over-fetch + refine.
|
||||
pub candidate_k: usize,
|
||||
/// Number of planted clusters. Same-cluster vectors are genuine near
|
||||
/// neighbours, so the cosine top-K is *meaningful* (not random-noise ties).
|
||||
pub n_clusters: usize,
|
||||
/// Intra-cluster Gaussian jitter (relative to unit-variance centres). Small
|
||||
/// jitter → tight, easily-recovered clusters; larger → harder top-K.
|
||||
pub noise: f32,
|
||||
/// Master seed (the whole fixture derives from this).
|
||||
pub seed: u64,
|
||||
}
|
||||
|
||||
impl CoverageParams {
|
||||
/// The canonical AETHER-shape fixture used for the ADR-quoted numbers:
|
||||
/// 128-d, planted clusters, modest intra-cluster jitter. Override fields
|
||||
/// with struct-update syntax (`CoverageParams { candidate_k: 32, ..base }`).
|
||||
pub fn aether_default(seed: u64) -> Self {
|
||||
Self {
|
||||
dim: 128,
|
||||
n: 2048,
|
||||
n_queries: 128,
|
||||
k: 8,
|
||||
candidate_k: 8,
|
||||
n_clusters: 64,
|
||||
noise: 0.35,
|
||||
seed,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Result of a coverage measurement.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct CoverageResult {
|
||||
/// Mean coverage in `[0, 1]` (fraction of float top-K found in the sketch
|
||||
/// candidate set), averaged over queries.
|
||||
pub coverage: f64,
|
||||
}
|
||||
|
||||
/// Measure mean top-K coverage of the **Pass-1** (no rotation) sketch against
|
||||
/// the full-float top-K, on the anisotropic synthetic distribution.
|
||||
pub fn measure_pass1(p: CoverageParams) -> CoverageResult {
|
||||
measure_inner(p, None)
|
||||
}
|
||||
|
||||
/// Measure mean top-K coverage of the **Pass-2** (rotated) sketch against the
|
||||
/// full-float top-K, on the anisotropic synthetic distribution. `rotation_seed`
|
||||
/// fixes the rotation (index and query share it — that is the contract).
|
||||
pub fn measure_pass2(p: CoverageParams, rotation_seed: u64) -> CoverageResult {
|
||||
let rot = Rotation::new(rotation_seed, p.dim);
|
||||
measure_inner(p, Some(rot))
|
||||
}
|
||||
|
||||
/// Measure mean top-K coverage of a **multi-bit (Pass-3)** rotated sketch:
|
||||
/// `bits` bits per dimension instead of 1, ranked by L1 distance over the
|
||||
/// per-dim codes (the natural multi-bit generalization of hamming). This is the
|
||||
/// "Multi-bit / Extended RaBitQ" half of ADR-156 §8 — measured here as an
|
||||
/// experiment to decide whether a full `MultiBitSketch` type is worth building.
|
||||
///
|
||||
/// Quantization: rotate (Pass-2 frame), then map each rotated coordinate through
|
||||
/// a uniform mid-rise scalar quantizer with `2^bits` levels over a fixed
|
||||
/// symmetric range `[-RANGE, RANGE]` (RANGE chosen from the rotated-coord scale).
|
||||
/// `bits == 1` reduces to sign-quantization (sanity: should match Pass-2 within
|
||||
/// quantizer-boundary noise). Memory cost is `bits×` the 1-bit sketch.
|
||||
///
|
||||
/// Returns the measured coverage; the caller reports the bit/coverage tradeoff.
|
||||
pub fn measure_multibit(p: CoverageParams, rotation_seed: u64, bits: u32) -> CoverageResult {
|
||||
assert!((1..=8).contains(&bits), "bits must be in 1..=8");
|
||||
let rot = Rotation::new(rotation_seed, p.dim);
|
||||
let levels = 1u32 << bits; // 2^bits codes per dim
|
||||
// Rotated AETHER-shape coords after the normalized FHT sit roughly in
|
||||
// [-RANGE, RANGE]; clamp out-of-range to the end codes. RANGE picked to
|
||||
// cover ~99% of the rotated-coord magnitude on this fixture (empirically
|
||||
// ~3.0 after the 1/√m normalization).
|
||||
const RANGE: f32 = 3.0;
|
||||
let quantize = move |v: &[f32]| -> Vec<u16> {
|
||||
rot.apply(v)
|
||||
.iter()
|
||||
.map(|&x| {
|
||||
let t = ((x + RANGE) / (2.0 * RANGE)).clamp(0.0, 1.0); // → [0,1]
|
||||
let code = (t * (levels - 1) as f32).round() as u32;
|
||||
code.min(levels - 1) as u16
|
||||
})
|
||||
.collect()
|
||||
};
|
||||
// L1 distance over per-dim codes.
|
||||
let l1 = |a: &[u16], b: &[u16]| -> u32 {
|
||||
a.iter()
|
||||
.zip(b)
|
||||
.map(|(&x, &y)| (x as i32 - y as i32).unsigned_abs())
|
||||
.sum()
|
||||
};
|
||||
|
||||
let float_bank = make_fixture(p);
|
||||
let centres = cluster_centres(p.dim, p.n_clusters.max(1), p.seed);
|
||||
let coded_bank: Vec<Vec<u16>> = float_bank.iter().map(|v| quantize(v)).collect();
|
||||
|
||||
let mut total = 0.0f64;
|
||||
for q in 0..p.n_queries {
|
||||
let c = q % p.n_clusters.max(1);
|
||||
let qv = realize(
|
||||
¢res[c],
|
||||
p.dim,
|
||||
p.noise,
|
||||
p.seed ^ 0xDEAD_0000_0000 ^ (q as u64).wrapping_mul(0x2545_F491),
|
||||
);
|
||||
let truth = float_topk(&float_bank, &qv, p.k);
|
||||
let qc = quantize(&qv);
|
||||
// top candidate_k by L1 over codes.
|
||||
let mut scored: Vec<(u32, u32)> = coded_bank
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, code)| (i as u32, l1(&qc, code)))
|
||||
.collect();
|
||||
scored.sort_by_key(|&(_, d)| d);
|
||||
scored.truncate(p.candidate_k);
|
||||
let cand_ids: std::collections::HashSet<u32> =
|
||||
scored.into_iter().map(|(id, _)| id).collect();
|
||||
let hit = truth.iter().filter(|id| cand_ids.contains(id)).count();
|
||||
total += hit as f64 / p.k as f64;
|
||||
}
|
||||
CoverageResult {
|
||||
coverage: total / p.n_queries as f64,
|
||||
}
|
||||
}
|
||||
|
||||
/// Build the deterministic float bank for `p`: `p.n` vectors, each assigned to
|
||||
/// one of `p.n_clusters` planted clusters (round-robin), realized as
|
||||
/// `centre + jitter` under the fixed anisotropic axis scale. Returned with the
|
||||
/// cluster id of each vector so queries can be drawn from the same clusters.
|
||||
pub fn make_fixture(p: CoverageParams) -> Vec<Vec<f32>> {
|
||||
let centres = cluster_centres(p.dim, p.n_clusters.max(1), p.seed);
|
||||
(0..p.n)
|
||||
.map(|i| {
|
||||
let c = i % p.n_clusters.max(1);
|
||||
realize(¢res[c], p.dim, p.noise, p.seed ^ (i as u64).wrapping_mul(0x9E37))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn measure_inner(p: CoverageParams, rotation: Option<Rotation>) -> CoverageResult {
|
||||
const SV: u16 = 1;
|
||||
// Float bank (ground truth) + sketch bank from the SAME vectors, so the
|
||||
// only variable is float-vs-sketch (and Pass-1-vs-Pass-2).
|
||||
let float_bank = make_fixture(p);
|
||||
let centres = cluster_centres(p.dim, p.n_clusters.max(1), p.seed);
|
||||
|
||||
let mut bank = match &rotation {
|
||||
Some(r) => SketchBank::with_rotation(r.clone()),
|
||||
None => SketchBank::new(),
|
||||
};
|
||||
for (i, v) in float_bank.iter().enumerate() {
|
||||
// Use the bank's rotation policy for both Pass-1 and Pass-2 uniformly.
|
||||
bank.insert_embedding(i as u32, v, SV)
|
||||
.expect("schema-locked insert");
|
||||
}
|
||||
|
||||
let mut total = 0.0f64;
|
||||
for q in 0..p.n_queries {
|
||||
// Each query is a fresh draw from a planted cluster (disjoint seed
|
||||
// range from the bank), so it HAS genuine same-cluster neighbours in
|
||||
// the bank — a meaningful top-K, not random-noise ties.
|
||||
let c = q % p.n_clusters.max(1);
|
||||
let qv = realize(
|
||||
¢res[c],
|
||||
p.dim,
|
||||
p.noise,
|
||||
p.seed ^ 0xDEAD_0000_0000 ^ (q as u64).wrapping_mul(0x2545_F491),
|
||||
);
|
||||
let truth = float_topk(&float_bank, &qv, p.k);
|
||||
let cand = bank
|
||||
.topk_embedding(&qv, SV, p.candidate_k)
|
||||
.expect("schema match");
|
||||
let cand_ids: std::collections::HashSet<u32> = cand.into_iter().map(|(id, _)| id).collect();
|
||||
let hit = truth.iter().filter(|id| cand_ids.contains(id)).count();
|
||||
total += hit as f64 / p.k as f64;
|
||||
}
|
||||
CoverageResult {
|
||||
coverage: total / p.n_queries as f64,
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn tight_clusters_give_high_coverage_with_overfetch() {
|
||||
// Sanity / regression: on tight clusters with enough over-fetch the
|
||||
// sketch MUST recover essentially all of the float cosine top-K — this
|
||||
// both proves the harness is correct (a broken topk gives ~random here)
|
||||
// and pins the cluster structure as meaningful. Catches the heap
|
||||
// inversion bug found during this work (which made this ~6%).
|
||||
let p = CoverageParams {
|
||||
n: 1024,
|
||||
n_queries: 64,
|
||||
n_clusters: 64,
|
||||
noise: 0.1,
|
||||
candidate_k: 64,
|
||||
..CoverageParams::aether_default(0x1111)
|
||||
};
|
||||
let cov = measure_pass1(p).coverage;
|
||||
assert!(
|
||||
cov > 0.95,
|
||||
"tight clusters + 8× over-fetch should recover >95% of top-K, got {:.3}",
|
||||
cov
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multibit_tradeoff_report() {
|
||||
// ADR-156 §8 "Multi-bit / Extended RaBitQ" measurement: bit/coverage
|
||||
// tradeoff at the STRICT bar (candidate_k == K). Reports b=1..4 bits
|
||||
// per dim alongside Pass-1 / Pass-2 (1-bit) baselines. Run with
|
||||
// --nocapture to see the table.
|
||||
let base = CoverageParams::aether_default(0xAD00_0084);
|
||||
let rot_seed = 0x5EED_C0DE_1234_5678u64;
|
||||
let p1 = measure_pass1(base).coverage;
|
||||
let p2 = measure_pass2(base, rot_seed).coverage;
|
||||
println!("\n=== ADR-156 §8 multi-bit tradeoff (strict candidate_k=K={}) ===", base.k);
|
||||
println!("dim={} N={} clusters={} noise={} bar=90%", base.dim, base.n, base.n_clusters, base.noise);
|
||||
println!(" Pass1 (no rot, 1-bit) : {:6.2}%", p1 * 100.0);
|
||||
println!(" Pass2 (rot, 1-bit) : {:6.2}%", p2 * 100.0);
|
||||
for bits in 1..=4u32 {
|
||||
let cov = measure_multibit(base, rot_seed, bits).coverage;
|
||||
let bytes_per_vec = base.dim * bits as usize / 8;
|
||||
println!(
|
||||
" Pass3 (rot, {bits}-bit, {bytes_per_vec:>3} B/vec): {:6.2}% {}",
|
||||
cov * 100.0,
|
||||
if cov >= 0.90 { "≥90%" } else { "" }
|
||||
);
|
||||
}
|
||||
println!("=================================================================\n");
|
||||
assert!((0.0..=1.0).contains(&p1));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multibit_1bit_matches_pass2_approx() {
|
||||
// Sanity: 1-bit multi-bit quantization is essentially sign-quantization,
|
||||
// so its coverage should track Pass-2 (rotated 1-bit) closely. (Not
|
||||
// exact: the mid-rise quantizer's 0/1 boundary is at the RANGE midpoint,
|
||||
// which equals the sign boundary, so they should match very closely.)
|
||||
let p = CoverageParams {
|
||||
n: 256,
|
||||
n_queries: 16,
|
||||
n_clusters: 16,
|
||||
..CoverageParams::aether_default(0x55)
|
||||
};
|
||||
let rot_seed = 0xABCDu64;
|
||||
let p2 = measure_pass2(p, rot_seed).coverage;
|
||||
let mb1 = measure_multibit(p, rot_seed, 1).coverage;
|
||||
assert!(
|
||||
(p2 - mb1).abs() < 0.05,
|
||||
"1-bit multibit {mb1:.3} should track Pass-2 {p2:.3}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fixture_is_deterministic() {
|
||||
let p = CoverageParams::aether_default(12345);
|
||||
let a = make_fixture(p);
|
||||
let b = make_fixture(p);
|
||||
assert_eq!(a, b);
|
||||
assert_eq!(a.len(), p.n);
|
||||
assert_eq!(a[0].len(), p.dim);
|
||||
let c = make_fixture(CoverageParams::aether_default(12346));
|
||||
assert_ne!(a[0], c[0]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn coverage_harness_runs_and_is_in_range() {
|
||||
// Small fixed fixture — fast, deterministic, in [0,1].
|
||||
let p = CoverageParams {
|
||||
n: 256,
|
||||
n_queries: 16,
|
||||
n_clusters: 16,
|
||||
..CoverageParams::aether_default(0xABCD)
|
||||
};
|
||||
let c1 = measure_pass1(p);
|
||||
let c2 = measure_pass2(p, 0x1234_5678);
|
||||
assert!((0.0..=1.0).contains(&c1.coverage));
|
||||
assert!((0.0..=1.0).contains(&c2.coverage));
|
||||
// Determinism: same params → same number.
|
||||
assert_eq!(measure_pass1(p).coverage, c1.coverage);
|
||||
assert_eq!(measure_pass2(p, 0x1234_5678).coverage, c2.coverage);
|
||||
}
|
||||
}
|
||||
@@ -28,13 +28,16 @@
|
||||
|
||||
#[cfg(feature = "crv")]
|
||||
pub mod crv;
|
||||
pub mod coverage;
|
||||
pub mod event_log;
|
||||
pub mod mat;
|
||||
pub mod rotation;
|
||||
pub mod signal;
|
||||
pub mod sketch;
|
||||
pub mod viewpoint;
|
||||
|
||||
pub use event_log::{NoveltyEvent, PrivacyEventLog};
|
||||
pub use rotation::Rotation;
|
||||
pub use sketch::{
|
||||
Sketch, SketchBank, SketchError, WireSketch, WireSketchError, WIRE_SKETCH_FORMAT_VERSION,
|
||||
WIRE_SKETCH_MAGIC, WIRE_SKETCH_MAX_BYTES,
|
||||
|
||||
@@ -0,0 +1,353 @@
|
||||
//! RaBitQ **Pass 2** — deterministic randomized orthogonal rotation.
|
||||
//!
|
||||
//! Implements the "Pass 2" deferred in [`crate::sketch`]'s Pass-1 doc and in
|
||||
//! [ADR-156 §8](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)
|
||||
//! (Multi-bit / Extended RaBitQ). The published *RaBitQ* algorithm
|
||||
//! (Gao & Long, SIGMOD 2024) wraps the 1-bit sign-quantization of Pass 1 with
|
||||
//! a **randomized orthogonal rotation** `R` applied to every embedding *before*
|
||||
//! sign-quantization. The rotation decorrelates coordinates so the per-bit sign
|
||||
//! carries more independent information, which gives both the paper's
|
||||
//! theoretical error bound and better top-K recall on anisotropic / correlated
|
||||
//! embedding distributions (exactly the case ADR-084's "Open questions" flagged
|
||||
//! for skewed spectrogram embeddings).
|
||||
//!
|
||||
//! # Why a Fast Hadamard Transform, not a dense d×d matrix
|
||||
//!
|
||||
//! A full dense orthogonal matrix `R ∈ ℝ^{d×d}` is **O(d²) memory and O(d²)
|
||||
//! time per vector**. ADR-084's wire format already provisions for embeddings
|
||||
//! up to `u16::MAX = 65,535` dimensions; a dense rotation there is ~4.3 G
|
||||
//! floats (17 GiB) — completely infeasible on the cluster-Pi / edge targets
|
||||
//! this sketch is built for.
|
||||
//!
|
||||
//! Instead we use the **randomized Hadamard transform** (the "HD" construction,
|
||||
//! a.k.a. a structured Johnson–Lindenstrauss / fast-JL rotation):
|
||||
//!
|
||||
//! ```text
|
||||
//! R · x = H · D · x
|
||||
//! ```
|
||||
//!
|
||||
//! where `D` is a diagonal matrix of random ±1 sign flips and `H` is the
|
||||
//! (normalized) Walsh–Hadamard matrix applied via the **Fast Hadamard
|
||||
//! Transform (FHT)**. The FHT is `O(d log d)` time and `O(1)` extra memory
|
||||
//! (in-place butterfly); `D` is `O(d)` memory (one sign per dimension, packed).
|
||||
//! `H` and `D` are each orthogonal, so `R = H·D` is orthogonal and therefore
|
||||
//! **norm-preserving** — a hard requirement for a rotation that must not distort
|
||||
//! relative distances. This is the same fast-orthogonal trick used by Fast-JL,
|
||||
//! Structured Orthogonal Random Features, and the RaBitQ reference rotation.
|
||||
//!
|
||||
//! # Determinism (index-time == query-time)
|
||||
//!
|
||||
//! The rotation **must** be identical when the bank is built and when it is
|
||||
//! queried, or the two sign-quantizations live in different rotated frames and
|
||||
//! hamming distance becomes meaningless. We therefore derive the ±1 sign flips
|
||||
//! deterministically from a stored `u64` seed via a SplitMix64 PRNG — **never**
|
||||
//! an unseeded / OS RNG. Two [`Rotation`]s built from the same `(seed, dim)`
|
||||
//! produce bit-identical output for the same input (pinned by
|
||||
//! `rotation_is_deterministic_for_seed`).
|
||||
//!
|
||||
//! # Power-of-two padding
|
||||
//!
|
||||
//! The FHT is defined on lengths that are powers of two. For a `d` that is not
|
||||
//! a power of two we pad the (sign-flipped) input with zeros up to the next
|
||||
//! power of two `m = next_pow2(d)`, run the length-`m` FHT, and then **read back
|
||||
//! the first `d` coordinates**. Zero-padding + orthogonal `H` keeps the
|
||||
//! transform norm-preserving on the padded vector; we sign-quantize the first
|
||||
//! `d` rotated coordinates so the sketch dimension is unchanged from Pass 1
|
||||
//! (API-compatible: same `embedding_dim`, same packed-byte length, same
|
||||
//! `SketchBank` schema).
|
||||
|
||||
/// A deterministic randomized orthogonal rotation (FHT-based) applied to an
|
||||
/// embedding before sign-quantization — RaBitQ Pass 2.
|
||||
///
|
||||
/// Construct once per `(seed, dim)` and reuse for **every** embedding that goes
|
||||
/// into the same [`crate::SketchBank`] (and for every query against it). The
|
||||
/// seed is stored so the rotation is reproducible across processes and runs.
|
||||
///
|
||||
/// # Invariants
|
||||
///
|
||||
/// - `dim` is the source-embedding dimension (the sketch keeps this dimension).
|
||||
/// - `padded` is `next_pow2(dim)` — the FHT working length.
|
||||
/// - `signs` has exactly `padded` entries (`+1.0` / `-1.0`), derived from
|
||||
/// `seed` via SplitMix64. Padding positions get signs too; they only ever
|
||||
/// multiply zeros, so their value is irrelevant to the result but they keep
|
||||
/// the construction uniform.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Rotation {
|
||||
/// Source-embedding dimension; the rotated sketch keeps this dimension.
|
||||
dim: usize,
|
||||
/// FHT working length = `next_pow2(dim)`.
|
||||
padded: usize,
|
||||
/// Random ±1 sign flips (the diagonal `D`), length `padded`.
|
||||
signs: Vec<f32>,
|
||||
/// The seed the sign flips were derived from (stored for reproducibility).
|
||||
seed: u64,
|
||||
}
|
||||
|
||||
impl Rotation {
|
||||
/// Build a rotation for `dim`-dimensional embeddings from a fixed `seed`.
|
||||
///
|
||||
/// The same `(seed, dim)` always yields a bit-identical rotation, so an
|
||||
/// index built with `Rotation::new(seed, d)` and a query rotated with a
|
||||
/// freshly-constructed `Rotation::new(seed, d)` agree exactly.
|
||||
///
|
||||
/// `dim == 0` yields an identity (empty) rotation — `apply` returns an
|
||||
/// empty vector — which keeps the constructor total (no panic on a
|
||||
/// degenerate dimension).
|
||||
pub fn new(seed: u64, dim: usize) -> Self {
|
||||
let padded = next_pow2(dim);
|
||||
let mut signs = Vec::with_capacity(padded);
|
||||
// SplitMix64: a tiny, well-distributed, fully deterministic PRNG. We
|
||||
// only need a reproducible stream of bits to pick ±1 per dimension;
|
||||
// SplitMix64 is the standard seeding generator and is more than
|
||||
// adequate (and far better-mixed than the LCG used for bench fixtures).
|
||||
let mut state = seed;
|
||||
for _ in 0..padded {
|
||||
state = split_mix64(&mut state);
|
||||
// Use the top bit of the mixed word to choose the sign.
|
||||
signs.push(if state >> 63 == 1 { 1.0 } else { -1.0 });
|
||||
}
|
||||
Self {
|
||||
dim,
|
||||
padded,
|
||||
signs,
|
||||
seed,
|
||||
}
|
||||
}
|
||||
|
||||
/// The seed this rotation was derived from (for serialization / audit).
|
||||
#[inline]
|
||||
pub fn seed(&self) -> u64 {
|
||||
self.seed
|
||||
}
|
||||
|
||||
/// Source-embedding dimension this rotation expects.
|
||||
#[inline]
|
||||
pub fn dim(&self) -> usize {
|
||||
self.dim
|
||||
}
|
||||
|
||||
/// FHT working length (`next_pow2(dim)`).
|
||||
#[inline]
|
||||
pub fn padded_dim(&self) -> usize {
|
||||
self.padded
|
||||
}
|
||||
|
||||
/// Apply the rotation `R = H·D` to `embedding`, returning the first `dim`
|
||||
/// rotated coordinates.
|
||||
///
|
||||
/// If `embedding.len() != dim` the input is treated charitably: it is
|
||||
/// truncated or zero-extended to `dim` before rotation. This mirrors
|
||||
/// Pass 1's saturating tolerance and keeps the call total.
|
||||
///
|
||||
/// The returned vector has length `self.dim`. Its L2 norm equals the L2
|
||||
/// norm of the (dim-truncated / zero-extended) input up to floating-point
|
||||
/// rounding — see [`Rotation::apply`] tests and
|
||||
/// `rotation_preserves_norm`.
|
||||
pub fn apply(&self, embedding: &[f32]) -> Vec<f32> {
|
||||
if self.dim == 0 {
|
||||
return Vec::new();
|
||||
}
|
||||
// Build the padded, sign-flipped working buffer: buf = D · x, then 0-pad.
|
||||
let mut buf = vec![0.0f32; self.padded];
|
||||
let n = embedding.len().min(self.dim);
|
||||
for i in 0..n {
|
||||
buf[i] = embedding[i] * self.signs[i];
|
||||
}
|
||||
// (positions n..dim and dim..padded stay zero — zero-extend + pad)
|
||||
|
||||
// In-place normalized Fast Hadamard Transform.
|
||||
fht_normalized(&mut buf);
|
||||
|
||||
// Read back the first `dim` rotated coordinates as the sketch input.
|
||||
buf.truncate(self.dim);
|
||||
buf
|
||||
}
|
||||
}
|
||||
|
||||
/// Smallest power of two `>= n` (with `next_pow2(0) == 1`, `next_pow2(1) == 1`).
|
||||
///
|
||||
/// Pulled out (and `pub(crate)`) so the sketch layer and tests can reason about
|
||||
/// the FHT working length without duplicating the rule.
|
||||
#[inline]
|
||||
pub(crate) fn next_pow2(n: usize) -> usize {
|
||||
if n <= 1 {
|
||||
return 1;
|
||||
}
|
||||
// `n` here is small relative to usize::MAX in every realistic embedding
|
||||
// (<= 65_535), so `next_power_of_two` cannot overflow.
|
||||
n.next_power_of_two()
|
||||
}
|
||||
|
||||
/// SplitMix64 step: advance `state` and return a well-mixed 64-bit word.
|
||||
///
|
||||
/// Reference algorithm (public domain, by Sebastiano Vigna). Deterministic and
|
||||
/// dependency-free — exactly what we need for a reproducible sign stream.
|
||||
#[inline]
|
||||
fn split_mix64(state: &mut u64) -> u64 {
|
||||
*state = state.wrapping_add(0x9E37_79B9_7F4A_7C15);
|
||||
let mut z = *state;
|
||||
z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
|
||||
z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
|
||||
z ^ (z >> 31)
|
||||
}
|
||||
|
||||
/// In-place **normalized** Fast Hadamard Transform on a power-of-two slice.
|
||||
///
|
||||
/// Computes `y = (1/√m) · H_m · x` in place, where `H_m` is the `m × m`
|
||||
/// Walsh–Hadamard matrix and `m = buf.len()` is a power of two. The `1/√m`
|
||||
/// normalization makes `H` orthogonal (`HᵀH = I`), so the transform preserves
|
||||
/// the L2 norm. Runs in `O(m log m)` with `O(1)` extra memory (the standard
|
||||
/// iterative butterfly).
|
||||
///
|
||||
/// # Panics
|
||||
///
|
||||
/// Debug-asserts that `buf.len()` is a power of two. Callers in this module
|
||||
/// always pass `next_pow2(dim)`, so this never fires in practice; it documents
|
||||
/// the precondition.
|
||||
fn fht_normalized(buf: &mut [f32]) {
|
||||
let m = buf.len();
|
||||
debug_assert!(m.is_power_of_two(), "FHT length must be a power of two");
|
||||
if m <= 1 {
|
||||
return;
|
||||
}
|
||||
// Unnormalized in-place Walsh–Hadamard butterfly.
|
||||
let mut h = 1usize;
|
||||
while h < m {
|
||||
let mut i = 0usize;
|
||||
while i < m {
|
||||
for j in i..i + h {
|
||||
let x = buf[j];
|
||||
let y = buf[j + h];
|
||||
buf[j] = x + y;
|
||||
buf[j + h] = x - y;
|
||||
}
|
||||
i += h * 2;
|
||||
}
|
||||
h *= 2;
|
||||
}
|
||||
// Normalize by 1/√m so H is orthogonal (norm-preserving).
|
||||
let inv_sqrt_m = 1.0f32 / (m as f32).sqrt();
|
||||
for v in buf.iter_mut() {
|
||||
*v *= inv_sqrt_m;
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn l2(v: &[f32]) -> f32 {
|
||||
v.iter().map(|&x| x * x).sum::<f32>().sqrt()
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn next_pow2_rounds_up() {
|
||||
assert_eq!(next_pow2(0), 1);
|
||||
assert_eq!(next_pow2(1), 1);
|
||||
assert_eq!(next_pow2(2), 2);
|
||||
assert_eq!(next_pow2(3), 4);
|
||||
assert_eq!(next_pow2(128), 128);
|
||||
assert_eq!(next_pow2(129), 256);
|
||||
assert_eq!(next_pow2(200), 256);
|
||||
assert_eq!(next_pow2(65_535), 65_536);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fht_is_norm_preserving_on_power_of_two() {
|
||||
// Pure FHT (no sign flips) must preserve L2 norm to fp tolerance.
|
||||
let mut v: Vec<f32> = (0..8).map(|i| (i as f32 - 3.5) * 0.7).collect();
|
||||
let before = l2(&v);
|
||||
fht_normalized(&mut v);
|
||||
let after = l2(&v);
|
||||
assert!(
|
||||
(before - after).abs() < 1e-5,
|
||||
"FHT changed norm: {before} -> {after}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fht_self_inverse_normalized() {
|
||||
// Normalized H is symmetric and orthogonal, so H·H·x == x.
|
||||
let original: Vec<f32> = vec![1.0, -2.0, 3.0, 0.5];
|
||||
let mut v = original.clone();
|
||||
fht_normalized(&mut v);
|
||||
fht_normalized(&mut v);
|
||||
for (a, b) in original.iter().zip(v.iter()) {
|
||||
assert!((a - b).abs() < 1e-5, "H·H·x != x: {a} vs {b}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotation_is_deterministic_for_seed() {
|
||||
// Two rotations from the same (seed, dim) must produce identical
|
||||
// output for the same input — the index-time == query-time contract.
|
||||
let r1 = Rotation::new(0xDEAD_BEEF_CAFE_1234, 130);
|
||||
let r2 = Rotation::new(0xDEAD_BEEF_CAFE_1234, 130);
|
||||
let x: Vec<f32> = (0..130).map(|i| (i as f32 * 0.31).sin()).collect();
|
||||
let a = r1.apply(&x);
|
||||
let b = r2.apply(&x);
|
||||
assert_eq!(a.len(), 130);
|
||||
assert_eq!(a, b, "same seed must give identical rotation");
|
||||
|
||||
// A different seed must (almost surely) differ.
|
||||
let r3 = Rotation::new(0x0000_0000_0000_0001, 130);
|
||||
let c = r3.apply(&x);
|
||||
assert_ne!(a, c, "different seed must give different rotation");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotation_preserves_norm() {
|
||||
// R = H·D is orthogonal; on a power-of-two dim the first `dim`
|
||||
// coordinates ARE the whole transform, so norm is preserved exactly
|
||||
// (to fp tolerance). We test a power-of-two dim for the exact claim.
|
||||
let r = Rotation::new(42, 128);
|
||||
let x: Vec<f32> = (0..128).map(|i| ((i * 7 % 13) as f32 - 6.0) * 0.5).collect();
|
||||
let y = r.apply(&x);
|
||||
let before = l2(&x);
|
||||
let after = l2(&y);
|
||||
assert!(
|
||||
(before - after).abs() < 1e-3 * before.max(1.0),
|
||||
"rotation changed norm: {before} -> {after}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotation_non_power_of_two_preserves_norm_via_padding() {
|
||||
// For a non-power-of-two dim, reading back the first `dim` coords of a
|
||||
// padded FHT only preserves norm if the padded tail carries ~no energy.
|
||||
// We assert the rotated norm does not EXCEED the input norm (the padded
|
||||
// transform is non-expansive on the truncated read-back) and stays
|
||||
// within a loose band — enough to confirm padding is sane, not a hard
|
||||
// exact-norm claim.
|
||||
let r = Rotation::new(7, 130); // pads 130 -> 256
|
||||
assert_eq!(r.padded_dim(), 256);
|
||||
let x: Vec<f32> = (0..130).map(|i| (i as f32 * 0.13).cos()).collect();
|
||||
let y = r.apply(&x);
|
||||
assert_eq!(y.len(), 130);
|
||||
let before = l2(&x);
|
||||
let after = l2(&y);
|
||||
// Truncated read-back is non-expansive: ||y|| <= ||Hx|| == ||x||.
|
||||
assert!(
|
||||
after <= before + 1e-4,
|
||||
"truncated rotation expanded norm: {before} -> {after}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotation_dim_zero_is_empty() {
|
||||
let r = Rotation::new(1, 0);
|
||||
assert!(r.apply(&[]).is_empty());
|
||||
assert!(r.apply(&[1.0, 2.0]).is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotation_handles_ragged_input() {
|
||||
// Charitable length handling: short input zero-extends, long truncates.
|
||||
let r = Rotation::new(99, 64);
|
||||
let short = r.apply(&[1.0, 2.0, 3.0]); // zero-extended to 64
|
||||
assert_eq!(short.len(), 64);
|
||||
let long: Vec<f32> = (0..200).map(|i| i as f32).collect();
|
||||
let truncated = r.apply(&long); // truncated to 64
|
||||
assert_eq!(truncated.len(), 64);
|
||||
}
|
||||
}
|
||||
@@ -40,8 +40,8 @@
|
||||
//! All sites take a `&Sketch` instead of an `&[f32]`; the bridge to dense
|
||||
//! embeddings is `Sketch::from_embedding`.
|
||||
|
||||
use crate::rotation::Rotation;
|
||||
use ruvector_core::quantization::{BinaryQuantized, QuantizedVector};
|
||||
use std::cmp::Reverse;
|
||||
use std::collections::BinaryHeap;
|
||||
|
||||
/// Errors raised by the sketch API.
|
||||
@@ -151,6 +151,42 @@ impl Sketch {
|
||||
Ok(Self::from_embedding(embedding, sketch_version))
|
||||
}
|
||||
|
||||
/// Construct a sketch from a dense f32 embedding **with RaBitQ Pass 2
|
||||
/// rotation** ([ADR-156 §8](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)).
|
||||
///
|
||||
/// Applies the deterministic randomized orthogonal rotation `R = H·D`
|
||||
/// (Fast Hadamard Transform + seeded ±1 sign flips, see [`Rotation`]) to
|
||||
/// the embedding *before* sign-quantization. The rotation decorrelates
|
||||
/// coordinates so each sign bit carries more independent information,
|
||||
/// improving top-K recall on anisotropic / correlated embedding
|
||||
/// distributions — the published RaBitQ construction.
|
||||
///
|
||||
/// The resulting sketch has the **same `embedding_dim`, packed-byte
|
||||
/// length, and `sketch_version`** as a Pass-1 sketch of the same input, so
|
||||
/// it is fully interchangeable in [`SketchBank`] and [`WireSketch`]. The
|
||||
/// *only* requirement is that the index and the query use the **same
|
||||
/// [`Rotation`]** (same seed + dim) — otherwise their sign bits live in
|
||||
/// different rotated frames and the hamming distance is meaningless.
|
||||
///
|
||||
/// Pass-1 (`from_embedding`) and Pass-2 sketches must **not** be mixed in
|
||||
/// one bank. Use [`SketchBank::with_rotation`] to make a bank that rotates
|
||||
/// every insert and query consistently.
|
||||
pub fn from_embedding_rotated(
|
||||
embedding: &[f32],
|
||||
sketch_version: u16,
|
||||
rotation: &Rotation,
|
||||
) -> Self {
|
||||
let rotated = rotation.apply(embedding);
|
||||
// Preserve the *source* embedding_dim semantics of Pass 1 (saturating
|
||||
// to u16::MAX) so banks/wire framing are byte-identical to Pass 1.
|
||||
let embedding_dim = embedding.len().min(u16::MAX as usize) as u16;
|
||||
Self {
|
||||
inner: BinaryQuantized::quantize(&rotated),
|
||||
embedding_dim,
|
||||
sketch_version,
|
||||
}
|
||||
}
|
||||
|
||||
/// Hamming distance to another sketch in `[0, embedding_dim]`.
|
||||
///
|
||||
/// Returns `None` if the two sketches have different `embedding_dim` or
|
||||
@@ -417,29 +453,113 @@ pub struct SketchBank {
|
||||
embedding_dim: Option<u16>,
|
||||
/// Locked at first insertion; all subsequent inserts must match.
|
||||
sketch_version: Option<u16>,
|
||||
/// Optional RaBitQ Pass-2 rotation ([ADR-156 §8]). When `Some`, the
|
||||
/// embedding-taking helpers ([`SketchBank::insert_embedding`],
|
||||
/// [`SketchBank::topk_embedding`], [`SketchBank::novelty_embedding`])
|
||||
/// rotate every embedding through this exact rotation before sketching, so
|
||||
/// index-time and query-time sketches always share one rotated frame. The
|
||||
/// raw [`SketchBank::insert`] / [`SketchBank::topk`] paths are unchanged —
|
||||
/// callers using pre-built sketches are responsible for having rotated them
|
||||
/// with the same `Rotation`.
|
||||
rotation: Option<Rotation>,
|
||||
}
|
||||
|
||||
impl SketchBank {
|
||||
/// Create an empty bank. Dimension and version are locked at the first
|
||||
/// `insert` call.
|
||||
/// `insert` call. No Pass-2 rotation (pure Pass-1, default behaviour).
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
entries: Vec::new(),
|
||||
embedding_dim: None,
|
||||
sketch_version: None,
|
||||
rotation: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a bank with a pre-locked `embedding_dim` and `sketch_version`.
|
||||
/// Use when the bank's expected schema is known at construction.
|
||||
/// No Pass-2 rotation (pure Pass-1).
|
||||
pub fn with_schema(embedding_dim: u16, sketch_version: u16) -> Self {
|
||||
Self {
|
||||
entries: Vec::new(),
|
||||
embedding_dim: Some(embedding_dim),
|
||||
sketch_version: Some(sketch_version),
|
||||
rotation: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Create a **RaBitQ Pass-2** bank that rotates every embedding through
|
||||
/// `rotation` before sketching ([ADR-156 §8]).
|
||||
///
|
||||
/// Use the embedding-taking helpers ([`SketchBank::insert_embedding`],
|
||||
/// [`SketchBank::topk_embedding`], [`SketchBank::novelty_embedding`]) with
|
||||
/// this bank so the index and queries share the same rotated frame. The
|
||||
/// `embedding_dim` / `sketch_version` schema is still locked at first
|
||||
/// insert exactly as for a Pass-1 bank — a Pass-2 sketch is byte-identical
|
||||
/// in shape to a Pass-1 sketch, only its bits differ.
|
||||
pub fn with_rotation(rotation: Rotation) -> Self {
|
||||
Self {
|
||||
entries: Vec::new(),
|
||||
embedding_dim: None,
|
||||
sketch_version: None,
|
||||
rotation: Some(rotation),
|
||||
}
|
||||
}
|
||||
|
||||
/// The Pass-2 rotation this bank applies to embeddings, if any.
|
||||
#[inline]
|
||||
pub fn rotation(&self) -> Option<&Rotation> {
|
||||
self.rotation.as_ref()
|
||||
}
|
||||
|
||||
/// Sketch a raw embedding using this bank's rotation policy: Pass-2
|
||||
/// (`from_embedding_rotated`) if the bank has a rotation, else Pass-1
|
||||
/// (`from_embedding`). The single place index-time and query-time sketching
|
||||
/// agree on the rotated frame.
|
||||
fn sketch_embedding(&self, embedding: &[f32], sketch_version: u16) -> Sketch {
|
||||
match &self.rotation {
|
||||
Some(r) => Sketch::from_embedding_rotated(embedding, sketch_version, r),
|
||||
None => Sketch::from_embedding(embedding, sketch_version),
|
||||
}
|
||||
}
|
||||
|
||||
/// Insert a raw embedding, sketching it through the bank's rotation policy.
|
||||
/// Convenience wrapper over [`SketchBank::insert`] that guarantees the
|
||||
/// stored sketch used the same (Pass-1 or Pass-2) frame the queries will.
|
||||
pub fn insert_embedding(
|
||||
&mut self,
|
||||
id: u32,
|
||||
embedding: &[f32],
|
||||
sketch_version: u16,
|
||||
) -> Result<(), SketchError> {
|
||||
let sketch = self.sketch_embedding(embedding, sketch_version);
|
||||
self.insert(id, sketch)
|
||||
}
|
||||
|
||||
/// Top-K over a raw query embedding, sketched through the bank's rotation
|
||||
/// policy. Equivalent to `bank.topk(&bank.sketch(query), k)` but cannot get
|
||||
/// the rotation frame wrong.
|
||||
pub fn topk_embedding(
|
||||
&self,
|
||||
query: &[f32],
|
||||
sketch_version: u16,
|
||||
k: usize,
|
||||
) -> Result<Vec<(u32, u32)>, SketchError> {
|
||||
let q = self.sketch_embedding(query, sketch_version);
|
||||
self.topk(&q, k)
|
||||
}
|
||||
|
||||
/// Novelty of a raw query embedding, sketched through the bank's rotation
|
||||
/// policy. See [`SketchBank::novelty`].
|
||||
pub fn novelty_embedding(
|
||||
&self,
|
||||
query: &[f32],
|
||||
sketch_version: u16,
|
||||
) -> Result<f32, SketchError> {
|
||||
let q = self.sketch_embedding(query, sketch_version);
|
||||
self.novelty(&q)
|
||||
}
|
||||
|
||||
/// Number of sketches in the bank.
|
||||
#[inline]
|
||||
pub fn len(&self) -> usize {
|
||||
@@ -523,12 +643,22 @@ impl SketchBank {
|
||||
});
|
||||
}
|
||||
}
|
||||
// Pass-1.5 optimisation: O(n log k) partial sort via a fixed-size
|
||||
// max-heap of `Reverse((distance, id))`. The heap's `peek()`
|
||||
// returns the *largest* of the current best-k. Each candidate is
|
||||
// compared against the heap top in O(1); only better candidates
|
||||
// trigger an O(log k) push/pop. Avoids touching the long tail of
|
||||
// large-distance entries that the truncate would have discarded.
|
||||
// Partial top-K via a fixed-size **max-heap** of `(distance, id)`.
|
||||
// `BinaryHeap` is a max-heap, so `peek()` is the *largest* distance
|
||||
// currently held — the worst of the running best-k. Each candidate is
|
||||
// O(1)-compared against that worst; only a *smaller* distance triggers
|
||||
// an O(log k) pop+push, evicting the current worst. The heap therefore
|
||||
// retains the k *smallest* distances. Total O(n log k), touching the
|
||||
// long tail only with a single comparison each.
|
||||
//
|
||||
// BUG FIX (ADR-156 §8 Pass-2 work): this loop previously used
|
||||
// `BinaryHeap<Reverse<(d, id)>>` and called the peek "the largest".
|
||||
// `Reverse` turns the max-heap into a **min-heap**, so `peek()` was the
|
||||
// *smallest* distance; evicting on `d < worst` then kept the k
|
||||
// *farthest* neighbours and returned them as "nearest". The pre-existing
|
||||
// unit tests only exercised the `n <= k` fast path (≤ 3 entries), so the
|
||||
// inversion went unnoticed until the Pass-2 coverage harness measured
|
||||
// near-random top-K on n > k. Pinned by `topk_heap_path_returns_nearest`.
|
||||
//
|
||||
// Fast path: when n ≤ k there is nothing to discard, so a plain
|
||||
// collect + sort is faster than building a heap.
|
||||
@@ -543,28 +673,25 @@ impl SketchBank {
|
||||
return Ok(scored);
|
||||
}
|
||||
|
||||
let mut heap: BinaryHeap<Reverse<(u32, u32)>> = BinaryHeap::with_capacity(k + 1);
|
||||
let mut heap: BinaryHeap<(u32, u32)> = BinaryHeap::with_capacity(k + 1);
|
||||
for (id, sk) in &self.entries {
|
||||
let d = sk.distance_unchecked(query);
|
||||
if heap.len() < k {
|
||||
heap.push(Reverse((d, *id)));
|
||||
} else if let Some(&Reverse((worst, _))) = heap.peek() {
|
||||
// L1 hardening (PR #435 review): structural `if let` rather
|
||||
// than `.expect("heap len == k > 0")`. The branch is
|
||||
// mathematically unreachable when `heap.len() >= k > 0`,
|
||||
// but a defensive pattern makes the impossibility a type
|
||||
// property rather than a runtime invariant. Same hot-path
|
||||
// cost (one bounds check); zero panic risk.
|
||||
heap.push((d, *id));
|
||||
} else if let Some(&(worst, _)) = heap.peek() {
|
||||
// `peek()` is the largest distance in the best-k (max-heap).
|
||||
// The `if let` is defensive: when `heap.len() == k > 0` the
|
||||
// heap is non-empty, so this never takes the `else`. Same
|
||||
// hot-path cost (one bounds check), zero panic risk.
|
||||
if d < worst {
|
||||
heap.pop();
|
||||
heap.push(Reverse((d, *id)));
|
||||
heap.push((d, *id));
|
||||
}
|
||||
}
|
||||
}
|
||||
// Drain heap into a Vec — already in (Reverse) descending order;
|
||||
// sort to expose ascending-by-distance per the public contract.
|
||||
let mut scored: Vec<(u32, u32)> =
|
||||
heap.into_iter().map(|Reverse((d, id))| (id, d)).collect();
|
||||
// Drain the max-heap and sort ascending-by-distance per the public
|
||||
// contract (heap drain order is unspecified beyond the root).
|
||||
let mut scored: Vec<(u32, u32)> = heap.into_iter().map(|(d, id)| (id, d)).collect();
|
||||
scored.sort_by_key(|&(_, d)| d);
|
||||
Ok(scored)
|
||||
}
|
||||
@@ -653,6 +780,45 @@ mod tests {
|
||||
assert!(topk[1].1 <= topk[2].1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn topk_heap_path_returns_nearest() {
|
||||
// Regression for the heap-inversion bug found during ADR-156 §8 Pass-2
|
||||
// work: with n > k the topk used a min-heap (`Reverse`) but treated its
|
||||
// peek as the max, so it returned the k *farthest* sketches. Build a
|
||||
// bank where the answer is unambiguous and assert the genuine nearest
|
||||
// come back. The OLD code returns the farthest here and fails.
|
||||
let dim = 64;
|
||||
let k = 4;
|
||||
// Query is all-positive (every bit 1).
|
||||
let query = Sketch::from_embedding(&vec![1.0f32; dim], 1);
|
||||
let mut bank = SketchBank::new();
|
||||
// id j has its first `j` dims flipped negative → hamming j to the
|
||||
// all-positive query. So nearest-4 are ids 0,1,2,3 (hamming 0,1,2,3);
|
||||
// farthest are 5..8. n = 9 > k = 4 → exercises the heap path.
|
||||
//
|
||||
// CRITICAL ordering: insert FARTHEST-FIRST (id 8 down to 0). This fills
|
||||
// the heap's first k slots with far entries, so the nearest entries
|
||||
// arrive only after the heap is full and MUST trigger eviction of the
|
||||
// current worst. The old `Reverse` (min-heap-as-max) bug peeked the
|
||||
// smallest distance and never evicted, so it kept the first-seen
|
||||
// (farthest) k and this assertion fails on the old code. Inserting
|
||||
// nearest-first would mask the bug (the heap fills with the right
|
||||
// answer by luck), so the order here is load-bearing.
|
||||
for j in (0..=8u32).rev() {
|
||||
let mut v = vec![1.0f32; dim];
|
||||
for d in v.iter_mut().take(j as usize) {
|
||||
*d = -1.0;
|
||||
}
|
||||
bank.insert(j, Sketch::from_embedding(&v, 1)).unwrap();
|
||||
}
|
||||
let top = bank.topk(&query, k).unwrap();
|
||||
assert_eq!(top.len(), k);
|
||||
let ids: Vec<u32> = top.iter().map(|&(id, _)| id).collect();
|
||||
let dists: Vec<u32> = top.iter().map(|&(_, d)| d).collect();
|
||||
assert_eq!(ids, vec![0, 1, 2, 3], "topk must return the NEAREST k, got {ids:?}");
|
||||
assert_eq!(dists, vec![0, 1, 2, 3], "distances must be the smallest k");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bank_topk_zero_returns_empty() {
|
||||
let mut bank = SketchBank::new();
|
||||
@@ -852,4 +1018,122 @@ mod tests {
|
||||
SketchError::SketchVersionMismatch { .. }
|
||||
));
|
||||
}
|
||||
|
||||
// ─── ADR-156 §8 / ADR-084 Pass 2 — randomized rotation ───────────────────
|
||||
|
||||
#[test]
|
||||
fn rotated_sketch_has_same_shape_as_pass1() {
|
||||
// A Pass-2 sketch must be byte-shape-identical to a Pass-1 sketch of
|
||||
// the same input: same embedding_dim, same packed-byte length, same
|
||||
// sketch_version. Only the bits differ. This is what lets Pass-2
|
||||
// sketches travel through the unchanged WireSketch / SketchBank schema.
|
||||
let v: Vec<f32> = (0..128).map(|i| (i as f32 * 0.21).sin()).collect();
|
||||
let rot = Rotation::new(0xA5A5_A5A5, 128);
|
||||
let p1 = Sketch::from_embedding(&v, 3);
|
||||
let p2 = Sketch::from_embedding_rotated(&v, 3, &rot);
|
||||
assert_eq!(p1.embedding_dim(), p2.embedding_dim());
|
||||
assert_eq!(p1.sketch_version(), p2.sketch_version());
|
||||
assert_eq!(p1.packed_bytes().len(), p2.packed_bytes().len());
|
||||
// The rotation actually changed the bits (else it would be a no-op on
|
||||
// this correlated input).
|
||||
assert_ne!(
|
||||
p1.packed_bytes(),
|
||||
p2.packed_bytes(),
|
||||
"rotation should change the sign bits on correlated input"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotated_sketch_is_deterministic_for_seed() {
|
||||
// Same (seed, dim) rotation → identical sketch bits across constructions
|
||||
// (the index-time == query-time contract, at the sketch layer).
|
||||
let v: Vec<f32> = (0..96).map(|i| ((i * 5 % 11) as f32 - 5.0) * 0.3).collect();
|
||||
let s1 = Sketch::from_embedding_rotated(&v, 1, &Rotation::new(7, 96));
|
||||
let s2 = Sketch::from_embedding_rotated(&v, 1, &Rotation::new(7, 96));
|
||||
assert_eq!(s1.distance_unchecked(&s2), 0, "same seed must agree exactly");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rotated_bank_self_match_is_zero_distance() {
|
||||
// A rotated bank queried with the same embedding it stored must return
|
||||
// that id at distance 0 — proves the bank rotates index and query in
|
||||
// the same frame.
|
||||
let rot = Rotation::new(0xBEEF, 64);
|
||||
let mut bank = SketchBank::with_rotation(rot);
|
||||
let v: Vec<f32> = (0..64).map(|i| (i as f32 * 0.37).cos()).collect();
|
||||
bank.insert_embedding(42, &v, 1).unwrap();
|
||||
let top = bank.topk_embedding(&v, 1, 1).unwrap();
|
||||
assert_eq!(top.len(), 1);
|
||||
assert_eq!(top[0].0, 42);
|
||||
assert_eq!(top[0].1, 0, "self-query in a rotated bank must be distance 0");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pass2_coverage_not_worse_than_pass1() {
|
||||
// The core regression: on a small fixed anisotropic fixture, Pass-2
|
||||
// (rotation) coverage must be >= Pass-1 coverage. Rotation must not
|
||||
// *hurt* recall. (We do not assert a hard >= 90% here — that is the
|
||||
// measurement reported in the ADR, not a unit-test invariant — but we
|
||||
// do pin that rotation is not a regression.)
|
||||
use crate::coverage::{measure_pass1, measure_pass2, CoverageParams};
|
||||
let p = CoverageParams {
|
||||
n: 512,
|
||||
n_queries: 32,
|
||||
n_clusters: 32,
|
||||
..CoverageParams::aether_default(0x00C0_FFEE)
|
||||
};
|
||||
let c1 = measure_pass1(p).coverage;
|
||||
let c2 = measure_pass2(p, 0x1234_5678_9ABC_DEF0).coverage;
|
||||
assert!(
|
||||
c2 + 1e-9 >= c1,
|
||||
"Pass-2 coverage {c2:.4} regressed below Pass-1 {c1:.4}"
|
||||
);
|
||||
}
|
||||
|
||||
/// Deterministic, test-runnable coverage measurement that PRINTS the
|
||||
/// numbers quoted in ADR-084 / ADR-156 §8. Run with `--nocapture` to see:
|
||||
/// cargo test -p wifi-densepose-ruvector --no-default-features \
|
||||
/// pass2_coverage_report -- --nocapture
|
||||
#[test]
|
||||
fn pass2_coverage_report() {
|
||||
use crate::coverage::{measure_pass1, measure_pass2, CoverageParams};
|
||||
let base = CoverageParams::aether_default(0xAD00_0084);
|
||||
let rot_seed = 0x5EED_C0DE_1234_5678u64;
|
||||
println!(
|
||||
"\n=== ADR-156 §8 RaBitQ Pass-2 coverage report (anisotropic synthetic) ==="
|
||||
);
|
||||
println!(
|
||||
"dim={} N={} K={} queries={} master_seed=0x{:X} rotation_seed=0x{:X}",
|
||||
base.dim, base.n, base.k, base.n_queries, base.seed, rot_seed
|
||||
);
|
||||
// Strict bar: candidate_k == K.
|
||||
let p1 = measure_pass1(base).coverage;
|
||||
let p2 = measure_pass2(base, rot_seed).coverage;
|
||||
println!(
|
||||
"candidate_k=K={:<2} Pass1={:6.2}% Pass2={:6.2}% bar=90% {}",
|
||||
base.k,
|
||||
p1 * 100.0,
|
||||
p2 * 100.0,
|
||||
if p2 >= 0.90 { "PASS" } else { "BELOW-BAR" }
|
||||
);
|
||||
// Over-fetch curve (models fetch C >= K candidates, refine to K).
|
||||
for &c in &[16usize, 24, 32, 64] {
|
||||
let pc = CoverageParams {
|
||||
candidate_k: c,
|
||||
..base
|
||||
};
|
||||
let cp1 = measure_pass1(pc).coverage;
|
||||
let cp2 = measure_pass2(pc, rot_seed).coverage;
|
||||
println!(
|
||||
"candidate_k={:<3} Pass1={:6.2}% Pass2={:6.2}%",
|
||||
c,
|
||||
cp1 * 100.0,
|
||||
cp2 * 100.0
|
||||
);
|
||||
}
|
||||
println!("========================================================================\n");
|
||||
// Always-true sanity so the test asserts something.
|
||||
assert!((0.0..=1.0).contains(&p1));
|
||||
assert!((0.0..=1.0).contains(&p2));
|
||||
}
|
||||
}
|
||||
|
||||
@@ -6944,8 +6944,12 @@ async fn main() {
|
||||
eprintln!("Starting training for {} epochs...", args.epochs);
|
||||
let result = t.run_training(train_data, val_data);
|
||||
eprintln!("Training complete in {:.1}s", result.total_time_secs);
|
||||
// ADR-155 §2.1: `best_pck` is RAW-threshold PCK (no torso norm) and
|
||||
// `best_oks` uses the fake-Gold area=1.0 proxy — NOT the canonical
|
||||
// hip↔hip `pck_canonical` / COCO OKS. Label them distinctly so the
|
||||
// printed numbers are never read as claim-grade canonical metrics.
|
||||
eprintln!(
|
||||
" Best epoch: {}, PCK@0.2: {:.4}, OKS mAP: {:.4}",
|
||||
" Best epoch: {}, pck_raw@0.2: {:.4}, oks_map(area=1.0 proxy): {:.4}",
|
||||
result.best_epoch, result.best_pck, result.best_oks
|
||||
);
|
||||
|
||||
|
||||
@@ -285,7 +285,24 @@ impl WarmupCosineScheduler {
|
||||
|
||||
// ── Validation metrics ─────────────────────────────────────────────────────
|
||||
|
||||
/// Percentage of Correct Keypoints at a distance threshold.
|
||||
/// **RAW-threshold** Percentage of Correct Keypoints — a keypoint is correct
|
||||
/// iff its raw L2 distance to the target is `≤ thr`, with **NO torso/bbox
|
||||
/// normalization**.
|
||||
///
|
||||
/// # ADR-155 §2.1 / §8 — DIVERGENT from canonical (relabel, do NOT conflate)
|
||||
///
|
||||
/// This is **not** the canonical hip↔hip torso-normalized
|
||||
/// `wifi_densepose_train::pck_canonical`. It is the most divergent PCK in the
|
||||
/// workspace: an unnormalized raw-distance count (the ADR-155 §1 "PCK-4
|
||||
/// raw-threshold" class). It drives the live sensing-server CLI's reported
|
||||
/// `best_pck` (see `Trainer::compute_validation_metrics`, `main.rs` training
|
||||
/// path), which prints/serializes as `PCK@0.2` — that label is **raw-threshold
|
||||
/// PCK**, NOT canonical PCK@0.2. ADR-155 Milestone-1 resolves the collision by
|
||||
/// relabelling the *reported* number (`pck_raw@0.2` in logs/JSON) rather than
|
||||
/// silently changing this `pub` API's math; unifying onto `pck_canonical`
|
||||
/// (requires a torso scale + the train crate as a dep) is a tracked §8 backlog
|
||||
/// item. The ADR-155 §1 table did not enumerate this live `trainer.rs` kernel —
|
||||
/// flagged here as a missed divergence.
|
||||
pub fn pck_at_threshold(pred: &[(f32, f32, f32)], target: &[(f32, f32, f32)], thr: f32) -> f32 {
|
||||
let n = pred.len().min(target.len());
|
||||
if n == 0 {
|
||||
@@ -340,6 +357,20 @@ pub fn oks_single(
|
||||
}
|
||||
|
||||
/// Mean OKS over multiple predictions (simplified mAP).
|
||||
///
|
||||
/// # ADR-155 §2.1 / §8 — FAKE-GOLD `area = 1.0` (flagged finding, not yet fixed)
|
||||
///
|
||||
/// This passes `area = 1.0` to [`oks_single`] — the **exact "fake Gold tier"
|
||||
/// pattern** ADR-155 §2.1 said it had closed in `ruview_metrics` / the train
|
||||
/// crate's `compute_oks`. With keypoints in a small coordinate range and
|
||||
/// `area = 1.0`, every squared distance is tiny relative to `2 σ² area`, so the
|
||||
/// exponential kernel returns ≈1.0 and the reported OKS is inflated regardless
|
||||
/// of pose quality. This live sensing-server kernel was **not** in the ADR-155
|
||||
/// §1 table and is still on the inflating `area = 1.0` path; it drives the live
|
||||
/// `best_oks` (`main.rs`). Until it is unified onto the canonical
|
||||
/// pose-extent-derived scale (tracked as an ADR-155 §8 backlog item), the value
|
||||
/// is relabelled `oks_map(area=1.0 proxy)` everywhere it surfaces and must NOT
|
||||
/// be read as a claim-grade COCO OKS.
|
||||
pub fn oks_map(preds: &[Vec<(f32, f32, f32)>], targets: &[Vec<(f32, f32, f32)>]) -> f32 {
|
||||
let n = preds.len().min(targets.len());
|
||||
if n == 0 {
|
||||
@@ -349,6 +380,7 @@ pub fn oks_map(preds: &[Vec<(f32, f32, f32)>], targets: &[Vec<(f32, f32, f32)>])
|
||||
.iter()
|
||||
.zip(targets.iter())
|
||||
.take(n)
|
||||
// area = 1.0 is the fake-Gold proxy (see fn doc / ADR-155 §8).
|
||||
.map(|(p, t)| oks_single(p, t, &COCO_KEYPOINT_SIGMAS, 1.0))
|
||||
.sum();
|
||||
s / n as f32
|
||||
@@ -1271,6 +1303,34 @@ mod tests {
|
||||
fn pck_all_wrong_is_0() {
|
||||
assert!(pck_at_threshold(&mkp(0.0), &mkp(100.0), 0.2) < 1e-6);
|
||||
}
|
||||
|
||||
/// ADR-155 §2.1 / §8: pin that the live `pck_at_threshold` is **raw-threshold**
|
||||
/// (no torso normalization) and is therefore a genuinely different metric
|
||||
/// from the canonical hip↔hip PCK — justifying RELABEL, not silent unify.
|
||||
///
|
||||
/// Two scenes with the **same absolute keypoint error** but **different torso
|
||||
/// sizes** must get the **same** raw PCK (because raw PCK ignores scale),
|
||||
/// whereas a torso-normalized PCK would score them differently. We assert the
|
||||
/// raw verdict is scale-invariant: a 0.15-unit error is "correct" at thr=0.2
|
||||
/// regardless of how far apart the hips are.
|
||||
#[test]
|
||||
fn pck_at_threshold_is_raw_unnormalized_not_canonical() {
|
||||
// Target: one keypoint at origin, vis=1. (Single-joint scene.)
|
||||
let target = vec![(0.0f32, 0.0f32, 1.0f32)];
|
||||
// Prediction off by exactly 0.15 in x.
|
||||
let pred = vec![(0.15f32, 0.0f32, 1.0f32)];
|
||||
|
||||
// Raw threshold 0.2: 0.15 ≤ 0.2 ⇒ correct ⇒ PCK 1.0, independent of any
|
||||
// torso scale (there is none in this kernel).
|
||||
let raw = pck_at_threshold(&pred, &target, 0.2);
|
||||
assert!((raw - 1.0).abs() < 1e-6, "raw PCK ignores scale; expected 1.0, got {raw}");
|
||||
|
||||
// Same absolute error, tighter raw threshold 0.1: 0.15 > 0.1 ⇒ wrong ⇒ 0.0.
|
||||
// The verdict is set purely by the absolute distance vs thr — the
|
||||
// signature of a raw (un-normalized) PCK, NOT canonical torso-relative PCK.
|
||||
let raw_tight = pck_at_threshold(&pred, &target, 0.1);
|
||||
assert!(raw_tight < 1e-6, "raw PCK is absolute-distance only; expected 0.0, got {raw_tight}");
|
||||
}
|
||||
#[test]
|
||||
fn oks_perfect_is_1() {
|
||||
assert!((oks_single(&mkp(0.0), &mkp(0.0), &COCO_KEYPOINT_SIGMAS, 1.0) - 1.0).abs() < 1e-6);
|
||||
|
||||
@@ -163,15 +163,26 @@ fn default_lora_epochs() -> u32 {
|
||||
}
|
||||
|
||||
/// Current training status (returned by `GET /api/v1/train/status`).
|
||||
///
|
||||
/// NOTE (ADR-155 §2.1): `val_pck` / `best_pck` carry the **torso-HEIGHT** PCK
|
||||
/// proxy from [`compute_pck_torso_height`] (pixel-space, nose→hip-midpoint),
|
||||
/// which is **deliberately distinct** from the canonical hip↔hip
|
||||
/// `wifi_densepose_train::pck_canonical`. The wire field names are kept for
|
||||
/// API/UI back-compat, but these are torso-height progress proxies, NOT the
|
||||
/// canonical reported-accuracy PCK@0.2 and must not be conflated with it.
|
||||
/// `val_oks` is a rough `0.88 × pck` proxy, not a COCO OKS.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TrainingStatus {
|
||||
pub active: bool,
|
||||
pub epoch: u32,
|
||||
pub total_epochs: u32,
|
||||
pub train_loss: f64,
|
||||
/// Torso-HEIGHT PCK@0.2 proxy (NOT canonical hip↔hip PCK — see struct doc).
|
||||
pub val_pck: f64,
|
||||
/// Rough OKS proxy (`0.88 × val_pck`), NOT a COCO OKS.
|
||||
pub val_oks: f64,
|
||||
pub lr: f64,
|
||||
/// Best torso-HEIGHT PCK@0.2 proxy seen so far (NOT canonical PCK).
|
||||
pub best_pck: f64,
|
||||
pub best_epoch: u32,
|
||||
pub patience_remaining: u32,
|
||||
@@ -199,13 +210,19 @@ impl Default for TrainingStatus {
|
||||
}
|
||||
|
||||
/// Progress update sent over WebSocket.
|
||||
///
|
||||
/// NOTE (ADR-155 §2.1): `val_pck`/`val_oks` are the torso-HEIGHT PCK proxy and
|
||||
/// its `0.88×` OKS proxy — NOT the canonical hip↔hip `pck_canonical`/COCO OKS.
|
||||
/// See [`TrainingStatus`] and [`compute_pck_torso_height`].
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct TrainingProgress {
|
||||
pub epoch: u32,
|
||||
pub batch: u32,
|
||||
pub total_batches: u32,
|
||||
pub train_loss: f64,
|
||||
/// Torso-HEIGHT PCK@0.2 proxy (NOT canonical hip↔hip PCK).
|
||||
pub val_pck: f64,
|
||||
/// Rough OKS proxy (`0.88 × val_pck`), NOT a COCO OKS.
|
||||
pub val_oks: f64,
|
||||
pub lr: f64,
|
||||
pub phase: String,
|
||||
@@ -789,19 +806,39 @@ fn compute_mse(predictions: &[Vec<f64>], targets: &[Vec<f64>]) -> f64 {
|
||||
total / (n * predictions[0].len().max(1) as f64)
|
||||
}
|
||||
|
||||
/// Compute PCK@0.2 (Percentage of Correct Keypoints at threshold 0.2 of torso height).
|
||||
/// Compute **PCK_torso-height@`threshold`** — a metric DELIBERATELY DISTINCT
|
||||
/// from the canonical hip↔hip PCK (`wifi_densepose_train::pck_canonical`).
|
||||
///
|
||||
/// Torso height is estimated as the distance between nose (kp 0) and the midpoint
|
||||
/// of the two hips (kps 11, 12).
|
||||
/// # Why this is `_torso_height`, not the canonical PCK (ADR-155 §2.1 / §8 — RESOLVED)
|
||||
///
|
||||
/// NOTE (ADR-155 §Tier-1.1, DEFERRED backlog item): this is a *separate*,
|
||||
/// torso-HEIGHT-normalized implementation distinct from the canonical hip↔hip
|
||||
/// `wifi_densepose_train::metrics::pck_canonical`. It drives the live server's
|
||||
/// in-loop progress display and is NOT the reported-accuracy metric. Unifying
|
||||
/// it with the canonical definition is tracked as a deferred ADR-155 backlog
|
||||
/// item — left unchanged here to avoid destabilising the running training
|
||||
/// service and to keep this milestone scoped to the train/nn subsystem.
|
||||
fn compute_pck(predictions: &[Vec<f64>], targets: &[Vec<f64>], threshold_ratio: f64) -> f64 {
|
||||
/// ADR-155 unified the workspace's reported-accuracy PCK to ONE definition:
|
||||
/// **hip↔hip torso WIDTH**, on `[0,1]`-normalized `[17,2]` keypoints. This
|
||||
/// live-server function is **not** that metric and must never be conflated
|
||||
/// with it. It is genuinely different on three load-bearing axes:
|
||||
///
|
||||
/// 1. **Coordinate space.** It operates on **pixel-space** teacher targets on a
|
||||
/// 640×480 canvas (`compute_teacher_targets`), not `[0,1]` MM-Fi coords —
|
||||
/// hence the `.max(50.0)` *pixel* torso floor below.
|
||||
/// 2. **Normalization axis.** It normalizes by torso **HEIGHT** (vertical
|
||||
/// nose→hip-midpoint distance), not canonical torso **WIDTH** (hip↔hip).
|
||||
/// Routing through `pck_canonical` would silently change which body axis
|
||||
/// sets the scale, altering every live number this drives.
|
||||
/// 3. **Layout.** It consumes `[17×3]`-flattened `Vec<Vec<f64>>` (x,y,z), not
|
||||
/// `ndarray::Array2<f32>`; `wifi-densepose-sensing-server` does not depend on
|
||||
/// `wifi-densepose-train` or `ndarray`.
|
||||
///
|
||||
/// Because the math is load-bearing (a running training service's progress
|
||||
/// display), ADR-155 Milestone-1 resolves the label collision by **relabelling**
|
||||
/// rather than forcing a false identity: the function and the metric it produces
|
||||
/// are named `_torso_height` everywhere they surface (this fn, the log line),
|
||||
/// and the `val_pck`/`best_pck` API fields document the divergence. The reported
|
||||
/// in-loop value is a torso-HEIGHT PCK proxy on heuristic teacher targets — it is
|
||||
/// NOT a claim-grade accuracy number and is NOT the canonical hip↔hip PCK@0.2.
|
||||
fn compute_pck_torso_height(
|
||||
predictions: &[Vec<f64>],
|
||||
targets: &[Vec<f64>],
|
||||
threshold_ratio: f64,
|
||||
) -> f64 {
|
||||
if predictions.is_empty() {
|
||||
return 0.0;
|
||||
}
|
||||
@@ -1166,8 +1203,11 @@ async fn real_training_loop(
|
||||
|
||||
let val_preds = forward(val_x, &weights, &bias, n_feat, N_TARGETS);
|
||||
let val_mse = compute_mse(&val_preds, val_y);
|
||||
let val_pck = compute_pck(&val_preds, val_y, 0.2);
|
||||
let val_oks = val_pck * 0.88; // approximate OKS from PCK
|
||||
// torso-HEIGHT PCK proxy (NOT canonical hip↔hip PCK@0.2 — see
|
||||
// compute_pck_torso_height / ADR-155 §2.1). Surfaced as `val_pck` for
|
||||
// wire-format back-compat but is a torso-height proxy, not a claim.
|
||||
let val_pck = compute_pck_torso_height(&val_preds, val_y, 0.2);
|
||||
let val_oks = val_pck * 0.88; // rough OKS proxy from torso-height PCK (NOT canonical OKS)
|
||||
|
||||
let val_progress = TrainingProgress {
|
||||
epoch,
|
||||
@@ -1224,14 +1264,17 @@ async fn real_training_loop(
|
||||
};
|
||||
}
|
||||
|
||||
// Logs label this `pck_torso_h@0.2` so it is never read as the canonical
|
||||
// hip↔hip PCK@0.2 (ADR-155 §2.1). It is a torso-HEIGHT proxy on heuristic
|
||||
// teacher targets, not a claim-grade accuracy number.
|
||||
info!(
|
||||
"Epoch {epoch}/{total_epochs}: loss={train_loss:.6}, val_pck={val_pck:.4}, \
|
||||
val_mse={val_mse:.4}, best_pck={best_pck:.4}@{best_epoch}, patience={patience_remaining}"
|
||||
"Epoch {epoch}/{total_epochs}: loss={train_loss:.6}, pck_torso_h@0.2={val_pck:.4}, \
|
||||
val_mse={val_mse:.4}, best_pck_torso_h={best_pck:.4}@{best_epoch}, patience={patience_remaining}"
|
||||
);
|
||||
|
||||
// Early stopping.
|
||||
if patience_remaining == 0 {
|
||||
info!("Early stopping at epoch {epoch} (best={best_epoch}, PCK={best_pck:.4})");
|
||||
info!("Early stopping at epoch {epoch} (best={best_epoch}, pck_torso_h@0.2={best_pck:.4})");
|
||||
let stop_progress = TrainingProgress {
|
||||
epoch,
|
||||
batch: total_batches,
|
||||
@@ -1368,7 +1411,7 @@ async fn real_training_loop(
|
||||
error!("Failed to write trained model RVF: {e}");
|
||||
} else {
|
||||
info!(
|
||||
"Trained model saved: {} ({} params, PCK={:.4})",
|
||||
"Trained model saved: {} ({} params, pck_torso_h@0.2={:.4})",
|
||||
rvf_path.display(),
|
||||
total_params,
|
||||
best_pck
|
||||
@@ -1969,13 +2012,69 @@ mod tests {
|
||||
tgt[37] = 100.0; // right hip y
|
||||
let preds = vec![tgt.clone()];
|
||||
let targets = vec![tgt];
|
||||
let pck = compute_pck(&preds, &targets, 0.2);
|
||||
let pck = compute_pck_torso_height(&preds, &targets, 0.2);
|
||||
assert!(
|
||||
(pck - 1.0).abs() < 1e-9,
|
||||
"Perfect prediction should give PCK=1.0"
|
||||
);
|
||||
}
|
||||
|
||||
/// ADR-155 §2.1 / §8 (RESOLVED): the live-server PCK is torso-HEIGHT
|
||||
/// normalized and is **labelled distinctly** from the canonical hip↔hip
|
||||
/// PCK. This test pins the *divergence*: the same prediction error gives a
|
||||
/// different verdict under torso-HEIGHT (nose→hip, vertical) than under an
|
||||
/// independent hip↔hip-WIDTH (horizontal) computation — proving the two are
|
||||
/// genuinely different metrics, so relabelling (not unifying) is correct.
|
||||
///
|
||||
/// Construction (pixel-space, one keypoint of interest = left_shoulder kp5):
|
||||
/// * nose(0).y = 0, hips(11,12).y = 100 ⇒ torso HEIGHT = 100.
|
||||
/// ⇒ torso-height threshold @0.2 = 20 px.
|
||||
/// * hips x: left(11).x = 0, right(12).x = 10 ⇒ torso WIDTH = 10.
|
||||
/// ⇒ a hip↔hip-WIDTH threshold @0.2 = 2 px.
|
||||
/// * Predicted kp5 is 5 px off in x from its target.
|
||||
/// - torso-HEIGHT verdict: 5 ≤ 20 ⇒ CORRECT.
|
||||
/// - hip↔hip-WIDTH verdict: 5 > 2 ⇒ WRONG.
|
||||
/// The two normalizers must disagree on this exact sample.
|
||||
#[test]
|
||||
fn torso_pck_is_labelled_distinctly_from_canonical() {
|
||||
// Targets: hips define both axes; kp5 is the joint under test.
|
||||
let mut tgt = vec![0.0; N_TARGETS];
|
||||
tgt[0 * 3] = 0.0; // nose x
|
||||
tgt[0 * 3 + 1] = 0.0; // nose y
|
||||
tgt[5 * 3] = 0.0; // l_shoulder x (target)
|
||||
tgt[5 * 3 + 1] = 50.0; // l_shoulder y
|
||||
tgt[11 * 3] = 0.0; // l_hip x
|
||||
tgt[11 * 3 + 1] = 100.0; // l_hip y
|
||||
tgt[12 * 3] = 10.0; // r_hip x ⇒ hip↔hip WIDTH = 10
|
||||
tgt[12 * 3 + 1] = 100.0; // r_hip y ⇒ torso HEIGHT (nose→hip) = 100
|
||||
|
||||
// Prediction: identical except kp5 x is +5 px off.
|
||||
let mut pred = tgt.clone();
|
||||
pred[5 * 3] = 5.0; // 5 px error in x on kp5
|
||||
|
||||
// Live-server torso-HEIGHT PCK: error 5 ≤ 0.2×100 = 20 ⇒ kp5 counts
|
||||
// correct, so ALL 17 joints correct ⇒ PCK = 1.0.
|
||||
let pck_height = compute_pck_torso_height(&[pred.clone()], &[tgt.clone()], 0.2);
|
||||
assert!(
|
||||
(pck_height - 1.0).abs() < 1e-9,
|
||||
"torso-HEIGHT PCK should pass kp5 (5px ≤ 20px), got {pck_height}"
|
||||
);
|
||||
|
||||
// Independent hip↔hip-WIDTH verdict on kp5: error 5 > 0.2×10 = 2 ⇒ kp5
|
||||
// is WRONG. This is the canonical normalization axis (width, not height).
|
||||
let hip_width = (tgt[12 * 3] - tgt[11 * 3]).abs(); // = 10
|
||||
let kp5_err = (pred[5 * 3] - tgt[5 * 3]).abs(); // = 5
|
||||
let width_threshold = 0.2 * hip_width; // = 2
|
||||
assert!(
|
||||
kp5_err > width_threshold,
|
||||
"hip↔hip-WIDTH should REJECT kp5 (5px > 2px) — the two metrics must disagree"
|
||||
);
|
||||
|
||||
// Therefore torso-HEIGHT PCK (1.0) ≠ the hip↔hip-WIDTH verdict on this
|
||||
// sample: the live `val_pck` is genuinely a different metric and is
|
||||
// correctly labelled `pck_torso_h`, never conflated with canonical PCK.
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn infer_pose_returns_17_keypoints() {
|
||||
let n_sub = 56;
|
||||
|
||||
@@ -50,6 +50,10 @@ pub mod error;
|
||||
pub mod eval;
|
||||
pub mod geometry;
|
||||
pub mod mae;
|
||||
/// Canonical pose-metric core (ADR-155 §Tier-1.1) — `pck_canonical` /
|
||||
/// `oks_canonical`, available **without** the `tch-backend` feature so the
|
||||
/// single metric definition is reachable from the workspace test gate.
|
||||
pub mod metrics_core;
|
||||
pub mod rapid_adapt;
|
||||
pub mod ruview_metrics;
|
||||
pub mod signal_features;
|
||||
@@ -79,6 +83,12 @@ pub mod occupancy_bench;
|
||||
pub mod trainer;
|
||||
|
||||
// Convenient re-exports at the crate root.
|
||||
// Canonical metric (ADR-155 §Tier-1.1) — re-exported un-gated so the single
|
||||
// source of truth is reachable with or without `tch-backend`.
|
||||
pub use metrics_core::{
|
||||
canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP,
|
||||
COCO_KP_SIGMAS,
|
||||
};
|
||||
pub use config::TrainingConfig;
|
||||
pub use dataset::{
|
||||
CsiDataset, CsiSample, DataLoader, MmFiDataset, SyntheticConfig, SyntheticCsiDataset,
|
||||
|
||||
@@ -4,7 +4,8 @@
|
||||
//!
|
||||
//! As of ADR-155 there is exactly **one** definition of PCK and one of OKS
|
||||
//! that may be used for any *reported / claimed* number. They live in the
|
||||
//! [`canonical`] region of this module:
|
||||
//! un-gated [`crate::metrics_core`] module (so the single definition is
|
||||
//! reachable with or without `tch-backend`) and are re-exported here:
|
||||
//!
|
||||
//! - [`pck_canonical`] — **PCK\@k, torso-normalized.** A keypoint `j` is
|
||||
//! correct iff `‖pred_j − gt_j‖₂ ≤ k · torso`, where
|
||||
@@ -47,177 +48,23 @@ use petgraph::visit::EdgeRef;
|
||||
use ruvector_mincut::{DynamicMinCut, MinCutBuilder};
|
||||
use std::collections::VecDeque;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// COCO keypoint sigmas (17 joints)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Per-joint sigma values from the COCO keypoint evaluation standard.
|
||||
///
|
||||
/// These constants control the spread of the OKS Gaussian kernel for each
|
||||
/// of the 17 COCO-defined body joints.
|
||||
pub const COCO_KP_SIGMAS: [f32; 17] = [
|
||||
0.026, // 0 nose
|
||||
0.025, // 1 left_eye
|
||||
0.025, // 2 right_eye
|
||||
0.035, // 3 left_ear
|
||||
0.035, // 4 right_ear
|
||||
0.079, // 5 left_shoulder
|
||||
0.079, // 6 right_shoulder
|
||||
0.072, // 7 left_elbow
|
||||
0.072, // 8 right_elbow
|
||||
0.062, // 9 left_wrist
|
||||
0.062, // 10 right_wrist
|
||||
0.107, // 11 left_hip
|
||||
0.107, // 12 right_hip
|
||||
0.087, // 13 left_knee
|
||||
0.087, // 14 right_knee
|
||||
0.089, // 15 left_ankle
|
||||
0.089, // 16 right_ankle
|
||||
];
|
||||
|
||||
// ===========================================================================
|
||||
// CANONICAL METRIC — single source of truth (ADR-155 §Tier-1.1)
|
||||
// ===========================================================================
|
||||
//
|
||||
// The canonical metric core was hoisted to the **un-gated** `metrics_core`
|
||||
// module (ADR-155 Milestone-1) so the single PCK/OKS definition is reachable
|
||||
// from the workspace test gate (`--no-default-features`) — this whole `metrics`
|
||||
// module is gated behind `tch-backend`. Re-exporting here keeps every existing
|
||||
// call site (`MetricsAccumulator`, `compute_pck`, the deprecated v2 path, the
|
||||
// tch trainer) pointing at exactly **one** implementation.
|
||||
|
||||
/// COCO joint index of the left hip.
|
||||
pub const CANON_LEFT_HIP: usize = 11;
|
||||
/// COCO joint index of the right hip.
|
||||
pub const CANON_RIGHT_HIP: usize = 12;
|
||||
|
||||
/// Canonical torso normalizer used by [`pck_canonical`].
|
||||
///
|
||||
/// Returns `‖left_hip − right_hip‖₂` (COCO joints 11↔12) when both hips are
|
||||
/// visible; otherwise the diagonal of the visible-keypoint bounding box. The
|
||||
/// distance is computed in whatever coordinate space `kpts` is expressed in
|
||||
/// (the canonical PCK requires pred and gt to share that space).
|
||||
///
|
||||
/// Returns `None` when there is no positive-extent reference available (no
|
||||
/// visible hips *and* a degenerate/empty visible bbox), signalling the caller
|
||||
/// that the sample cannot be scored.
|
||||
pub fn canonical_torso_size(gt_kpts: &Array2<f32>, visibility: &Array1<f32>) -> Option<f32> {
|
||||
let n = gt_kpts.shape()[0].min(visibility.len());
|
||||
if CANON_LEFT_HIP < n
|
||||
&& CANON_RIGHT_HIP < n
|
||||
&& visibility[CANON_LEFT_HIP] >= 0.5
|
||||
&& visibility[CANON_RIGHT_HIP] >= 0.5
|
||||
{
|
||||
let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]];
|
||||
let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]];
|
||||
let torso = (dx * dx + dy * dy).sqrt();
|
||||
if torso > 1e-6 {
|
||||
return Some(torso);
|
||||
}
|
||||
}
|
||||
// Fallback: bounding-box diagonal of visible keypoints.
|
||||
let diag = bounding_box_diagonal(gt_kpts, visibility, n);
|
||||
if diag > 1e-6 {
|
||||
Some(diag)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
/// **CANONICAL PCK\@`threshold`** — the single definition used for every
|
||||
/// reported number (ADR-155 §Tier-1.1).
|
||||
///
|
||||
/// A keypoint `j` with `visibility[j] >= 0.5` is *correct* iff
|
||||
/// `‖pred_j − gt_j‖₂ ≤ threshold · torso`, where `torso` is
|
||||
/// [`canonical_torso_size`] in the keypoint coordinate space.
|
||||
///
|
||||
/// # Returns
|
||||
/// `(correct, total, pck)` where `pck ∈ [0,1]`. **`(0, 0, 0.0)` when no
|
||||
/// keypoint is visible or the torso reference is degenerate** — a sample with
|
||||
/// no measurable evidence scores 0, never 1 (closes the
|
||||
/// `MetricsAccumulator` false-perfect bug).
|
||||
pub fn pck_canonical(
|
||||
pred_kpts: &Array2<f32>,
|
||||
gt_kpts: &Array2<f32>,
|
||||
visibility: &Array1<f32>,
|
||||
threshold: f32,
|
||||
) -> (usize, usize, f32) {
|
||||
let n = pred_kpts.shape()[0]
|
||||
.min(gt_kpts.shape()[0])
|
||||
.min(visibility.len());
|
||||
let torso = match canonical_torso_size(gt_kpts, visibility) {
|
||||
Some(t) => t,
|
||||
// No measurable reference scale ⇒ cannot score ⇒ 0.0 (NOT trivially 1.0).
|
||||
None => return (0, 0, 0.0),
|
||||
};
|
||||
let dist_threshold = threshold * torso;
|
||||
|
||||
let mut correct = 0usize;
|
||||
let mut total = 0usize;
|
||||
for j in 0..n {
|
||||
if visibility[j] < 0.5 {
|
||||
continue;
|
||||
}
|
||||
total += 1;
|
||||
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
|
||||
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
|
||||
if (dx * dx + dy * dy).sqrt() <= dist_threshold {
|
||||
correct += 1;
|
||||
}
|
||||
}
|
||||
let pck = if total > 0 {
|
||||
correct as f32 / total as f32
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
(correct, total, pck)
|
||||
}
|
||||
|
||||
/// **CANONICAL OKS** — COCO Object Keypoint Similarity (ADR-155 §Tier-1.1).
|
||||
///
|
||||
/// `OKS = Σⱼ exp(−dⱼ² / (2 s² kⱼ²)) · δ(vⱼ≥0.5) / Σⱼ δ(vⱼ≥0.5)` with
|
||||
/// `s = sqrt(area)` derived from the **GT keypoint bounding box in the
|
||||
/// keypoint coordinate space** (via [`canonical_torso_size`]² as a robust,
|
||||
/// always-positive proxy for area when an explicit bbox is unavailable).
|
||||
///
|
||||
/// Passing normalized [0,1] coordinates is fine *because the scale is derived
|
||||
/// from the pose itself* — there is no `s = 1.0` escape hatch that would make
|
||||
/// OKS ≈ 1.0 for any pose (the historical "fake Gold tier" bug).
|
||||
///
|
||||
/// Returns 0.0 when no keypoints are visible or the scale is degenerate.
|
||||
pub fn oks_canonical(
|
||||
pred_kpts: &Array2<f32>,
|
||||
gt_kpts: &Array2<f32>,
|
||||
visibility: &Array1<f32>,
|
||||
) -> f32 {
|
||||
let n = pred_kpts.shape()[0]
|
||||
.min(gt_kpts.shape()[0])
|
||||
.min(visibility.len());
|
||||
// Scale: area ≈ torso². Derived from the actual pose, never a fixed 1.0.
|
||||
let s = match canonical_torso_size(gt_kpts, visibility) {
|
||||
Some(t) => t,
|
||||
None => return 0.0,
|
||||
};
|
||||
let s_sq = s * s;
|
||||
if s_sq <= 0.0 {
|
||||
return 0.0;
|
||||
}
|
||||
let mut num = 0.0f32;
|
||||
let mut den = 0.0f32;
|
||||
for j in 0..n {
|
||||
if visibility[j] < 0.5 {
|
||||
continue;
|
||||
}
|
||||
den += 1.0;
|
||||
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
|
||||
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
|
||||
let d_sq = dx * dx + dy * dy;
|
||||
let k = if j < COCO_KP_SIGMAS.len() {
|
||||
COCO_KP_SIGMAS[j]
|
||||
} else {
|
||||
0.07
|
||||
};
|
||||
num += (-d_sq / (2.0 * s_sq * k * k)).exp();
|
||||
}
|
||||
if den > 0.0 {
|
||||
num / den
|
||||
} else {
|
||||
0.0
|
||||
}
|
||||
}
|
||||
pub use crate::metrics_core::{
|
||||
canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP,
|
||||
COCO_KP_SIGMAS,
|
||||
};
|
||||
// `bounding_box_diagonal` stays crate-internal (metrics_core); the only caller
|
||||
// here is a test, which references it via its full path.
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// MetricsResult
|
||||
@@ -400,39 +247,9 @@ impl MetricsAccumulator {
|
||||
// ---------------------------------------------------------------------------
|
||||
// Geometric helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Compute the Euclidean diagonal of the bounding box of visible keypoints.
|
||||
///
|
||||
/// The bounding box is defined by the axis-aligned extent of all keypoints
|
||||
/// that have `visibility[j] >= 0.5`. Returns 0.0 if there are no visible
|
||||
/// keypoints or all are co-located.
|
||||
fn bounding_box_diagonal(kp: &Array2<f32>, visibility: &Array1<f32>, num_joints: usize) -> f32 {
|
||||
let mut x_min = f32::MAX;
|
||||
let mut x_max = f32::MIN;
|
||||
let mut y_min = f32::MAX;
|
||||
let mut y_max = f32::MIN;
|
||||
let mut any_visible = false;
|
||||
|
||||
for j in 0..num_joints {
|
||||
if visibility[j] >= 0.5 {
|
||||
let x = kp[[j, 0]];
|
||||
let y = kp[[j, 1]];
|
||||
x_min = x_min.min(x);
|
||||
x_max = x_max.max(x);
|
||||
y_min = y_min.min(y);
|
||||
y_max = y_max.max(y);
|
||||
any_visible = true;
|
||||
}
|
||||
}
|
||||
|
||||
if !any_visible {
|
||||
return 0.0;
|
||||
}
|
||||
|
||||
let w = (x_max - x_min).max(0.0);
|
||||
let h = (y_max - y_min).max(0.0);
|
||||
(w * w + h * h).sqrt()
|
||||
}
|
||||
//
|
||||
// `bounding_box_diagonal` (the canonical normalizer's bbox fallback) now lives
|
||||
// in `metrics_core` alongside the canonical metric it supports.
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Per-sample PCK and OKS free functions (required by the training evaluator)
|
||||
@@ -1441,7 +1258,7 @@ mod tests {
|
||||
fn bbox_diagonal_unit_square() {
|
||||
let kp = array![[0.0_f32, 0.0], [1.0, 1.0]];
|
||||
let vis = array![2.0_f32, 2.0];
|
||||
let diag = bounding_box_diagonal(&kp, &vis, 2);
|
||||
let diag = crate::metrics_core::bounding_box_diagonal(&kp, &vis, 2);
|
||||
assert_abs_diff_eq!(diag, std::f32::consts::SQRT_2, epsilon = 1e-5);
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,251 @@
|
||||
//! Canonical pose-metric core (ADR-155 §Tier-1.1) — the single source of truth
|
||||
//! for PCK and OKS, **available without the `tch-backend` feature**.
|
||||
//!
|
||||
//! # Why this module exists (ADR-155 Milestone-1, §8 backlog resolution)
|
||||
//!
|
||||
//! The full [`crate::metrics`] module is gated behind `tch-backend` (libtorch
|
||||
//! FFI) because it also hosts the trainer accumulators, min-cut matchers, and
|
||||
//! ndarray/petgraph machinery. But the *metric definition itself*
|
||||
//! ([`pck_canonical`], [`oks_canonical`], [`canonical_torso_size`]) depends only
|
||||
//! on `ndarray` — no tch. Hoisting those four functions here makes the canonical
|
||||
//! definition reachable from the workspace test gate
|
||||
//! (`cargo test --no-default-features`) so the integration test
|
||||
//! (`tests/test_metrics.rs`) can validate the **production** function against
|
||||
//! hand-computed fixtures, instead of testing an independent reimplementation
|
||||
//! that could be wrong the same way (the §8 "reference kernels" finding).
|
||||
//!
|
||||
//! [`crate::metrics`] re-exports every item here, so all existing call sites and
|
||||
//! the tch-gated trainer path are unchanged: there is still exactly **one**
|
||||
//! implementation of each metric, now in one *un-gated* place.
|
||||
//!
|
||||
//! # CANONICAL METRIC (the only definitions valid for a *reported* number)
|
||||
//!
|
||||
//! - [`pck_canonical`] — **PCK\@k, torso-normalized.** A keypoint `j` is correct
|
||||
//! iff `‖pred_j − gt_j‖₂ ≤ k · torso`, where
|
||||
//! `torso = ‖left_hip(11) − right_hip(12)‖₂` in the keypoint coordinate space,
|
||||
//! with a bounding-box-diagonal fallback when the hips are not both visible.
|
||||
//! **Zero visible joints ⇒ `(0, 0, 0.0)`** — no evidence scores 0, never 1.
|
||||
//! - [`oks_canonical`] — **COCO OKS** with `s = sqrt(area)` derived from the GT
|
||||
//! pose extent (never a fixed `1.0`); a degenerate pose returns 0.0.
|
||||
//!
|
||||
//! # No mock data
|
||||
//!
|
||||
//! All computations are grounded in real geometry following published metric
|
||||
//! definitions. No random or synthetic values are introduced at runtime.
|
||||
|
||||
use ndarray::{Array1, Array2};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// COCO keypoint sigmas (17 joints)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Per-joint sigma values from the COCO keypoint evaluation standard.
|
||||
///
|
||||
/// These constants control the spread of the OKS Gaussian kernel for each
|
||||
/// of the 17 COCO-defined body joints.
|
||||
pub const COCO_KP_SIGMAS: [f32; 17] = [
|
||||
0.026, // 0 nose
|
||||
0.025, // 1 left_eye
|
||||
0.025, // 2 right_eye
|
||||
0.035, // 3 left_ear
|
||||
0.035, // 4 right_ear
|
||||
0.079, // 5 left_shoulder
|
||||
0.079, // 6 right_shoulder
|
||||
0.072, // 7 left_elbow
|
||||
0.072, // 8 right_elbow
|
||||
0.062, // 9 left_wrist
|
||||
0.062, // 10 right_wrist
|
||||
0.107, // 11 left_hip
|
||||
0.107, // 12 right_hip
|
||||
0.087, // 13 left_knee
|
||||
0.087, // 14 right_knee
|
||||
0.089, // 15 left_ankle
|
||||
0.089, // 16 right_ankle
|
||||
];
|
||||
|
||||
// ===========================================================================
|
||||
// CANONICAL METRIC — single source of truth (ADR-155 §Tier-1.1)
|
||||
// ===========================================================================
|
||||
|
||||
/// COCO joint index of the left hip.
|
||||
pub const CANON_LEFT_HIP: usize = 11;
|
||||
/// COCO joint index of the right hip.
|
||||
pub const CANON_RIGHT_HIP: usize = 12;
|
||||
|
||||
/// Compute the Euclidean diagonal of the bounding box of visible keypoints.
|
||||
///
|
||||
/// The bounding box is defined by the axis-aligned extent of all keypoints
|
||||
/// that have `visibility[j] >= 0.5`. Returns 0.0 if there are no visible
|
||||
/// keypoints or all are co-located.
|
||||
pub(crate) fn bounding_box_diagonal(
|
||||
kp: &Array2<f32>,
|
||||
visibility: &Array1<f32>,
|
||||
num_joints: usize,
|
||||
) -> f32 {
|
||||
let mut x_min = f32::MAX;
|
||||
let mut x_max = f32::MIN;
|
||||
let mut y_min = f32::MAX;
|
||||
let mut y_max = f32::MIN;
|
||||
let mut any_visible = false;
|
||||
|
||||
for j in 0..num_joints {
|
||||
if visibility[j] >= 0.5 {
|
||||
let x = kp[[j, 0]];
|
||||
let y = kp[[j, 1]];
|
||||
x_min = x_min.min(x);
|
||||
x_max = x_max.max(x);
|
||||
y_min = y_min.min(y);
|
||||
y_max = y_max.max(y);
|
||||
any_visible = true;
|
||||
}
|
||||
}
|
||||
|
||||
if !any_visible {
|
||||
return 0.0;
|
||||
}
|
||||
|
||||
let w = (x_max - x_min).max(0.0);
|
||||
let h = (y_max - y_min).max(0.0);
|
||||
(w * w + h * h).sqrt()
|
||||
}
|
||||
|
||||
/// Canonical torso normalizer used by [`pck_canonical`].
|
||||
///
|
||||
/// Returns `‖left_hip − right_hip‖₂` (COCO joints 11↔12) when both hips are
|
||||
/// visible; otherwise the diagonal of the visible-keypoint bounding box. The
|
||||
/// distance is computed in whatever coordinate space `gt_kpts` is expressed in
|
||||
/// (the canonical PCK requires pred and gt to share that space).
|
||||
///
|
||||
/// Returns `None` when there is no positive-extent reference available (no
|
||||
/// visible hips *and* a degenerate/empty visible bbox), signalling the caller
|
||||
/// that the sample cannot be scored.
|
||||
pub fn canonical_torso_size(gt_kpts: &Array2<f32>, visibility: &Array1<f32>) -> Option<f32> {
|
||||
let n = gt_kpts.shape()[0].min(visibility.len());
|
||||
if CANON_LEFT_HIP < n
|
||||
&& CANON_RIGHT_HIP < n
|
||||
&& visibility[CANON_LEFT_HIP] >= 0.5
|
||||
&& visibility[CANON_RIGHT_HIP] >= 0.5
|
||||
{
|
||||
let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]];
|
||||
let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]];
|
||||
let torso = (dx * dx + dy * dy).sqrt();
|
||||
if torso > 1e-6 {
|
||||
return Some(torso);
|
||||
}
|
||||
}
|
||||
// Fallback: bounding-box diagonal of visible keypoints.
|
||||
let diag = bounding_box_diagonal(gt_kpts, visibility, n);
|
||||
if diag > 1e-6 {
|
||||
Some(diag)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
/// **CANONICAL PCK\@`threshold`** — the single definition used for every
|
||||
/// reported number (ADR-155 §Tier-1.1).
|
||||
///
|
||||
/// A keypoint `j` with `visibility[j] >= 0.5` is *correct* iff
|
||||
/// `‖pred_j − gt_j‖₂ ≤ threshold · torso`, where `torso` is
|
||||
/// [`canonical_torso_size`] in the keypoint coordinate space.
|
||||
///
|
||||
/// # Returns
|
||||
/// `(correct, total, pck)` where `pck ∈ [0,1]`. **`(0, 0, 0.0)` when no
|
||||
/// keypoint is visible or the torso reference is degenerate** — a sample with
|
||||
/// no measurable evidence scores 0, never 1 (closes the
|
||||
/// `MetricsAccumulator` false-perfect bug).
|
||||
///
|
||||
/// # Normalization basis (vs other PCK definitions in the workspace)
|
||||
/// This is **hip↔hip torso WIDTH** normalized in the keypoint coordinate space.
|
||||
/// It is deliberately **distinct** from the live sensing-server's
|
||||
/// `compute_pck_torso_height` (torso-HEIGHT nose→hip, pixel-space) — see ADR-155
|
||||
/// §2.1 / §8. Those numbers must never be conflated.
|
||||
pub fn pck_canonical(
|
||||
pred_kpts: &Array2<f32>,
|
||||
gt_kpts: &Array2<f32>,
|
||||
visibility: &Array1<f32>,
|
||||
threshold: f32,
|
||||
) -> (usize, usize, f32) {
|
||||
let n = pred_kpts.shape()[0]
|
||||
.min(gt_kpts.shape()[0])
|
||||
.min(visibility.len());
|
||||
let torso = match canonical_torso_size(gt_kpts, visibility) {
|
||||
Some(t) => t,
|
||||
// No measurable reference scale ⇒ cannot score ⇒ 0.0 (NOT trivially 1.0).
|
||||
None => return (0, 0, 0.0),
|
||||
};
|
||||
let dist_threshold = threshold * torso;
|
||||
|
||||
let mut correct = 0usize;
|
||||
let mut total = 0usize;
|
||||
for j in 0..n {
|
||||
if visibility[j] < 0.5 {
|
||||
continue;
|
||||
}
|
||||
total += 1;
|
||||
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
|
||||
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
|
||||
if (dx * dx + dy * dy).sqrt() <= dist_threshold {
|
||||
correct += 1;
|
||||
}
|
||||
}
|
||||
let pck = if total > 0 {
|
||||
correct as f32 / total as f32
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
(correct, total, pck)
|
||||
}
|
||||
|
||||
/// **CANONICAL OKS** — COCO Object Keypoint Similarity (ADR-155 §Tier-1.1).
|
||||
///
|
||||
/// `OKS = Σⱼ exp(−dⱼ² / (2 s² kⱼ²)) · δ(vⱼ≥0.5) / Σⱼ δ(vⱼ≥0.5)` with
|
||||
/// `s = sqrt(area)` derived from the **GT keypoint bounding box in the
|
||||
/// keypoint coordinate space** (via [`canonical_torso_size`]² as a robust,
|
||||
/// always-positive proxy for area when an explicit bbox is unavailable).
|
||||
///
|
||||
/// Passing normalized [0,1] coordinates is fine *because the scale is derived
|
||||
/// from the pose itself* — there is no `s = 1.0` escape hatch that would make
|
||||
/// OKS ≈ 1.0 for any pose (the historical "fake Gold tier" bug).
|
||||
///
|
||||
/// Returns 0.0 when no keypoints are visible or the scale is degenerate.
|
||||
pub fn oks_canonical(
|
||||
pred_kpts: &Array2<f32>,
|
||||
gt_kpts: &Array2<f32>,
|
||||
visibility: &Array1<f32>,
|
||||
) -> f32 {
|
||||
let n = pred_kpts.shape()[0]
|
||||
.min(gt_kpts.shape()[0])
|
||||
.min(visibility.len());
|
||||
// Scale: area ≈ torso². Derived from the actual pose, never a fixed 1.0.
|
||||
let s = match canonical_torso_size(gt_kpts, visibility) {
|
||||
Some(t) => t,
|
||||
None => return 0.0,
|
||||
};
|
||||
let s_sq = s * s;
|
||||
if s_sq <= 0.0 {
|
||||
return 0.0;
|
||||
}
|
||||
let mut num = 0.0f32;
|
||||
let mut den = 0.0f32;
|
||||
for j in 0..n {
|
||||
if visibility[j] < 0.5 {
|
||||
continue;
|
||||
}
|
||||
den += 1.0;
|
||||
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
|
||||
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
|
||||
let d_sq = dx * dx + dy * dy;
|
||||
let k = if j < COCO_KP_SIGMAS.len() {
|
||||
COCO_KP_SIGMAS[j]
|
||||
} else {
|
||||
0.07
|
||||
};
|
||||
num += (-d_sq / (2.0 * s_sq * k * k)).exp();
|
||||
}
|
||||
if den > 0.0 {
|
||||
num / den
|
||||
} else {
|
||||
0.0
|
||||
}
|
||||
}
|
||||
@@ -1,14 +1,34 @@
|
||||
//! Integration tests for [`wifi_densepose_train::metrics`].
|
||||
//! Integration tests for `wifi_densepose_train` pose metrics.
|
||||
//!
|
||||
//! The metrics module is only compiled when the `tch-backend` feature is
|
||||
//! enabled (because it is gated in `lib.rs`). Tests that use
|
||||
//! `EvalMetrics` are wrapped in `#[cfg(feature = "tch-backend")]`.
|
||||
//! # ADR-155 Milestone-1 — §8 "reference kernels" resolution
|
||||
//!
|
||||
//! The deterministic PCK, OKS, and Hungarian assignment tests that require
|
||||
//! no tch dependency are implemented inline in the non-gated section below
|
||||
//! using hand-computed helper functions.
|
||||
//! The full `metrics` module is gated behind `tch-backend` (libtorch), but the
|
||||
//! **canonical** metric core (`pck_canonical` / `oks_canonical`) now lives in
|
||||
//! the un-gated `metrics_core` module and is re-exported at the crate root, so
|
||||
//! these workspace tests (run under `--no-default-features`) validate the
|
||||
//! **production** functions directly.
|
||||
//!
|
||||
//! All inputs are fixed, deterministic arrays — no `rand`, no OS entropy.
|
||||
//! Previously this file carried its own local `compute_pck` / `compute_oks`
|
||||
//! reimplementations and asserted properties of *those* — a test that could
|
||||
//! not catch a bug in the canonical implementation (both could be wrong the
|
||||
//! same way). That is fixed two ways here:
|
||||
//!
|
||||
//! 1. **Fixture tests** (`canonical_pck_matches_hand_computed_fixture`,
|
||||
//! `canonical_oks_*`) assert the production `pck_canonical` / `oks_canonical`
|
||||
//! equal *hand-computed* expected values — numbers worked out by hand below,
|
||||
//! NOT a second implementation of the same algorithm.
|
||||
//! 2. **Differential test** (`test_kernel_agrees_with_canonical`) keeps a small
|
||||
//! independent reference kernel and asserts it **agrees** with the canonical
|
||||
//! function on shared inputs (in the torso=raw-threshold regime where the two
|
||||
//! coincide), so the reference adds genuine cross-check value rather than
|
||||
//! duplicating the algorithm under test.
|
||||
//!
|
||||
//! `EvalMetrics` tests remain `#[cfg(feature = "tch-backend")]` (that type is in
|
||||
//! the gated module). All inputs are fixed, deterministic arrays — no `rand`,
|
||||
//! no OS entropy.
|
||||
|
||||
use ndarray::{Array1, Array2};
|
||||
use wifi_densepose_train::{oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Tests that use `EvalMetrics` (requires tch-backend because the metrics
|
||||
@@ -163,146 +183,236 @@ mod eval_metrics_tests {
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Deterministic PCK computation tests (pure Rust, no tch, no feature gate)
|
||||
// Canonical PCK / OKS validation (production functions, no tch)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Compute PCK@threshold for a (pred, gt) pair.
|
||||
fn compute_pck(pred: &[[f64; 2]], gt: &[[f64; 2]], threshold: f64) -> f64 {
|
||||
let n = pred.len();
|
||||
if n == 0 {
|
||||
return 0.0;
|
||||
/// Build a 17-joint pose in `[0,1]` coordinates from an `(x, y)` per-joint list,
|
||||
/// padding any unspecified joint to `(0,0)`. Returns `[17, 2]`.
|
||||
fn pose17(joints: &[(usize, f32, f32)]) -> Array2<f32> {
|
||||
let mut a = Array2::<f32>::zeros((17, 2));
|
||||
for &(j, x, y) in joints {
|
||||
a[[j, 0]] = x;
|
||||
a[[j, 1]] = y;
|
||||
}
|
||||
let correct = pred
|
||||
.iter()
|
||||
.zip(gt.iter())
|
||||
.filter(|(p, g)| {
|
||||
let dx = p[0] - g[0];
|
||||
let dy = p[1] - g[1];
|
||||
(dx * dx + dy * dy).sqrt() <= threshold
|
||||
})
|
||||
.count();
|
||||
correct as f64 / n as f64
|
||||
a
|
||||
}
|
||||
|
||||
/// PCK of a perfect prediction (pred == gt) must be 1.0.
|
||||
#[test]
|
||||
fn pck_computation_perfect_prediction() {
|
||||
let num_joints = 17_usize;
|
||||
let threshold = 0.5_f64;
|
||||
|
||||
let pred: Vec<[f64; 2]> = (0..num_joints)
|
||||
.map(|j| [j as f64 * 0.05, j as f64 * 0.04])
|
||||
.collect();
|
||||
let gt = pred.clone();
|
||||
|
||||
let pck = compute_pck(&pred, >, threshold);
|
||||
assert!(
|
||||
(pck - 1.0).abs() < 1e-9,
|
||||
"PCK for perfect prediction must be 1.0, got {pck}"
|
||||
);
|
||||
}
|
||||
|
||||
/// PCK of completely wrong predictions must be 0.0.
|
||||
#[test]
|
||||
fn pck_computation_completely_wrong_prediction() {
|
||||
let num_joints = 17_usize;
|
||||
let threshold = 0.05_f64;
|
||||
|
||||
let gt: Vec<[f64; 2]> = (0..num_joints).map(|_| [0.0, 0.0]).collect();
|
||||
let pred: Vec<[f64; 2]> = (0..num_joints).map(|_| [10.0, 10.0]).collect();
|
||||
|
||||
let pck = compute_pck(&pred, >, threshold);
|
||||
assert!(
|
||||
pck.abs() < 1e-9,
|
||||
"PCK for completely wrong prediction must be 0.0, got {pck}"
|
||||
);
|
||||
}
|
||||
|
||||
/// PCK is monotone: a prediction closer to GT scores higher.
|
||||
#[test]
|
||||
fn pck_monotone_with_accuracy() {
|
||||
let gt = vec![[0.5_f64, 0.5_f64]];
|
||||
let close_pred = vec![[0.51_f64, 0.50_f64]];
|
||||
let far_pred = vec![[0.60_f64, 0.50_f64]];
|
||||
let very_far_pred = vec![[0.90_f64, 0.50_f64]];
|
||||
|
||||
let threshold = 0.05_f64;
|
||||
let pck_close = compute_pck(&close_pred, >, threshold);
|
||||
let pck_far = compute_pck(&far_pred, >, threshold);
|
||||
let pck_very_far = compute_pck(&very_far_pred, >, threshold);
|
||||
|
||||
assert!(
|
||||
pck_close >= pck_far,
|
||||
"closer prediction must score at least as high: close={pck_close}, far={pck_far}"
|
||||
);
|
||||
assert!(
|
||||
pck_far >= pck_very_far,
|
||||
"farther prediction must score lower or equal: far={pck_far}, very_far={pck_very_far}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Deterministic OKS computation tests (pure Rust, no tch, no feature gate)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Compute OKS for a (pred, gt) pair.
|
||||
fn compute_oks(pred: &[[f64; 2]], gt: &[[f64; 2]], sigma: f64, scale: f64) -> f64 {
|
||||
let n = pred.len();
|
||||
if n == 0 {
|
||||
return 0.0;
|
||||
/// Visibility vector with the listed joints visible (`2.0`), rest invisible.
|
||||
fn vis17(visible: &[usize]) -> Array1<f32> {
|
||||
let mut v = Array1::<f32>::zeros(17);
|
||||
for &j in visible {
|
||||
v[j] = 2.0;
|
||||
}
|
||||
let denom = 2.0 * scale * scale * sigma * sigma;
|
||||
let sum: f64 = pred
|
||||
.iter()
|
||||
.zip(gt.iter())
|
||||
.map(|(p, g)| {
|
||||
let dx = p[0] - g[0];
|
||||
let dy = p[1] - g[1];
|
||||
(-(dx * dx + dy * dy) / denom).exp()
|
||||
})
|
||||
.sum();
|
||||
sum / n as f64
|
||||
v
|
||||
}
|
||||
|
||||
/// OKS of a perfect prediction (pred == gt) must be 1.0.
|
||||
/// **Fixture test (Goal B).** The production `pck_canonical` must equal a value
|
||||
/// worked out *by hand* on a constructed pose — not a reimplementation.
|
||||
///
|
||||
/// Construction (all coordinates in `[0,1]`):
|
||||
/// * left_hip(11) = (0.40, 0.50), right_hip(12) = (0.60, 0.50)
|
||||
/// ⇒ canonical torso = hip↔hip width = 0.20.
|
||||
/// * threshold = 0.2 ⇒ dist_threshold = 0.2 × 0.20 = **0.04**.
|
||||
/// * Visible joints: {0 (nose), 5 (l_shoulder), 11, 12}. (4 visible.)
|
||||
/// - nose(0): pred == gt ⇒ dist 0.00 ≤ 0.04 ⇒ CORRECT
|
||||
/// - l_shoulder(5): pred off by dy=0.10 ⇒ dist 0.10 > 0.04 ⇒ wrong
|
||||
/// - l_hip(11): pred == gt ⇒ dist 0.00 ≤ 0.04 ⇒ CORRECT
|
||||
/// - r_hip(12): pred off by dx=0.03 ⇒ dist 0.03 ≤ 0.04 ⇒ CORRECT
|
||||
/// Hand result: correct = 3, total = 4, pck = 3/4 = **0.75**.
|
||||
#[test]
|
||||
fn oks_perfect_prediction_is_one() {
|
||||
let num_joints = 17_usize;
|
||||
let sigma = 0.05_f64;
|
||||
let scale = 1.0_f64;
|
||||
fn canonical_pck_matches_hand_computed_fixture() {
|
||||
let gt = pose17(&[
|
||||
(0, 0.50, 0.20), // nose
|
||||
(5, 0.35, 0.35), // left_shoulder
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
let pred = pose17(&[
|
||||
(0, 0.50, 0.20), // exact
|
||||
(5, 0.35, 0.45), // off by dy = 0.10 (> 0.04)
|
||||
(CANON_LEFT_HIP, 0.40, 0.50), // exact
|
||||
(CANON_RIGHT_HIP, 0.63, 0.50), // off by dx = 0.03 (<= 0.04)
|
||||
]);
|
||||
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
|
||||
let pred: Vec<[f64; 2]> = (0..num_joints).map(|j| [j as f64 * 0.05, 0.3]).collect();
|
||||
let gt = pred.clone();
|
||||
|
||||
let oks = compute_oks(&pred, >, sigma, scale);
|
||||
let (correct, total, pck) = pck_canonical(&pred, >, &vis, 0.2);
|
||||
assert_eq!(total, 4, "4 visible joints expected, got {total}");
|
||||
assert_eq!(correct, 3, "hand-computed: 3 of 4 within 0.04, got {correct}");
|
||||
assert!(
|
||||
(oks - 1.0).abs() < 1e-9,
|
||||
"OKS for perfect prediction must be 1.0, got {oks}"
|
||||
(pck - 0.75).abs() < 1e-6,
|
||||
"hand-computed PCK is 0.75, got {pck}"
|
||||
);
|
||||
}
|
||||
|
||||
/// OKS must decrease as the L2 distance between pred and GT increases.
|
||||
/// Pin the **normalizer**: PCK uses hip↔hip torso width. A prediction error of
|
||||
/// 0.18 (just under 0.2 × torso=1.0 wide hips) is CORRECT, but the same error
|
||||
/// is WRONG once the hips are squeezed to width 0.20 (threshold 0.04). If the
|
||||
/// implementation ignored the torso normalizer this test would fail.
|
||||
#[test]
|
||||
fn oks_decreases_with_distance() {
|
||||
let sigma = 0.05_f64;
|
||||
let scale = 1.0_f64;
|
||||
fn canonical_pck_uses_hip_to_hip_torso_normalizer() {
|
||||
// Wide hips: width 1.0 ⇒ threshold 0.2. An error of 0.18 on joint 5 is OK.
|
||||
let gt_wide = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.0, 0.5), (CANON_RIGHT_HIP, 1.0, 0.5)]);
|
||||
let pred_wide = pose17(&[(5, 0.68, 0.50), (CANON_LEFT_HIP, 0.0, 0.5), (CANON_RIGHT_HIP, 1.0, 0.5)]);
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let (_, _, pck_wide) = pck_canonical(&pred_wide, >_wide, &vis, 0.2);
|
||||
|
||||
let gt = vec![[0.5_f64, 0.5_f64]];
|
||||
let pred_d0 = vec![[0.5_f64, 0.5_f64]];
|
||||
let pred_d1 = vec![[0.6_f64, 0.5_f64]];
|
||||
let pred_d2 = vec![[1.0_f64, 0.5_f64]];
|
||||
|
||||
let oks_d0 = compute_oks(&pred_d0, >, sigma, scale);
|
||||
let oks_d1 = compute_oks(&pred_d1, >, sigma, scale);
|
||||
let oks_d2 = compute_oks(&pred_d2, >, sigma, scale);
|
||||
// Narrow hips: width 0.20 ⇒ threshold 0.04. Same 0.18 error on joint 5 is wrong.
|
||||
let gt_narrow = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.5), (CANON_RIGHT_HIP, 0.60, 0.5)]);
|
||||
let pred_narrow = pose17(&[(5, 0.68, 0.50), (CANON_LEFT_HIP, 0.40, 0.5), (CANON_RIGHT_HIP, 0.60, 0.5)]);
|
||||
let (_, _, pck_narrow) = pck_canonical(&pred_narrow, >_narrow, &vis, 0.2);
|
||||
|
||||
// Joints 11/12 are exact (correct in both); joint 5 flips.
|
||||
// Wide: 3/3 = 1.0; Narrow: 2/3 ≈ 0.667.
|
||||
assert!((pck_wide - 1.0).abs() < 1e-6, "wide-hip PCK should be 1.0, got {pck_wide}");
|
||||
assert!(
|
||||
oks_d0 > oks_d1,
|
||||
"OKS at distance 0 must be > OKS at distance 0.1: {oks_d0} vs {oks_d1}"
|
||||
(pck_narrow - 2.0 / 3.0).abs() < 1e-6,
|
||||
"narrow-hip PCK should be 2/3 (joint 5 now out of tolerance), got {pck_narrow}"
|
||||
);
|
||||
}
|
||||
|
||||
/// The claim-inflating bug: no visible joints must score **0.0**, never 1.0.
|
||||
#[test]
|
||||
fn canonical_pck_zero_visible_is_zero() {
|
||||
let kpts = pose17(&[(CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
|
||||
let vis = vis17(&[]); // nothing visible
|
||||
let (correct, total, pck) = pck_canonical(&kpts, &kpts, &vis, 0.2);
|
||||
assert_eq!((correct, total), (0, 0));
|
||||
assert_eq!(pck, 0.0, "no-visible-joint PCK must be 0.0 (not the old 1.0)");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Canonical OKS validation (production function, no tch)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// **Fixture test (Goal B).** A perfect prediction (pred == gt) makes every
|
||||
/// Gaussian term `exp(0) = 1`, so the canonical OKS is exactly **1.0** —
|
||||
/// hand-evident, independent of the (positive) scale.
|
||||
#[test]
|
||||
fn canonical_oks_perfect_prediction_is_one() {
|
||||
let gt = pose17(&[
|
||||
(0, 0.50, 0.20),
|
||||
(5, 0.35, 0.35),
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let oks = oks_canonical(>, >, &vis);
|
||||
assert!(
|
||||
oks_d1 > oks_d2,
|
||||
"OKS at distance 0.1 must be > OKS at distance 0.5: {oks_d1} vs {oks_d2}"
|
||||
(oks - 1.0).abs() < 1e-6,
|
||||
"OKS for a perfect prediction must be 1.0, got {oks}"
|
||||
);
|
||||
}
|
||||
|
||||
/// **The "fake Gold tier" bug, pinned (Goal B).** On normalized `[0,1]`
|
||||
/// coordinates the historical `s = 1.0` path returned ≈1.0 for *any* pose.
|
||||
/// Canonical derives `s` from the pose extent (here torso width = 0.20), so a
|
||||
/// pose whose visible non-hip joint is off by ~3× the torso scores far below
|
||||
/// the "Gold" tier. Hand bound: for joint 5 with d ≈ 0.60, s = 0.20, k = 0.079,
|
||||
/// the exponent `-d²/(2 s² k²)` is enormously negative ⇒ that term ≈ 0; the two
|
||||
/// (exact) hip terms give 1 each ⇒ OKS ≈ 2/3 at most, and with joint-5 ≈ 0 the
|
||||
/// mean is ≈ 0.667. We assert it is comfortably **< 0.8** (and the wrong joint
|
||||
/// contributes ≈ 0), i.e. nowhere near the old ≈1.0.
|
||||
#[test]
|
||||
fn canonical_oks_not_one_for_wrong_pose_on_normalized_coords() {
|
||||
let gt = pose17(&[
|
||||
(5, 0.30, 0.50),
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
// Joint 5 dragged 0.60 away (3× the 0.20 torso); hips exact.
|
||||
let pred = pose17(&[
|
||||
(5, 0.90, 0.50),
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let oks = oks_canonical(&pred, >, &vis);
|
||||
assert!(
|
||||
oks < 0.8,
|
||||
"wrong-pose OKS on [0,1] coords must NOT be ≈1.0 (fake-Gold bug); got {oks}"
|
||||
);
|
||||
// The two exact hips alone give 2/3; the wrong joint must add ~nothing.
|
||||
assert!(
|
||||
(oks - 2.0 / 3.0).abs() < 0.05,
|
||||
"wrong joint should contribute ≈0 ⇒ OKS ≈ 2/3, got {oks}"
|
||||
);
|
||||
}
|
||||
|
||||
/// Canonical OKS decreases monotonically with prediction error.
|
||||
#[test]
|
||||
fn canonical_oks_decreases_with_distance() {
|
||||
let gt = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let mk = |x5: f32| pose17(&[(5, x5, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
|
||||
|
||||
let oks0 = oks_canonical(&mk(0.50), >, &vis);
|
||||
let oks1 = oks_canonical(&mk(0.52), >, &vis);
|
||||
let oks2 = oks_canonical(&mk(0.60), >, &vis);
|
||||
assert!(oks0 > oks1, "OKS must drop as error grows: {oks0} vs {oks1}");
|
||||
assert!(oks1 > oks2, "OKS must drop as error grows: {oks1} vs {oks2}");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Differential cross-check: independent reference kernel vs canonical (Goal B)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// A deliberately *independent* PCK reference implementation in the simplest
|
||||
/// regime — a **raw distance threshold** (no torso normalization). It is kept
|
||||
/// only to cross-check the canonical function, not to define the metric.
|
||||
fn reference_pck_raw(pred: &[(f32, f32)], gt: &[(f32, f32)], dist_threshold: f32) -> (usize, usize, f32) {
|
||||
let n = pred.len().min(gt.len());
|
||||
let mut correct = 0usize;
|
||||
for i in 0..n {
|
||||
let dx = pred[i].0 - gt[i].0;
|
||||
let dy = pred[i].1 - gt[i].1;
|
||||
if (dx * dx + dy * dy).sqrt() <= dist_threshold {
|
||||
correct += 1;
|
||||
}
|
||||
}
|
||||
let pck = if n > 0 { correct as f32 / n as f32 } else { 0.0 };
|
||||
(correct, n, pck)
|
||||
}
|
||||
|
||||
/// **Differential test (Goal B).** In the regime where the canonical torso
|
||||
/// normalizer equals 1.0 (hips exactly one unit apart, so `threshold · torso`
|
||||
/// reduces to the raw `threshold`), the canonical PCK and an independent
|
||||
/// raw-threshold reference kernel MUST agree on shared inputs. This catches a
|
||||
/// canonical-side bug that a pure self-fixture could miss, *because* the second
|
||||
/// implementation is genuinely independent.
|
||||
#[test]
|
||||
fn test_kernel_agrees_with_canonical() {
|
||||
// Hips one unit apart ⇒ canonical torso == 1.0 ⇒ dist_threshold == threshold.
|
||||
let gt = pose17(&[
|
||||
(0, 0.30, 0.30),
|
||||
(5, 0.55, 0.55),
|
||||
(7, 0.10, 0.90),
|
||||
(CANON_LEFT_HIP, 0.00, 0.50),
|
||||
(CANON_RIGHT_HIP, 1.00, 0.50),
|
||||
]);
|
||||
let pred = pose17(&[
|
||||
(0, 0.31, 0.30), // err 0.01
|
||||
(5, 0.70, 0.55), // err 0.15
|
||||
(7, 0.10, 0.98), // err 0.08
|
||||
(CANON_LEFT_HIP, 0.00, 0.50), // exact
|
||||
(CANON_RIGHT_HIP, 1.00, 0.50), // exact
|
||||
]);
|
||||
let visible = [0usize, 5, 7, CANON_LEFT_HIP, CANON_RIGHT_HIP];
|
||||
let vis = vis17(&visible);
|
||||
let threshold = 0.1_f32;
|
||||
|
||||
let (c_can, t_can, pck_can) = pck_canonical(&pred, >, &vis, threshold);
|
||||
|
||||
// Reference over the SAME visible joints with the SAME raw threshold
|
||||
// (torso == 1.0 so threshold·torso == threshold).
|
||||
let pred_v: Vec<(f32, f32)> = visible.iter().map(|&j| (pred[[j, 0]], pred[[j, 1]])).collect();
|
||||
let gt_v: Vec<(f32, f32)> = visible.iter().map(|&j| (gt[[j, 0]], gt[[j, 1]])).collect();
|
||||
let (c_ref, t_ref, pck_ref) = reference_pck_raw(&pred_v, >_v, threshold);
|
||||
|
||||
assert_eq!(t_can, t_ref, "visible counts must match: {t_can} vs {t_ref}");
|
||||
assert_eq!(c_can, c_ref, "correct counts must match: {c_can} vs {c_ref}");
|
||||
assert!(
|
||||
(pck_can - pck_ref).abs() < 1e-6,
|
||||
"canonical PCK {pck_can} must agree with independent reference {pck_ref}"
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -309,6 +309,61 @@ impl WlanApiScanner {
|
||||
})
|
||||
}
|
||||
|
||||
/// Measure the **real** achieved rate of a *specific* backend over a
|
||||
/// fixed wall-clock `window`, for an honest native-vs-netsh comparison.
|
||||
///
|
||||
/// Unlike [`benchmark`](Self::benchmark) (which picks native-first and so
|
||||
/// never exercises netsh on a box where native works), this runs back-to-
|
||||
/// back scans on **exactly** the requested backend until `window` elapses,
|
||||
/// then reports the measured scans/second and mean BSSIDs/scan. This is the
|
||||
/// ADR-157 §5 #4 measurement primitive: drive it once per backend over the
|
||||
/// same window and compare the two `rate_hz` values — no rate is assumed.
|
||||
///
|
||||
/// Returns `None` for [`ScanBackend::Native`] when the native path is
|
||||
/// unavailable (non-Windows or WLAN service error), so a caller can report
|
||||
/// the honest negative rather than a fabricated number.
|
||||
///
|
||||
/// # Errors
|
||||
///
|
||||
/// Propagates the first scan error from the chosen backend.
|
||||
pub fn benchmark_backend(
|
||||
&self,
|
||||
backend: ScanBackend,
|
||||
window: Duration,
|
||||
) -> Result<Option<BenchmarkResult>, WifiScanError> {
|
||||
// Probe native availability first so an unavailable native path is an
|
||||
// honest `None`, not an error charged against the comparison.
|
||||
if backend == ScanBackend::Native && wlanapi_native::scan_native().is_err() {
|
||||
return Ok(None);
|
||||
}
|
||||
|
||||
let start = Instant::now();
|
||||
let mut iterations: u32 = 0;
|
||||
let mut total_bssids: u64 = 0;
|
||||
while start.elapsed() < window {
|
||||
let list = match backend {
|
||||
ScanBackend::Native => wlanapi_native::scan_native()?,
|
||||
ScanBackend::Netsh => self.inner.scan_sync()?,
|
||||
};
|
||||
total_bssids += list.len() as u64;
|
||||
iterations += 1;
|
||||
}
|
||||
let total = start.elapsed();
|
||||
let secs = total.as_secs_f64().max(f64::MIN_POSITIVE);
|
||||
|
||||
Ok(Some(BenchmarkResult {
|
||||
iterations,
|
||||
total,
|
||||
rate_hz: f64::from(iterations) / secs,
|
||||
mean_bssids: if iterations == 0 {
|
||||
0.0
|
||||
} else {
|
||||
total_bssids as f64 / f64::from(iterations)
|
||||
},
|
||||
backend,
|
||||
}))
|
||||
}
|
||||
|
||||
/// Perform an async scan by offloading the blocking call to a
|
||||
/// background thread (native-first, netsh fallback inside the task).
|
||||
///
|
||||
@@ -560,4 +615,76 @@ mod tests {
|
||||
);
|
||||
assert!(bench.rate_hz > 0.0);
|
||||
}
|
||||
|
||||
/// ADR-157 §5 #4 honest native-vs-netsh throughput comparison. `#[ignore]`
|
||||
/// (live WLAN, ~20 s). Run with:
|
||||
/// `cargo test -p wifi-densepose-wifiscan -- --ignored --nocapture
|
||||
/// measure_native_vs_netsh_throughput`. Drives BOTH backends over the same
|
||||
/// fixed wall-clock window and prints the measured Hz + BSSIDs/scan for
|
||||
/// each, plus the ratio — the real number, whatever it is (a null/negative
|
||||
/// result is a valid outcome and must be reported, not hidden).
|
||||
#[cfg(windows)]
|
||||
#[test]
|
||||
#[ignore = "live WLAN native-vs-netsh comparison; run with --ignored --nocapture"]
|
||||
fn measure_native_vs_netsh_throughput() {
|
||||
let scanner = WlanApiScanner::new();
|
||||
let window = Duration::from_secs(10);
|
||||
|
||||
let native = scanner
|
||||
.benchmark_backend(ScanBackend::Native, window)
|
||||
.expect("native benchmark must not error");
|
||||
let netsh = scanner
|
||||
.benchmark_backend(ScanBackend::Netsh, window)
|
||||
.expect("netsh benchmark must not error")
|
||||
.expect("netsh is always available on Windows");
|
||||
|
||||
match native {
|
||||
Some(n) => {
|
||||
println!(
|
||||
"NATIVE: {:.2} Hz ({} scans / {:?}), mean {:.1} BSSIDs/scan",
|
||||
n.rate_hz, n.iterations, n.total, n.mean_bssids
|
||||
);
|
||||
println!(
|
||||
"NETSH: {:.2} Hz ({} scans / {:?}), mean {:.1} BSSIDs/scan",
|
||||
netsh.rate_hz, netsh.iterations, netsh.total, netsh.mean_bssids
|
||||
);
|
||||
let ratio = n.rate_hz / netsh.rate_hz.max(f64::MIN_POSITIVE);
|
||||
println!("RATIO native/netsh: {ratio:.2}x");
|
||||
assert!(n.rate_hz > 0.0 && netsh.rate_hz > 0.0);
|
||||
}
|
||||
None => {
|
||||
println!(
|
||||
"NATIVE: unavailable on this box (WLAN service error). \
|
||||
NETSH: {:.2} Hz, mean {:.1} BSSIDs/scan",
|
||||
netsh.rate_hz, netsh.mean_bssids
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Determinism + handle-cleanup pin: N back-to-back native scans must all
|
||||
/// succeed (or all be the same typed error) with no resource exhaustion —
|
||||
/// a `WlanOpenHandle`/`WlanCloseHandle` leak would, after enough calls,
|
||||
/// surface as a `ScanFailed`. Running 50 iterations here exercises the
|
||||
/// open→enum→getlist→free→close cycle repeatedly. `#[ignore]` for CI (live
|
||||
/// WLAN service) but RUN on this box to verify no leak.
|
||||
#[cfg(windows)]
|
||||
#[test]
|
||||
#[ignore = "live WLAN handle-cleanup check; run with --ignored --nocapture"]
|
||||
fn native_scans_dont_leak_handles() {
|
||||
let scanner = WlanApiScanner::new();
|
||||
let mut ok = 0u32;
|
||||
let mut failed = 0u32;
|
||||
for _ in 0..50 {
|
||||
match scanner.scan_native() {
|
||||
Ok(_) => ok += 1,
|
||||
Err(WifiScanError::ScanFailed { .. }) => failed += 1,
|
||||
Err(e) => panic!("unexpected error during leak check: {e:?}"),
|
||||
}
|
||||
}
|
||||
println!("native leak check: {ok} ok, {failed} scan-failed of 50");
|
||||
// No leak ⇒ behavior is consistent across all 50 calls (all ok, or all
|
||||
// the same WLAN-service-off failure) — not a degrade partway through.
|
||||
assert!(ok == 50 || failed == 50, "inconsistent results suggest a leak: {ok} ok / {failed} failed");
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user