feat(ADR-261): ruvector HNSW graph-ANN (25x measured vs linear) + honest SymphonyQG-direction refutation (#1063 )

* feat(ruvector): real float HNSW + SymphonyQG-style quantized-traversal index (ADR-261) Adds the graph-ANN index the ruvector retrieval path was missing (ADR-156 §5 #1 noted there was no HNSW baseline to measure SymphonyQG against). - hnsw.rs: correct float HNSW (Malkov & Yashunin) — multi-layer NSW graph, ef_construction/ef_search, Algorithm-4 neighbour selection, seeded- deterministic level assignment (SplitMix64, reused from rotation.rs), L2 + cosine, brute-force ground truth, full degenerate-case guards. recall@10 correctness gate >=0.95 vs brute force (L2 + cosine). - hnsw_quantized.rs: SymphonyQG-style variant — same graph, traversal scored by cheap 1-bit Hamming over the RaBitQ Pass-2 rotated sign code, final exact-float rerank. - ann_measure.rs: shared deterministic planted-cluster fixture + recall/QPS measurement (ann_bench_report is the ADR source of truth). Fixes an index-out-of-bounds bug the recall gate caught: insert wired bidirectional edges before pushing the node's own link row. +20 tests, ruvector lib 131->151, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * bench(ruvector): criterion ann_bench for HNSW vs quantized vs linear (ADR-261) Times the same shared ann_measure fixture/indices through criterion so the bench and the report test can never measure different graphs. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-261): graph-ANN index ADR with MEASURED HNSW vs quantized verdict ADR-261 (Accepted): float HNSW ~25x QPS over linear scan at recall >=0.99 (the baseline ADR-156 said was missing). Honest negative: the 1-bit quantized traversal is too coarse to beat float HNSW at equal recall at N=10k (best recall 0.738, no >=0.90 equal-recall point) — the SymphonyQG 3.5-17x is NOT reproduced by our 1-bit construction; expected crossover at large N + a multi-bit code. Caveat: our HNSW + our quant, not SymphonyQG's system — direction tested, not a 1:1 reproduction. ADR-156 §5 #1 + §8 backlog: CLAIMED -> MEASURED-direction-tested. CHANGELOG [Unreleased] entry. Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-14 11:03:18 +00:00 · 2026-06-14 02:33:32 -04:00
9 changed files with 1950 additions and 2 deletions
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]

 ### Added
+- **ADR-261: RuVector graph-ANN index — a real HNSW baseline + a SymphonyQG-style quantized variant, MEASURED (honest negative).** Closes the [ADR-156 §5 #1](docs/adr/ADR-156-ruvector-fusion-beyond-sota.md) gap: the SymphonyQG (SIGMOD 2025) **3.5–17× QPS-over-HNSW** claim was CLAIMED-only because **no HNSW baseline existed to compare against**. This adds one. New pure-Rust, `--no-default-features`-buildable modules in `wifi-densepose-ruvector`: `hnsw.rs` (a correct float HNSW — Malkov & Yashunin: multi-layer NSW graph, `ef_construction`/`ef_search`, Algorithm-4 neighbour selection, **seeded-deterministic** level assignment via SplitMix64, L2 + cosine, full degenerate-case guards), `hnsw_quantized.rs` (the SymphonyQG-style variant — the **same** graph traversed by a cheap **1-bit Hamming** score over the RaBitQ Pass-2 rotated sign code, then **exact-float rerank**), `ann_measure.rs` + `benches/ann_bench.rs` (one shared deterministic planted-cluster fixture; the `ann_bench_report` test is the source of truth). **MEASURED (dim=128, N=10k, K=10, `--release`):** float HNSW = **~25× QPS over linear scan at recall ≥0.99** (the baseline this gap needed; recall@10 correctness gate ≥0.95 holds, L2 + cosine). **Honest negative:** the 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at this scale** — its best recall is **0.738**, never reaching the ≥0.90 equal-recall point, so there is **no QPS win** over float HNSW; the 3.5–17× is **not reproduced** by our 1-bit construction here. The recall gate also **caught a real index-out-of-bounds bug** in the insert path (disclosed in ADR-261 §4). Caveat: this is **our** HNSW + **our** 1-bit quant, not SymphonyQG's exact system — it tests the *direction* of the claim, with the expected crossover at large N + a multi-bit traversal code. **We did not tune to manufacture a speedup.** +20 tests (ruvector lib 131→151, 0 failed). ADR-156 §5 #1 / §8 backlog: CLAIMED → **MEASURED-direction-tested**. Python deterministic proof unchanged (off the signal proof path).
 - **ADR-260: RuField MFS — the open specification for camera-free multimodal field sensing.** A common event / tensor / calibration / privacy / provenance model that sits *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and future quantum sensors (each modality emits a normalized `FieldEvent` → `FieldTensor` → `FusionGraph` → `PrivacyClass` → `ProvenanceReceipt`). Published as a **standalone repo** [`ruvnet/rufield`](https://github.com/ruvnet/rufield) and vendored here as the `vendor/rufield` submodule (the `vendor/rvcsi` pattern — not a `v2/` workspace member). The v0.1 reference stack is a self-contained 6-crate Rust workspace (`rufield-core`, `-provenance` [sha256 + ed25519], `-privacy` [P0–P5 guard], `-adapters` [deterministic `SyntheticSim` across wifi_csi/mmwave_radar/infrared_thermal], `-fusion` [graph + TOML weighted-Bayes rules → 7 room-state inferences], `-bench` [deterministic runner + the §31 acceptance test]). **60 tests / 0 failed, clippy-clean.** §27 acceptance criteria 1–8 and 10 PASS; the live dashboard (9) is deferred. **All benchmark metrics are SYNTHETIC** (scored against the simulator's own ground truth — presence/breathing/bed_exit/room_transition F1 = 1.000, nocturnal_scratch 0.923 reported honestly, p95 latency ~0.01 ms, provenance coverage 100%, 0 privacy violations) — they prove the pipeline recovers known truth, **not** field accuracy; real hardware adapters (ESP32 CSI, mmWave, thermal IR) are a documented roadmap item, none validated in v0.1. The Python deterministic proof is unchanged (rufield is off the signal-processing proof path).

 ### Security
@@ -102,7 +102,7 @@ The double-clone elimination is also correctness-neutral: all 100 `viewpoint`/`m

 | # | Candidate | What | Grade | Verdict |
 |---|-----------|------|-------|---------|
-| **1** | **SymphonyQG** (SIGMOD 2025, public code) | Unified quantization + graph ANN; source reports **3.5–17× QPS over HNSW at equal recall**, pure-CPU / edge-portable. | **CLAIMED** (author-measured; **not reproduced on our hardware** — reproduction is future work) | **Lead beyond-SOTA candidate for the ruvector ANN path.** Propose as ACCEPTED-future; cite honestly as "claimed by source, reproduction pending." Best fit because the ruvector retrieval path (AETHER re-ID, sketch prefilter) is exactly an ANN problem and SymphonyQG is CPU/edge-portable like our deployment. |
+| **1** | **SymphonyQG** (SIGMOD 2025, public code) | Unified quantization + graph ANN; source reports **3.5–17× QPS over HNSW at equal recall**, pure-CPU / edge-portable. | **MEASURED-direction-tested** (was CLAIMED) — **[ADR-261](ADR-261-ruvector-graph-ann-index.md)** built the missing HNSW baseline + a SymphonyQG-style 1-bit quantized-traversal variant and **measured** the ratio on our hardware. | **DONE — direction REFUTED at our scale (honest negative).** ADR-261 built the real HNSW baseline (**~25× QPS over linear scan at recall ≥0.99**, the substrate this row wanted) and a quantized variant. At N=10k the 1-bit Hamming traversal is **too coarse** — its best recall is 0.738, never reaching the ≥0.90 equal-recall point, so **no QPS win over float HNSW** (the SymphonyQG 3.5–17× is *not* reproduced by our 1-bit construction here). Caveat: **our HNSW + our 1-bit quant, not SymphonyQG's system**; expected crossover at large N + a multi-bit code. We did **not** tune to manufacture a speedup. |
 | **2** | **Multi-bit / Extended RaBitQ + unbiased estimator** | Extends our existing **1-bit** `sketch.rs` (ADR-084): Pass-2 rotation, multi-bit Pass-3, and the **real RaBitQ unbiased distance estimator** (Gao & Long SIGMOD 2024) reranking the candidate set from the 1-bit code + 8 B/vec side info (§11). | **MEASURED-on-our-hardware** (was CLAIMED) — rotation (§10), multi-bit (§10), and the estimator (§11) all implemented + benchmarked. Rotation lifts strict-K 36%→46%; multi-bit (≤4-bit) reaches 74% strict; **the estimator reaches 49.71% strict (cosine rerank), still short of 90%.** All clear 90% only with over-fetch (estimator improves the factor: 95% at candidate_k=24 vs sign 91.6%). | **DONE — RESOLVED-PARTIAL / NEGATIVE.** Rotation (§10) + estimator (§11) built and MEASURED. The honest negative (no strict-bar 90% from rotation, ≤4-bit, **or the unbiased estimator**) is recorded, not hidden. Over-fetch + Pass-2 is the path that meets the bar (ADR-084's "candidate set" pattern); the estimator lowers the over-fetch factor needed. |
 | **3** | **GraphPose-Fi-style learned antenna-attention + ChebGConv fusion head** | Would replace the current **untrained identity-projection + mean-pool** "attention" (the `CrossViewpointAttention` default is `ProjectionWeights::identity` — not a *learned* attention) with a learned graph fusion head. | **DATA-GATED** (per ADR-152 measurement (b): architecture is **NOT** the current bottleneck — **data is**) | **ACCEPTED-future, data-gated. Do NOT build now.** ADR-152's measured lesson was that swapping architecture without more/better paired data does not move PCK. Building a learned fusion head before the data exists would repeat the mistake ADR-155 §5 also flagged for GraphPose-Fi. |
 | — | **Cramér-Rao / sensor-placement** (`geometry.rs` CRB) | Investigated for a 2026 advance beating the textbook Fisher-information CRB already implemented. | **Investigated — NO ACTION** | **Cleared honestly.** No 2026 method beats the closed-form Fisher-information CRB for this 2-D bearing problem; our implementation is already correct SOTA. (Recording a negative result is a deliberate anti-slop signal.) The only CRB change this milestone is the §2.3 *GDOP* honesty fix, which is a labelling/quantity correction, not an algorithmic one. |
@@ -138,7 +138,7 @@ The double-clone elimination is also correctness-neutral: all 100 `viewpoint`/`m

 The review surfaced more than this milestone scoped. Tracked here for a future ADR-156 milestone:

- **SymphonyQG reproduction** (§5 #1) — reproduce the 3.5–17× QPS-over-HNSW claim on our hardware before integrating into the ruvector ANN path. Currently CLAIMED-only.
+- **SymphonyQG reproduction** (§5 #1) — **RESOLVED-DIRECTION-TESTED** (see [ADR-261](ADR-261-ruvector-graph-ann-index.md)). The missing HNSW baseline + a SymphonyQG-style 1-bit quantized-traversal variant were built and **MEASURED**: float HNSW is ~25× over linear scan at recall ≥0.99 (the baseline this gap needed), but our 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at N=10k** (best recall 0.738) — the 3.5–17× is **not reproduced** by our construction. Honest negative recorded; expected crossover is large N + a multi-bit traversal code. (Caveat: our HNSW + our 1-bit quant, not SymphonyQG's exact system.)
 - **Multi-bit / Extended RaBitQ** (§5 #2) — **RESOLVED-PARTIAL** (see §10). Pass-2 randomized rotation (FHT + seeded ±1 sign flips, `src/rotation.rs`) and a multi-bit Pass-3 experiment landed and were MEASURED against the ADR-084 ≥90% bar. **Honest result: rotation helps (+10pp at the strict bar) and Pass-2 reaches 90% with ~3× over-fetch, but NEITHER rotation nor multi-bit (up to 4-bit) clears the strict candidate_k==K 90% bar on the tested anisotropic distribution.** The original `1-bit sign quantization ships first; rotation/more-bits later if benchmark-measured top-K coverage drops below 90%` deferral is therefore retired: the rotation is built, the bar is characterised, and the residual gap is documented rather than deferred.
 - **Learned cross-viewpoint fusion head** (§5 #3, GraphPose-Fi-style) — **data-gated**: blocked on the paired multi-room data ADR-152 measurement (b) identified as the real bottleneck; do not build the architecture first.
 - **`CrossViewpointAttention` learned projections** — the default `ProjectionWeights::identity` + mean-pool is honest but unlearned; wiring real learned Q/K/V projections is part of the data-gated item above (no learned weights ⇒ the "attention" is currently a geometric-bias-weighted average, which the code/docs should keep stating plainly).
@@ -0,0 +1,172 @@
+# ADR-261: RuVector Graph-ANN Index — a real HNSW baseline + a SymphonyQG-style quantized variant, MEASURED
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted |
+| **Date** | 2026-06-14 |
+| **Deciders** | ruv |
+| **Codebase target** | `wifi-densepose-ruvector` — `hnsw.rs`, `hnsw_quantized.rs`, `ann_measure.rs`, `benches/ann_bench.rs`, docs |
+| **Relates to** | ADR-084 (RaBitQ similarity sensor — 1-bit sketch), ADR-156 (RuVector beyond-SOTA sweep — §5 #1 SymphonyQG, §8/§10/§11 RaBitQ Pass-2/multi-bit/estimator), ADR-024 (AETHER re-ID), ADR-016/017 (RuVector integration) |
+| **Scope** | Build the **missing HNSW graph-ANN baseline** in the ruvector retrieval path, build a **SymphonyQG-style quantized-traversal variant** on the same graph, and **MEASURE** the real recall/QPS ratio between them — closing the ADR-156 §5 #1 gap honestly. Resolves ADR-156 §8 backlog item **"SymphonyQG reproduction"** from **CLAIMED-only** to **MEASURED-direction-tested**. |
+
+---
+
+## 0. PROOF discipline (this ADR's contract)
+
+This project has been publicly accused of "AI slop." This ADR answers with **evidence, not adjectives** — the same contract as ADR-154/156:
+
+- The HNSW index ships a **committed recall@10 correctness gate** (≥ 0.95 vs brute force on a planted-cluster fixture). Low recall means a graph bug; the gate is wired to fail in that case. It **did** fail first — and caught a real index-out-of-bounds bug in the insert path (§4) — which is exactly what a real gate is for.
+- Every QPS/recall number below is **MEASURED** on this box with a committed, deterministic, `--no-default-features`-runnable measurement (`src/ann_measure.rs`, `ann_bench_report`) and a committed criterion bench (`benches/ann_bench.rs`). Both call **one** shared fixture/measurement module, so the bench and the report can never measure different graphs.
+- The **headline result is an honest negative**: at our test scale the SymphonyQG-style quantized variant **does not beat float HNSW at equal recall** — the 1-bit Hamming traversal is too coarse to keep recall up. We report the real numbers, explain *why*, and state the expected large-N crossover. **We did not tune the quantized path to manufacture the 3.5–17× the source claims.** A measured negative + a scale caveat is a valid, publishable result.
+- We are explicit that this is **OUR HNSW + OUR 1-bit quantization, not SymphonyQG's exact system**. It tests the **direction** of the claim on our hardware/data, not a 1:1 reproduction.
+
+Test machine: Windows 11, `cargo test --release`, `std::time::Instant` wall-clock. Numbers are warm medians on this box; the **ratio** is the claim, not the absolute QPS.
+
+Reproduce:
+```bash
+cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --release \
+  ann_bench_report -- --nocapture
+# Larger N: ANN_BENCH_N=50000 cargo test ... --release ann_bench_report -- --nocapture
+cargo bench -p wifi-densepose-ruvector --bench ann_bench
+```
+
+---
+
+## 1. Context
+
+The ruvector crate's retrieval path — AETHER re-ID hot-cache (ADR-024), the `sketch.rs` 1-bit prefilter (ADR-084), room fingerprinting — is, at its core, an **approximate nearest-neighbour (ANN)** problem: dense float embedding in, top-K similar ids out. But **the crate had no graph index**. Every `topk` was either a linear scan (`O(N·d)` per query) or a 1-bit Hamming prefilter over a linear scan. That is `O(N)` per query and does not scale.
+
+[ADR-156 §5 #1](ADR-156-ruvector-fusion-beyond-sota.md) graded **SymphonyQG** (SIGMOD 2025) the **lead beyond-SOTA ANN candidate**, citing the source's claim of **3.5–17× QPS over HNSW at equal recall**, but marked it **CLAIMED**:
+
+> *"author-measured; **not reproduced on our hardware** — reproduction is future work."*
+
+And ADR-156 §8 was blunt about *why* it could not be reproduced: **there was no HNSW baseline to compare against.** You cannot measure a ratio against a baseline that does not exist. This ADR builds that missing baseline, builds the quantized variant that tests the direction of the SymphonyQG bet, and measures the real ratio.
+
+---
+
+## 2. Decision
+
+1. Add a correct, dependency-free **float HNSW** graph index (`hnsw.rs`): the real Malkov & Yashunin (TPAMI 2018) algorithm — multi-layer navigable small-world graph, `ef_construction` / `ef_search`, the Algorithm-4 neighbour-selection heuristic, seeded-deterministic level assignment, L2 + cosine. This is the **baseline** ADR-156 said was missing.
+2. Add a **SymphonyQG-style quantized-traversal variant** (`hnsw_quantized.rs`): the *same* graph (same seed, same structure), but the beam search scores candidates with a **cheap 1-bit Hamming distance** over the RaBitQ Pass-2 rotated sign code (reusing `rotation.rs` + the sign-quantization of `sketch.rs`), then **exact-float reranks** the final candidate set. This is the SymphonyQG bet — cheaper per-node scoring, recovered by a final exact rerank.
+3. **Measure** linear vs float-HNSW vs quantized-HNSW (recall@10, QPS, equal-recall ratios) on one deterministic planted-cluster fixture, and record the honest verdict against the SymphonyQG 3.5–17× claim.
+
+### Why 1-bit Hamming for the quantized traversal
+
+The crate already had the exact pieces SymphonyQG fuses: a deterministic orthogonal rotation (`rotation.rs`, RaBitQ Pass-2) and sign-quantization (`sketch.rs`). A 1-bit code compares by POPCNT Hamming — a few machine words, no per-dimension float work — so it is the cheapest possible traversal score and the most direct test of "can a quantized score keep the beam on the right path." The cost (measured below): the 1-bit code is a *coarse* angle proxy (ADR-156 §10 measured ~46% strict-K coverage for sign-only), and that coarseness is what limits recall here.
+
+---
+
+## 3. Design
+
+### 3.1 `hnsw.rs` — float HNSW (the baseline)
+
+- **Graph.** `links[id][layer]` adjacency; layer 0 holds every node, higher layers exponentially sparser. `m_max` is `2·M` on layer 0, `M` above (the paper's asymmetric degree cap).
+- **Insert.** Greedy-descend the upper layers to a good entry point, then for each layer from the node's level down to 0: `search_layer` for `ef_construction` candidates, `select_neighbours` (Algorithm 4 — keep a candidate only if it is closer to the new node than to any already-selected neighbour, giving diverse navigable edges), wire bidirectional edges, re-prune any neighbour that overflows `m_max`. The node is pushed into the arrays **before** wiring so every `links[*]` index is valid mid-insert (§4 — the bug the gate caught).
+- **Search.** Greedy-descend layers `>0`, then best-first beam search of width `ef` on layer 0; return the closest `k`. Iterative (explicit heaps + visited set) — **no recursion**, bounded by the beam and the visited set.
+- **Determinism.** Level assignment is the only randomness and is driven by a **seeded SplitMix64** (the exact pattern from `rotation.rs`) — never `Date::now`/OS RNG/unseeded `rand`. Same `(seed, params, insertion order)` ⇒ bit-identical graph and search (pinned by `hnsw_is_deterministic_for_seed`).
+- **Robustness.** Empty index, `k==0`, `k>n`, single node, zero-dim, ragged query, `ef<k` all return cleanly — pinned by `*_no_panic` tests.
+
+### 3.2 `hnsw_quantized.rs` — the SymphonyQG-style variant
+
+Same graph as the float index (identical seed/structure — the **only** variable is the scoring), plus a per-node `ceil(D/8)`-byte 1-bit Pass-2 sign code (`D = next_pow2(dim)`). `search_quantized(query, k, ef, rerank)`:
+1. Encode the query to its 1-bit code (one rotation + sign pack).
+2. Greedy-descend + beam-search the graph scoring every visited node by **POPCNT Hamming** (query-code XOR node-code) — no per-dim float work.
+3. **Exact-float rerank** the top `rerank` Hamming candidates with the true L2/cosine metric, return the best `k`.
+
+### 3.3 Security / robustness
+
+Both indices: bounded **iterative** traversal (no unbounded recursion), no panic on empty/degenerate/ragged/zero-dim input (the metric compares over the shorter prefix; zero-norm cosine returns max distance, not NaN). The 1-bit encode handles padded dims via the existing `Rotation::apply_padded`.
+
+---
+
+## 4. The bug the correctness gate caught (disclosed, not hidden)
+
+The first run of the recall@10 gate **panicked**: `index out of bounds: the len is 33 but the index is 33` in `search_layer`. Root cause: `insert` wired bidirectional edges (`links[nbr][l].push(id)`) **before** pushing the new node's own `links[id]` row into the array. A later traversal step in the *same* insert could hop to a neighbour that now pointed at `id` and read `links[id]` — which did not exist yet. Fix: push the node (with empty per-layer link lists) into `vectors`/`links`/`levels` **up front**, then wire edges into its existing slot. The new node has no incoming edges and empty outgoing lists until wiring, so it is unreachable by the searches that run first — pushing early is safe and keeps every index valid. This is exactly why the recall gate exists: a silent low-recall graph and an out-of-bounds panic are both "slop" the gate forces into the open.
+
+---
+
+## 5. The SymphonyQG claim being tested
+
+| Source | Claim | Grade (before this ADR) |
+|--------|-------|-------------------------|
+| SymphonyQG, SIGMOD 2025 | **3.5–17× QPS over HNSW at equal recall**, via quantization unified with graph traversal, pure-CPU/edge-portable | **CLAIMED** — author-measured, *not reproduced on our hardware (no HNSW baseline existed)* |
+
+The bet: a quantized traversal score is cheap enough — and accurate enough to keep the beam on-path — that you pay far less per visited node and recover the small recall loss with a final exact rerank.
+
+---
+
+## 6. MEASURED results
+
+Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 queries, K=10, noise=0.35**, L2 metric, `M=16`, `ef_construction=200`. Graph seed `0x6261524741484E53`, rotation seed `0x5EEDC0DE12345678`. `--release`, warm wall-clock on the test machine. (The fixture and both indices are shared by the criterion bench.)
+
+| Method | recall@10 | QPS | latency (µs) |
+|--------|-----------|-----|--------------|
+| **linear scan (brute force)** | 1.0000 | 1,022 | 978 |
+| **float-HNSW** ef=16 | 0.9945 | **25,744** | 39 |
+| float-HNSW ef=32 | 0.9990 | 21,470 | 47 |
+| float-HNSW ef=64 | 1.0000 | 18,779 | 53 |
+| float-HNSW ef=128 | 1.0000 | 12,722 | 79 |
+| float-HNSW ef=256 | 1.0000 | 5,742 | 174 |
+| quant-HNSW ef=32 rr=20 | 0.1620 | 30,005 | 33 |
+| quant-HNSW ef=32 rr=100 | 0.2615 | 36,388 | 28 |
+| quant-HNSW ef=64 rr=100 | 0.4865 | 20,603 | 49 |
+| quant-HNSW ef=128 rr=100 | 0.6785 | 13,718 | 73 |
+| quant-HNSW ef=256 rr=100 | **0.7380** | 6,578 | 152 |
+
+### Equal-recall QPS ratios
+
+| Target recall | Fastest float-HNSW | Fastest quant-HNSW meeting it | quant/float | float/linear |
+|---------------|--------------------|-------------------------------|-------------|--------------|
+| ≥ 0.90 | ef=16 → 25,744 QPS | **none** (best quant recall = 0.738) | — | **25.19×** |
+| ≥ 0.95 | ef=16 → 25,744 QPS | **none** | — | **25.19×** |
+| ≥ 0.99 | ef=16 → 25,744 QPS | **none** | — | **25.19×** |
+
+---
+
+## 7. Honest verdict
+
+**The HNSW baseline is a decisive win over linear scan: ~25× QPS at recall ≥ 0.99** (ef=16: 0.9945 recall, 25,744 QPS vs linear 1,022 QPS). The correctness gate (recall@10 ≥ 0.95 vs brute force, both L2 and cosine) holds. This is the baseline ADR-156 §5 #1 said did not exist — it now does.
+
+**The SymphonyQG-style quantized variant does NOT beat float HNSW at our scale — direction REFUTED at N=10k.** The 1-bit Hamming traversal is too coarse: its best achievable recall is **0.738** (ef=256, rr=100), and it never reaches even the 0.90 equal-recall point where a fair QPS comparison could be made. Where the quantized score *is* faster (ef=32: ~30–36k QPS, beating float's 25.7k), its recall collapses to 0.16–0.26 — a meaningless win. There is **no equal-recall operating point** at which quantized is faster, so the SymphonyQG 3.5–17× claim is **not reproduced** by our 1-bit construction here.
+
+**Why** (so the negative is understood, not just stated):
+1. The 1-bit sign code is a **coarse angle proxy** — ADR-156 §10 already measured it at ~46% strict-K coverage. Driving graph *traversal* by that coarse score steers the beam onto the wrong nodes, and the exact-float rerank can only recover what the beam actually visited. At N=10k, near-neighbours have nearly-identical sign codes, so Hamming cannot separate them.
+2. At this scale **float distance is already cheap**: one 128-d L2 is a handful of µs; the per-node float compute the quantization saves is small relative to the recall it costs. SymphonyQG's win shows up at **much larger N** (millions), where (a) the float-distance fraction of query time dominates and (b) their *multi-bit RaBitQ-fused* code (not our 1-bit sign code) keeps recall high. **Expected crossover: large N + a higher-bit code.** ADR-156 §10 already measured that a ≤4-bit code reaches ~74% strict coverage vs 1-bit's ~46%, so a multi-bit traversal score is the obvious next lever — deferred, not claimed.
+
+**Caveat (stated plainly):** this is **our** HNSW + **our** 1-bit quantization, not SymphonyQG's system. We tested the *direction* of the claim ("does quantized traversal + rerank beat float HNSW at equal recall?") on our hardware/data and got a **measured no at N=10k**. That neither confirms nor refutes SymphonyQG's own published numbers on their system/scale — it refutes the direction *for our construction at our scale*, and identifies the two levers (scale, code bit-depth) a real reproduction would need.
+
+---
+
+## 8. Validation
+
+- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **151 passed / 0 failed** (was 131; +20 new tests: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`).
+- **`cargo test --workspace --no-default-features`** — GREEN (see §10 for the count).
+- **Correctness gate verified to bite:** the recall@10 gate **panicked** on the first (buggy) insert path (§4); after the fix it passes at 0.99+ recall (L2 and cosine).
+- **`cargo test -p wifi-densepose-ruvector --no-default-features --release ann_bench_report -- --nocapture`** — prints the §6 table; the numbers above are copied verbatim from that run.
+- **`cargo bench -p wifi-densepose-ruvector --bench ann_bench`** — compiles and runs the same fixture through criterion.
+- **`python archive/v1/data/proof/verify.py`** — **VERDICT: PASS** (the Rust ANN work is independent of the Python signal-proof pipeline; hash unchanged).
+
+---
+
+## 9. Consequences
+
+**Positive.** ruvector now has a real, deterministic, pure-Rust HNSW graph index (25× over linear scan at high recall) usable by the AETHER re-ID / sketch-prefilter path — the ANN substrate ADR-156 §5 #1 wanted. The SymphonyQG claim is no longer CLAIMED-only: we built the missing baseline and **measured** the direction, with the bug-caught-by-the-gate disclosed.
+
+**Negative / honest.** The 1-bit quantized variant is **not** an equal-recall QPS win at our scale; it is shipped as a measured experiment with a clearly-stated ceiling, not as a recommended default. Anyone reaching for it must read §7.
+
+**Deferred (not silently dropped).**
+- **Multi-bit / RaBitQ-estimator traversal score.** Replace 1-bit Hamming traversal with a ≤4-bit code or the `estimator.rs` unbiased rescale (ADR-156 §10/§11) — the lever most likely to lift quantized recall to the equal-recall regime.
+- **Large-N crossover measurement.** Re-run §6 at N=100k–1M (`ANN_BENCH_N`) to find where quantization's per-node saving starts to dominate.
+- **Wiring HNSW into the live re-ID path** (AETHER hot-cache / sketch prefilter) behind a flag.
+
+---
+
+## 10. What changed, file by file
+
+- `hnsw.rs` (new) — float HNSW: graph, seeded-deterministic level assignment, Algorithm-2 beam search, Algorithm-4 neighbour selection, L2/cosine, brute-force ground truth, full degenerate-case guards; 10 tests incl. the recall@10 correctness gate (L2 + cosine) and determinism. The insert-order bug fix (§4).
+- `hnsw_quantized.rs` (new) — SymphonyQG-style quantized-traversal index over the shared graph: 1-bit Pass-2 code per node, Hamming-scored greedy + beam, exact-float rerank; 7 tests incl. the rerank-recall gate and determinism.
+- `ann_measure.rs` (new) — shared deterministic fixture + recall/QPS measurement for linear / float-HNSW / quant-HNSW, the `ann_bench_report` test (the §6 source of truth), `ANN_BENCH_N` override.
+- `benches/ann_bench.rs` (new) + `Cargo.toml` `[[bench]]` — criterion bench over the same fixture/indices.
+- `lib.rs` — `pub mod hnsw / hnsw_quantized / ann_measure`; re-export `HnswIndex`, `HnswParams`, `Metric`, `QuantizedHnswIndex`.
+- `ADR-156-ruvector-fusion-beyond-sota.md` §5 #1 + §8 backlog — SymphonyQG regraded **CLAIMED → MEASURED-direction-tested (refuted at N=10k for our 1-bit construction)**, pointing here.
+- `CHANGELOG.md` — `[Unreleased]` entry.
@@ -47,3 +47,7 @@ harness = false
 [[bench]]
 name = "fusion_bench"
 harness = false
+
+[[bench]]
+name = "ann_bench"
+harness = false
@@ -0,0 +1,74 @@
+//! Criterion bench for the ADR-261 graph-ANN index: linear scan vs float HNSW
+//! vs quantized HNSW, on the shared `ann_measure` fixture.
+//!
+//! The authoritative recall/QPS numbers in ADR-261 come from the
+//! `--no-default-features --release` test report
+//! (`ann_bench_report` in `src/ann_measure.rs`), which is deterministic and
+//! gate-runnable. This criterion bench times the same operations through the
+//! criterion harness for stable per-op medians:
+//!
+//! ```text
+//! cargo bench -p wifi-densepose-ruvector --bench ann_bench
+//! ```
+//!
+//! Build is excluded from the timed region (done once in setup); only the query
+//! path is measured. The fixture and both indices are identical to the report's,
+//! so the bench and the report can never measure different graphs.
+
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+use wifi_densepose_ruvector::ann_measure::{build_indices, queries, AnnBenchParams};
+
+fn bench_ann(c: &mut Criterion) {
+    // Modest N so the bench builds quickly; the report covers the larger N.
+    let p = AnnBenchParams::default_fixture(10_000);
+    let (float_idx, quant_idx, _v) = build_indices(p);
+    let qs = queries(p);
+    let k = p.k;
+
+    let mut group = c.benchmark_group("ann_query");
+    group.sample_size(20);
+
+    // Linear scan (brute force) — the no-index baseline.
+    group.bench_function("linear_scan", |b| {
+        b.iter(|| {
+            let mut sink = 0u64;
+            for q in &qs {
+                sink = sink.wrapping_add(float_idx.brute_force(black_box(q), k).len() as u64);
+            }
+            black_box(sink)
+        })
+    });
+
+    // Float HNSW at a mid beam width.
+    for &ef in &[64usize, 128] {
+        group.bench_function(format!("float_hnsw_ef{ef}"), |b| {
+            b.iter(|| {
+                let mut sink = 0u64;
+                for q in &qs {
+                    sink = sink.wrapping_add(float_idx.search(black_box(q), k, ef).len() as u64);
+                }
+                black_box(sink)
+            })
+        });
+    }
+
+    // Quantized HNSW at matched beam widths + rerank.
+    for &ef in &[64usize, 128] {
+        let rr = k * 5;
+        group.bench_function(format!("quant_hnsw_ef{ef}_rr{rr}"), |b| {
+            b.iter(|| {
+                let mut sink = 0u64;
+                for q in &qs {
+                    sink = sink
+                        .wrapping_add(quant_idx.search_quantized(black_box(q), k, ef, rr).len() as u64);
+                }
+                black_box(sink)
+            })
+        });
+    }
+
+    group.finish();
+}
+
+criterion_group!(benches, bench_ann);
+criterion_main!(benches);
@@ -0,0 +1,400 @@
+//! Deterministic, `--no-default-features`-runnable **ANN benchmark measurement**
+//! for ADR-261 — the single source of truth for the QPS/recall numbers the ADR
+//! quotes for **linear scan**, **float HNSW**, and **quantized HNSW**.
+//!
+//! Both the criterion bench (`benches/ann_bench.rs`) and the in-crate report test
+//! ([`tests::ann_bench_report`]) call into here, so they can never silently
+//! measure different things. The numbers in ADR-261 §6 come from running:
+//!
+//! ```text
+//! cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --release \
+//!   ann_bench_report -- --nocapture
+//! ```
+//!
+//! # What is measured, and the honesty contract
+//!
+//! On one fixed planted-cluster fixture (documented dim/N/K/seed), for each
+//! method we measure:
+//! - **recall@10** vs the brute-force exact top-10 (the ground truth),
+//! - **QPS** = queries / total wall-clock query time (warm; build excluded),
+//! at matched recall operating points found by sweeping `ef` (HNSW) and
+//! `(ef, rerank)` (quantized).
+//!
+//! The reported **ratio** is the claim, not the absolute QPS (which is
+//! machine-specific). We do **not** tune the quantized path to manufacture a
+//! win: if at our scale quantized does not beat float HNSW, the report says so
+//! and the ADR records the honest negative + the expected larger-N crossover.
+
+use std::collections::HashSet;
+use std::time::Instant;
+
+use crate::hnsw::{HnswIndex, HnswParams, Metric};
+use crate::hnsw_quantized::QuantizedHnswIndex;
+
+/// SplitMix64 — the crate-wide deterministic PRNG (mirrors `coverage.rs`).
+#[inline]
+fn split_mix64(state: &mut u64) -> u64 {
+    *state = state.wrapping_add(0x9E37_79B9_7F4A_7C15);
+    let mut z = *state;
+    z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
+    z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
+    z ^ (z >> 31)
+}
+#[inline]
+fn unif01(state: &mut u64) -> f32 {
+    ((split_mix64(state) >> 40) as f32) / ((1u64 << 24) as f32)
+}
+#[inline]
+fn gauss(state: &mut u64) -> f32 {
+    let u1 = unif01(state).max(1e-7);
+    let u2 = unif01(state);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// ANN benchmark fixture parameters, documented in the ADR-261 report.
+#[derive(Debug, Clone, Copy)]
+pub struct AnnBenchParams {
+    /// Embedding dimension.
+    pub dim: usize,
+    /// Number of indexed vectors (N).
+    pub n: usize,
+    /// Number of planted clusters (near-neighbour structure).
+    pub clusters: usize,
+    /// Number of queries timed.
+    pub n_queries: usize,
+    /// Top-K.
+    pub k: usize,
+    /// Intra-cluster Gaussian jitter.
+    pub noise: f32,
+    /// Master fixture seed.
+    pub seed: u64,
+    /// Graph construction/level seed.
+    pub graph_seed: u64,
+    /// Rotation seed for the quantized 1-bit codes.
+    pub rot_seed: u64,
+}
+
+impl AnnBenchParams {
+    /// The default ADR-261 fixture: AETHER-shape 128-d, planted clusters.
+    pub fn default_fixture(n: usize) -> Self {
+        Self {
+            dim: 128,
+            n,
+            clusters: 64,
+            n_queries: 200,
+            k: 10,
+            noise: 0.35,
+            seed: 0xADADADAD_0000_0261,
+            graph_seed: 0x6261_5247_4148_4E53,
+            rot_seed: 0x5EED_C0DE_1234_5678,
+        }
+    }
+}
+
+/// The fixture vectors for `p` (deterministic planted clusters).
+pub fn fixture(p: AnnBenchParams) -> Vec<Vec<f32>> {
+    let centres: Vec<Vec<f32>> = (0..p.clusters)
+        .map(|c| {
+            let mut s = p.seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+            (0..p.dim).map(|_| gauss(&mut s) * 3.0).collect()
+        })
+        .collect();
+    (0..p.n)
+        .map(|i| {
+            let c = i % p.clusters;
+            let mut s = p.seed ^ (i as u64).wrapping_mul(0x9E37);
+            (0..p.dim)
+                .map(|d| centres[c][d] + gauss(&mut s) * p.noise)
+                .collect()
+        })
+        .collect()
+}
+
+/// The timed query set for `p` (drawn from the same clusters, disjoint seed).
+pub fn queries(p: AnnBenchParams) -> Vec<Vec<f32>> {
+    let centres: Vec<Vec<f32>> = (0..p.clusters)
+        .map(|c| {
+            let mut s = p.seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+            (0..p.dim).map(|_| gauss(&mut s) * 3.0).collect()
+        })
+        .collect();
+    (0..p.n_queries)
+        .map(|q| {
+            let c = q % p.clusters;
+            let mut s = p.seed ^ 0xDEAD_0000_0000 ^ (q as u64).wrapping_mul(0x2545_F491);
+            (0..p.dim)
+                .map(|d| centres[c][d] + gauss(&mut s) * p.noise)
+                .collect()
+        })
+        .collect()
+}
+
+/// Per-method measurement: recall@K and QPS.
+#[derive(Debug, Clone, Copy)]
+pub struct MethodResult {
+    /// Mean recall@K vs brute-force ground truth.
+    pub recall: f64,
+    /// Queries per second (warm wall-clock).
+    pub qps: f64,
+    /// Mean query latency in microseconds.
+    pub latency_us: f64,
+}
+
+/// Ground-truth brute-force top-K id sets for every query (computed once).
+/// Public so the criterion bench and the report test share one definition.
+pub fn ground_truth(idx: &HnswIndex, queries: &[Vec<f32>], k: usize) -> Vec<HashSet<u32>> {
+    queries
+        .iter()
+        .map(|q| idx.brute_force(q, k).into_iter().map(|(id, _)| id).collect())
+        .collect()
+}
+
+/// Measure **linear scan** (brute force): recall is 1.0 by definition; QPS is the
+/// timed exact scan. This is the no-index baseline.
+pub fn measure_linear(
+    idx: &HnswIndex,
+    queries: &[Vec<f32>],
+    truth: &[HashSet<u32>],
+    k: usize,
+) -> MethodResult {
+    let mut recall_acc = 0.0f64;
+    let start = Instant::now();
+    let mut sink = 0u64;
+    for (qi, q) in queries.iter().enumerate() {
+        let got = idx.brute_force(q, k);
+        let hit = got.iter().filter(|(id, _)| truth[qi].contains(id)).count();
+        recall_acc += hit as f64 / k as f64;
+        sink = sink.wrapping_add(got.len() as u64);
+    }
+    let elapsed = start.elapsed().as_secs_f64();
+    std::hint::black_box(sink);
+    MethodResult {
+        recall: recall_acc / queries.len() as f64,
+        qps: queries.len() as f64 / elapsed,
+        latency_us: elapsed / queries.len() as f64 * 1e6,
+    }
+}
+
+/// Measure **float HNSW** at a given beam width `ef`.
+pub fn measure_float_hnsw(
+    idx: &HnswIndex,
+    queries: &[Vec<f32>],
+    truth: &[HashSet<u32>],
+    k: usize,
+    ef: usize,
+) -> MethodResult {
+    let mut recall_acc = 0.0f64;
+    let start = Instant::now();
+    let mut sink = 0u64;
+    for (qi, q) in queries.iter().enumerate() {
+        let got = idx.search(q, k, ef);
+        let hit = got.iter().filter(|(id, _)| truth[qi].contains(id)).count();
+        recall_acc += hit as f64 / k as f64;
+        sink = sink.wrapping_add(got.len() as u64);
+    }
+    let elapsed = start.elapsed().as_secs_f64();
+    std::hint::black_box(sink);
+    MethodResult {
+        recall: recall_acc / queries.len() as f64,
+        qps: queries.len() as f64 / elapsed,
+        latency_us: elapsed / queries.len() as f64 * 1e6,
+    }
+}
+
+/// Measure **quantized HNSW** at a given `(ef, rerank)`.
+pub fn measure_quantized_hnsw(
+    qidx: &QuantizedHnswIndex,
+    queries: &[Vec<f32>],
+    truth: &[HashSet<u32>],
+    k: usize,
+    ef: usize,
+    rerank: usize,
+) -> MethodResult {
+    let mut recall_acc = 0.0f64;
+    let start = Instant::now();
+    let mut sink = 0u64;
+    for (qi, q) in queries.iter().enumerate() {
+        let got = qidx.search_quantized(q, k, ef, rerank);
+        let hit = got.iter().filter(|(id, _)| truth[qi].contains(id)).count();
+        recall_acc += hit as f64 / k as f64;
+        sink = sink.wrapping_add(got.len() as u64);
+    }
+    let elapsed = start.elapsed().as_secs_f64();
+    std::hint::black_box(sink);
+    MethodResult {
+        recall: recall_acc / queries.len() as f64,
+        qps: queries.len() as f64 / elapsed,
+        latency_us: elapsed / queries.len() as f64 * 1e6,
+    }
+}
+
+/// Build both indices for `p` (shared insertion order + graph seed so the float
+/// and quantized graphs are identical — the only variable is scoring).
+pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
+    let vectors = fixture(p);
+    let params = HnswParams {
+        m: 16,
+        ef_construction: 200,
+        ef_search: 64,
+        seed: p.graph_seed,
+    };
+    let mut float_idx = HnswIndex::new(p.dim, Metric::L2, params);
+    for v in &vectors {
+        float_idx.insert(v);
+    }
+    let quant_idx =
+        QuantizedHnswIndex::build(&vectors, p.dim, Metric::L2, params, p.rot_seed, p.k * 4);
+    (float_idx, quant_idx, vectors)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn fixture_and_queries_are_deterministic() {
+        let p = AnnBenchParams::default_fixture(500);
+        assert_eq!(fixture(p), fixture(p));
+        assert_eq!(queries(p), queries(p));
+        let p2 = AnnBenchParams {
+            seed: p.seed ^ 1,
+            ..p
+        };
+        assert_ne!(fixture(p)[0], fixture(p2)[0]);
+    }
+
+    #[test]
+    fn linear_recall_is_one() {
+        // Linear scan IS the ground truth, so recall must be exactly 1.0.
+        let p = AnnBenchParams::default_fixture(800);
+        let (float_idx, _q, _v) = build_indices(p);
+        let qs = queries(p);
+        let truth = ground_truth(&float_idx, &qs, p.k);
+        let r = measure_linear(&float_idx, &qs, &truth, p.k);
+        assert!((r.recall - 1.0).abs() < 1e-9, "linear recall {} != 1.0", r.recall);
+        assert!(r.qps > 0.0);
+    }
+
+    /// The ADR-261 measurement report. Prints the linear / float-HNSW /
+    /// quantized-HNSW recall@10 + QPS table and the QPS ratios at matched recall.
+    /// Run with `--release --nocapture` for the numbers the ADR quotes.
+    #[test]
+    fn ann_bench_report() {
+        // N here is the small/CI-friendly default so the standard (debug) test
+        // gate stays fast; the ADR's headline numbers are taken at the larger N
+        // under --release (documented in the ADR with the exact command). This
+        // test asserts only structural invariants so it is gate-safe at any N.
+        let n: usize = std::env::var("ANN_BENCH_N")
+            .ok()
+            .and_then(|s| s.parse().ok())
+            .unwrap_or(10_000);
+        let p = AnnBenchParams::default_fixture(n);
+        let (float_idx, quant_idx, _v) = build_indices(p);
+        let qs = queries(p);
+        let truth = ground_truth(&float_idx, &qs, p.k);
+
+        println!("\n=== ADR-261 ANN benchmark (planted-cluster synthetic) ===");
+        println!(
+            "dim={} N={} clusters={} queries={} K={} noise={} graph_seed=0x{:X} rot_seed=0x{:X}",
+            p.dim, p.n, p.clusters, p.n_queries, p.k, p.noise, p.graph_seed, p.rot_seed
+        );
+        println!("metric=L2  M=16 ef_construction=200  (debug build unless --release)");
+        println!(
+            "{:<28} {:>9} {:>12} {:>12}",
+            "method", "recall@10", "QPS", "lat(us)"
+        );
+
+        let lin = measure_linear(&float_idx, &qs, &truth, p.k);
+        println!(
+            "{:<28} {:>8.4} {:>12.1} {:>12.1}",
+            "linear scan (brute)", lin.recall, lin.qps, lin.latency_us
+        );
+
+        // Float HNSW across an ef sweep.
+        let mut float_ops: Vec<(usize, MethodResult)> = Vec::new();
+        for &ef in &[16usize, 32, 64, 128, 256] {
+            let r = measure_float_hnsw(&float_idx, &qs, &truth, p.k, ef);
+            println!(
+                "{:<28} {:>8.4} {:>12.1} {:>12.1}",
+                format!("float-HNSW ef={ef}"),
+                r.recall,
+                r.qps,
+                r.latency_us
+            );
+            float_ops.push((ef, r));
+        }
+
+        // Quantized HNSW across (ef, rerank) sweep.
+        let mut quant_ops: Vec<((usize, usize), MethodResult)> = Vec::new();
+        for &ef in &[32usize, 64, 128, 256] {
+            for &rr in &[p.k * 2, p.k * 5, p.k * 10] {
+                let r = measure_quantized_hnsw(&quant_idx, &qs, &truth, p.k, ef, rr);
+                println!(
+                    "{:<28} {:>8.4} {:>12.1} {:>12.1}",
+                    format!("quant-HNSW ef={ef} rr={rr}"),
+                    r.recall,
+                    r.qps,
+                    r.latency_us
+                );
+                quant_ops.push(((ef, rr), r));
+            }
+        }
+
+        // Equal-recall comparison: pick, for a target recall, the FASTEST op of
+        // each method that meets it, then report the QPS ratios.
+        println!("\n--- equal-recall QPS ratios ---");
+        for &target in &[0.90f64, 0.95, 0.99] {
+            let best_float = float_ops
+                .iter()
+                .filter(|(_, r)| r.recall >= target)
+                .max_by(|a, b| a.1.qps.partial_cmp(&b.1.qps).unwrap());
+            let best_quant = quant_ops
+                .iter()
+                .filter(|(_, r)| r.recall >= target)
+                .max_by(|a, b| a.1.qps.partial_cmp(&b.1.qps).unwrap());
+            match (best_float, best_quant) {
+                (Some((fef, fr)), Some(((qef, qrr), qr))) => {
+                    let ratio = qr.qps / fr.qps;
+                    let hnsw_vs_lin = fr.qps / lin.qps;
+                    println!(
+                        "recall>={:.2}: float ef={} {:.0} QPS | quant ef={} rr={} {:.0} QPS | quant/float={:.2}x | float/linear={:.2}x",
+                        target, fef, fr.qps, qef, qrr, qr.qps, ratio, hnsw_vs_lin
+                    );
+                }
+                (Some((fef, fr)), None) => {
+                    let hnsw_vs_lin = fr.qps / lin.qps;
+                    println!(
+                        "recall>={:.2}: float ef={} {:.0} QPS | quant: NO op met this recall | float/linear={:.2}x",
+                        target, fef, fr.qps, hnsw_vs_lin
+                    );
+                }
+                _ => {
+                    println!("recall>={:.2}: neither method met this recall at the swept ops", target);
+                }
+            }
+        }
+        println!("=========================================================\n");
+
+        // Structural assertions (gate-safe, any N):
+        // - linear scan is exact,
+        // - the best float-HNSW op clears the correctness gate,
+        // - quantized's best op is at least useful (recall well above random).
+        assert!((lin.recall - 1.0).abs() < 1e-9);
+        let best_float_recall = float_ops.iter().map(|(_, r)| r.recall).fold(0.0, f64::max);
+        assert!(
+            best_float_recall >= 0.95,
+            "best float-HNSW recall {best_float_recall:.4} below 0.95 gate"
+        );
+        let best_quant_recall = quant_ops.iter().map(|(_, r)| r.recall).fold(0.0, f64::max);
+        // Honest floor: the 1-bit Hamming traversal is a COARSE angle proxy, so
+        // at large N its best recall lands well below the float gate (MEASURED
+        // ~0.74 at N=10k — see ADR-261 §6). We assert only that it is clearly
+        // useful (>> random: random top-10 of N=10k is ~0.001), which catches a
+        // fully-broken traversal/rerank without pretending the quantized variant
+        // matches float HNSW. The honest negative IS the result.
+        assert!(
+            best_quant_recall >= 0.30,
+            "best quant-HNSW recall {best_quant_recall:.4} below the 0.30 not-broken floor"
+        );
+    }
+}
@@ -0,0 +1,826 @@
+//! A correct, dependency-free **float HNSW** graph-ANN index — ADR-261.
+//!
+//! # Why this exists
+//!
+//! The ruvector crate's retrieval path (AETHER re-ID hot-cache, the `sketch.rs`
+//! 1-bit prefilter, room fingerprinting) is, at its core, an **approximate
+//! nearest-neighbour** problem: dense float embedding in, top-K similar ids out.
+//! Until now the crate had **no graph index** — every `topk` was a linear scan
+//! (`O(N·d)` per query) or a 1-bit Hamming prefilter over a linear scan. That is
+//! fine at the small N the unit fixtures use, but it is `O(N)` per query and does
+//! not scale.
+//!
+//! [ADR-156 §5 #1](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)
+//! lists **SymphonyQG** (SIGMOD 2025) as the lead beyond-SOTA ANN candidate,
+//! claiming **3.5–17× QPS over HNSW at equal recall** — but graded that claim
+//! **CLAIMED**, *"not reproduced on our hardware (no HNSW baseline exists to
+//! compare against)."* You cannot measure a ratio against a baseline you do not
+//! have. This module **builds that missing HNSW baseline**; [`crate::hnsw_quantized`]
+//! builds the quantized-rerank variant that tests the *direction* of the
+//! SymphonyQG bet. ADR-261 reports the **measured** ratio.
+//!
+//! # The algorithm (Malkov & Yashunin, TPAMI 2018)
+//!
+//! HNSW = a multi-layer navigable small-world graph. Each inserted point gets a
+//! random **level** `ℓ` (geometrically distributed, mean `1/ln(M)`); it appears
+//! in all layers `0..=ℓ`. Layer 0 holds every point; higher layers are
+//! exponentially sparser "express lanes". A search:
+//!
+//! 1. Enters at the top layer's single entry point.
+//! 2. **Greedy-descends** each layer above 0: repeatedly hop to the neighbour
+//!    closest to the query until no neighbour is closer, then drop a layer.
+//! 3. At layer 0, runs a **best-first beam search** with beam width `ef`,
+//!    keeping the `ef` closest candidates seen, and returns the closest `k`.
+//!
+//! Construction inserts each point by searching for its `ef_construction`
+//! nearest existing neighbours at each of its layers, then connecting it to a
+//! pruned subset chosen by the **neighbour-selection heuristic** (Algorithm 4 in
+//! the paper): prefer neighbours that are closer to the new point than to any
+//! already-selected neighbour, which keeps the graph navigable (diverse edges)
+//! instead of clumping all edges toward one cluster.
+//!
+//! # Determinism (the proof contract)
+//!
+//! Level assignment is the only randomness, and it is driven by a **seeded
+//! SplitMix64** PRNG (the exact pattern from [`crate::rotation`]) — never
+//! `Date::now`, an OS RNG, or `rand` without a seed. Two indices built from the
+//! same `(seed, params, insertion order)` are bit-identical, pinned by
+//! [`tests::hnsw_is_deterministic_for_seed`]. This matters for reproducible
+//! benchmarks: the recall/QPS numbers in ADR-261 must be regenerable.
+//!
+//! # Robustness (no panic on degenerate input)
+//!
+//! Empty index, `k > n`, `k == 0`, a single node, zero-dimension vectors,
+//! ragged-length queries, and `ef < k` are all handled without panicking —
+//! pinned by the `*_no_panic` / degenerate tests. Graph traversal is bounded by
+//! the visited-set and the candidate beam, so there is no unbounded recursion
+//! (the search is iterative, using explicit heaps).
+
+use std::cmp::Ordering;
+use std::collections::{BinaryHeap, HashSet};
+
+/// Distance metric for the index. Both are computed over `Vec<f32>` with an
+/// `f64` accumulator for numerical stability on long vectors.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum Metric {
+    /// Squared euclidean distance `Σ (a_i − b_i)²`. Monotone in euclidean
+    /// distance, so top-K ranking is identical; we skip the sqrt.
+    L2,
+    /// Cosine **distance** `1 − cos(a, b)`. Smaller = more similar. This is
+    /// AETHER's actual angular metric and what the `sketch.rs` sign code
+    /// approximates, so it is the default for ruvector re-ID.
+    Cosine,
+}
+
+impl Metric {
+    /// Distance between two equal-length slices under this metric.
+    ///
+    /// Ragged lengths are handled charitably (compared over the shorter prefix);
+    /// a degenerate (zero-norm) cosine input yields the maximum cosine distance
+    /// `1.0` rather than a NaN. Never panics.
+    #[inline]
+    pub fn distance(self, a: &[f32], b: &[f32]) -> f32 {
+        let n = a.len().min(b.len());
+        match self {
+            Metric::L2 => {
+                let mut acc = 0.0f64;
+                for i in 0..n {
+                    let d = a[i] as f64 - b[i] as f64;
+                    acc += d * d;
+                }
+                acc as f32
+            }
+            Metric::Cosine => {
+                let mut dot = 0.0f64;
+                let mut na = 0.0f64;
+                let mut nb = 0.0f64;
+                for i in 0..n {
+                    let (x, y) = (a[i] as f64, b[i] as f64);
+                    dot += x * y;
+                    na += x * x;
+                    nb += y * y;
+                }
+                let denom = (na * nb).sqrt();
+                if denom < 1e-12 {
+                    1.0
+                } else {
+                    (1.0 - dot / denom) as f32
+                }
+            }
+        }
+    }
+}
+
+/// Construction / search hyper-parameters for an [`HnswIndex`].
+///
+/// Defaults follow the paper's recommended starting points (`M = 16`,
+/// `ef_construction = 200`). `ef_search` is the query-time beam width; larger
+/// `ef_search` trades QPS for recall — the knob the ADR-261 benchmark sweeps to
+/// find the equal-recall operating point.
+#[derive(Debug, Clone, Copy)]
+pub struct HnswParams {
+    /// Max neighbours per node on layers ≥ 1. Layer 0 uses `2·M` (`m_max0`),
+    /// the paper's standard asymmetry (the base layer needs higher degree).
+    pub m: usize,
+    /// Candidate list size during construction (`efConstruction`). Larger =
+    /// better-connected graph, slower build.
+    pub ef_construction: usize,
+    /// Default beam width at query time (`ef`). Overridable per-query in
+    /// [`HnswIndex::search`].
+    pub ef_search: usize,
+    /// Seed for the level-assignment PRNG. Fixed ⇒ reproducible graph.
+    pub seed: u64,
+}
+
+impl Default for HnswParams {
+    fn default() -> Self {
+        Self {
+            m: 16,
+            ef_construction: 200,
+            ef_search: 64,
+            seed: 0x1157_0000_0000_0001u64,
+        }
+    }
+}
+
+/// A min-distance ordering wrapper: a `BinaryHeap<Candidate>` is a **max-heap**,
+/// so we negate the comparison to make `peek()` the *closest* candidate when we
+/// want a min-heap, or use it directly for a max-heap of the *farthest*. We keep
+/// two explicit newtypes to make the intent unmistakable at each call site.
+#[derive(Debug, Clone, Copy)]
+struct Scored {
+    dist: f32,
+    id: u32,
+}
+
+impl PartialEq for Scored {
+    fn eq(&self, other: &Self) -> bool {
+        self.dist == other.dist && self.id == other.id
+    }
+}
+impl Eq for Scored {}
+
+/// Max-heap ordering: larger `dist` is "greater" ⇒ at the top. Ties broken by
+/// id so the order is total and deterministic.
+impl Ord for Scored {
+    fn cmp(&self, other: &Self) -> Ordering {
+        self.dist
+            .partial_cmp(&other.dist)
+            .unwrap_or(Ordering::Equal)
+            .then(self.id.cmp(&other.id))
+    }
+}
+impl PartialOrd for Scored {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+/// `Reverse`-equivalent for a min-heap (closest at top) without pulling in
+/// `std::cmp::Reverse` boilerplate at every site.
+#[derive(Debug, Clone, Copy)]
+struct MinScored(Scored);
+impl PartialEq for MinScored {
+    fn eq(&self, other: &Self) -> bool {
+        self.0 == other.0
+    }
+}
+impl Eq for MinScored {}
+impl Ord for MinScored {
+    fn cmp(&self, other: &Self) -> Ordering {
+        other.0.cmp(&self.0) // reversed
+    }
+}
+impl PartialOrd for MinScored {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+/// A multi-layer HNSW graph index over dense `Vec<f32>` embeddings.
+///
+/// IDs are the **insertion index** (`0..len`), returned by [`HnswIndex::search`]
+/// alongside the distance. The original vectors are retained (the graph needs
+/// them for distance computation at query time), so memory is
+/// `O(N·d) + O(N·M)` — the float vectors plus the adjacency lists.
+#[derive(Debug, Clone)]
+pub struct HnswIndex {
+    metric: Metric,
+    params: HnswParams,
+    dim: usize,
+    /// Stored vectors, indexed by id.
+    vectors: Vec<Vec<f32>>,
+    /// `links[id][layer]` = neighbour ids of `id` on `layer`. A node of level
+    /// `ℓ` has `ℓ+1` layers (`0..=ℓ`).
+    links: Vec<Vec<Vec<u32>>>,
+    /// Per-node top level.
+    levels: Vec<usize>,
+    /// Current entry point id (the highest-level node), or `None` if empty.
+    entry: Option<u32>,
+    /// Highest level currently present in the graph.
+    top_level: usize,
+    /// PRNG state for level assignment (advances per insert).
+    rng_state: u64,
+}
+
+impl HnswIndex {
+    /// Create an empty index with the given metric and parameters.
+    ///
+    /// `dim` is the expected embedding dimension. Inserts of a different length
+    /// are accepted charitably (the metric compares over the shorter prefix), so
+    /// a wrong-length vector degrades recall rather than panicking — but callers
+    /// should keep dimension uniform.
+    pub fn new(dim: usize, metric: Metric, params: HnswParams) -> Self {
+        Self {
+            metric,
+            params,
+            dim,
+            vectors: Vec::new(),
+            links: Vec::new(),
+            levels: Vec::new(),
+            entry: None,
+            top_level: 0,
+            rng_state: params.seed.wrapping_add(0x9E37_79B9_7F4A_7C15),
+        }
+    }
+
+    /// Number of indexed points.
+    #[inline]
+    pub fn len(&self) -> usize {
+        self.vectors.len()
+    }
+
+    /// True iff the index holds no points.
+    #[inline]
+    pub fn is_empty(&self) -> bool {
+        self.vectors.is_empty()
+    }
+
+    /// The metric this index ranks by.
+    #[inline]
+    pub fn metric(&self) -> Metric {
+        self.metric
+    }
+
+    /// The expected embedding dimension.
+    #[inline]
+    pub fn dim(&self) -> usize {
+        self.dim
+    }
+
+    /// The current entry-point id (highest-level node), or `None` if empty.
+    /// Exposed so the quantized variant ([`crate::hnsw_quantized`]) can traverse
+    /// the **same** graph with a different (quantized) score.
+    #[inline]
+    pub fn entry_point(&self) -> Option<u32> {
+        self.entry
+    }
+
+    /// The highest level currently present in the graph.
+    #[inline]
+    pub fn top_level(&self) -> usize {
+        self.top_level
+    }
+
+    /// The default query-time beam width (`ef_search`) from this index's params.
+    #[inline]
+    pub fn params_ef_search(&self) -> usize {
+        self.params.ef_search
+    }
+
+    /// Borrow the neighbour ids of `id` on `layer`. Returns an empty slice if the
+    /// id is unknown or the node does not reach that layer — never panics. Used
+    /// by the quantized variant to walk the shared graph.
+    #[inline]
+    pub fn neighbours(&self, id: u32, layer: usize) -> &[u32] {
+        match self.links.get(id as usize).and_then(|l| l.get(layer)) {
+            Some(v) => v.as_slice(),
+            None => &[],
+        }
+    }
+
+    /// `m_max` for a layer: `2·M` on layer 0, `M` above. The base layer carries
+    /// every node and needs higher degree to stay connected (the paper's
+    /// asymmetric degree cap).
+    #[inline]
+    fn m_max(&self, layer: usize) -> usize {
+        if layer == 0 {
+            self.params.m * 2
+        } else {
+            self.params.m
+        }
+    }
+
+    /// Draw the next node's level from a geometric distribution with parameter
+    /// `m_l = 1/ln(M)` — the paper's level generator — using the **seeded**
+    /// SplitMix64 stream. `floor(−ln(U) · m_l)` with `U ∈ (0, 1]`.
+    fn assign_level(&mut self) -> usize {
+        let m = self.params.m.max(2) as f64;
+        let m_l = 1.0 / m.ln();
+        // Uniform in (0, 1] from the top 53 bits of a SplitMix64 word.
+        let r = split_mix64(&mut self.rng_state);
+        let u = (((r >> 11) as f64) + 1.0) / ((1u64 << 53) as f64 + 1.0);
+        let level = (-(u.ln()) * m_l).floor();
+        if level.is_finite() && level >= 0.0 {
+            level as usize
+        } else {
+            0
+        }
+    }
+
+    /// Insert `embedding` with the next sequential id. Returns the assigned id.
+    ///
+    /// Builds the node's adjacency by searching the existing graph for its
+    /// nearest neighbours at each of its layers and connecting via the
+    /// neighbour-selection heuristic. The first insert becomes the entry point.
+    pub fn insert(&mut self, embedding: &[f32]) -> u32 {
+        let id = self.vectors.len() as u32;
+        let vec = embedding.to_vec();
+        let node_level = self.assign_level();
+
+        // Push the node into the arrays UP FRONT with empty per-layer link lists.
+        // This is load-bearing: the bidirectional wiring below does
+        // `self.links[nbr][l].push(id)`, after which a neighbour points at `id`;
+        // a subsequent traversal step in the SAME insert can hop to that
+        // neighbour and read `self.links[id]`. If `id`'s links did not exist yet
+        // that read panics (the bug the recall gate caught). The new node has no
+        // *incoming* edges until we add them, and empty outgoing lists, so it is
+        // unreachable by the searches that run before its edges are wired —
+        // pushing it early is safe and keeps every `self.links[*]` index valid.
+        self.vectors.push(vec.clone());
+        self.links.push(vec![Vec::new(); node_level + 1]);
+        self.levels.push(node_level);
+
+        // First node: it is the entry point, no neighbours to connect.
+        if self.entry.is_none() {
+            self.entry = Some(id);
+            self.top_level = node_level;
+            return id;
+        }
+
+        let entry = self.entry.unwrap();
+        let mut ep = entry;
+
+        // Phase 1: greedy-descend from the top of the graph down to the layer
+        // just above the node's own top level, refining the single entry point.
+        let mut layer = self.top_level;
+        while layer > node_level {
+            ep = self.greedy_closest(&vec, ep, layer);
+            if layer == 0 {
+                break;
+            }
+            layer -= 1;
+        }
+
+        // Phase 2: from min(node_level, top_level) down to 0, search for
+        // ef_construction candidates, select neighbours, and wire bidirectional
+        // edges (pruning the neighbour's list if it overflows m_max).
+        let start = node_level.min(self.top_level);
+        let mut layer = start as isize;
+        while layer >= 0 {
+            let l = layer as usize;
+            let candidates =
+                self.search_layer(&vec, &[ep], self.params.ef_construction.max(1), l);
+            let selected = self.select_neighbours(&vec, &candidates, self.m_max(l));
+
+            // Connect node -> selected (write straight into the node's slot).
+            self.links[id as usize][l] = selected.iter().map(|s| s.id).collect();
+
+            // Connect selected -> node (bidirectional), pruning if needed.
+            for s in &selected {
+                let nbr = s.id as usize;
+                self.links[nbr][l].push(id);
+                if self.links[nbr][l].len() > self.m_max(l) {
+                    self.prune_neighbours(nbr as u32, l);
+                }
+            }
+
+            // Move the entry for the next-lower layer to the closest candidate.
+            if let Some(best) = candidates
+                .iter()
+                .min_by(|a, b| a.dist.partial_cmp(&b.dist).unwrap_or(Ordering::Equal))
+            {
+                ep = best.id;
+            }
+            layer -= 1;
+        }
+
+        if node_level > self.top_level {
+            self.top_level = node_level;
+            self.entry = Some(id);
+        }
+        id
+    }
+
+    /// Greedy single-best descent on one layer: hop to the neighbour closest to
+    /// `query` until no neighbour improves. Iterative (bounded by the graph) —
+    /// no recursion.
+    fn greedy_closest(&self, query: &[f32], start: u32, layer: usize) -> u32 {
+        let mut best = start;
+        let mut best_d = self.metric.distance(query, &self.vectors[best as usize]);
+        loop {
+            let mut improved = false;
+            for &nbr in &self.links[best as usize][layer] {
+                let d = self.metric.distance(query, &self.vectors[nbr as usize]);
+                if d < best_d {
+                    best_d = d;
+                    best = nbr;
+                    improved = true;
+                }
+            }
+            if !improved {
+                return best;
+            }
+        }
+    }
+
+    /// Beam search on one layer (paper Algorithm 2): best-first expansion from
+    /// `entry_points`, keeping the `ef` closest results. Returns the result set
+    /// (unsorted; callers sort/truncate). Bounded by a visited set + the `ef`
+    /// result heap — no recursion, no unbounded growth.
+    fn search_layer(
+        &self,
+        query: &[f32],
+        entry_points: &[u32],
+        ef: usize,
+        layer: usize,
+    ) -> Vec<Scored> {
+        let mut visited: HashSet<u32> = HashSet::new();
+        // `candidates`: min-heap (closest first) of nodes to expand.
+        let mut candidates: BinaryHeap<MinScored> = BinaryHeap::new();
+        // `results`: max-heap (farthest first) of the best-ef found so far, so
+        // the top is the current worst and is cheap to evict.
+        let mut results: BinaryHeap<Scored> = BinaryHeap::new();
+
+        for &ep in entry_points {
+            if ep as usize >= self.vectors.len() {
+                continue;
+            }
+            let d = self.metric.distance(query, &self.vectors[ep as usize]);
+            let s = Scored { dist: d, id: ep };
+            visited.insert(ep);
+            candidates.push(MinScored(s));
+            results.push(s);
+        }
+        // Cap results at ef from the start.
+        while results.len() > ef {
+            results.pop();
+        }
+
+        while let Some(MinScored(cur)) = candidates.pop() {
+            // Stop when the closest unexpanded candidate is farther than the
+            // current worst result and the result set is already full.
+            let worst = results.peek().map(|s| s.dist).unwrap_or(f32::INFINITY);
+            if cur.dist > worst && results.len() >= ef {
+                break;
+            }
+            for &nbr in &self.links[cur.id as usize][layer] {
+                if !visited.insert(nbr) {
+                    continue;
+                }
+                let d = self.metric.distance(query, &self.vectors[nbr as usize]);
+                let worst = results.peek().map(|s| s.dist).unwrap_or(f32::INFINITY);
+                if results.len() < ef || d < worst {
+                    let s = Scored { dist: d, id: nbr };
+                    candidates.push(MinScored(s));
+                    results.push(s);
+                    while results.len() > ef {
+                        results.pop();
+                    }
+                }
+            }
+        }
+        results.into_vec()
+    }
+
+    /// Neighbour-selection heuristic (paper Algorithm 4): from `candidates`,
+    /// greedily pick up to `m` that are **closer to the new point than to any
+    /// already-picked neighbour**, giving diverse, navigable edges instead of a
+    /// clump. Candidates are considered nearest-first.
+    fn select_neighbours(&self, _base: &[f32], candidates: &[Scored], m: usize) -> Vec<Scored> {
+        let mut sorted = candidates.to_vec();
+        sorted.sort_by(|a, b| a.dist.partial_cmp(&b.dist).unwrap_or(Ordering::Equal));
+        let mut selected: Vec<Scored> = Vec::with_capacity(m);
+        for cand in sorted {
+            if selected.len() >= m {
+                break;
+            }
+            // Keep `cand` only if it is closer to `base` than to every already
+            // selected neighbour — the diversity condition.
+            let cand_vec = &self.vectors[cand.id as usize];
+            let mut keep = true;
+            for sel in &selected {
+                let d_cand_sel = self.metric.distance(cand_vec, &self.vectors[sel.id as usize]);
+                if d_cand_sel < cand.dist {
+                    keep = false;
+                    break;
+                }
+            }
+            if keep {
+                selected.push(cand);
+            }
+        }
+        // If the diversity filter left us short (sparse graph), backfill with the
+        // remaining nearest candidates so the node is not under-connected.
+        if selected.len() < m {
+            let chosen: HashSet<u32> = selected.iter().map(|s| s.id).collect();
+            let mut rest: Vec<Scored> = candidates
+                .iter()
+                .filter(|c| !chosen.contains(&c.id))
+                .copied()
+                .collect();
+            rest.sort_by(|a, b| a.dist.partial_cmp(&b.dist).unwrap_or(Ordering::Equal));
+            for c in rest {
+                if selected.len() >= m {
+                    break;
+                }
+                selected.push(c);
+            }
+        }
+        selected
+    }
+
+    /// Re-prune a node's neighbour list on `layer` back down to `m_max` using
+    /// the selection heuristic, after a bidirectional edge pushed it over cap.
+    fn prune_neighbours(&mut self, id: u32, layer: usize) {
+        let base = self.vectors[id as usize].clone();
+        let current: Vec<Scored> = self.links[id as usize][layer]
+            .iter()
+            .map(|&nbr| Scored {
+                dist: self.metric.distance(&base, &self.vectors[nbr as usize]),
+                id: nbr,
+            })
+            .collect();
+        let kept = self.select_neighbours(&base, &current, self.m_max(layer));
+        self.links[id as usize][layer] = kept.iter().map(|s| s.id).collect();
+    }
+
+    /// Search for the `k` nearest neighbours of `query`, using beam width `ef`
+    /// (clamped to at least `k`). Returns up to `k` `(id, distance)` pairs sorted
+    /// ascending by distance.
+    ///
+    /// Degenerate cases return cleanly: empty index ⇒ empty vec; `k == 0` ⇒ empty
+    /// vec; `k > len` ⇒ all points; a single node ⇒ that node. Never panics.
+    pub fn search(&self, query: &[f32], k: usize, ef: usize) -> Vec<(u32, f32)> {
+        if k == 0 || self.is_empty() {
+            return Vec::new();
+        }
+        let entry = match self.entry {
+            Some(e) => e,
+            None => return Vec::new(),
+        };
+        let ef = ef.max(k).max(1);
+
+        // Greedy-descend the upper layers to a good layer-0 entry point.
+        let mut ep = entry;
+        let mut layer = self.top_level;
+        while layer > 0 {
+            ep = self.greedy_closest(query, ep, layer);
+            layer -= 1;
+        }
+        // Beam search on layer 0.
+        let mut results = self.search_layer(query, &[ep], ef, 0);
+        results.sort_by(|a, b| a.dist.partial_cmp(&b.dist).unwrap_or(Ordering::Equal));
+        results.truncate(k);
+        results.into_iter().map(|s| (s.id, s.dist)).collect()
+    }
+
+    /// Search using the index's configured default `ef_search`.
+    #[inline]
+    pub fn search_default(&self, query: &[f32], k: usize) -> Vec<(u32, f32)> {
+        self.search(query, k, self.params.ef_search)
+    }
+
+    /// Borrow a stored vector by id (for the quantized variant / reranking).
+    #[inline]
+    pub fn vector(&self, id: u32) -> Option<&[f32]> {
+        self.vectors.get(id as usize).map(|v| v.as_slice())
+    }
+
+    /// Brute-force exact top-K linear scan over the stored vectors — the ANN
+    /// **ground truth** and the linear-scan baseline the benchmark measures
+    /// against. `O(N·d)` per query. Returns up to `k` `(id, distance)` ascending.
+    pub fn brute_force(&self, query: &[f32], k: usize) -> Vec<(u32, f32)> {
+        if k == 0 || self.is_empty() {
+            return Vec::new();
+        }
+        let mut scored: Vec<(u32, f32)> = self
+            .vectors
+            .iter()
+            .enumerate()
+            .map(|(i, v)| (i as u32, self.metric.distance(query, v)))
+            .collect();
+        scored.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(Ordering::Equal));
+        scored.truncate(k);
+        scored
+    }
+}
+
+/// SplitMix64 step — the same deterministic PRNG used by [`crate::rotation`].
+/// Public-domain (Sebastiano Vigna). Dependency-free and reproducible.
+#[inline]
+pub(crate) fn split_mix64(state: &mut u64) -> u64 {
+    *state = state.wrapping_add(0x9E37_79B9_7F4A_7C15);
+    let mut z = *state;
+    z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
+    z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
+    z ^ (z >> 31)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// SplitMix64-driven uniform in [0,1) for building fixtures (mirrors
+    /// `coverage.rs`'s style so the planted-cluster geometry matches).
+    fn unif01(state: &mut u64) -> f32 {
+        let r = split_mix64(state);
+        ((r >> 40) as f32) / ((1u64 << 24) as f32)
+    }
+    fn gauss(state: &mut u64) -> f32 {
+        let u1 = unif01(state).max(1e-7);
+        let u2 = unif01(state);
+        (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+    }
+
+    /// Build a planted-cluster fixture: `n` vectors of `dim`, in `clusters`
+    /// Gaussian clusters. Returns the vectors. Deterministic from `seed`.
+    fn planted(dim: usize, n: usize, clusters: usize, seed: u64) -> Vec<Vec<f32>> {
+        let centres: Vec<Vec<f32>> = (0..clusters)
+            .map(|c| {
+                let mut s = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+                (0..dim).map(|_| gauss(&mut s) * 3.0).collect()
+            })
+            .collect();
+        (0..n)
+            .map(|i| {
+                let c = i % clusters;
+                let mut s = seed ^ (i as u64).wrapping_mul(0x9E37);
+                (0..dim).map(|d| centres[c][d] + gauss(&mut s) * 0.35).collect()
+            })
+            .collect()
+    }
+
+    fn build(vectors: &[Vec<f32>], metric: Metric, seed: u64) -> HnswIndex {
+        let params = HnswParams {
+            m: 16,
+            ef_construction: 200,
+            ef_search: 64,
+            seed,
+        };
+        let mut idx = HnswIndex::new(vectors[0].len(), metric, params);
+        for v in vectors {
+            idx.insert(v);
+        }
+        idx
+    }
+
+    /// Recall@k of HNSW search vs brute-force ground truth, averaged over queries
+    /// drawn from the same planted clusters.
+    fn recall_at_k(
+        idx: &HnswIndex,
+        vectors: &[Vec<f32>],
+        dim: usize,
+        clusters: usize,
+        k: usize,
+        ef: usize,
+        n_queries: usize,
+        seed: u64,
+    ) -> f64 {
+        let centres_seed = seed; // reuse fixture seed for matching cluster geometry
+        let mut total = 0.0f64;
+        for q in 0..n_queries {
+            let c = q % clusters;
+            let mut s = centres_seed ^ 0xDEAD_0000 ^ (q as u64).wrapping_mul(0x2545_F491);
+            // A query near cluster centre c: regenerate the centre then jitter.
+            let mut cs = centres_seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+            let centre: Vec<f32> = (0..dim).map(|_| gauss(&mut cs) * 3.0).collect();
+            let qv: Vec<f32> = (0..dim).map(|d| centre[d] + gauss(&mut s) * 0.35).collect();
+
+            let truth: HashSet<u32> = idx.brute_force(&qv, k).into_iter().map(|(id, _)| id).collect();
+            let got = idx.search(&qv, k, ef);
+            let hit = got.iter().filter(|(id, _)| truth.contains(id)).count();
+            total += hit as f64 / k as f64;
+            let _ = vectors;
+        }
+        total / n_queries as f64
+    }
+
+    #[test]
+    fn empty_index_search_is_empty_no_panic() {
+        let idx = HnswIndex::new(8, Metric::L2, HnswParams::default());
+        assert!(idx.is_empty());
+        assert!(idx.search(&[0.0; 8], 5, 16).is_empty());
+        assert!(idx.brute_force(&[0.0; 8], 5).is_empty());
+    }
+
+    #[test]
+    fn single_node_returns_itself() {
+        let mut idx = HnswIndex::new(4, Metric::L2, HnswParams::default());
+        let id = idx.insert(&[1.0, 2.0, 3.0, 4.0]);
+        assert_eq!(id, 0);
+        let r = idx.search(&[1.0, 2.0, 3.0, 4.0], 5, 16);
+        assert_eq!(r.len(), 1);
+        assert_eq!(r[0].0, 0);
+        assert!(r[0].1 < 1e-6);
+    }
+
+    #[test]
+    fn k_zero_and_k_gt_n_no_panic() {
+        let vectors = planted(16, 40, 4, 0xABCD);
+        let idx = build(&vectors, Metric::L2, 0x1234);
+        assert!(idx.search(&vectors[0], 0, 16).is_empty());
+        // k > n returns all n.
+        let r = idx.search(&vectors[0], 1000, 64);
+        assert_eq!(r.len(), 40);
+    }
+
+    #[test]
+    fn ragged_query_no_panic() {
+        let vectors = planted(16, 30, 3, 0x55);
+        let idx = build(&vectors, Metric::Cosine, 0x66);
+        // Short and long queries must not panic.
+        assert!(!idx.search(&[1.0, 2.0, 3.0], 3, 16).is_empty());
+        let long: Vec<f32> = (0..100).map(|i| i as f32).collect();
+        assert!(!idx.search(&long, 3, 16).is_empty());
+    }
+
+    #[test]
+    fn self_query_ranks_self_first() {
+        let vectors = planted(32, 200, 8, 0x77);
+        let idx = build(&vectors, Metric::L2, 0x88);
+        for &probe in &[0usize, 50, 137, 199] {
+            let r = idx.search(&vectors[probe], 1, 64);
+            assert_eq!(r.len(), 1);
+            assert_eq!(r[0].0, probe as u32, "self-query should return the stored self");
+        }
+    }
+
+    #[test]
+    fn hnsw_is_deterministic_for_seed() {
+        // Same (seed, params, insertion order) ⇒ identical level assignment and
+        // identical search output.
+        let vectors = planted(24, 150, 6, 0x2222);
+        let a = build(&vectors, Metric::Cosine, 0xFEED);
+        let b = build(&vectors, Metric::Cosine, 0xFEED);
+        assert_eq!(a.levels, b.levels, "level assignment must be deterministic");
+        let q = &vectors[42];
+        assert_eq!(a.search(q, 10, 64), b.search(q, 10, 64));
+        // A different seed (almost surely) changes the level structure.
+        let c = build(&vectors, Metric::Cosine, 0x1357);
+        assert_ne!(a.levels, c.levels, "different seed should change levels");
+    }
+
+    #[test]
+    fn recall_at_10_meets_correctness_gate_l2() {
+        // THE CORRECTNESS GATE (ADR-261): HNSW recall@10 vs brute-force must be
+        // >= 0.95 at a reasonable ef. Low recall ⇒ a bug in the graph.
+        let dim = 64;
+        let n = 2000;
+        let clusters = 32;
+        let seed = 0x9999;
+        let vectors = planted(dim, n, clusters, seed);
+        let idx = build(&vectors, Metric::L2, 0xAAAA);
+        let recall = recall_at_k(&idx, &vectors, dim, clusters, 10, 128, 64, seed);
+        assert!(
+            recall >= 0.95,
+            "HNSW recall@10 (L2) = {recall:.4} below the 0.95 correctness gate — graph bug"
+        );
+    }
+
+    #[test]
+    fn recall_at_10_meets_correctness_gate_cosine() {
+        let dim = 64;
+        let n = 2000;
+        let clusters = 32;
+        let seed = 0xBBBB;
+        let vectors = planted(dim, n, clusters, seed);
+        let idx = build(&vectors, Metric::Cosine, 0xCCCC);
+        let recall = recall_at_k(&idx, &vectors, dim, clusters, 10, 128, 64, seed);
+        assert!(
+            recall >= 0.95,
+            "HNSW recall@10 (cosine) = {recall:.4} below the 0.95 correctness gate — graph bug"
+        );
+    }
+
+    #[test]
+    fn higher_ef_does_not_reduce_recall() {
+        // Monotonicity sanity: more beam width should not hurt recall.
+        let dim = 48;
+        let vectors = planted(dim, 1000, 16, 0xD00D);
+        let idx = build(&vectors, Metric::L2, 0xE00E);
+        let lo = recall_at_k(&idx, &vectors, dim, 16, 10, 16, 48, 0xD00D);
+        let hi = recall_at_k(&idx, &vectors, dim, 16, 10, 128, 48, 0xD00D);
+        assert!(hi + 1e-9 >= lo, "recall dropped with larger ef: {lo:.3} -> {hi:.3}");
+    }
+
+    #[test]
+    fn zero_dim_no_panic() {
+        // Degenerate zero-dimension index: inserts and searches must not panic.
+        let mut idx = HnswIndex::new(0, Metric::Cosine, HnswParams::default());
+        idx.insert(&[]);
+        idx.insert(&[]);
+        let r = idx.search(&[], 2, 16);
+        assert_eq!(r.len(), 2);
+    }
+}
@@ -0,0 +1,466 @@
+//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261.
+//!
+//! # The SymphonyQG bet (what we are testing)
+//!
+//! [SymphonyQG (SIGMOD 2025)](../../../../../docs/adr/ADR-261-ruvector-graph-ann-index.md)
+//! unifies **quantization with graph traversal**: instead of computing the full
+//! float distance at every node the beam search visits (the cost that dominates
+//! float HNSW — one `O(d)` float dot/diff per visited node), it scores traversal
+//! candidates with a **cheap quantized distance** and only computes the exact
+//! float distance for the *final* candidate set, which it **reranks**. The bet:
+//! the quantized score is cheap enough — and accurate enough to keep the beam on
+//! the right path — that you visit roughly as many nodes but pay far less per
+//! node, and recover the small recall loss with a final exact rerank. Source
+//! reports **3.5–17× QPS over HNSW at equal recall**.
+//!
+//! # Our implementation (honest scope)
+//!
+//! We are **not** reproducing SymphonyQG's exact system (their RaBitQ-fused codes,
+//! their SIMD layout, their refined graph). We build the **direction** of the
+//! claim from the pieces this crate already has, so the comparison is
+//! apples-to-apples on *our* hardware:
+//!
+//! - **Same graph** as the float [`crate::HnswIndex`] — identical structure,
+//!   identical seed, identical level assignment. The *only* variable between the
+//!   float and quantized search is **how a candidate is scored during traversal**,
+//!   so any QPS/recall difference is attributable to the quantization, not to a
+//!   different graph.
+//! - **Quantized score = 1-bit Hamming over the RaBitQ Pass-2 rotated sign code**
+//!   ([`crate::rotation`] + the sign-quantization in [`crate::sketch`]). Each
+//!   node stores its `ceil(D/8)`-byte sign code (`D = next_pow2(dim)`). During
+//!   traversal we compare query-code vs node-code by **POPCNT Hamming** — a few
+//!   machine words, no per-dimension float work.
+//! - **Exact float rerank** of the final beam: the top `rerank` candidates by
+//!   Hamming are re-scored with the true float metric and the best `k` returned.
+//!
+//! This trades a small recall hit (the 1-bit code is a coarse angle proxy — the
+//! same ~46%-strict limitation ADR-156 §10 measured) for far cheaper per-node
+//! scoring, recovered by the float rerank. **Whether that nets a QPS win at our
+//! test scale is the measured question ADR-261 answers** — and at small N the
+//! float distance is cheap enough that the Hamming saving may not pay off. We
+//! report the real number, win or lose, and do not tune to manufacture a speedup.
+//!
+//! # Determinism & robustness
+//!
+//! The graph seed drives everything (level assignment), so the quantized index
+//! is as reproducible as the float one. Empty/degenerate inputs are guarded
+//! exactly as in [`crate::hnsw`] — no panic on empty index, `k > n`, `k == 0`,
+//! single node, ragged query, or zero dim.
+
+use std::cmp::Ordering;
+use std::collections::{BinaryHeap, HashSet};
+
+use crate::hnsw::{HnswIndex, HnswParams, Metric};
+use crate::rotation::Rotation;
+
+/// A 1-bit Pass-2 sign code for one vector, over the padded rotation length `D`.
+/// Stored as packed bytes; compared by POPCNT Hamming.
+#[derive(Debug, Clone)]
+struct Code {
+    bits: Vec<u8>,
+}
+
+impl Code {
+    /// Hamming distance to another code of the same length (popcount of XOR).
+    #[inline]
+    fn hamming(&self, other: &Code) -> u32 {
+        let n = self.bits.len().min(other.bits.len());
+        let mut acc = 0u32;
+        for i in 0..n {
+            acc += (self.bits[i] ^ other.bits[i]).count_ones();
+        }
+        acc
+    }
+}
+
+/// Build the packed 1-bit sign code of a rotated embedding over the padded
+/// length `D = rotation.padded_dim()`. Bit set ⇒ rotated coord ≥ 0.
+fn encode(embedding: &[f32], rotation: &Rotation) -> Code {
+    let rotated = rotation.apply_padded(embedding);
+    let d = rotated.len();
+    let mut bits = vec![0u8; d.div_ceil(8)];
+    for (i, &c) in rotated.iter().enumerate() {
+        if c >= 0.0 {
+            bits[i / 8] |= 1 << (7 - (i % 8));
+        }
+    }
+    Code { bits }
+}
+
+/// Min-heap node for the quantized beam (closest Hamming at the top).
+#[derive(Debug, Clone, Copy)]
+struct HScored {
+    /// Hamming distance (quantized score) — the traversal key.
+    ham: u32,
+    id: u32,
+}
+impl PartialEq for HScored {
+    fn eq(&self, other: &Self) -> bool {
+        self.ham == other.ham && self.id == other.id
+    }
+}
+impl Eq for HScored {}
+impl Ord for HScored {
+    fn cmp(&self, other: &Self) -> Ordering {
+        self.ham.cmp(&other.ham).then(self.id.cmp(&other.id))
+    }
+}
+impl PartialOrd for HScored {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+/// Reversed wrapper for a min-heap (smallest Hamming at the top).
+#[derive(Debug, Clone, Copy)]
+struct MinH(HScored);
+impl PartialEq for MinH {
+    fn eq(&self, other: &Self) -> bool {
+        self.0 == other.0
+    }
+}
+impl Eq for MinH {}
+impl Ord for MinH {
+    fn cmp(&self, other: &Self) -> Ordering {
+        other.0.cmp(&self.0)
+    }
+}
+impl PartialOrd for MinH {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+/// A SymphonyQG-style HNSW: the same graph as [`HnswIndex`], traversed by a
+/// **cheap 1-bit Hamming score**, with a final **exact-float rerank**.
+///
+/// Built by inserting the same vectors in the same order with the same seed as
+/// a float [`HnswIndex`], so the two indices share identical graph structure and
+/// only differ in how the beam is scored. The shared [`Rotation`] (seed + dim)
+/// is the index/query frame for the 1-bit codes.
+#[derive(Debug, Clone)]
+pub struct QuantizedHnswIndex {
+    /// The underlying graph (built with the float metric for exact rerank).
+    graph: HnswIndex,
+    /// Per-node 1-bit Pass-2 codes, indexed by id (parallel to graph vectors).
+    codes: Vec<Code>,
+    /// The rotation frame shared by index and query codes.
+    rotation: Rotation,
+    /// Number of final candidates to exact-float rerank (≥ k at query time).
+    default_rerank: usize,
+}
+
+impl QuantizedHnswIndex {
+    /// Build a quantized index over `vectors`, mirroring a float [`HnswIndex`]
+    /// built with the same `(dim, metric, params)` and insertion order. The
+    /// `rotation_seed` fixes the 1-bit code frame (index and query share it).
+    ///
+    /// `default_rerank` is how many top-Hamming candidates get an exact float
+    /// re-score before returning the best `k`; it is clamped to `≥ k` at query
+    /// time. A larger rerank recovers more recall at more float cost — the knob
+    /// that, alongside `ef`, sets the equal-recall operating point.
+    pub fn build(
+        vectors: &[Vec<f32>],
+        dim: usize,
+        metric: Metric,
+        params: HnswParams,
+        rotation_seed: u64,
+        default_rerank: usize,
+    ) -> Self {
+        let rotation = Rotation::new(rotation_seed, dim);
+        let mut graph = HnswIndex::new(dim, metric, params);
+        let mut codes = Vec::with_capacity(vectors.len());
+        for v in vectors {
+            graph.insert(v);
+            codes.push(encode(v, &rotation));
+        }
+        Self {
+            graph,
+            codes,
+            rotation,
+            default_rerank: default_rerank.max(1),
+        }
+    }
+
+    /// Number of indexed points.
+    #[inline]
+    pub fn len(&self) -> usize {
+        self.graph.len()
+    }
+
+    /// True iff empty.
+    #[inline]
+    pub fn is_empty(&self) -> bool {
+        self.graph.is_empty()
+    }
+
+    /// Borrow the underlying float graph (for shared-graph benchmark parity:
+    /// the float-HNSW baseline runs on *this* graph so the only variable is
+    /// scoring).
+    #[inline]
+    pub fn graph(&self) -> &HnswIndex {
+        &self.graph
+    }
+
+    /// The rerank width this index defaults to.
+    #[inline]
+    pub fn default_rerank(&self) -> usize {
+        self.default_rerank
+    }
+
+    /// SymphonyQG-style search: traverse the graph scoring candidates by **1-bit
+    /// Hamming**, collect a beam of `ef`, then **exact-float rerank** the top
+    /// `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
+    ///
+    /// Degenerate cases mirror [`HnswIndex::search`]: empty ⇒ empty; `k == 0` ⇒
+    /// empty; `k > n` ⇒ all; never panics.
+    pub fn search_quantized(
+        &self,
+        query: &[f32],
+        k: usize,
+        ef: usize,
+        rerank: usize,
+    ) -> Vec<(u32, f32)> {
+        if k == 0 || self.is_empty() {
+            return Vec::new();
+        }
+        let ef = ef.max(k).max(1);
+        let rerank = rerank.max(k);
+        let q_code = encode(query, &self.rotation);
+
+        // Entry point: the graph's entry (highest-level node).
+        let entry = match self.graph.entry_point() {
+            Some(e) => e,
+            None => return Vec::new(),
+        };
+
+        // Greedy-descend upper layers by Hamming, then beam-search layer 0.
+        let mut ep = entry;
+        let mut layer = self.graph.top_level();
+        while layer > 0 {
+            ep = self.greedy_hamming(&q_code, ep, layer);
+            layer -= 1;
+        }
+        let beam = self.beam_hamming(&q_code, ep, ef);
+
+        // Exact-float rerank of the top `rerank` Hamming candidates.
+        let mut cand: Vec<HScored> = beam;
+        cand.sort_by_key(|c| c.ham);
+        cand.truncate(rerank);
+        let mut reranked: Vec<(u32, f32)> = cand
+            .iter()
+            .filter_map(|c| {
+                self.graph
+                    .vector(c.id)
+                    .map(|v| (c.id, self.graph.metric().distance(query, v)))
+            })
+            .collect();
+        reranked.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(Ordering::Equal));
+        reranked.truncate(k);
+        reranked
+    }
+
+    /// Search using the index's default `ef` (from graph params) and rerank.
+    #[inline]
+    pub fn search_default(&self, query: &[f32], k: usize) -> Vec<(u32, f32)> {
+        self.search_quantized(query, k, self.graph.params_ef_search(), self.default_rerank)
+    }
+
+    /// Greedy single-best descent on a layer scored by Hamming.
+    fn greedy_hamming(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
+        let mut best = start;
+        let mut best_h = self.codes[best as usize].hamming(q_code);
+        loop {
+            let mut improved = false;
+            for &nbr in self.graph.neighbours(best, layer) {
+                let h = self.codes[nbr as usize].hamming(q_code);
+                if h < best_h {
+                    best_h = h;
+                    best = nbr;
+                    improved = true;
+                }
+            }
+            if !improved {
+                return best;
+            }
+        }
+    }
+
+    /// Beam search on layer 0 scored by Hamming. Returns the `ef` best-Hamming
+    /// nodes (unsorted). Iterative — bounded by the visited set + the ef beam.
+    fn beam_hamming(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
+        let mut visited: HashSet<u32> = HashSet::new();
+        let mut candidates: BinaryHeap<MinH> = BinaryHeap::new();
+        let mut results: BinaryHeap<HScored> = BinaryHeap::new(); // max-heap: worst at top
+
+        let h0 = self.codes[ep as usize].hamming(q_code);
+        let s0 = HScored { ham: h0, id: ep };
+        visited.insert(ep);
+        candidates.push(MinH(s0));
+        results.push(s0);
+
+        while let Some(MinH(cur)) = candidates.pop() {
+            let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
+            if cur.ham > worst && results.len() >= ef {
+                break;
+            }
+            for &nbr in self.graph.neighbours(cur.id, 0) {
+                if !visited.insert(nbr) {
+                    continue;
+                }
+                let h = self.codes[nbr as usize].hamming(q_code);
+                let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
+                if results.len() < ef || h < worst {
+                    let s = HScored { ham: h, id: nbr };
+                    candidates.push(MinH(s));
+                    results.push(s);
+                    while results.len() > ef {
+                        results.pop();
+                    }
+                }
+            }
+        }
+        results.into_vec()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn split_mix64(state: &mut u64) -> u64 {
+        *state = state.wrapping_add(0x9E37_79B9_7F4A_7C15);
+        let mut z = *state;
+        z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
+        z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
+        z ^ (z >> 31)
+    }
+    fn unif01(state: &mut u64) -> f32 {
+        ((split_mix64(state) >> 40) as f32) / ((1u64 << 24) as f32)
+    }
+    fn gauss(state: &mut u64) -> f32 {
+        let u1 = unif01(state).max(1e-7);
+        let u2 = unif01(state);
+        (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+    }
+    fn planted(dim: usize, n: usize, clusters: usize, seed: u64) -> Vec<Vec<f32>> {
+        let centres: Vec<Vec<f32>> = (0..clusters)
+            .map(|c| {
+                let mut s = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+                (0..dim).map(|_| gauss(&mut s) * 3.0).collect()
+            })
+            .collect();
+        (0..n)
+            .map(|i| {
+                let c = i % clusters;
+                let mut s = seed ^ (i as u64).wrapping_mul(0x9E37);
+                (0..dim).map(|d| centres[c][d] + gauss(&mut s) * 0.35).collect()
+            })
+            .collect()
+    }
+    fn params(seed: u64) -> HnswParams {
+        HnswParams {
+            m: 16,
+            ef_construction: 200,
+            ef_search: 64,
+            seed,
+        }
+    }
+
+    #[test]
+    fn empty_quantized_search_is_empty_no_panic() {
+        let idx = QuantizedHnswIndex::build(&[], 8, Metric::Cosine, params(1), 0x42, 16);
+        assert!(idx.is_empty());
+        assert!(idx.search_quantized(&[0.0; 8], 5, 16, 16).is_empty());
+    }
+
+    #[test]
+    fn single_node_quantized_returns_itself() {
+        let v = vec![vec![1.0, 2.0, 3.0, 4.0]];
+        let idx = QuantizedHnswIndex::build(&v, 4, Metric::L2, params(2), 0x7, 8);
+        let r = idx.search_quantized(&v[0], 3, 16, 8);
+        assert_eq!(r.len(), 1);
+        assert_eq!(r[0].0, 0);
+    }
+
+    #[test]
+    fn k_zero_and_k_gt_n_no_panic() {
+        let vectors = planted(16, 40, 4, 0xABCD);
+        let idx = QuantizedHnswIndex::build(&vectors, 16, Metric::L2, params(3), 0x9, 32);
+        assert!(idx.search_quantized(&vectors[0], 0, 16, 16).is_empty());
+        let r = idx.search_quantized(&vectors[0], 1000, 64, 64);
+        assert_eq!(r.len(), 40);
+    }
+
+    #[test]
+    fn ragged_query_no_panic() {
+        let vectors = planted(16, 30, 3, 0x55);
+        let idx = QuantizedHnswIndex::build(&vectors, 16, Metric::Cosine, params(4), 0xB, 16);
+        assert!(!idx.search_quantized(&[1.0, 2.0, 3.0], 3, 16, 16).is_empty());
+        let long: Vec<f32> = (0..100).map(|i| i as f32).collect();
+        assert!(!idx.search_quantized(&long, 3, 16, 16).is_empty());
+    }
+
+    #[test]
+    fn quantized_is_deterministic() {
+        let vectors = planted(32, 300, 8, 0x2468);
+        let a = QuantizedHnswIndex::build(&vectors, 32, Metric::Cosine, params(0xFEED), 0xC0DE, 32);
+        let b = QuantizedHnswIndex::build(&vectors, 32, Metric::Cosine, params(0xFEED), 0xC0DE, 32);
+        let q = &vectors[100];
+        assert_eq!(
+            a.search_quantized(q, 10, 64, 32),
+            b.search_quantized(q, 10, 64, 32),
+            "quantized search must be deterministic"
+        );
+    }
+
+    /// Recall@10 of quantized-HNSW vs brute-force ground truth, averaged over
+    /// queries. With an exact-float rerank, recall should be high (the rerank
+    /// repairs most of the 1-bit traversal's coarseness). This is the quantized
+    /// variant's correctness gate.
+    #[test]
+    fn quantized_recall_at_10_is_high_with_rerank() {
+        let dim = 64;
+        let n = 2000;
+        let clusters = 32;
+        let seed = 0x9999;
+        let vectors = planted(dim, n, clusters, seed);
+        // Generous rerank so the exact float repairs the coarse Hamming beam.
+        let idx = QuantizedHnswIndex::build(&vectors, dim, Metric::L2, params(0xAAAA), 0x5EED, 64);
+
+        let mut total = 0.0f64;
+        let n_queries = 64;
+        for q in 0..n_queries {
+            let c = q % clusters;
+            let mut cs = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+            let centre: Vec<f32> = (0..dim).map(|_| gauss(&mut cs) * 3.0).collect();
+            let mut s = seed ^ 0xDEAD_0000 ^ (q as u64).wrapping_mul(0x2545_F491);
+            let qv: Vec<f32> = (0..dim).map(|d| centre[d] + gauss(&mut s) * 0.35).collect();
+            let truth: HashSet<u32> = idx
+                .graph()
+                .brute_force(&qv, 10)
+                .into_iter()
+                .map(|(id, _)| id)
+                .collect();
+            let got = idx.search_quantized(&qv, 10, 128, 64);
+            let hit = got.iter().filter(|(id, _)| truth.contains(id)).count();
+            total += hit as f64 / 10.0;
+        }
+        let recall = total / n_queries as f64;
+        // The 1-bit code is coarse, so we do not demand the float 0.95 gate here;
+        // but with a 64-wide rerank over an ef=128 beam it must be clearly useful
+        // (well above random). ADR-261 reports the exact number; this gate just
+        // catches a broken traversal/rerank.
+        assert!(
+            recall >= 0.80,
+            "quantized recall@10 = {recall:.4} too low — traversal or rerank bug"
+        );
+    }
+
+    #[test]
+    fn zero_dim_no_panic() {
+        let vectors = vec![vec![], vec![]];
+        let idx = QuantizedHnswIndex::build(&vectors, 0, Metric::Cosine, params(5), 0x1, 4);
+        let r = idx.search_quantized(&[], 2, 16, 4);
+        assert_eq!(r.len(), 2);
+    }
+}
@@ -28,9 +28,12 @@

 #[cfg(feature = "crv")]
 pub mod crv;
+pub mod ann_measure;
 pub mod coverage;
 pub mod estimator;
 pub mod event_log;
+pub mod hnsw;
+pub mod hnsw_quantized;
 pub mod mat;
 pub mod rotation;
 pub mod signal;
@@ -41,6 +44,8 @@ pub use estimator::{
    DistanceEstimator, EstimatorBank, EstimatorQuery, EstimatorSketch, SideInfo,
 };
 pub use event_log::{NoveltyEvent, PrivacyEventLog};
+pub use hnsw::{HnswIndex, HnswParams, Metric};
+pub use hnsw_quantized::QuantizedHnswIndex;
 pub use rotation::Rotation;
 pub use sketch::{
    Sketch, SketchBank, SketchError, WireSketch, WireSketchError, WIRE_SKETCH_FORMAT_VERSION,