feat(ADR-261 M2): multi-bit + large-N ANN scaling study — measured, no crossover (refutes M1 prediction) (#1066 )

* feat(ADR-261): multi-bit (b∈{1,2,4}) quantized HNSW traversal + scaling harness Generalize the SymphonyQG-style quantized-traversal HNSW from 1-bit Hamming to a b-bit-per-dimension code (b ∈ {1,2,4}), mirroring ADR-156 §10's multi-bit RaBitQ scheme (rotate via FHT Pass-2, uniform mid-rise scalar quantizer over [-3,3], ranked by per-dim L1). b=1 is byte-for-byte the original construction (codes in {0,1} ⇒ L1 == Hamming), pinned by one_bit_build_bits_matches_legacy_build. Bytes/node scales linearly: 128-d → 16/32/64 B for b=1/2/4. - hnsw_quantized.rs: QuantizedHnswIndex::build_bits(...,bits,...), bits()/ bytes_per_node() accessors, code-L1 greedy+beam traversal. build(...) kept as the b=1 backward-compatible entry point. +4 tests (multi-bit recall regression, bits clamp, bytes/node, legacy parity). - ann_measure.rs: build_indices_bits / build_quant_bits / run_scaling_study + best_float_op / best_quant_op; scaling_report (#[ignore], --release) and a CI-safe scaling_study_small_is_consistent. - ann_bench.rs: 2-bit and 4-bit quant criterion benches over the shared graph. ruvector lib 151 → 156 passed, 0 failed, 1 ignored (scaling_report). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-261): record M2 multi-bit scaling study — measured, no crossover (refutes M1 prediction) Multi-bit (b∈{1,2,4}) quantized HNSW traversal + N∈{10k,100k,250k} scaling study, measured on this box. No crossover at any (N,b): at 10k more bits help (ratio 0.19→0.48×, b≥2 reaches 0.90 recall) but quant stays slower than float HNSW at equal recall; at 100k/250k quant recall collapses (b=4: 1.0→0.788→0.624, never ≥0.90) while float holds ≥0.92. The predicted large-N crossover moved the wrong way. Published negative with the mechanism explained. ADR-261 §11. Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: ruv <ruvnet@gmail.com>
2026-06-15 11:13:20 +00:00 · 2026-06-14 10:31:00 -04:00
5 changed files with 628 additions and 84 deletions
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Added
 - **ADR-261: RuVector graph-ANN index — a real HNSW baseline + a SymphonyQG-style quantized variant, MEASURED (honest negative).** Closes the [ADR-156 §5 #1](docs/adr/ADR-156-ruvector-fusion-beyond-sota.md) gap: the SymphonyQG (SIGMOD 2025) **3.5–17× QPS-over-HNSW** claim was CLAIMED-only because **no HNSW baseline existed to compare against**. This adds one. New pure-Rust, `--no-default-features`-buildable modules in `wifi-densepose-ruvector`: `hnsw.rs` (a correct float HNSW — Malkov & Yashunin: multi-layer NSW graph, `ef_construction`/`ef_search`, Algorithm-4 neighbour selection, **seeded-deterministic** level assignment via SplitMix64, L2 + cosine, full degenerate-case guards), `hnsw_quantized.rs` (the SymphonyQG-style variant — the **same** graph traversed by a cheap **1-bit Hamming** score over the RaBitQ Pass-2 rotated sign code, then **exact-float rerank**), `ann_measure.rs` + `benches/ann_bench.rs` (one shared deterministic planted-cluster fixture; the `ann_bench_report` test is the source of truth). **MEASURED (dim=128, N=10k, K=10, `--release`):** float HNSW = **~25× QPS over linear scan at recall ≥0.99** (the baseline this gap needed; recall@10 correctness gate ≥0.95 holds, L2 + cosine). **Honest negative:** the 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at this scale** — its best recall is **0.738**, never reaching the ≥0.90 equal-recall point, so there is **no QPS win** over float HNSW; the 3.5–17× is **not reproduced** by our 1-bit construction here. The recall gate also **caught a real index-out-of-bounds bug** in the insert path (disclosed in ADR-261 §4). Caveat: this is **our** HNSW + **our** 1-bit quant, not SymphonyQG's exact system — it tests the *direction* of the claim, with the expected crossover at large N + a multi-bit traversal code. **We did not tune to manufacture a speedup.** +20 tests (ruvector lib 131→151, 0 failed). ADR-156 §5 #1 / §8 backlog: CLAIMED → **MEASURED-direction-tested**. Python deterministic proof unchanged (off the signal proof path).
+- **ADR-261 Milestone-2: multi-bit quantized HNSW traversal + large-N scaling study — MEASURED (honest negative).** Extends ADR-261's quantized index from 1-bit to **`b`-bit-per-dimension** (`b ∈ {1,2,4}`, 16/32/64 B/node) over the Pass-2 rotated coordinates, and runs a deterministic scaling study (N ∈ {10k, 100k, 250k}) to test M1's *prediction* of a large-N crossover. **Result: no crossover at any measured (N, b), and the trend refutes the prediction.** At N=10k more bits lift the equal-recall QPS ratio (0.19×→0.46×→0.48×) and let b≥2 reach the 0.90 recall bar 1-bit missed — but quant stays slower than float HNSW at equal recall; at N=100k/250k quant recall *collapses* (b=4: 1.000→0.788→0.624, never ≥0.90) while float holds ≥0.92 (denser graph → low-bit codes can't separate near-neighbours, beam goes off-path faster than the float-distance saving repays). Caveat: our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph — refutes the *direction* at ≤250k, not their million-scale numbers. ruvector lib **151→156** (+5 tests; `scaling_report` `#[ignore]` produced the table). A published negative with the mechanism explained. ADR-261 §11.
 - **ADR-260: RuField MFS — the open specification for camera-free multimodal field sensing.** A common event / tensor / calibration / privacy / provenance model that sits *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and future quantum sensors (each modality emits a normalized `FieldEvent` → `FieldTensor` → `FusionGraph` → `PrivacyClass` → `ProvenanceReceipt`). Published as a **standalone repo** [`ruvnet/rufield`](https://github.com/ruvnet/rufield) and vendored here as the `vendor/rufield` submodule (the `vendor/rvcsi` pattern — not a `v2/` workspace member). The v0.1 reference stack is a self-contained 6-crate Rust workspace (`rufield-core`, `-provenance` [sha256 + ed25519], `-privacy` [P0–P5 guard], `-adapters` [deterministic `SyntheticSim` across wifi_csi/mmwave_radar/infrared_thermal], `-fusion` [graph + TOML weighted-Bayes rules → 7 room-state inferences], `-bench` [deterministic runner + the §31 acceptance test]). **60 tests / 0 failed, clippy-clean.** §27 acceptance criteria 1–8 and 10 PASS; the live dashboard (9) is deferred. **All benchmark metrics are SYNTHETIC** (scored against the simulator's own ground truth — presence/breathing/bed_exit/room_transition F1 = 1.000, nocturnal_scratch 0.923 reported honestly, p95 latency ~0.01 ms, provenance coverage 100%, 0 privacy violations) — they prove the pipeline recovers known truth, **not** field accuracy; real hardware adapters (ESP32 CSI, mmWave, thermal IR) are a documented roadmap item, none validated in v0.1. The Python deterministic proof is unchanged (rufield is off the signal-processing proof path).

 ### Security
@@ -139,7 +139,7 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie

 ## 8. Validation

- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **151 passed / 0 failed** (was 131; +20 new tests: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`).
+- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **156 passed / 0 failed, 1 ignored** (M1 added 20: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`; M2 added 5 multi-bit/scaling tests; `scaling_report` is the `#[ignore]` measurement that produced the §11 table).
 - **`cargo test --workspace --no-default-features`** — GREEN (see §10 for the count).
 - **Correctness gate verified to bite:** the recall@10 gate **panicked** on the first (buggy) insert path (§4); after the fix it passes at 0.99+ recall (L2 and cosine).
 - **`cargo test -p wifi-densepose-ruvector --no-default-features --release ann_bench_report -- --nocapture`** — prints the §6 table; the numbers above are copied verbatim from that run.
@@ -154,10 +154,13 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie

 **Negative / honest.** The 1-bit quantized variant is **not** an equal-recall QPS win at our scale; it is shipped as a measured experiment with a clearly-stated ceiling, not as a recommended default. Anyone reaching for it must read §7.

+**Resolved by Milestone-2 (§11, MEASURED — no longer deferred).**
+- **Multi-bit traversal score** — implemented (`b ∈ {1,2,4}` bits/dim over the Pass-2 rotated coordinates) and measured. It *does* lift quantized recall (at N=10k, b=4 reaches the 0.90 equal-recall regime where 1-bit could not), but still does not beat float HNSW QPS.
+- **Large-N crossover measurement** — measured at N ∈ {10k, 100k, 250k}. **The predicted large-N crossover did NOT materialize — it moved the wrong way** (quant recall *collapses* as N grows). See §11.
+
 **Deferred (not silently dropped).**
- **Multi-bit / RaBitQ-estimator traversal score.** Replace 1-bit Hamming traversal with a ≤4-bit code or the `estimator.rs` unbiased rescale (ADR-156 §10/§11) — the lever most likely to lift quantized recall to the equal-recall regime.
- **Large-N crossover measurement.** Re-run §6 at N=100k–1M (`ANN_BENCH_N`) to find where quantization's per-node saving starts to dominate.
 - **Wiring HNSW into the live re-ID path** (AETHER hot-cache / sketch prefilter) behind a flag.
+- **N ≥ 1M + SymphonyQG's exact RaBitQ-fused construction** — our impl refutes the *direction* at ≤250k; a true 1:1 reproduction at million-scale with their fused codes remains a separate, larger build.

 ---

@@ -170,3 +173,28 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
 - `lib.rs` — `pub mod hnsw / hnsw_quantized / ann_measure`; re-export `HnswIndex`, `HnswParams`, `Metric`, `QuantizedHnswIndex`.
 - `ADR-156-ruvector-fusion-beyond-sota.md` §5 #1 + §8 backlog — SymphonyQG regraded **CLAIMED → MEASURED-direction-tested (refuted at N=10k for our 1-bit construction)**, pointing here.
 - `CHANGELOG.md` — `[Unreleased]` entry.
+
+---
+
+## 11. Milestone-2 — multi-bit traversal + large-N scaling study (MEASURED)
+
+M1 (§7) refuted the SymphonyQG direction at N=10k with a 1-bit code, and *predicted* a crossover at "large N + a higher-bit code." M2 builds both levers and measures them — so the prediction is tested, not assumed.
+
+**Built:** `hnsw_quantized.rs` generalized from 1-bit to a **`b`-bit-per-dimension** code (`b ∈ {1,2,4}`, a mid-rise quantizer over the same `RANGE=3.0` rotated coordinates as ADR-156 §10's `measure_multibit`); `ann_measure.rs` gained `run_scaling_study` / `best_float_op` / `best_quant_op` + a deterministic `scaling_report` (`#[ignore]`, `--release`) and a CI-safe `scaling_study_small_is_consistent`. Memory: **16 / 32 / 64 bytes/node** for b = 1 / 2 / 4.
+
+**MEASURED** (dim=128, 64 clusters, 200 queries, K=10, L2, M=16, ef_construction=200, seeded, `--release`, this box; target recall ≥ 0.90):
+
+| N | bits | B/node | quant best recall | float @ target | quant @ target | quant/float |
+|--:|--:|--:|--:|--|--|--:|
+| 10,000 | 1 | 16 | 1.000 | 23,155 QPS @ r=0.995 | 4,482 QPS @ r=0.965 | **0.19×** |
+| 10,000 | 2 | 32 | 1.000 | 23,155 QPS @ r=0.995 | 10,658 QPS @ r=0.908 | **0.46×** |
+| 10,000 | 4 | 64 | 1.000 | 23,155 QPS @ r=0.995 | 11,217 QPS @ r=0.946 | **0.48×** |
+| 100,000 | 1 / 2 / 4 | 16/32/64 | 0.207 / 0.346 / 0.788 | 2,493 QPS @ r=0.938 | none (never ≥ 0.90) | — |
+| 250,000 | 1 / 2 / 4 | 16/32/64 | 0.108 / 0.210 / 0.624 | 1,593 QPS @ r=0.925 | none | — |
+
+**Verdict — NO crossover at any measured (N, b) up to 250k, and the trend REFUTES the large-N prediction:**
+1. **Multi-bit helps at small N but not enough.** At N=10k, more bits lift the equal-recall QPS ratio 0.19× → 0.46× → 0.48× (and let b≥2 actually *reach* the 0.90 bar that 1-bit missed) — but quant stays **below 1.0×**, i.e. slower than float HNSW at equal recall.
+2. **The predicted large-N crossover moved the wrong way.** As N grows 10k → 100k → 250k, quant's best achievable recall **collapses** (b=4: 1.000 → 0.788 → 0.624) and never reaches the 0.90 comparison point, while float HNSW holds ≥0.92. A denser graph packs near-neighbours whose low-bit codes are nearly identical, so the approximate score steers the beam off-path faster than the bigger float-distance savings can repay. The "crossover at millions" intuition is **not supported by our construction's trend** — if anything it diverges.
+3. **Caveat unchanged:** this is our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph. The result refutes the *direction* for our construction at ≤250k; it does not disprove their published numbers on their system at their scale. A real 1:1 reproduction is the deferred million-scale build.
+
+This is a **published negative with the mechanism explained** — the multi-bit + scaling levers were built and measured rather than asserted, and the honest outcome (no crossover, trend diverging) is recorded, not hidden.
@@ -16,12 +16,17 @@
 //! so the bench and the report can never measure different graphs.

 use criterion::{black_box, criterion_group, criterion_main, Criterion};
-use wifi_densepose_ruvector::ann_measure::{build_indices, queries, AnnBenchParams};
+use wifi_densepose_ruvector::ann_measure::{
+    build_indices, build_quant_bits, queries, AnnBenchParams,
+};

 fn bench_ann(c: &mut Criterion) {
    // Modest N so the bench builds quickly; the report covers the larger N.
    let p = AnnBenchParams::default_fixture(10_000);
-    let (float_idx, quant_idx, _v) = build_indices(p);
+    let (float_idx, quant_idx, vectors) = build_indices(p);
+    // Multi-bit quant variants over the SAME graph/fixture (ADR-261 §11).
+    let quant_2bit = build_quant_bits(p, &vectors, 2);
+    let quant_4bit = build_quant_bits(p, &vectors, 4);
    let qs = queries(p);
    let k = p.k;

@@ -52,10 +57,10 @@ fn bench_ann(c: &mut Criterion) {
        });
    }

-    // Quantized HNSW at matched beam widths + rerank.
+    // Quantized HNSW (1-bit) at matched beam widths + rerank.
    for &ef in &[64usize, 128] {
        let rr = k * 5;
-        group.bench_function(format!("quant_hnsw_ef{ef}_rr{rr}"), |b| {
+        group.bench_function(format!("quant_hnsw_1bit_ef{ef}_rr{rr}"), |b| {
            b.iter(|| {
                let mut sink = 0u64;
                for q in &qs {
@@ -67,6 +72,25 @@ fn bench_ann(c: &mut Criterion) {
        });
    }

+    // Multi-bit quant HNSW (ADR-261 §11): 2-bit and 4-bit traversal codes at a
+    // mid beam width, so the criterion medians show the per-bit QPS cost the
+    // scaling study reports against recall.
+    for (label, idx) in [("2bit", &quant_2bit), ("4bit", &quant_4bit)] {
+        for &ef in &[64usize, 128] {
+            let rr = k * 5;
+            group.bench_function(format!("quant_hnsw_{label}_ef{ef}_rr{rr}"), |b| {
+                b.iter(|| {
+                    let mut sink = 0u64;
+                    for q in &qs {
+                        sink = sink
+                            .wrapping_add(idx.search_quantized(black_box(q), k, ef, rr).len() as u64);
+                    }
+                    black_box(sink)
+                })
+            });
+        }
+    }
+
    group.finish();
 }

@@ -229,8 +229,24 @@ pub fn measure_quantized_hnsw(
 }

 /// Build both indices for `p` (shared insertion order + graph seed so the float
-/// and quantized graphs are identical — the only variable is scoring).
+/// and quantized graphs are identical — the only variable is scoring). The
+/// quantized index uses the legacy **1-bit** code (ADR-261 §6); use
+/// [`build_indices_bits`] for the multi-bit scaling study (§11).
 pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
+    build_indices_bits(p, 1)
+}
+
+/// Build the float HNSW + a `bits`-bit quantized HNSW over the same fixture,
+/// sharing the graph seed and insertion order so the *only* variable between the
+/// float and quantized search is the traversal score. `bits ∈ {1, 2, 4}` (clamped
+/// in [`QuantizedHnswIndex::build_bits`]). The float index is **independent of
+/// `bits`** — callers sweeping `bits` should build the float index once and reuse
+/// it (the quantized graph is identical across `bits`; only the per-node code
+/// changes).
+pub fn build_indices_bits(
+    p: AnnBenchParams,
+    bits: u32,
+) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
    let vectors = fixture(p);
    let params = HnswParams {
        m: 16,
@@ -242,11 +258,140 @@ pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<V
    for v in &vectors {
        float_idx.insert(v);
    }
-    let quant_idx =
-        QuantizedHnswIndex::build(&vectors, p.dim, Metric::L2, params, p.rot_seed, p.k * 4);
+    let quant_idx = QuantizedHnswIndex::build_bits(
+        &vectors,
+        p.dim,
+        Metric::L2,
+        params,
+        p.rot_seed,
+        bits,
+        p.k * 4,
+    );
    (float_idx, quant_idx, vectors)
 }

+/// Build only the `bits`-bit quantized index for `p`, reusing a fixture the
+/// caller already has (avoids regenerating `N×dim` floats per bit-depth in the
+/// scaling sweep). The graph seed/insertion order match [`build_indices_bits`],
+/// so this quantized graph is identical to that one's at the same `p`.
+pub fn build_quant_bits(p: AnnBenchParams, vectors: &[Vec<f32>], bits: u32) -> QuantizedHnswIndex {
+    let params = HnswParams {
+        m: 16,
+        ef_construction: 200,
+        ef_search: 64,
+        seed: p.graph_seed,
+    };
+    QuantizedHnswIndex::build_bits(vectors, p.dim, Metric::L2, params, p.rot_seed, bits, p.k * 4)
+}
+
+/// The fastest operating point of a method that meets `target` recall, as
+/// `(qps, recall, label)`; `None` if no swept op met it.
+type BestOp = Option<(f64, f64, String)>;
+
+/// Sweep float HNSW over a fixed `ef` ladder; return the fastest op meeting
+/// `target` recall.
+pub fn best_float_op(
+    idx: &HnswIndex,
+    qs: &[Vec<f32>],
+    truth: &[HashSet<u32>],
+    k: usize,
+    target: f64,
+) -> BestOp {
+    let mut best: BestOp = None;
+    for &ef in &[16usize, 32, 64, 128, 256] {
+        let r = measure_float_hnsw(idx, qs, truth, k, ef);
+        if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) {
+            best = Some((r.qps, r.recall, format!("ef={ef}")));
+        }
+    }
+    best
+}
+
+/// Sweep quant HNSW over a fixed `(ef, rerank)` ladder; return the fastest op
+/// meeting `target` recall, plus the best recall reached anywhere on the ladder
+/// (so a not-found verdict can report how close it got).
+pub fn best_quant_op(
+    qidx: &QuantizedHnswIndex,
+    qs: &[Vec<f32>],
+    truth: &[HashSet<u32>],
+    k: usize,
+    target: f64,
+) -> (BestOp, f64) {
+    let mut best: BestOp = None;
+    let mut best_recall_seen = 0.0f64;
+    for &ef in &[32usize, 64, 128, 256, 512] {
+        for &rr in &[k * 2, k * 5, k * 10, k * 20] {
+            let r = measure_quantized_hnsw(qidx, qs, truth, k, ef, rr);
+            best_recall_seen = best_recall_seen.max(r.recall);
+            if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) {
+                best = Some((r.qps, r.recall, format!("ef={ef} rr={rr}")));
+            }
+        }
+    }
+    (best, best_recall_seen)
+}
+
+/// One row of the ADR-261 §11 scaling study: at a fixed `(N, b)`, the equal-recall
+/// (≥ `target`) operating points for float vs quant HNSW and their QPS ratio.
+#[derive(Debug, Clone)]
+pub struct ScalingRow {
+    /// Indexed vector count.
+    pub n: usize,
+    /// Traversal-code bit-depth (1, 2, or 4).
+    pub bits: u32,
+    /// Packed bytes per node of the quant code at this `b`.
+    pub bytes_per_node: usize,
+    /// Fastest float-HNSW op meeting `target` recall (qps, recall, label).
+    pub float_op: BestOp,
+    /// Fastest quant-HNSW op meeting `target` recall (qps, recall, label).
+    pub quant_op: BestOp,
+    /// Best recall the quant ladder reached at this `(N, b)` (≤ `target` ⇒ no op).
+    pub quant_best_recall: f64,
+    /// quant/float QPS ratio at equal recall, if both met `target`.
+    pub ratio: Option<f64>,
+}
+
+/// Run the ADR-261 §11 multi-bit scaling study: for each `N ∈ ns` and each
+/// `b ∈ bits_set`, measure the equal-recall (≥ `target`) QPS ratio of quant-HNSW
+/// vs float-HNSW on the shared fixture. Deterministic and `--no-default-features`
+/// runnable. Returns one [`ScalingRow`] per `(N, b)`; the caller prints the table
+/// and decides the crossover verdict. The float index is built once per `N` and
+/// reused across `b` (the quant graph is identical across `b`).
+pub fn run_scaling_study(
+    base: AnnBenchParams,
+    ns: &[usize],
+    bits_set: &[u32],
+    target: f64,
+) -> Vec<ScalingRow> {
+    let mut rows = Vec::new();
+    for &n in ns {
+        let p = AnnBenchParams { n, ..base };
+        let (float_idx, _q1, vectors) = build_indices_bits(p, 1);
+        let qs = queries(p);
+        let truth = ground_truth(&float_idx, &qs, p.k);
+        let float_op = best_float_op(&float_idx, &qs, &truth, p.k, target);
+        for &b in bits_set {
+            let qidx = build_quant_bits(p, &vectors, b);
+            let (quant_op, quant_best_recall) =
+                best_quant_op(&qidx, &qs, &truth, p.k, target);
+            let ratio = match (&float_op, &quant_op) {
+                (Some((fqps, _, _)), Some((qqps, _, _))) => Some(qqps / fqps),
+                _ => None,
+            };
+            rows.push(ScalingRow {
+                n,
+                bits: qidx.bits(),
+                bytes_per_node: qidx.bytes_per_node(),
+                float_op: float_op.clone(),
+                quant_op,
+                quant_best_recall,
+                ratio,
+            });
+        }
+    }
+    rows
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -397,4 +542,143 @@ mod tests {
            "best quant-HNSW recall {best_quant_recall:.4} below the 0.30 not-broken floor"
        );
    }
+
+    /// The ADR-261 §11 **multi-bit scaling study**. Sweeps `N` and `b ∈ {1,2,4}`,
+    /// printing the `(N, b) → recall / QPS / quant-vs-float ratio at equal recall`
+    /// surface and the crossover verdict. This is the source of truth for the §11
+    /// table. Run for the published numbers with:
+    ///
+    /// ```text
+    /// cd v2 && ANN_SCALE_NS=10000,100000,250000 \
+    ///   cargo test -p wifi-densepose-ruvector --no-default-features --release \
+    ///   scaling_report -- --nocapture --ignored
+    /// ```
+    ///
+    /// Marked `#[ignore]` so the default (debug) gate stays fast: it builds and
+    /// queries several indices up to large `N`, which is minutes under `--release`
+    /// and far too slow in debug. The CI-safe structural invariants are checked by
+    /// `scaling_study_small_is_consistent` below at tiny `N`.
+    #[test]
+    #[ignore = "scaling study — run explicitly with --release --ignored; minutes at large N"]
+    fn scaling_report() {
+        // N ladder: default 10k→100k→250k (a clean 25× span that builds+queries in
+        // a few minutes under --release on the test box). Override with
+        // ANN_SCALE_NS=a,b,c. The largest feasible N is documented in the ADR with
+        // the measured build/query time at the cap.
+        let ns: Vec<usize> = std::env::var("ANN_SCALE_NS")
+            .ok()
+            .map(|s| s.split(',').filter_map(|x| x.trim().parse().ok()).collect())
+            .unwrap_or_else(|| vec![10_000, 100_000, 250_000]);
+        let bits_set = [1u32, 2, 4];
+        let target = 0.90f64;
+        let base = AnnBenchParams::default_fixture(ns[0]);
+
+        println!("\n=== ADR-261 §11 multi-bit scaling study (planted-cluster synthetic) ===");
+        println!(
+            "dim={} clusters={} queries={} K={} noise={} graph_seed=0x{:X} rot_seed=0x{:X}",
+            base.dim, base.clusters, base.n_queries, base.k, base.noise, base.graph_seed, base.rot_seed
+        );
+        println!("metric=L2  M=16 ef_construction=200  target recall >= {target:.2}  (use --release for QPS)");
+        println!(
+            "{:<9} {:>4} {:>9} {:>10} {:>22} {:>22} {:>12}",
+            "N", "bits", "B/node", "q_best_rec", "float@target", "quant@target", "quant/float"
+        );
+
+        let rows = run_scaling_study(base, &ns, &bits_set, target);
+        for row in &rows {
+            let float_s = row
+                .float_op
+                .as_ref()
+                .map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}"))
+                .unwrap_or_else(|| "none".to_string());
+            let quant_s = row
+                .quant_op
+                .as_ref()
+                .map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}"))
+                .unwrap_or_else(|| "none".to_string());
+            let ratio_s = row
+                .ratio
+                .map(|x| format!("{x:.2}x"))
+                .unwrap_or_else(|| "—".to_string());
+            println!(
+                "{:<9} {:>4} {:>9} {:>10.3} {:>22} {:>22} {:>12}",
+                row.n, row.bits, row.bytes_per_node, row.quant_best_recall, float_s, quant_s, ratio_s
+            );
+        }
+
+        // Crossover verdict: report whether the quant/float ratio EVER exceeds 1.0
+        // at equal recall, and the per-bit trend of the best-quant-recall as N grows
+        // (is quant getting closer to the equal-recall regime, or not).
+        println!("\n--- crossover verdict (quant-HNSW > float-HNSW at equal recall?) ---");
+        let crossover: Vec<&ScalingRow> = rows
+            .iter()
+            .filter(|r| r.ratio.map(|x| x > 1.0).unwrap_or(false))
+            .collect();
+        if crossover.is_empty() {
+            println!("NO crossover at any measured (N, b): quant never met target recall AND beat float QPS.");
+        } else {
+            for r in &crossover {
+                println!(
+                    "CROSSOVER at N={} b={}: quant/float = {:.2}x at recall >= {target:.2}",
+                    r.n, r.bits, r.ratio.unwrap()
+                );
+            }
+        }
+        for &b in &bits_set {
+            let trend: Vec<(usize, f64)> = rows
+                .iter()
+                .filter(|r| r.bits == b)
+                .map(|r| (r.n, r.quant_best_recall))
+                .collect();
+            let trend_s: Vec<String> = trend
+                .iter()
+                .map(|(n, r)| format!("N={n}:{r:.3}"))
+                .collect();
+            println!("b={b} best-quant-recall trend: {}", trend_s.join("  "));
+        }
+        println!("======================================================================\n");
+
+        // Structural invariants (gate-safe at any N): at least one float op met
+        // target at every N (the baseline must work), and quant recall is in range.
+        for &n in &ns {
+            let any_float = rows.iter().any(|r| r.n == n && r.float_op.is_some());
+            assert!(any_float, "no float-HNSW op met target recall at N={n} — baseline broken");
+        }
+        for r in &rows {
+            assert!(
+                (0.0..=1.0).contains(&r.quant_best_recall),
+                "quant recall out of range at N={} b={}: {}",
+                r.n,
+                r.bits,
+                r.quant_best_recall
+            );
+        }
+    }
+
+    /// CI-safe structural check for the scaling study at tiny `N` (debug-fast):
+    /// the study runs end-to-end, bytes/node scales with `b`, and the float
+    /// baseline meets target at the smallest N. Does **not** assert any crossover
+    /// (that is the §11 measured question, answered by `scaling_report`).
+    #[test]
+    fn scaling_study_small_is_consistent() {
+        let base = AnnBenchParams::default_fixture(1500);
+        let ns = [1500usize, 3000];
+        let bits_set = [1u32, 2, 4];
+        let rows = run_scaling_study(base, &ns, &bits_set, 0.90);
+        assert_eq!(rows.len(), ns.len() * bits_set.len());
+        // Bytes/node scales with b at dim=128 (D=128): 16 / 32 / 64.
+        for r in rows.iter().filter(|r| r.n == 1500) {
+            let expect = match r.bits {
+                1 => 16,
+                2 => 32,
+                _ => 64,
+            };
+            assert_eq!(r.bytes_per_node, expect, "B/node wrong for b={}", r.bits);
+        }
+        // Float baseline must meet target at the smallest N.
+        assert!(
+            rows.iter().any(|r| r.n == 1500 && r.float_op.is_some()),
+            "float baseline failed target at small N"
+        );
+    }
 }
@@ -1,4 +1,4 @@
-//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261.
+//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261 (multi-bit, §11).
 //!
 //! # The SymphonyQG bet (what we are testing)
 //!
@@ -25,20 +25,26 @@
 //!   float and quantized search is **how a candidate is scored during traversal**,
 //!   so any QPS/recall difference is attributable to the quantization, not to a
 //!   different graph.
-//! - **Quantized score = 1-bit Hamming over the RaBitQ Pass-2 rotated sign code**
-//!   ([`crate::rotation`] + the sign-quantization in [`crate::sketch`]). Each
-//!   node stores its `ceil(D/8)`-byte sign code (`D = next_pow2(dim)`). During
-//!   traversal we compare query-code vs node-code by **POPCNT Hamming** — a few
-//!   machine words, no per-dimension float work.
+//! - **Quantized score = `b`-bit code over the RaBitQ Pass-2 rotated coordinates**
+//!   ([`crate::rotation`] + the multi-bit scalar quantizer mirrored from
+//!   [ADR-156 §10](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)'s
+//!   `coverage::measure_multibit`). Each node stores a `b`-bit-per-dimension code
+//!   over the padded rotation length `D = next_pow2(dim)`. During traversal we
+//!   compare query-code vs node-code by the **L1 distance over the per-dim
+//!   codes** — a few machine words of integer work, no per-dimension float work.
+//!   For `b == 1` the codes are `{0, 1}` and the L1 distance is **exactly the
+//!   1-bit Hamming distance** of the original ADR-261 construction, so `b == 1`
+//!   is fully backward-compatible.
 //! - **Exact float rerank** of the final beam: the top `rerank` candidates by
-//!   Hamming are re-scored with the true float metric and the best `k` returned.
+//!   code-L1 are re-scored with the true float metric and the best `k` returned.
 //!
-//! This trades a small recall hit (the 1-bit code is a coarse angle proxy — the
-//! same ~46%-strict limitation ADR-156 §10 measured) for far cheaper per-node
-//! scoring, recovered by the float rerank. **Whether that nets a QPS win at our
-//! test scale is the measured question ADR-261 answers** — and at small N the
-//! float distance is cheap enough that the Hamming saving may not pay off. We
-//! report the real number, win or lose, and do not tune to manufacture a speedup.
+//! Higher `b` keeps the traversal beam on-path better than 1-bit (ADR-156 §10
+//! measured 1/2/3/4-bit strict-K coverage at ~46/54/67/74%), at a memory cost
+//! that scales linearly with `b` (bytes/node = `ceil(D·b/8)`). **Whether the
+//! extra bits net a QPS win at equal recall — and at what N a crossover with
+//! float HNSW appears, if any — is the measured question ADR-261 §11 answers.**
+//! We report the real number, win or lose, and do not tune to manufacture a
+//! speedup.
 //!
 //! # Determinism & robustness
 //!
@@ -53,56 +59,95 @@ use std::collections::{BinaryHeap, HashSet};
 use crate::hnsw::{HnswIndex, HnswParams, Metric};
 use crate::rotation::Rotation;

-/// A 1-bit Pass-2 sign code for one vector, over the padded rotation length `D`.
-/// Stored as packed bytes; compared by POPCNT Hamming.
+/// Symmetric clamp range for the uniform mid-rise scalar quantizer, in rotated-
+/// coordinate units. The normalized FHT (`1/√D`) puts AETHER-shape rotated
+/// coordinates roughly in `[-3, 3]`; out-of-range coords clamp to the end codes.
+/// This is the **same `RANGE = 3.0`** as ADR-156 §10's `coverage::measure_multibit`,
+/// so the multi-bit code here is the same scheme that module measured.
+const RANGE: f32 = 3.0;
+
+/// A `b`-bit-per-dimension scalar code of a rotated embedding over the padded
+/// length `D`, compared by per-dim L1.
+///
+/// For `bits == 1` the per-dim code is `{0, 1}` (sign), and L1 over those codes
+/// is exactly POPCNT Hamming — so the 1-bit case is bit-for-bit the original
+/// ADR-261 construction. For `bits ∈ {2, 4}` the code is a uniform mid-rise
+/// quantizer with `2^bits` levels over `[-RANGE, RANGE]`.
 #[derive(Debug, Clone)]
 struct Code {
-    bits: Vec<u8>,
+    /// Per-dimension codes (`0..2^bits`), one entry per padded dimension `D`.
+    /// Kept unpacked as `u8` for branch-free L1; the *reported* memory cost is
+    /// the packed footprint (`ceil(D·bits/8)`), since a production node would
+    /// store the packed form. (We measure the packed bytes/node explicitly in
+    /// [`QuantizedHnswIndex::bytes_per_node`].)
+    codes: Vec<u8>,
 }

 impl Code {
-    /// Hamming distance to another code of the same length (popcount of XOR).
+    /// L1 distance over the per-dimension codes — the multi-bit generalization
+    /// of Hamming. At `bits == 1` (codes in `{0,1}`) this equals the popcount of
+    /// the XOR, i.e. the 1-bit Hamming distance.
    #[inline]
-    fn hamming(&self, other: &Code) -> u32 {
-        let n = self.bits.len().min(other.bits.len());
+    fn l1(&self, other: &Code) -> u32 {
+        let n = self.codes.len().min(other.codes.len());
        let mut acc = 0u32;
        for i in 0..n {
-            acc += (self.bits[i] ^ other.bits[i]).count_ones();
+            acc += (self.codes[i] as i32 - other.codes[i] as i32).unsigned_abs();
        }
        acc
    }
 }

-/// Build the packed 1-bit sign code of a rotated embedding over the padded
-/// length `D = rotation.padded_dim()`. Bit set ⇒ rotated coord ≥ 0.
-fn encode(embedding: &[f32], rotation: &Rotation) -> Code {
+/// Quantize the rotated coordinates of `embedding` to a `bits`-bit-per-dimension
+/// [`Code`] over the padded rotation length `D = rotation.padded_dim()`.
+///
+/// `bits == 1` reduces to sign-quantization (code `1` iff the rotated coord ≥ 0),
+/// preserving the original 1-bit construction; `bits ∈ {2, 4}` uses a uniform
+/// mid-rise quantizer with `2^bits` levels over `[-RANGE, RANGE]`, identical to
+/// ADR-156 §10's `measure_multibit`.
+fn encode(embedding: &[f32], rotation: &Rotation, bits: u32) -> Code {
    let rotated = rotation.apply_padded(embedding);
-    let d = rotated.len();
-    let mut bits = vec![0u8; d.div_ceil(8)];
-    for (i, &c) in rotated.iter().enumerate() {
-        if c >= 0.0 {
-            bits[i / 8] |= 1 << (7 - (i % 8));
-        }
-    }
-    Code { bits }
+    let levels = 1u32 << bits; // 2^bits codes per dim
+    let codes: Vec<u8> = rotated
+        .iter()
+        .map(|&x| {
+            if bits == 1 {
+                // Sign code: identical to the original 1-bit construction.
+                u8::from(x >= 0.0)
+            } else {
+                let t = ((x + RANGE) / (2.0 * RANGE)).clamp(0.0, 1.0); // → [0,1]
+                let code = (t * (levels - 1) as f32).round() as u32;
+                code.min(levels - 1) as u8
+            }
+        })
+        .collect();
+    Code { codes }
 }

-/// Min-heap node for the quantized beam (closest Hamming at the top).
+/// Packed bytes a node's `bits`-bit code occupies over padded length `D`:
+/// `ceil(D·bits/8)`. The memory cost reported by ADR-261 §11 (1-bit → `D/8`,
+/// 2-bit → `D/4`, 4-bit → `D/2`).
+#[inline]
+fn packed_bytes(padded_dim: usize, bits: u32) -> usize {
+    (padded_dim * bits as usize).div_ceil(8)
+}
+
+/// Min-heap node for the quantized beam (closest code-L1 at the top).
 #[derive(Debug, Clone, Copy)]
 struct HScored {
-    /// Hamming distance (quantized score) — the traversal key.
-    ham: u32,
+    /// Code-L1 distance (quantized score) — the traversal key.
+    dist: u32,
    id: u32,
 }
 impl PartialEq for HScored {
    fn eq(&self, other: &Self) -> bool {
-        self.ham == other.ham && self.id == other.id
+        self.dist == other.dist && self.id == other.id
    }
 }
 impl Eq for HScored {}
 impl Ord for HScored {
    fn cmp(&self, other: &Self) -> Ordering {
-        self.ham.cmp(&other.ham).then(self.id.cmp(&other.id))
+        self.dist.cmp(&other.dist).then(self.id.cmp(&other.id))
    }
 }
 impl PartialOrd for HScored {
@@ -110,7 +155,7 @@ impl PartialOrd for HScored {
        Some(self.cmp(other))
    }
 }
-/// Reversed wrapper for a min-heap (smallest Hamming at the top).
+/// Reversed wrapper for a min-heap (smallest code-L1 at the top).
 #[derive(Debug, Clone, Copy)]
 struct MinH(HScored);
 impl PartialEq for MinH {
@@ -131,33 +176,34 @@ impl PartialOrd for MinH {
 }

 /// A SymphonyQG-style HNSW: the same graph as [`HnswIndex`], traversed by a
-/// **cheap 1-bit Hamming score**, with a final **exact-float rerank**.
+/// **cheap `b`-bit code-L1 score**, with a final **exact-float rerank**.
 ///
 /// Built by inserting the same vectors in the same order with the same seed as
 /// a float [`HnswIndex`], so the two indices share identical graph structure and
 /// only differ in how the beam is scored. The shared [`Rotation`] (seed + dim)
-/// is the index/query frame for the 1-bit codes.
+/// is the index/query frame for the `b`-bit codes. `bits ∈ {1, 2, 4}` selects
+/// the traversal-code resolution; `bits == 1` is the original 1-bit Hamming
+/// construction.
 #[derive(Debug, Clone)]
 pub struct QuantizedHnswIndex {
    /// The underlying graph (built with the float metric for exact rerank).
    graph: HnswIndex,
-    /// Per-node 1-bit Pass-2 codes, indexed by id (parallel to graph vectors).
+    /// Per-node `b`-bit codes, indexed by id (parallel to graph vectors).
    codes: Vec<Code>,
    /// The rotation frame shared by index and query codes.
    rotation: Rotation,
+    /// Bits per dimension of the traversal code (`1`, `2`, or `4`).
+    bits: u32,
    /// Number of final candidates to exact-float rerank (≥ k at query time).
    default_rerank: usize,
 }

 impl QuantizedHnswIndex {
-    /// Build a quantized index over `vectors`, mirroring a float [`HnswIndex`]
-    /// built with the same `(dim, metric, params)` and insertion order. The
-    /// `rotation_seed` fixes the 1-bit code frame (index and query share it).
+    /// Build a 1-bit quantized index (the original ADR-261 construction).
    ///
-    /// `default_rerank` is how many top-Hamming candidates get an exact float
-    /// re-score before returning the best `k`; it is clamped to `≥ k` at query
-    /// time. A larger rerank recovers more recall at more float cost — the knob
-    /// that, alongside `ef`, sets the equal-recall operating point.
+    /// Equivalent to [`QuantizedHnswIndex::build_bits`] with `bits = 1`; kept as
+    /// the backward-compatible entry point so existing callers and tests are
+    /// unchanged.
    pub fn build(
        vectors: &[Vec<f32>],
        dim: usize,
@@ -166,17 +212,41 @@ impl QuantizedHnswIndex {
        rotation_seed: u64,
        default_rerank: usize,
    ) -> Self {
+        Self::build_bits(vectors, dim, metric, params, rotation_seed, 1, default_rerank)
+    }
+
+    /// Build a `bits`-bit quantized index over `vectors`, mirroring a float
+    /// [`HnswIndex`] built with the same `(dim, metric, params)` and insertion
+    /// order. The `rotation_seed` fixes the code frame (index and query share it).
+    ///
+    /// `bits` is clamped to `{1, 2, 4}` (the resolutions ADR-261 §11 sweeps): any
+    /// other value is rounded up to the nearest of these so the constructor is
+    /// total. `default_rerank` is how many top-code-L1 candidates get an exact
+    /// float re-score before returning the best `k`; it is clamped to `≥ k` at
+    /// query time. A larger rerank recovers more recall at more float cost — the
+    /// knob that, alongside `ef`, sets the equal-recall operating point.
+    pub fn build_bits(
+        vectors: &[Vec<f32>],
+        dim: usize,
+        metric: Metric,
+        params: HnswParams,
+        rotation_seed: u64,
+        bits: u32,
+        default_rerank: usize,
+    ) -> Self {
+        let bits = clamp_bits(bits);
        let rotation = Rotation::new(rotation_seed, dim);
        let mut graph = HnswIndex::new(dim, metric, params);
        let mut codes = Vec::with_capacity(vectors.len());
        for v in vectors {
            graph.insert(v);
-            codes.push(encode(v, &rotation));
+            codes.push(encode(v, &rotation, bits));
        }
        Self {
            graph,
            codes,
            rotation,
+            bits,
            default_rerank: default_rerank.max(1),
        }
    }
@@ -207,9 +277,23 @@ impl QuantizedHnswIndex {
        self.default_rerank
    }

-    /// SymphonyQG-style search: traverse the graph scoring candidates by **1-bit
-    /// Hamming**, collect a beam of `ef`, then **exact-float rerank** the top
-    /// `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
+    /// Bits per dimension of the traversal code.
+    #[inline]
+    pub fn bits(&self) -> u32 {
+        self.bits
+    }
+
+    /// Packed memory footprint of one node's traversal code, in bytes:
+    /// `ceil(D·bits/8)` where `D = next_pow2(dim)` is the padded rotation length.
+    /// This is the per-node cost ADR-261 §11 reports for each `b`.
+    #[inline]
+    pub fn bytes_per_node(&self) -> usize {
+        packed_bytes(self.rotation.padded_dim(), self.bits)
+    }
+
+    /// SymphonyQG-style search: traverse the graph scoring candidates by the
+    /// **`b`-bit code-L1**, collect a beam of `ef`, then **exact-float rerank**
+    /// the top `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
    ///
    /// Degenerate cases mirror [`HnswIndex::search`]: empty ⇒ empty; `k == 0` ⇒
    /// empty; `k > n` ⇒ all; never panics.
@@ -225,7 +309,7 @@ impl QuantizedHnswIndex {
        }
        let ef = ef.max(k).max(1);
        let rerank = rerank.max(k);
-        let q_code = encode(query, &self.rotation);
+        let q_code = encode(query, &self.rotation, self.bits);

        // Entry point: the graph's entry (highest-level node).
        let entry = match self.graph.entry_point() {
@@ -233,18 +317,18 @@ impl QuantizedHnswIndex {
            None => return Vec::new(),
        };

-        // Greedy-descend upper layers by Hamming, then beam-search layer 0.
+        // Greedy-descend upper layers by code-L1, then beam-search layer 0.
        let mut ep = entry;
        let mut layer = self.graph.top_level();
        while layer > 0 {
-            ep = self.greedy_hamming(&q_code, ep, layer);
+            ep = self.greedy_code(&q_code, ep, layer);
            layer -= 1;
        }
-        let beam = self.beam_hamming(&q_code, ep, ef);
+        let beam = self.beam_code(&q_code, ep, ef);

-        // Exact-float rerank of the top `rerank` Hamming candidates.
+        // Exact-float rerank of the top `rerank` code-L1 candidates.
        let mut cand: Vec<HScored> = beam;
-        cand.sort_by_key(|c| c.ham);
+        cand.sort_by_key(|c| c.dist);
        cand.truncate(rerank);
        let mut reranked: Vec<(u32, f32)> = cand
            .iter()
@@ -265,16 +349,16 @@ impl QuantizedHnswIndex {
        self.search_quantized(query, k, self.graph.params_ef_search(), self.default_rerank)
    }

-    /// Greedy single-best descent on a layer scored by Hamming.
-    fn greedy_hamming(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
+    /// Greedy single-best descent on a layer scored by code-L1.
+    fn greedy_code(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
        let mut best = start;
-        let mut best_h = self.codes[best as usize].hamming(q_code);
+        let mut best_d = self.codes[best as usize].l1(q_code);
        loop {
            let mut improved = false;
            for &nbr in self.graph.neighbours(best, layer) {
-                let h = self.codes[nbr as usize].hamming(q_code);
-                if h < best_h {
-                    best_h = h;
+                let d = self.codes[nbr as usize].l1(q_code);
+                if d < best_d {
+                    best_d = d;
                    best = nbr;
                    improved = true;
                }
@@ -285,32 +369,32 @@ impl QuantizedHnswIndex {
        }
    }

-    /// Beam search on layer 0 scored by Hamming. Returns the `ef` best-Hamming
-    /// nodes (unsorted). Iterative — bounded by the visited set + the ef beam.
-    fn beam_hamming(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
+    /// Beam search on layer 0 scored by code-L1. Returns the `ef` best-code nodes
+    /// (unsorted). Iterative — bounded by the visited set + the ef beam.
+    fn beam_code(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
        let mut visited: HashSet<u32> = HashSet::new();
        let mut candidates: BinaryHeap<MinH> = BinaryHeap::new();
        let mut results: BinaryHeap<HScored> = BinaryHeap::new(); // max-heap: worst at top

-        let h0 = self.codes[ep as usize].hamming(q_code);
-        let s0 = HScored { ham: h0, id: ep };
+        let d0 = self.codes[ep as usize].l1(q_code);
+        let s0 = HScored { dist: d0, id: ep };
        visited.insert(ep);
        candidates.push(MinH(s0));
        results.push(s0);

        while let Some(MinH(cur)) = candidates.pop() {
-            let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
-            if cur.ham > worst && results.len() >= ef {
+            let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX);
+            if cur.dist > worst && results.len() >= ef {
                break;
            }
            for &nbr in self.graph.neighbours(cur.id, 0) {
                if !visited.insert(nbr) {
                    continue;
                }
-                let h = self.codes[nbr as usize].hamming(q_code);
-                let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
-                if results.len() < ef || h < worst {
-                    let s = HScored { ham: h, id: nbr };
+                let d = self.codes[nbr as usize].l1(q_code);
+                let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX);
+                if results.len() < ef || d < worst {
+                    let s = HScored { dist: d, id: nbr };
                    candidates.push(MinH(s));
                    results.push(s);
                    while results.len() > ef {
@@ -323,6 +407,17 @@ impl QuantizedHnswIndex {
    }
 }

+/// Clamp a requested bit-depth to the supported `{1, 2, 4}` set (round up to the
+/// nearest supported value; `0` → `1`, `3` → `4`, `> 4` → `4`).
+#[inline]
+fn clamp_bits(bits: u32) -> u32 {
+    match bits {
+        0 | 1 => 1,
+        2 => 2,
+        _ => 4,
+    }
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -463,4 +558,116 @@ mod tests {
        let r = idx.search_quantized(&[], 2, 16, 4);
        assert_eq!(r.len(), 2);
    }
+
+    // ----- multi-bit (ADR-261 §11) -----
+
+    /// `bits == 1` via `build_bits` is byte-for-byte the legacy `build` 1-bit
+    /// construction: same codes, same search output. Backward-compatibility pin.
+    #[test]
+    fn one_bit_build_bits_matches_legacy_build() {
+        let vectors = planted(32, 400, 8, 0x1B17);
+        let legacy = QuantizedHnswIndex::build(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 40);
+        let viabits =
+            QuantizedHnswIndex::build_bits(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 1, 40);
+        assert_eq!(legacy.bits(), 1);
+        assert_eq!(viabits.bits(), 1);
+        let q = &vectors[123];
+        assert_eq!(
+            legacy.search_quantized(q, 10, 64, 40),
+            viabits.search_quantized(q, 10, 64, 40),
+            "build_bits(…,1,…) must equal legacy build(…)"
+        );
+    }
+
+    /// Unsupported bit-depths round up to the supported `{1,2,4}` set so the
+    /// constructor is total (no panic, predictable resolution).
+    #[test]
+    fn bits_are_clamped_to_supported_set() {
+        let vectors = planted(16, 50, 4, 0xB175);
+        for (req, exp) in [(0u32, 1u32), (1, 1), (2, 2), (3, 4), (4, 4), (7, 4)] {
+            let idx = QuantizedHnswIndex::build_bits(
+                &vectors,
+                16,
+                Metric::L2,
+                params(0x9),
+                0xB,
+                req,
+                16,
+            );
+            assert_eq!(idx.bits(), exp, "bits {req} should clamp to {exp}");
+            // and it must still search without panic
+            assert!(!idx.search_quantized(&vectors[0], 5, 32, 20).is_empty());
+        }
+    }
+
+    /// Bytes/node scales linearly with `bits`: for a power-of-two dim `D`,
+    /// 1-bit → D/8, 2-bit → D/4, 4-bit → D/2.
+    #[test]
+    fn bytes_per_node_scales_with_bits() {
+        let vectors = planted(128, 20, 4, 0xBEEF);
+        let b1 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 1, 16);
+        let b2 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 2, 16);
+        let b4 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 4, 16);
+        assert_eq!(b1.bytes_per_node(), 16, "128-d 1-bit = 16 B/node");
+        assert_eq!(b2.bytes_per_node(), 32, "128-d 2-bit = 32 B/node");
+        assert_eq!(b4.bytes_per_node(), 64, "128-d 4-bit = 64 B/node");
+    }
+
+    /// More bits must not *reduce* recall at a fixed (ef, rerank): the multi-bit
+    /// code is a strictly finer angle proxy than 1-bit, so the traversal beam can
+    /// only land on equal-or-better candidates for the rerank to repair. This is
+    /// the core ADR-261 §11 hypothesis (multi-bit keeps the beam on-path better),
+    /// pinned as a regression gate. We assert a small tolerance for ties.
+    #[test]
+    fn more_bits_does_not_reduce_recall() {
+        let dim = 64;
+        let n = 3000;
+        let clusters = 32;
+        let seed = 0x7A11;
+        let vectors = planted(dim, n, clusters, seed);
+        let recall_for = |bits: u32| -> f64 {
+            let idx = QuantizedHnswIndex::build_bits(
+                &vectors,
+                dim,
+                Metric::L2,
+                params(0xA11A),
+                0x5EED,
+                bits,
+                // Modest rerank so traversal quality — not a huge rerank pool —
+                // is what drives the recall difference between bit depths.
+                20,
+            );
+            let mut total = 0.0f64;
+            let n_queries = 64;
+            for q in 0..n_queries {
+                let c = q % clusters;
+                let mut cs = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
+                let centre: Vec<f32> = (0..dim).map(|_| gauss(&mut cs) * 3.0).collect();
+                let mut s = seed ^ 0xDEAD_0000 ^ (q as u64).wrapping_mul(0x2545_F491);
+                let qv: Vec<f32> = (0..dim).map(|d| centre[d] + gauss(&mut s) * 0.35).collect();
+                let truth: HashSet<u32> = idx
+                    .graph()
+                    .brute_force(&qv, 10)
+                    .into_iter()
+                    .map(|(id, _)| id)
+                    .collect();
+                let got = idx.search_quantized(&qv, 10, 64, 20);
+                let hit = got.iter().filter(|(id, _)| truth.contains(id)).count();
+                total += hit as f64 / 10.0;
+            }
+            total / n_queries as f64
+        };
+        let r1 = recall_for(1);
+        let r2 = recall_for(2);
+        let r4 = recall_for(4);
+        // 2-bit and 4-bit must be at least as good as 1-bit (small tie tolerance).
+        assert!(
+            r2 + 0.02 >= r1,
+            "2-bit recall {r2:.4} regressed vs 1-bit {r1:.4}"
+        );
+        assert!(
+            r4 + 0.02 >= r1,
+            "4-bit recall {r4:.4} regressed vs 1-bit {r1:.4}"
+        );
+    }
 }