Compare commits

..

1 Commits

Author SHA1 Message Date
rUv 1f05456588 feat(ADR-261 M2): multi-bit + large-N ANN scaling study — measured, no crossover (refutes M1 prediction) (#1066)
* feat(ADR-261): multi-bit (b∈{1,2,4}) quantized HNSW traversal + scaling harness

Generalize the SymphonyQG-style quantized-traversal HNSW from 1-bit Hamming to a
b-bit-per-dimension code (b ∈ {1,2,4}), mirroring ADR-156 §10's multi-bit RaBitQ
scheme (rotate via FHT Pass-2, uniform mid-rise scalar quantizer over [-3,3],
ranked by per-dim L1). b=1 is byte-for-byte the original construction (codes in
{0,1} ⇒ L1 == Hamming), pinned by one_bit_build_bits_matches_legacy_build.
Bytes/node scales linearly: 128-d → 16/32/64 B for b=1/2/4.

- hnsw_quantized.rs: QuantizedHnswIndex::build_bits(...,bits,...), bits()/
  bytes_per_node() accessors, code-L1 greedy+beam traversal. build(...) kept as
  the b=1 backward-compatible entry point. +4 tests (multi-bit recall regression,
  bits clamp, bytes/node, legacy parity).
- ann_measure.rs: build_indices_bits / build_quant_bits / run_scaling_study +
  best_float_op / best_quant_op; scaling_report (#[ignore], --release) and a
  CI-safe scaling_study_small_is_consistent.
- ann_bench.rs: 2-bit and 4-bit quant criterion benches over the shared graph.

ruvector lib 151 → 156 passed, 0 failed, 1 ignored (scaling_report).

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-261): record M2 multi-bit scaling study — measured, no crossover (refutes M1 prediction)

Multi-bit (b∈{1,2,4}) quantized HNSW traversal + N∈{10k,100k,250k} scaling study,
measured on this box. No crossover at any (N,b): at 10k more bits help (ratio
0.19→0.48×, b≥2 reaches 0.90 recall) but quant stays slower than float HNSW at
equal recall; at 100k/250k quant recall collapses (b=4: 1.0→0.788→0.624, never
≥0.90) while float holds ≥0.92. The predicted large-N crossover moved the wrong
way. Published negative with the mechanism explained. ADR-261 §11.

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: ruv <ruvnet@gmail.com>
2026-06-14 10:31:00 -04:00
5 changed files with 628 additions and 84 deletions
+1
View File
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **ADR-261: RuVector graph-ANN index — a real HNSW baseline + a SymphonyQG-style quantized variant, MEASURED (honest negative).** Closes the [ADR-156 §5 #1](docs/adr/ADR-156-ruvector-fusion-beyond-sota.md) gap: the SymphonyQG (SIGMOD 2025) **3.517× QPS-over-HNSW** claim was CLAIMED-only because **no HNSW baseline existed to compare against**. This adds one. New pure-Rust, `--no-default-features`-buildable modules in `wifi-densepose-ruvector`: `hnsw.rs` (a correct float HNSW — Malkov & Yashunin: multi-layer NSW graph, `ef_construction`/`ef_search`, Algorithm-4 neighbour selection, **seeded-deterministic** level assignment via SplitMix64, L2 + cosine, full degenerate-case guards), `hnsw_quantized.rs` (the SymphonyQG-style variant — the **same** graph traversed by a cheap **1-bit Hamming** score over the RaBitQ Pass-2 rotated sign code, then **exact-float rerank**), `ann_measure.rs` + `benches/ann_bench.rs` (one shared deterministic planted-cluster fixture; the `ann_bench_report` test is the source of truth). **MEASURED (dim=128, N=10k, K=10, `--release`):** float HNSW = **~25× QPS over linear scan at recall ≥0.99** (the baseline this gap needed; recall@10 correctness gate ≥0.95 holds, L2 + cosine). **Honest negative:** the 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at this scale** — its best recall is **0.738**, never reaching the ≥0.90 equal-recall point, so there is **no QPS win** over float HNSW; the 3.517× is **not reproduced** by our 1-bit construction here. The recall gate also **caught a real index-out-of-bounds bug** in the insert path (disclosed in ADR-261 §4). Caveat: this is **our** HNSW + **our** 1-bit quant, not SymphonyQG's exact system — it tests the *direction* of the claim, with the expected crossover at large N + a multi-bit traversal code. **We did not tune to manufacture a speedup.** +20 tests (ruvector lib 131→151, 0 failed). ADR-156 §5 #1 / §8 backlog: CLAIMED → **MEASURED-direction-tested**. Python deterministic proof unchanged (off the signal proof path).
- **ADR-261 Milestone-2: multi-bit quantized HNSW traversal + large-N scaling study — MEASURED (honest negative).** Extends ADR-261's quantized index from 1-bit to **`b`-bit-per-dimension** (`b ∈ {1,2,4}`, 16/32/64 B/node) over the Pass-2 rotated coordinates, and runs a deterministic scaling study (N ∈ {10k, 100k, 250k}) to test M1's *prediction* of a large-N crossover. **Result: no crossover at any measured (N, b), and the trend refutes the prediction.** At N=10k more bits lift the equal-recall QPS ratio (0.19×→0.46×→0.48×) and let b≥2 reach the 0.90 recall bar 1-bit missed — but quant stays slower than float HNSW at equal recall; at N=100k/250k quant recall *collapses* (b=4: 1.000→0.788→0.624, never ≥0.90) while float holds ≥0.92 (denser graph → low-bit codes can't separate near-neighbours, beam goes off-path faster than the float-distance saving repays). Caveat: our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph — refutes the *direction* at ≤250k, not their million-scale numbers. ruvector lib **151→156** (+5 tests; `scaling_report` `#[ignore]` produced the table). A published negative with the mechanism explained. ADR-261 §11.
- **ADR-260: RuField MFS — the open specification for camera-free multimodal field sensing.** A common event / tensor / calibration / privacy / provenance model that sits *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and future quantum sensors (each modality emits a normalized `FieldEvent``FieldTensor``FusionGraph``PrivacyClass``ProvenanceReceipt`). Published as a **standalone repo** [`ruvnet/rufield`](https://github.com/ruvnet/rufield) and vendored here as the `vendor/rufield` submodule (the `vendor/rvcsi` pattern — not a `v2/` workspace member). The v0.1 reference stack is a self-contained 6-crate Rust workspace (`rufield-core`, `-provenance` [sha256 + ed25519], `-privacy` [P0P5 guard], `-adapters` [deterministic `SyntheticSim` across wifi_csi/mmwave_radar/infrared_thermal], `-fusion` [graph + TOML weighted-Bayes rules → 7 room-state inferences], `-bench` [deterministic runner + the §31 acceptance test]). **60 tests / 0 failed, clippy-clean.** §27 acceptance criteria 18 and 10 PASS; the live dashboard (9) is deferred. **All benchmark metrics are SYNTHETIC** (scored against the simulator's own ground truth — presence/breathing/bed_exit/room_transition F1 = 1.000, nocturnal_scratch 0.923 reported honestly, p95 latency ~0.01 ms, provenance coverage 100%, 0 privacy violations) — they prove the pipeline recovers known truth, **not** field accuracy; real hardware adapters (ESP32 CSI, mmWave, thermal IR) are a documented roadmap item, none validated in v0.1. The Python deterministic proof is unchanged (rufield is off the signal-processing proof path).
### Security
+31 -3
View File
@@ -139,7 +139,7 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
## 8. Validation
- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **151 passed / 0 failed** (was 131; +20 new tests: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`).
- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **156 passed / 0 failed, 1 ignored** (M1 added 20: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`; M2 added 5 multi-bit/scaling tests; `scaling_report` is the `#[ignore]` measurement that produced the §11 table).
- **`cargo test --workspace --no-default-features`** — GREEN (see §10 for the count).
- **Correctness gate verified to bite:** the recall@10 gate **panicked** on the first (buggy) insert path (§4); after the fix it passes at 0.99+ recall (L2 and cosine).
- **`cargo test -p wifi-densepose-ruvector --no-default-features --release ann_bench_report -- --nocapture`** — prints the §6 table; the numbers above are copied verbatim from that run.
@@ -154,10 +154,13 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
**Negative / honest.** The 1-bit quantized variant is **not** an equal-recall QPS win at our scale; it is shipped as a measured experiment with a clearly-stated ceiling, not as a recommended default. Anyone reaching for it must read §7.
**Resolved by Milestone-2 (§11, MEASURED — no longer deferred).**
- **Multi-bit traversal score** — implemented (`b ∈ {1,2,4}` bits/dim over the Pass-2 rotated coordinates) and measured. It *does* lift quantized recall (at N=10k, b=4 reaches the 0.90 equal-recall regime where 1-bit could not), but still does not beat float HNSW QPS.
- **Large-N crossover measurement** — measured at N ∈ {10k, 100k, 250k}. **The predicted large-N crossover did NOT materialize — it moved the wrong way** (quant recall *collapses* as N grows). See §11.
**Deferred (not silently dropped).**
- **Multi-bit / RaBitQ-estimator traversal score.** Replace 1-bit Hamming traversal with a ≤4-bit code or the `estimator.rs` unbiased rescale (ADR-156 §10/§11) — the lever most likely to lift quantized recall to the equal-recall regime.
- **Large-N crossover measurement.** Re-run §6 at N=100k1M (`ANN_BENCH_N`) to find where quantization's per-node saving starts to dominate.
- **Wiring HNSW into the live re-ID path** (AETHER hot-cache / sketch prefilter) behind a flag.
- **N ≥ 1M + SymphonyQG's exact RaBitQ-fused construction** — our impl refutes the *direction* at ≤250k; a true 1:1 reproduction at million-scale with their fused codes remains a separate, larger build.
---
@@ -170,3 +173,28 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie
- `lib.rs``pub mod hnsw / hnsw_quantized / ann_measure`; re-export `HnswIndex`, `HnswParams`, `Metric`, `QuantizedHnswIndex`.
- `ADR-156-ruvector-fusion-beyond-sota.md` §5 #1 + §8 backlog — SymphonyQG regraded **CLAIMED → MEASURED-direction-tested (refuted at N=10k for our 1-bit construction)**, pointing here.
- `CHANGELOG.md``[Unreleased]` entry.
---
## 11. Milestone-2 — multi-bit traversal + large-N scaling study (MEASURED)
M1 (§7) refuted the SymphonyQG direction at N=10k with a 1-bit code, and *predicted* a crossover at "large N + a higher-bit code." M2 builds both levers and measures them — so the prediction is tested, not assumed.
**Built:** `hnsw_quantized.rs` generalized from 1-bit to a **`b`-bit-per-dimension** code (`b ∈ {1,2,4}`, a mid-rise quantizer over the same `RANGE=3.0` rotated coordinates as ADR-156 §10's `measure_multibit`); `ann_measure.rs` gained `run_scaling_study` / `best_float_op` / `best_quant_op` + a deterministic `scaling_report` (`#[ignore]`, `--release`) and a CI-safe `scaling_study_small_is_consistent`. Memory: **16 / 32 / 64 bytes/node** for b = 1 / 2 / 4.
**MEASURED** (dim=128, 64 clusters, 200 queries, K=10, L2, M=16, ef_construction=200, seeded, `--release`, this box; target recall ≥ 0.90):
| N | bits | B/node | quant best recall | float @ target | quant @ target | quant/float |
|--:|--:|--:|--:|--|--|--:|
| 10,000 | 1 | 16 | 1.000 | 23,155 QPS @ r=0.995 | 4,482 QPS @ r=0.965 | **0.19×** |
| 10,000 | 2 | 32 | 1.000 | 23,155 QPS @ r=0.995 | 10,658 QPS @ r=0.908 | **0.46×** |
| 10,000 | 4 | 64 | 1.000 | 23,155 QPS @ r=0.995 | 11,217 QPS @ r=0.946 | **0.48×** |
| 100,000 | 1 / 2 / 4 | 16/32/64 | 0.207 / 0.346 / 0.788 | 2,493 QPS @ r=0.938 | none (never ≥ 0.90) | — |
| 250,000 | 1 / 2 / 4 | 16/32/64 | 0.108 / 0.210 / 0.624 | 1,593 QPS @ r=0.925 | none | — |
**Verdict — NO crossover at any measured (N, b) up to 250k, and the trend REFUTES the large-N prediction:**
1. **Multi-bit helps at small N but not enough.** At N=10k, more bits lift the equal-recall QPS ratio 0.19× → 0.46× → 0.48× (and let b≥2 actually *reach* the 0.90 bar that 1-bit missed) — but quant stays **below 1.0×**, i.e. slower than float HNSW at equal recall.
2. **The predicted large-N crossover moved the wrong way.** As N grows 10k → 100k → 250k, quant's best achievable recall **collapses** (b=4: 1.000 → 0.788 → 0.624) and never reaches the 0.90 comparison point, while float HNSW holds ≥0.92. A denser graph packs near-neighbours whose low-bit codes are nearly identical, so the approximate score steers the beam off-path faster than the bigger float-distance savings can repay. The "crossover at millions" intuition is **not supported by our construction's trend** — if anything it diverges.
3. **Caveat unchanged:** this is our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph. The result refutes the *direction* for our construction at ≤250k; it does not disprove their published numbers on their system at their scale. A real 1:1 reproduction is the deferred million-scale build.
This is a **published negative with the mechanism explained** — the multi-bit + scaling levers were built and measured rather than asserted, and the honest outcome (no crossover, trend diverging) is recorded, not hidden.
@@ -16,12 +16,17 @@
//! so the bench and the report can never measure different graphs.
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use wifi_densepose_ruvector::ann_measure::{build_indices, queries, AnnBenchParams};
use wifi_densepose_ruvector::ann_measure::{
build_indices, build_quant_bits, queries, AnnBenchParams,
};
fn bench_ann(c: &mut Criterion) {
// Modest N so the bench builds quickly; the report covers the larger N.
let p = AnnBenchParams::default_fixture(10_000);
let (float_idx, quant_idx, _v) = build_indices(p);
let (float_idx, quant_idx, vectors) = build_indices(p);
// Multi-bit quant variants over the SAME graph/fixture (ADR-261 §11).
let quant_2bit = build_quant_bits(p, &vectors, 2);
let quant_4bit = build_quant_bits(p, &vectors, 4);
let qs = queries(p);
let k = p.k;
@@ -52,10 +57,10 @@ fn bench_ann(c: &mut Criterion) {
});
}
// Quantized HNSW at matched beam widths + rerank.
// Quantized HNSW (1-bit) at matched beam widths + rerank.
for &ef in &[64usize, 128] {
let rr = k * 5;
group.bench_function(format!("quant_hnsw_ef{ef}_rr{rr}"), |b| {
group.bench_function(format!("quant_hnsw_1bit_ef{ef}_rr{rr}"), |b| {
b.iter(|| {
let mut sink = 0u64;
for q in &qs {
@@ -67,6 +72,25 @@ fn bench_ann(c: &mut Criterion) {
});
}
// Multi-bit quant HNSW (ADR-261 §11): 2-bit and 4-bit traversal codes at a
// mid beam width, so the criterion medians show the per-bit QPS cost the
// scaling study reports against recall.
for (label, idx) in [("2bit", &quant_2bit), ("4bit", &quant_4bit)] {
for &ef in &[64usize, 128] {
let rr = k * 5;
group.bench_function(format!("quant_hnsw_{label}_ef{ef}_rr{rr}"), |b| {
b.iter(|| {
let mut sink = 0u64;
for q in &qs {
sink = sink
.wrapping_add(idx.search_quantized(black_box(q), k, ef, rr).len() as u64);
}
black_box(sink)
})
});
}
}
group.finish();
}
@@ -229,8 +229,24 @@ pub fn measure_quantized_hnsw(
}
/// Build both indices for `p` (shared insertion order + graph seed so the float
/// and quantized graphs are identical — the only variable is scoring).
/// and quantized graphs are identical — the only variable is scoring). The
/// quantized index uses the legacy **1-bit** code (ADR-261 §6); use
/// [`build_indices_bits`] for the multi-bit scaling study (§11).
pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
build_indices_bits(p, 1)
}
/// Build the float HNSW + a `bits`-bit quantized HNSW over the same fixture,
/// sharing the graph seed and insertion order so the *only* variable between the
/// float and quantized search is the traversal score. `bits ∈ {1, 2, 4}` (clamped
/// in [`QuantizedHnswIndex::build_bits`]). The float index is **independent of
/// `bits`** — callers sweeping `bits` should build the float index once and reuse
/// it (the quantized graph is identical across `bits`; only the per-node code
/// changes).
pub fn build_indices_bits(
p: AnnBenchParams,
bits: u32,
) -> (HnswIndex, QuantizedHnswIndex, Vec<Vec<f32>>) {
let vectors = fixture(p);
let params = HnswParams {
m: 16,
@@ -242,11 +258,140 @@ pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec<V
for v in &vectors {
float_idx.insert(v);
}
let quant_idx =
QuantizedHnswIndex::build(&vectors, p.dim, Metric::L2, params, p.rot_seed, p.k * 4);
let quant_idx = QuantizedHnswIndex::build_bits(
&vectors,
p.dim,
Metric::L2,
params,
p.rot_seed,
bits,
p.k * 4,
);
(float_idx, quant_idx, vectors)
}
/// Build only the `bits`-bit quantized index for `p`, reusing a fixture the
/// caller already has (avoids regenerating `N×dim` floats per bit-depth in the
/// scaling sweep). The graph seed/insertion order match [`build_indices_bits`],
/// so this quantized graph is identical to that one's at the same `p`.
pub fn build_quant_bits(p: AnnBenchParams, vectors: &[Vec<f32>], bits: u32) -> QuantizedHnswIndex {
let params = HnswParams {
m: 16,
ef_construction: 200,
ef_search: 64,
seed: p.graph_seed,
};
QuantizedHnswIndex::build_bits(vectors, p.dim, Metric::L2, params, p.rot_seed, bits, p.k * 4)
}
/// The fastest operating point of a method that meets `target` recall, as
/// `(qps, recall, label)`; `None` if no swept op met it.
type BestOp = Option<(f64, f64, String)>;
/// Sweep float HNSW over a fixed `ef` ladder; return the fastest op meeting
/// `target` recall.
pub fn best_float_op(
idx: &HnswIndex,
qs: &[Vec<f32>],
truth: &[HashSet<u32>],
k: usize,
target: f64,
) -> BestOp {
let mut best: BestOp = None;
for &ef in &[16usize, 32, 64, 128, 256] {
let r = measure_float_hnsw(idx, qs, truth, k, ef);
if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) {
best = Some((r.qps, r.recall, format!("ef={ef}")));
}
}
best
}
/// Sweep quant HNSW over a fixed `(ef, rerank)` ladder; return the fastest op
/// meeting `target` recall, plus the best recall reached anywhere on the ladder
/// (so a not-found verdict can report how close it got).
pub fn best_quant_op(
qidx: &QuantizedHnswIndex,
qs: &[Vec<f32>],
truth: &[HashSet<u32>],
k: usize,
target: f64,
) -> (BestOp, f64) {
let mut best: BestOp = None;
let mut best_recall_seen = 0.0f64;
for &ef in &[32usize, 64, 128, 256, 512] {
for &rr in &[k * 2, k * 5, k * 10, k * 20] {
let r = measure_quantized_hnsw(qidx, qs, truth, k, ef, rr);
best_recall_seen = best_recall_seen.max(r.recall);
if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) {
best = Some((r.qps, r.recall, format!("ef={ef} rr={rr}")));
}
}
}
(best, best_recall_seen)
}
/// One row of the ADR-261 §11 scaling study: at a fixed `(N, b)`, the equal-recall
/// (≥ `target`) operating points for float vs quant HNSW and their QPS ratio.
#[derive(Debug, Clone)]
pub struct ScalingRow {
/// Indexed vector count.
pub n: usize,
/// Traversal-code bit-depth (1, 2, or 4).
pub bits: u32,
/// Packed bytes per node of the quant code at this `b`.
pub bytes_per_node: usize,
/// Fastest float-HNSW op meeting `target` recall (qps, recall, label).
pub float_op: BestOp,
/// Fastest quant-HNSW op meeting `target` recall (qps, recall, label).
pub quant_op: BestOp,
/// Best recall the quant ladder reached at this `(N, b)` (≤ `target` ⇒ no op).
pub quant_best_recall: f64,
/// quant/float QPS ratio at equal recall, if both met `target`.
pub ratio: Option<f64>,
}
/// Run the ADR-261 §11 multi-bit scaling study: for each `N ∈ ns` and each
/// `b ∈ bits_set`, measure the equal-recall (≥ `target`) QPS ratio of quant-HNSW
/// vs float-HNSW on the shared fixture. Deterministic and `--no-default-features`
/// runnable. Returns one [`ScalingRow`] per `(N, b)`; the caller prints the table
/// and decides the crossover verdict. The float index is built once per `N` and
/// reused across `b` (the quant graph is identical across `b`).
pub fn run_scaling_study(
base: AnnBenchParams,
ns: &[usize],
bits_set: &[u32],
target: f64,
) -> Vec<ScalingRow> {
let mut rows = Vec::new();
for &n in ns {
let p = AnnBenchParams { n, ..base };
let (float_idx, _q1, vectors) = build_indices_bits(p, 1);
let qs = queries(p);
let truth = ground_truth(&float_idx, &qs, p.k);
let float_op = best_float_op(&float_idx, &qs, &truth, p.k, target);
for &b in bits_set {
let qidx = build_quant_bits(p, &vectors, b);
let (quant_op, quant_best_recall) =
best_quant_op(&qidx, &qs, &truth, p.k, target);
let ratio = match (&float_op, &quant_op) {
(Some((fqps, _, _)), Some((qqps, _, _))) => Some(qqps / fqps),
_ => None,
};
rows.push(ScalingRow {
n,
bits: qidx.bits(),
bytes_per_node: qidx.bytes_per_node(),
float_op: float_op.clone(),
quant_op,
quant_best_recall,
ratio,
});
}
}
rows
}
#[cfg(test)]
mod tests {
use super::*;
@@ -397,4 +542,143 @@ mod tests {
"best quant-HNSW recall {best_quant_recall:.4} below the 0.30 not-broken floor"
);
}
/// The ADR-261 §11 **multi-bit scaling study**. Sweeps `N` and `b ∈ {1,2,4}`,
/// printing the `(N, b) → recall / QPS / quant-vs-float ratio at equal recall`
/// surface and the crossover verdict. This is the source of truth for the §11
/// table. Run for the published numbers with:
///
/// ```text
/// cd v2 && ANN_SCALE_NS=10000,100000,250000 \
/// cargo test -p wifi-densepose-ruvector --no-default-features --release \
/// scaling_report -- --nocapture --ignored
/// ```
///
/// Marked `#[ignore]` so the default (debug) gate stays fast: it builds and
/// queries several indices up to large `N`, which is minutes under `--release`
/// and far too slow in debug. The CI-safe structural invariants are checked by
/// `scaling_study_small_is_consistent` below at tiny `N`.
#[test]
#[ignore = "scaling study — run explicitly with --release --ignored; minutes at large N"]
fn scaling_report() {
// N ladder: default 10k→100k→250k (a clean 25× span that builds+queries in
// a few minutes under --release on the test box). Override with
// ANN_SCALE_NS=a,b,c. The largest feasible N is documented in the ADR with
// the measured build/query time at the cap.
let ns: Vec<usize> = std::env::var("ANN_SCALE_NS")
.ok()
.map(|s| s.split(',').filter_map(|x| x.trim().parse().ok()).collect())
.unwrap_or_else(|| vec![10_000, 100_000, 250_000]);
let bits_set = [1u32, 2, 4];
let target = 0.90f64;
let base = AnnBenchParams::default_fixture(ns[0]);
println!("\n=== ADR-261 §11 multi-bit scaling study (planted-cluster synthetic) ===");
println!(
"dim={} clusters={} queries={} K={} noise={} graph_seed=0x{:X} rot_seed=0x{:X}",
base.dim, base.clusters, base.n_queries, base.k, base.noise, base.graph_seed, base.rot_seed
);
println!("metric=L2 M=16 ef_construction=200 target recall >= {target:.2} (use --release for QPS)");
println!(
"{:<9} {:>4} {:>9} {:>10} {:>22} {:>22} {:>12}",
"N", "bits", "B/node", "q_best_rec", "float@target", "quant@target", "quant/float"
);
let rows = run_scaling_study(base, &ns, &bits_set, target);
for row in &rows {
let float_s = row
.float_op
.as_ref()
.map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}"))
.unwrap_or_else(|| "none".to_string());
let quant_s = row
.quant_op
.as_ref()
.map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}"))
.unwrap_or_else(|| "none".to_string());
let ratio_s = row
.ratio
.map(|x| format!("{x:.2}x"))
.unwrap_or_else(|| "".to_string());
println!(
"{:<9} {:>4} {:>9} {:>10.3} {:>22} {:>22} {:>12}",
row.n, row.bits, row.bytes_per_node, row.quant_best_recall, float_s, quant_s, ratio_s
);
}
// Crossover verdict: report whether the quant/float ratio EVER exceeds 1.0
// at equal recall, and the per-bit trend of the best-quant-recall as N grows
// (is quant getting closer to the equal-recall regime, or not).
println!("\n--- crossover verdict (quant-HNSW > float-HNSW at equal recall?) ---");
let crossover: Vec<&ScalingRow> = rows
.iter()
.filter(|r| r.ratio.map(|x| x > 1.0).unwrap_or(false))
.collect();
if crossover.is_empty() {
println!("NO crossover at any measured (N, b): quant never met target recall AND beat float QPS.");
} else {
for r in &crossover {
println!(
"CROSSOVER at N={} b={}: quant/float = {:.2}x at recall >= {target:.2}",
r.n, r.bits, r.ratio.unwrap()
);
}
}
for &b in &bits_set {
let trend: Vec<(usize, f64)> = rows
.iter()
.filter(|r| r.bits == b)
.map(|r| (r.n, r.quant_best_recall))
.collect();
let trend_s: Vec<String> = trend
.iter()
.map(|(n, r)| format!("N={n}:{r:.3}"))
.collect();
println!("b={b} best-quant-recall trend: {}", trend_s.join(" "));
}
println!("======================================================================\n");
// Structural invariants (gate-safe at any N): at least one float op met
// target at every N (the baseline must work), and quant recall is in range.
for &n in &ns {
let any_float = rows.iter().any(|r| r.n == n && r.float_op.is_some());
assert!(any_float, "no float-HNSW op met target recall at N={n} — baseline broken");
}
for r in &rows {
assert!(
(0.0..=1.0).contains(&r.quant_best_recall),
"quant recall out of range at N={} b={}: {}",
r.n,
r.bits,
r.quant_best_recall
);
}
}
/// CI-safe structural check for the scaling study at tiny `N` (debug-fast):
/// the study runs end-to-end, bytes/node scales with `b`, and the float
/// baseline meets target at the smallest N. Does **not** assert any crossover
/// (that is the §11 measured question, answered by `scaling_report`).
#[test]
fn scaling_study_small_is_consistent() {
let base = AnnBenchParams::default_fixture(1500);
let ns = [1500usize, 3000];
let bits_set = [1u32, 2, 4];
let rows = run_scaling_study(base, &ns, &bits_set, 0.90);
assert_eq!(rows.len(), ns.len() * bits_set.len());
// Bytes/node scales with b at dim=128 (D=128): 16 / 32 / 64.
for r in rows.iter().filter(|r| r.n == 1500) {
let expect = match r.bits {
1 => 16,
2 => 32,
_ => 64,
};
assert_eq!(r.bytes_per_node, expect, "B/node wrong for b={}", r.bits);
}
// Float baseline must meet target at the smallest N.
assert!(
rows.iter().any(|r| r.n == 1500 && r.float_op.is_some()),
"float baseline failed target at small N"
);
}
}
@@ -1,4 +1,4 @@
//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261.
//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261 (multi-bit, §11).
//!
//! # The SymphonyQG bet (what we are testing)
//!
@@ -25,20 +25,26 @@
//! float and quantized search is **how a candidate is scored during traversal**,
//! so any QPS/recall difference is attributable to the quantization, not to a
//! different graph.
//! - **Quantized score = 1-bit Hamming over the RaBitQ Pass-2 rotated sign code**
//! ([`crate::rotation`] + the sign-quantization in [`crate::sketch`]). Each
//! node stores its `ceil(D/8)`-byte sign code (`D = next_pow2(dim)`). During
//! traversal we compare query-code vs node-code by **POPCNT Hamming** — a few
//! machine words, no per-dimension float work.
//! - **Quantized score = `b`-bit code over the RaBitQ Pass-2 rotated coordinates**
//! ([`crate::rotation`] + the multi-bit scalar quantizer mirrored from
//! [ADR-156 §10](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)'s
//! `coverage::measure_multibit`). Each node stores a `b`-bit-per-dimension code
//! over the padded rotation length `D = next_pow2(dim)`. During traversal we
//! compare query-code vs node-code by the **L1 distance over the per-dim
//! codes** — a few machine words of integer work, no per-dimension float work.
//! For `b == 1` the codes are `{0, 1}` and the L1 distance is **exactly the
//! 1-bit Hamming distance** of the original ADR-261 construction, so `b == 1`
//! is fully backward-compatible.
//! - **Exact float rerank** of the final beam: the top `rerank` candidates by
//! Hamming are re-scored with the true float metric and the best `k` returned.
//! code-L1 are re-scored with the true float metric and the best `k` returned.
//!
//! This trades a small recall hit (the 1-bit code is a coarse angle proxy — the
//! same ~46%-strict limitation ADR-156 §10 measured) for far cheaper per-node
//! scoring, recovered by the float rerank. **Whether that nets a QPS win at our
//! test scale is the measured question ADR-261 answers** — and at small N the
//! float distance is cheap enough that the Hamming saving may not pay off. We
//! report the real number, win or lose, and do not tune to manufacture a speedup.
//! Higher `b` keeps the traversal beam on-path better than 1-bit (ADR-156 §10
//! measured 1/2/3/4-bit strict-K coverage at ~46/54/67/74%), at a memory cost
//! that scales linearly with `b` (bytes/node = `ceil(D·b/8)`). **Whether the
//! extra bits net a QPS win at equal recall — and at what N a crossover with
//! float HNSW appears, if any — is the measured question ADR-261 §11 answers.**
//! We report the real number, win or lose, and do not tune to manufacture a
//! speedup.
//!
//! # Determinism & robustness
//!
@@ -53,56 +59,95 @@ use std::collections::{BinaryHeap, HashSet};
use crate::hnsw::{HnswIndex, HnswParams, Metric};
use crate::rotation::Rotation;
/// A 1-bit Pass-2 sign code for one vector, over the padded rotation length `D`.
/// Stored as packed bytes; compared by POPCNT Hamming.
/// Symmetric clamp range for the uniform mid-rise scalar quantizer, in rotated-
/// coordinate units. The normalized FHT (`1/√D`) puts AETHER-shape rotated
/// coordinates roughly in `[-3, 3]`; out-of-range coords clamp to the end codes.
/// This is the **same `RANGE = 3.0`** as ADR-156 §10's `coverage::measure_multibit`,
/// so the multi-bit code here is the same scheme that module measured.
const RANGE: f32 = 3.0;
/// A `b`-bit-per-dimension scalar code of a rotated embedding over the padded
/// length `D`, compared by per-dim L1.
///
/// For `bits == 1` the per-dim code is `{0, 1}` (sign), and L1 over those codes
/// is exactly POPCNT Hamming — so the 1-bit case is bit-for-bit the original
/// ADR-261 construction. For `bits ∈ {2, 4}` the code is a uniform mid-rise
/// quantizer with `2^bits` levels over `[-RANGE, RANGE]`.
#[derive(Debug, Clone)]
struct Code {
bits: Vec<u8>,
/// Per-dimension codes (`0..2^bits`), one entry per padded dimension `D`.
/// Kept unpacked as `u8` for branch-free L1; the *reported* memory cost is
/// the packed footprint (`ceil(D·bits/8)`), since a production node would
/// store the packed form. (We measure the packed bytes/node explicitly in
/// [`QuantizedHnswIndex::bytes_per_node`].)
codes: Vec<u8>,
}
impl Code {
/// Hamming distance to another code of the same length (popcount of XOR).
/// L1 distance over the per-dimension codes — the multi-bit generalization
/// of Hamming. At `bits == 1` (codes in `{0,1}`) this equals the popcount of
/// the XOR, i.e. the 1-bit Hamming distance.
#[inline]
fn hamming(&self, other: &Code) -> u32 {
let n = self.bits.len().min(other.bits.len());
fn l1(&self, other: &Code) -> u32 {
let n = self.codes.len().min(other.codes.len());
let mut acc = 0u32;
for i in 0..n {
acc += (self.bits[i] ^ other.bits[i]).count_ones();
acc += (self.codes[i] as i32 - other.codes[i] as i32).unsigned_abs();
}
acc
}
}
/// Build the packed 1-bit sign code of a rotated embedding over the padded
/// length `D = rotation.padded_dim()`. Bit set ⇒ rotated coord ≥ 0.
fn encode(embedding: &[f32], rotation: &Rotation) -> Code {
/// Quantize the rotated coordinates of `embedding` to a `bits`-bit-per-dimension
/// [`Code`] over the padded rotation length `D = rotation.padded_dim()`.
///
/// `bits == 1` reduces to sign-quantization (code `1` iff the rotated coord ≥ 0),
/// preserving the original 1-bit construction; `bits ∈ {2, 4}` uses a uniform
/// mid-rise quantizer with `2^bits` levels over `[-RANGE, RANGE]`, identical to
/// ADR-156 §10's `measure_multibit`.
fn encode(embedding: &[f32], rotation: &Rotation, bits: u32) -> Code {
let rotated = rotation.apply_padded(embedding);
let d = rotated.len();
let mut bits = vec![0u8; d.div_ceil(8)];
for (i, &c) in rotated.iter().enumerate() {
if c >= 0.0 {
bits[i / 8] |= 1 << (7 - (i % 8));
}
}
Code { bits }
let levels = 1u32 << bits; // 2^bits codes per dim
let codes: Vec<u8> = rotated
.iter()
.map(|&x| {
if bits == 1 {
// Sign code: identical to the original 1-bit construction.
u8::from(x >= 0.0)
} else {
let t = ((x + RANGE) / (2.0 * RANGE)).clamp(0.0, 1.0); // → [0,1]
let code = (t * (levels - 1) as f32).round() as u32;
code.min(levels - 1) as u8
}
})
.collect();
Code { codes }
}
/// Min-heap node for the quantized beam (closest Hamming at the top).
/// Packed bytes a node's `bits`-bit code occupies over padded length `D`:
/// `ceil(D·bits/8)`. The memory cost reported by ADR-261 §11 (1-bit → `D/8`,
/// 2-bit → `D/4`, 4-bit → `D/2`).
#[inline]
fn packed_bytes(padded_dim: usize, bits: u32) -> usize {
(padded_dim * bits as usize).div_ceil(8)
}
/// Min-heap node for the quantized beam (closest code-L1 at the top).
#[derive(Debug, Clone, Copy)]
struct HScored {
/// Hamming distance (quantized score) — the traversal key.
ham: u32,
/// Code-L1 distance (quantized score) — the traversal key.
dist: u32,
id: u32,
}
impl PartialEq for HScored {
fn eq(&self, other: &Self) -> bool {
self.ham == other.ham && self.id == other.id
self.dist == other.dist && self.id == other.id
}
}
impl Eq for HScored {}
impl Ord for HScored {
fn cmp(&self, other: &Self) -> Ordering {
self.ham.cmp(&other.ham).then(self.id.cmp(&other.id))
self.dist.cmp(&other.dist).then(self.id.cmp(&other.id))
}
}
impl PartialOrd for HScored {
@@ -110,7 +155,7 @@ impl PartialOrd for HScored {
Some(self.cmp(other))
}
}
/// Reversed wrapper for a min-heap (smallest Hamming at the top).
/// Reversed wrapper for a min-heap (smallest code-L1 at the top).
#[derive(Debug, Clone, Copy)]
struct MinH(HScored);
impl PartialEq for MinH {
@@ -131,33 +176,34 @@ impl PartialOrd for MinH {
}
/// A SymphonyQG-style HNSW: the same graph as [`HnswIndex`], traversed by a
/// **cheap 1-bit Hamming score**, with a final **exact-float rerank**.
/// **cheap `b`-bit code-L1 score**, with a final **exact-float rerank**.
///
/// Built by inserting the same vectors in the same order with the same seed as
/// a float [`HnswIndex`], so the two indices share identical graph structure and
/// only differ in how the beam is scored. The shared [`Rotation`] (seed + dim)
/// is the index/query frame for the 1-bit codes.
/// is the index/query frame for the `b`-bit codes. `bits ∈ {1, 2, 4}` selects
/// the traversal-code resolution; `bits == 1` is the original 1-bit Hamming
/// construction.
#[derive(Debug, Clone)]
pub struct QuantizedHnswIndex {
/// The underlying graph (built with the float metric for exact rerank).
graph: HnswIndex,
/// Per-node 1-bit Pass-2 codes, indexed by id (parallel to graph vectors).
/// Per-node `b`-bit codes, indexed by id (parallel to graph vectors).
codes: Vec<Code>,
/// The rotation frame shared by index and query codes.
rotation: Rotation,
/// Bits per dimension of the traversal code (`1`, `2`, or `4`).
bits: u32,
/// Number of final candidates to exact-float rerank (≥ k at query time).
default_rerank: usize,
}
impl QuantizedHnswIndex {
/// Build a quantized index over `vectors`, mirroring a float [`HnswIndex`]
/// built with the same `(dim, metric, params)` and insertion order. The
/// `rotation_seed` fixes the 1-bit code frame (index and query share it).
/// Build a 1-bit quantized index (the original ADR-261 construction).
///
/// `default_rerank` is how many top-Hamming candidates get an exact float
/// re-score before returning the best `k`; it is clamped to `≥ k` at query
/// time. A larger rerank recovers more recall at more float cost — the knob
/// that, alongside `ef`, sets the equal-recall operating point.
/// Equivalent to [`QuantizedHnswIndex::build_bits`] with `bits = 1`; kept as
/// the backward-compatible entry point so existing callers and tests are
/// unchanged.
pub fn build(
vectors: &[Vec<f32>],
dim: usize,
@@ -166,17 +212,41 @@ impl QuantizedHnswIndex {
rotation_seed: u64,
default_rerank: usize,
) -> Self {
Self::build_bits(vectors, dim, metric, params, rotation_seed, 1, default_rerank)
}
/// Build a `bits`-bit quantized index over `vectors`, mirroring a float
/// [`HnswIndex`] built with the same `(dim, metric, params)` and insertion
/// order. The `rotation_seed` fixes the code frame (index and query share it).
///
/// `bits` is clamped to `{1, 2, 4}` (the resolutions ADR-261 §11 sweeps): any
/// other value is rounded up to the nearest of these so the constructor is
/// total. `default_rerank` is how many top-code-L1 candidates get an exact
/// float re-score before returning the best `k`; it is clamped to `≥ k` at
/// query time. A larger rerank recovers more recall at more float cost — the
/// knob that, alongside `ef`, sets the equal-recall operating point.
pub fn build_bits(
vectors: &[Vec<f32>],
dim: usize,
metric: Metric,
params: HnswParams,
rotation_seed: u64,
bits: u32,
default_rerank: usize,
) -> Self {
let bits = clamp_bits(bits);
let rotation = Rotation::new(rotation_seed, dim);
let mut graph = HnswIndex::new(dim, metric, params);
let mut codes = Vec::with_capacity(vectors.len());
for v in vectors {
graph.insert(v);
codes.push(encode(v, &rotation));
codes.push(encode(v, &rotation, bits));
}
Self {
graph,
codes,
rotation,
bits,
default_rerank: default_rerank.max(1),
}
}
@@ -207,9 +277,23 @@ impl QuantizedHnswIndex {
self.default_rerank
}
/// SymphonyQG-style search: traverse the graph scoring candidates by **1-bit
/// Hamming**, collect a beam of `ef`, then **exact-float rerank** the top
/// `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
/// Bits per dimension of the traversal code.
#[inline]
pub fn bits(&self) -> u32 {
self.bits
}
/// Packed memory footprint of one node's traversal code, in bytes:
/// `ceil(D·bits/8)` where `D = next_pow2(dim)` is the padded rotation length.
/// This is the per-node cost ADR-261 §11 reports for each `b`.
#[inline]
pub fn bytes_per_node(&self) -> usize {
packed_bytes(self.rotation.padded_dim(), self.bits)
}
/// SymphonyQG-style search: traverse the graph scoring candidates by the
/// **`b`-bit code-L1**, collect a beam of `ef`, then **exact-float rerank**
/// the top `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`.
///
/// Degenerate cases mirror [`HnswIndex::search`]: empty ⇒ empty; `k == 0` ⇒
/// empty; `k > n` ⇒ all; never panics.
@@ -225,7 +309,7 @@ impl QuantizedHnswIndex {
}
let ef = ef.max(k).max(1);
let rerank = rerank.max(k);
let q_code = encode(query, &self.rotation);
let q_code = encode(query, &self.rotation, self.bits);
// Entry point: the graph's entry (highest-level node).
let entry = match self.graph.entry_point() {
@@ -233,18 +317,18 @@ impl QuantizedHnswIndex {
None => return Vec::new(),
};
// Greedy-descend upper layers by Hamming, then beam-search layer 0.
// Greedy-descend upper layers by code-L1, then beam-search layer 0.
let mut ep = entry;
let mut layer = self.graph.top_level();
while layer > 0 {
ep = self.greedy_hamming(&q_code, ep, layer);
ep = self.greedy_code(&q_code, ep, layer);
layer -= 1;
}
let beam = self.beam_hamming(&q_code, ep, ef);
let beam = self.beam_code(&q_code, ep, ef);
// Exact-float rerank of the top `rerank` Hamming candidates.
// Exact-float rerank of the top `rerank` code-L1 candidates.
let mut cand: Vec<HScored> = beam;
cand.sort_by_key(|c| c.ham);
cand.sort_by_key(|c| c.dist);
cand.truncate(rerank);
let mut reranked: Vec<(u32, f32)> = cand
.iter()
@@ -265,16 +349,16 @@ impl QuantizedHnswIndex {
self.search_quantized(query, k, self.graph.params_ef_search(), self.default_rerank)
}
/// Greedy single-best descent on a layer scored by Hamming.
fn greedy_hamming(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
/// Greedy single-best descent on a layer scored by code-L1.
fn greedy_code(&self, q_code: &Code, start: u32, layer: usize) -> u32 {
let mut best = start;
let mut best_h = self.codes[best as usize].hamming(q_code);
let mut best_d = self.codes[best as usize].l1(q_code);
loop {
let mut improved = false;
for &nbr in self.graph.neighbours(best, layer) {
let h = self.codes[nbr as usize].hamming(q_code);
if h < best_h {
best_h = h;
let d = self.codes[nbr as usize].l1(q_code);
if d < best_d {
best_d = d;
best = nbr;
improved = true;
}
@@ -285,32 +369,32 @@ impl QuantizedHnswIndex {
}
}
/// Beam search on layer 0 scored by Hamming. Returns the `ef` best-Hamming
/// nodes (unsorted). Iterative — bounded by the visited set + the ef beam.
fn beam_hamming(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
/// Beam search on layer 0 scored by code-L1. Returns the `ef` best-code nodes
/// (unsorted). Iterative — bounded by the visited set + the ef beam.
fn beam_code(&self, q_code: &Code, ep: u32, ef: usize) -> Vec<HScored> {
let mut visited: HashSet<u32> = HashSet::new();
let mut candidates: BinaryHeap<MinH> = BinaryHeap::new();
let mut results: BinaryHeap<HScored> = BinaryHeap::new(); // max-heap: worst at top
let h0 = self.codes[ep as usize].hamming(q_code);
let s0 = HScored { ham: h0, id: ep };
let d0 = self.codes[ep as usize].l1(q_code);
let s0 = HScored { dist: d0, id: ep };
visited.insert(ep);
candidates.push(MinH(s0));
results.push(s0);
while let Some(MinH(cur)) = candidates.pop() {
let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
if cur.ham > worst && results.len() >= ef {
let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX);
if cur.dist > worst && results.len() >= ef {
break;
}
for &nbr in self.graph.neighbours(cur.id, 0) {
if !visited.insert(nbr) {
continue;
}
let h = self.codes[nbr as usize].hamming(q_code);
let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX);
if results.len() < ef || h < worst {
let s = HScored { ham: h, id: nbr };
let d = self.codes[nbr as usize].l1(q_code);
let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX);
if results.len() < ef || d < worst {
let s = HScored { dist: d, id: nbr };
candidates.push(MinH(s));
results.push(s);
while results.len() > ef {
@@ -323,6 +407,17 @@ impl QuantizedHnswIndex {
}
}
/// Clamp a requested bit-depth to the supported `{1, 2, 4}` set (round up to the
/// nearest supported value; `0` → `1`, `3` → `4`, `> 4` → `4`).
#[inline]
fn clamp_bits(bits: u32) -> u32 {
match bits {
0 | 1 => 1,
2 => 2,
_ => 4,
}
}
#[cfg(test)]
mod tests {
use super::*;
@@ -463,4 +558,116 @@ mod tests {
let r = idx.search_quantized(&[], 2, 16, 4);
assert_eq!(r.len(), 2);
}
// ----- multi-bit (ADR-261 §11) -----
/// `bits == 1` via `build_bits` is byte-for-byte the legacy `build` 1-bit
/// construction: same codes, same search output. Backward-compatibility pin.
#[test]
fn one_bit_build_bits_matches_legacy_build() {
let vectors = planted(32, 400, 8, 0x1B17);
let legacy = QuantizedHnswIndex::build(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 40);
let viabits =
QuantizedHnswIndex::build_bits(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 1, 40);
assert_eq!(legacy.bits(), 1);
assert_eq!(viabits.bits(), 1);
let q = &vectors[123];
assert_eq!(
legacy.search_quantized(q, 10, 64, 40),
viabits.search_quantized(q, 10, 64, 40),
"build_bits(…,1,…) must equal legacy build(…)"
);
}
/// Unsupported bit-depths round up to the supported `{1,2,4}` set so the
/// constructor is total (no panic, predictable resolution).
#[test]
fn bits_are_clamped_to_supported_set() {
let vectors = planted(16, 50, 4, 0xB175);
for (req, exp) in [(0u32, 1u32), (1, 1), (2, 2), (3, 4), (4, 4), (7, 4)] {
let idx = QuantizedHnswIndex::build_bits(
&vectors,
16,
Metric::L2,
params(0x9),
0xB,
req,
16,
);
assert_eq!(idx.bits(), exp, "bits {req} should clamp to {exp}");
// and it must still search without panic
assert!(!idx.search_quantized(&vectors[0], 5, 32, 20).is_empty());
}
}
/// Bytes/node scales linearly with `bits`: for a power-of-two dim `D`,
/// 1-bit → D/8, 2-bit → D/4, 4-bit → D/2.
#[test]
fn bytes_per_node_scales_with_bits() {
let vectors = planted(128, 20, 4, 0xBEEF);
let b1 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 1, 16);
let b2 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 2, 16);
let b4 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 4, 16);
assert_eq!(b1.bytes_per_node(), 16, "128-d 1-bit = 16 B/node");
assert_eq!(b2.bytes_per_node(), 32, "128-d 2-bit = 32 B/node");
assert_eq!(b4.bytes_per_node(), 64, "128-d 4-bit = 64 B/node");
}
/// More bits must not *reduce* recall at a fixed (ef, rerank): the multi-bit
/// code is a strictly finer angle proxy than 1-bit, so the traversal beam can
/// only land on equal-or-better candidates for the rerank to repair. This is
/// the core ADR-261 §11 hypothesis (multi-bit keeps the beam on-path better),
/// pinned as a regression gate. We assert a small tolerance for ties.
#[test]
fn more_bits_does_not_reduce_recall() {
let dim = 64;
let n = 3000;
let clusters = 32;
let seed = 0x7A11;
let vectors = planted(dim, n, clusters, seed);
let recall_for = |bits: u32| -> f64 {
let idx = QuantizedHnswIndex::build_bits(
&vectors,
dim,
Metric::L2,
params(0xA11A),
0x5EED,
bits,
// Modest rerank so traversal quality — not a huge rerank pool —
// is what drives the recall difference between bit depths.
20,
);
let mut total = 0.0f64;
let n_queries = 64;
for q in 0..n_queries {
let c = q % clusters;
let mut cs = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1));
let centre: Vec<f32> = (0..dim).map(|_| gauss(&mut cs) * 3.0).collect();
let mut s = seed ^ 0xDEAD_0000 ^ (q as u64).wrapping_mul(0x2545_F491);
let qv: Vec<f32> = (0..dim).map(|d| centre[d] + gauss(&mut s) * 0.35).collect();
let truth: HashSet<u32> = idx
.graph()
.brute_force(&qv, 10)
.into_iter()
.map(|(id, _)| id)
.collect();
let got = idx.search_quantized(&qv, 10, 64, 20);
let hit = got.iter().filter(|(id, _)| truth.contains(id)).count();
total += hit as f64 / 10.0;
}
total / n_queries as f64
};
let r1 = recall_for(1);
let r2 = recall_for(2);
let r4 = recall_for(4);
// 2-bit and 4-bit must be at least as good as 1-bit (small tie tolerance).
assert!(
r2 + 0.02 >= r1,
"2-bit recall {r2:.4} regressed vs 1-bit {r1:.4}"
);
assert!(
r4 + 0.02 >= r1,
"4-bit recall {r4:.4} regressed vs 1-bit {r1:.4}"
);
}
}