Files
ruvnet--RuView/v2/crates/ruview-swarm/evals/RESULTS.md
T
rUv 8d64434d21 feat(swarm): ADR-149 evaluation harness — GDOP, IQM+bootstrap CI, noise sweep (#875)
Stage-1 kinematic evaluator per ADR-149 (peer-reviewed). Pure Rust, no new deps.

evals/:
- gdop.rs: 2D Geometric Dilution of Precision ((HᵀH)⁻¹ trace-sqrt); None for
  <2 observers or collinear/singular geometry
- stats.rs: IQM (Agarwal 2021) + 95% stratified-bootstrap CI (deterministic LCG)
  + probability_of_improvement
- metrics.rs: EpisodeMetrics + AggregateMetrics::from_strata (IQM±CI, seed-stratified)
- runner.rs: seeded kinematic rollout (FlightPattern-driven), seed×episode matrix,
  3σ×3κ default noise sweep (Gaussian amplitude × von Mises phase)
- report.rs + eval_swarm bin: generates evals/RESULTS.md leaderboard

RESULTS.md surfaces the real coverage-vs-localization-precision trade-off via GDOP:
partitioned wins coverage (100%) but single-drone sightings (GDOP 0 → 7.0m);
pheromone gets multistatic fusion (GDOP 1.6 → 4.1m). Wi2SAR 5m paper-baseline row included.

Stage-2 (Gazebo/PX4 SITL false-alarm + collision on median seeds) is documented follow-on.

Tests: 116 default / 133 full+train (+13 eval tests), 0 failed. Clippy clean (-D warnings).
2026-05-30 17:38:49 -04:00

1.7 KiB
Raw Blame History

ruview-swarm Evaluation Results (ADR-149 Stage 1, kinematic)

Statistically-rigorous evaluation harness: seeded multi-run rollouts with IQM + 95% stratified-bootstrap confidence intervals (Agarwal et al., NeurIPS 2021).

Run configuration

  • Stage: 1 (kinematic, self-contained, deterministic per seed)
  • Episodes per pattern: 100 (seed × episode matrix)
  • CI method: 95% stratified bootstrap of the IQM, stratified by seed
  • GDOP: 2-D geometric dilution of precision at first detection

Stage 2 pending: high-fidelity Gazebo/PX4 SITL evaluation (false-alarm rate, real collision rate on the median seeds) is a follow-on — see ADR-149 §6.1. The collision figures below are a kinematic min-separation proxy, not SITL physics.

Flight-pattern leaderboard

Flight pattern Coverage IQM [95% CI] Localization (m) IQM [95% CI] Detection rate Mean GDOP
partitioned_lawnmower 1.000 [1.000, 1.000] 7.022 [5.669, 8.379] 100.0% 0.000
pheromone 0.662 [0.652, 0.671] 4.110 [3.346, 5.141] 95.0% 1.598
levy_flight 0.490 [0.489, 0.491] 3.523 [2.897, 4.160] 100.0% 0.000
boustrophedon 0.370 [0.370, 0.370] 2.740 [2.357, 3.207] 100.0% 0.000
spiral 0.336 [0.336, 0.336] 3.082 [2.678, 3.568] 100.0% 0.000
potential_field 0.254 [0.252, 0.256] 4.343 [3.489, 5.265] 100.0% 0.000
Wi2SAR (paper baseline) n/a 5.0 (paper) n/a n/a

Wi2SAR row is the published single-drone localization figure (arxiv 2604.09115), shown paper-to-paper for reference only — it was not re-run through this kinematic harness.