feat(swarm): ADR-149 evaluation harness — GDOP, IQM+bootstrap CI, noise sweep (#875)

Stage-1 kinematic evaluator per ADR-149 (peer-reviewed). Pure Rust, no new deps.

evals/:
- gdop.rs: 2D Geometric Dilution of Precision ((HᵀH)⁻¹ trace-sqrt); None for
  <2 observers or collinear/singular geometry
- stats.rs: IQM (Agarwal 2021) + 95% stratified-bootstrap CI (deterministic LCG)
  + probability_of_improvement
- metrics.rs: EpisodeMetrics + AggregateMetrics::from_strata (IQM±CI, seed-stratified)
- runner.rs: seeded kinematic rollout (FlightPattern-driven), seed×episode matrix,
  3σ×3κ default noise sweep (Gaussian amplitude × von Mises phase)
- report.rs + eval_swarm bin: generates evals/RESULTS.md leaderboard

RESULTS.md surfaces the real coverage-vs-localization-precision trade-off via GDOP:
partitioned wins coverage (100%) but single-drone sightings (GDOP 0 → 7.0m);
pheromone gets multistatic fusion (GDOP 1.6 → 4.1m). Wi2SAR 5m paper-baseline row included.

Stage-2 (Gazebo/PX4 SITL false-alarm + collision on median seeds) is documented follow-on.

Tests: 116 default / 133 full+train (+13 eval tests), 0 failed. Clippy clean (-D warnings).
This commit is contained in:
rUv
2026-05-30 17:38:49 -04:00
committed by GitHub
parent 0d3d835bf8
commit 8d64434d21
12 changed files with 1368 additions and 0 deletions
+4
View File
@@ -78,3 +78,7 @@ harness = false
[[bin]]
name = "train_marl"
required-features = ["train"]
# ADR-149 Stage-1 evaluation CLI — pure Rust, no special feature needed.
[[bin]]
name = "eval_swarm"
+2
View File
@@ -0,0 +1,2 @@
# ADR-149 evaluation outputs
RESULTS.md is generated by the `eval_swarm` binary.
+26
View File
@@ -0,0 +1,26 @@
# ruview-swarm Evaluation Results (ADR-149 Stage 1, kinematic)
Statistically-rigorous evaluation harness: seeded multi-run rollouts with IQM + 95% stratified-bootstrap confidence intervals (Agarwal et al., NeurIPS 2021).
## Run configuration
- **Stage**: 1 (kinematic, self-contained, deterministic per seed)
- **Episodes per pattern**: 100 (seed × episode matrix)
- **CI method**: 95% stratified bootstrap of the IQM, stratified by seed
- **GDOP**: 2-D geometric dilution of precision at first detection
> **Stage 2 pending**: high-fidelity Gazebo/PX4 SITL evaluation (false-alarm rate, real collision rate on the median seeds) is a follow-on — see ADR-149 §6.1. The collision figures below are a kinematic min-separation proxy, not SITL physics.
## Flight-pattern leaderboard
| Flight pattern | Coverage IQM [95% CI] | Localization (m) IQM [95% CI] | Detection rate | Mean GDOP |
|----------------|-----------------------|-------------------------------|----------------|-----------|
| partitioned_lawnmower | 1.000 [1.000, 1.000] | 7.022 [5.669, 8.379] | 100.0% | 0.000 |
| pheromone | 0.662 [0.652, 0.671] | 4.110 [3.346, 5.141] | 95.0% | 1.598 |
| levy_flight | 0.490 [0.489, 0.491] | 3.523 [2.897, 4.160] | 100.0% | 0.000 |
| boustrophedon | 0.370 [0.370, 0.370] | 2.740 [2.357, 3.207] | 100.0% | 0.000 |
| spiral | 0.336 [0.336, 0.336] | 3.082 [2.678, 3.568] | 100.0% | 0.000 |
| potential_field | 0.254 [0.252, 0.256] | 4.343 [3.489, 5.265] | 100.0% | 0.000 |
| _Wi2SAR (paper baseline)_ | _n/a_ | _5.0 (paper)_ | _n/a_ | _n/a_ |
_Wi2SAR row is the published single-drone localization figure (arxiv 2604.09115), shown paper-to-paper for reference only — it was not re-run through this kinematic harness._
@@ -0,0 +1,104 @@
//! ADR-149 Stage-1 evaluation CLI.
//!
//! Runs the kinematic eval matrix over every flight pattern (default) and
//! writes a ranked `RESULTS.md` leaderboard. Pure Rust — no special feature
//! flag required, so it builds and runs in default CI.
//!
//! Defaults are intentionally small (10 seeds × 10 episodes) so the run is fast.
//! The full ADR-149 reporting configuration is 10 seeds × 50 episodes — pass
//! `--seeds 10 --episodes 50` for the publication run.
//!
//! ```text
//! cargo run -p ruview-swarm --bin eval_swarm -- \
//! --seeds 10 --episodes 10 --out crates/ruview-swarm/evals/RESULTS.md
//! ```
use std::path::PathBuf;
use ruview_swarm::evals::metrics::AggregateMetrics;
use ruview_swarm::evals::report::render_results_md;
use ruview_swarm::evals::runner::{run_matrix, EvalConfig};
use ruview_swarm::planning::patterns::FlightPattern;
fn main() {
let args: Vec<String> = std::env::args().collect();
let mut seeds = 10usize;
let mut episodes = 10usize;
let mut out = PathBuf::from("crates/ruview-swarm/evals/RESULTS.md");
let mut i = 1;
while i < args.len() {
match args[i].as_str() {
"--seeds" => {
i += 1;
seeds = args.get(i).and_then(|s| s.parse().ok()).unwrap_or(seeds);
}
"--episodes" => {
i += 1;
episodes = args.get(i).and_then(|s| s.parse().ok()).unwrap_or(episodes);
}
"--out" => {
i += 1;
if let Some(p) = args.get(i) {
out = PathBuf::from(p);
}
}
"--help" | "-h" => {
eprintln!(
"eval_swarm — ADR-149 Stage-1 kinematic evaluator\n\
Usage: eval_swarm [--seeds N] [--episodes M] [--out PATH]\n\
Defaults: --seeds 10 --episodes 10 --out crates/ruview-swarm/evals/RESULTS.md"
);
return;
}
other => {
eprintln!("warning: ignoring unknown argument '{other}'");
}
}
i += 1;
}
eprintln!(
"Running ADR-149 Stage-1 eval: {seeds} seeds × {episodes} episodes \
over {} flight patterns...",
FlightPattern::all().len()
);
let mut rows: Vec<(String, AggregateMetrics)> = Vec::new();
for (idx, pattern) in FlightPattern::all().into_iter().enumerate() {
let mut cfg = EvalConfig::sar_small(pattern);
cfg.seeds = seeds;
cfg.episodes_per_seed = episodes;
let matrix = run_matrix(&cfg);
let agg = AggregateMetrics::from_strata(&matrix, 0x0149 ^ idx as u64);
eprintln!(
" {}: coverage IQM {:.3}, detection {:.0}%",
pattern.name(),
agg.coverage_iqm.point,
agg.detection_rate * 100.0
);
rows.push((pattern.name().to_string(), agg));
}
// Rank by descending coverage point estimate.
rows.sort_by(|a, b| {
b.1.coverage_iqm
.point
.partial_cmp(&a.1.coverage_iqm.point)
.unwrap_or(std::cmp::Ordering::Equal)
});
let md = render_results_md(&rows);
if let Some(parent) = out.parent() {
if let Err(e) = std::fs::create_dir_all(parent) {
eprintln!("error: could not create {}: {e}", parent.display());
std::process::exit(1);
}
}
if let Err(e) = std::fs::write(&out, &md) {
eprintln!("error: could not write {}: {e}", out.display());
std::process::exit(1);
}
eprintln!("Wrote {} ({} bytes).", out.display(), md.len());
}
+118
View File
@@ -0,0 +1,118 @@
//! Geometric Dilution of Precision (GDOP) for a constellation of observers.
//!
//! GDOP quantifies how observer geometry amplifies measurement error into
//! position-estimate error. Build the geometry matrix `H` of unit
//! line-of-sight (LOS) vectors from each observer to the target, form the
//! normal matrix `HᵀH`, invert it, and take `GDOP = sqrt(trace((HᵀH)⁻¹))`.
//!
//! For the 2-D (x, y) localization case `H` is `N×2` and `HᵀH` is `2×2`, so a
//! closed-form 2×2 inverse suffices (no linear-algebra dependency needed).
//!
//! Lower GDOP = better geometry: observers spread ~120° apart around the target
//! give low GDOP; (near-)collinear observers give a singular/ill-conditioned
//! `HᵀH` → GDOP → ∞.
use crate::types::Position3D;
/// Geometric Dilution of Precision (2-D) for `observers` viewing a `target`.
///
/// Lower = better geometry. A ~120° constellation → low GDOP; collinear → very
/// large (→∞). Returns `None` if fewer than two observers, if any observer is
/// coincident with the target (undefined LOS), or if the geometry is singular
/// / degenerate (collinear) so `HᵀH` is not invertible.
pub fn gdop(observers: &[Position3D], target: &Position3D) -> Option<f64> {
if observers.len() < 2 {
return None;
}
// Accumulate HᵀH directly (2×2 symmetric) from unit LOS vectors.
// Row i of H is the unit vector from target → observer i in (x, y).
let mut a = 0.0; // sum ux*ux
let mut b = 0.0; // sum ux*uy
let mut d = 0.0; // sum uy*uy
for obs in observers {
let dx = obs.x - target.x;
let dy = obs.y - target.y;
let range = (dx * dx + dy * dy).sqrt();
if range < 1e-9 {
// Observer on top of the target → LOS undefined.
return None;
}
let ux = dx / range;
let uy = dy / range;
a += ux * ux;
b += ux * uy;
d += uy * uy;
}
// Determinant of HᵀH = [[a, b], [b, d]].
let det = a * d - b * b;
if det.abs() < 1e-12 {
// Singular: observers are (near-)collinear with the target.
return None;
}
// (HᵀH)⁻¹ = 1/det * [[d, -b], [-b, a]]; trace = (d + a) / det.
let trace_inv = (a + d) / det;
if trace_inv <= 0.0 || !trace_inv.is_finite() {
return None;
}
Some(trace_inv.sqrt())
}
#[cfg(test)]
mod tests {
use super::*;
fn p(x: f64, y: f64) -> Position3D {
Position3D { x, y, z: 0.0 }
}
#[test]
fn test_triangle_lower_than_collinear() {
let target = p(0.0, 0.0);
// Three observers at 120° around the target, radius 10.
let r = 10.0;
let triangle = [
p(r * 0.0_f64.cos(), r * 0.0_f64.sin()),
p(
r * (2.0 * std::f64::consts::PI / 3.0).cos(),
r * (2.0 * std::f64::consts::PI / 3.0).sin(),
),
p(
r * (4.0 * std::f64::consts::PI / 3.0).cos(),
r * (4.0 * std::f64::consts::PI / 3.0).sin(),
),
];
// Three nearly-collinear observers (tiny y perturbation to stay invertible).
let near_collinear = [p(5.0, 0.01), p(10.0, 0.0), p(15.0, 0.01)];
let tri = gdop(&triangle, &target).expect("triangle finite GDOP");
let col = gdop(&near_collinear, &target).expect("near-collinear finite GDOP");
assert!(tri.is_finite(), "triangle GDOP must be finite: {tri}");
assert!(
tri < col,
"120° constellation should have lower GDOP than near-collinear: tri={tri}, col={col}"
);
}
#[test]
fn test_collinear_degenerate() {
let target = p(0.0, 0.0);
// Perfectly collinear observers along +x → singular HᵀH.
let collinear = [p(5.0, 0.0), p(10.0, 0.0), p(20.0, 0.0)];
let g = gdop(&collinear, &target);
assert!(
g.is_none() || g.unwrap() > 1e6,
"perfectly collinear geometry must be None or huge, got {g:?}"
);
}
#[test]
fn test_single_observer_none() {
let target = p(0.0, 0.0);
assert!(gdop(&[p(5.0, 5.0)], &target).is_none());
assert!(gdop(&[], &target).is_none());
}
}
+150
View File
@@ -0,0 +1,150 @@
//! Per-episode and aggregate SAR + MARL metrics (ADR-149 Stage 1).
use crate::evals::stats::{stratified_bootstrap_ci, ConfidenceInterval};
/// Per-episode SAR metrics (Stage 1 kinematic).
#[derive(Debug, Clone)]
pub struct EpisodeMetrics {
/// Fraction of the mission area scanned at least once, in [0, 1].
pub coverage_pct: f64,
/// Localization error (m) of the fused victim estimate; `None` if no detection.
pub localization_error_m: Option<f64>,
/// GDOP of the contributing-drone constellation at detection; `None` if none.
pub gdop_at_detection: Option<f64>,
/// Mission-elapsed seconds to first detection; `None` if no detection.
pub time_to_first_detection_s: Option<f64>,
/// Whether at least one victim was detected this episode.
pub detected: bool,
/// Count of inter-drone proximity violations (kinematic proxy for collisions).
pub collisions: u32,
/// Fraction of scanned area covered by more than one drone, in [0, 1].
pub overlap_ratio: f64,
/// Scalar episodic return (reward-like coverage/detection objective).
pub episodic_return: f64,
}
/// Aggregate over a seed × episode matrix with IQM + 95% bootstrap CIs.
#[derive(Debug, Clone)]
pub struct AggregateMetrics {
pub coverage_iqm: ConfidenceInterval,
/// IQM over detected episodes only (undetected episodes carry no error).
pub localization_iqm: ConfidenceInterval,
pub detection_rate: f64,
pub mean_gdop: f64,
pub return_iqm: ConfidenceInterval,
pub n_episodes: usize,
}
impl AggregateMetrics {
/// Aggregate a seed-stratified matrix of episodes. Each inner `Vec` is one
/// seed's episodes; bootstrap resampling is stratified by seed so the CI
/// reflects between-seed variance (the dominant source per ADR-149).
pub fn from_strata(per_seed: &[Vec<EpisodeMetrics>], boot_seed: u64) -> Self {
const N_BOOT: usize = 1000;
let coverage_strata: Vec<Vec<f64>> = per_seed
.iter()
.map(|s| s.iter().map(|e| e.coverage_pct).collect())
.collect();
let return_strata: Vec<Vec<f64>> = per_seed
.iter()
.map(|s| s.iter().map(|e| e.episodic_return).collect())
.collect();
// Localization: only detected episodes contribute. Keep stratification
// by seed but drop empty strata so the bootstrap doesn't degenerate.
let loc_strata: Vec<Vec<f64>> = per_seed
.iter()
.map(|s| {
s.iter()
.filter_map(|e| e.localization_error_m)
.collect::<Vec<f64>>()
})
.filter(|v: &Vec<f64>| !v.is_empty())
.collect();
let mut detected = 0usize;
let mut total = 0usize;
let mut gdop_sum = 0.0;
let mut gdop_n = 0usize;
for seed in per_seed {
for e in seed {
total += 1;
if e.detected {
detected += 1;
}
if let Some(g) = e.gdop_at_detection {
if g.is_finite() {
gdop_sum += g;
gdop_n += 1;
}
}
}
}
let detection_rate = if total == 0 {
0.0
} else {
detected as f64 / total as f64
};
let mean_gdop = if gdop_n == 0 {
0.0
} else {
gdop_sum / gdop_n as f64
};
AggregateMetrics {
coverage_iqm: stratified_bootstrap_ci(&coverage_strata, N_BOOT, boot_seed),
localization_iqm: stratified_bootstrap_ci(
&loc_strata,
N_BOOT,
boot_seed.wrapping_add(1),
),
detection_rate,
mean_gdop,
return_iqm: stratified_bootstrap_ci(
&return_strata,
N_BOOT,
boot_seed.wrapping_add(2),
),
n_episodes: total,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
fn ep(cov: f64, loc: Option<f64>, ret: f64, detected: bool) -> EpisodeMetrics {
EpisodeMetrics {
coverage_pct: cov,
localization_error_m: loc,
gdop_at_detection: if detected { Some(2.0) } else { None },
time_to_first_detection_s: if detected { Some(10.0) } else { None },
detected,
collisions: 0,
overlap_ratio: 0.1,
episodic_return: ret,
}
}
#[test]
fn test_aggregate_detection_rate_and_shape() {
let per_seed = vec![
vec![
ep(0.8, Some(1.5), 80.0, true),
ep(0.7, None, 70.0, false),
],
vec![
ep(0.9, Some(2.0), 90.0, true),
ep(0.85, Some(1.0), 85.0, true),
],
];
let agg = AggregateMetrics::from_strata(&per_seed, 7);
assert_eq!(agg.n_episodes, 4);
assert!((agg.detection_rate - 0.75).abs() < 1e-9);
assert!(agg.coverage_iqm.lo <= agg.coverage_iqm.point);
assert!(agg.coverage_iqm.point <= agg.coverage_iqm.hi);
assert!(agg.mean_gdop > 0.0);
}
}
+19
View File
@@ -0,0 +1,19 @@
//! ADR-149 statistically-rigorous evaluation harness (Stage 1, kinematic).
//!
//! Produces SAR + MARL metrics over a seeded N-seed × M-episode matrix with
//! IQM + 95% stratified-bootstrap CIs, a (sigma, kappa) CSI-noise sweep, and
//! GDOP-stratified localization error. Generates evals/RESULTS.md.
//!
//! Stage 2 (Gazebo/PX4 SITL high-fidelity, false-alarm + collision rate on the
//! median seeds) is a follow-on — see ADR-149 §6.1.
pub mod gdop;
pub mod stats;
pub mod metrics;
pub mod runner;
pub mod report;
pub use gdop::gdop;
pub use stats::{iqm, stratified_bootstrap_ci, ConfidenceInterval};
pub use metrics::{EpisodeMetrics, AggregateMetrics};
pub use runner::{EvalConfig, NoiseLevel, run_matrix};
pub use report::render_results_md;
+120
View File
@@ -0,0 +1,120 @@
//! RESULTS.md leaderboard generator (ADR-149 Stage 1).
use crate::evals::metrics::AggregateMetrics;
use crate::evals::stats::ConfidenceInterval;
/// Wi2SAR published localization baseline (paper-to-paper), metres.
const WI2SAR_LOCALIZATION_M: f64 = 5.0;
/// Format a CI as `point [lo, hi]` with two decimals.
fn fmt_ci(ci: &ConfidenceInterval) -> String {
format!("{:.3} [{:.3}, {:.3}]", ci.point, ci.lo, ci.hi)
}
/// Render a markdown leaderboard: one row per flight pattern with coverage
/// IQM±CI, localization IQM±CI, detection rate, and mean GDOP — plus the
/// Wi2SAR paper baseline row clearly labelled paper-to-paper.
///
/// `rows` is `(pattern_name, aggregate)`; rows are emitted in the order given,
/// so callers should pre-sort (e.g. by descending coverage point estimate).
pub fn render_results_md(rows: &[(String, AggregateMetrics)]) -> String {
let mut s = String::new();
s.push_str("# ruview-swarm Evaluation Results (ADR-149 Stage 1, kinematic)\n\n");
s.push_str(
"Statistically-rigorous evaluation harness: seeded multi-run rollouts with \
IQM + 95% stratified-bootstrap confidence intervals (Agarwal et al., \
NeurIPS 2021).\n\n",
);
// Run configuration header.
let (n_episodes, n_seeds) = rows
.first()
.map(|(_, a)| {
let n = a.n_episodes;
// Episodes-per-seed isn't stored; report total + leave seed split to caller note.
(n, 0usize)
})
.unwrap_or((0, 0));
s.push_str("## Run configuration\n\n");
s.push_str(&format!(
"- **Stage**: 1 (kinematic, self-contained, deterministic per seed)\n\
- **Episodes per pattern**: {n_episodes} (seed × episode matrix)\n\
- **CI method**: 95% stratified bootstrap of the IQM, stratified by seed\n\
- **GDOP**: 2-D geometric dilution of precision at first detection\n"
));
let _ = n_seeds;
s.push_str(
"\n> **Stage 2 pending**: high-fidelity Gazebo/PX4 SITL evaluation \
(false-alarm rate, real collision rate on the median seeds) is a \
follow-on — see ADR-149 §6.1. The collision figures below are a \
kinematic min-separation proxy, not SITL physics.\n\n",
);
// Leaderboard table.
s.push_str("## Flight-pattern leaderboard\n\n");
s.push_str(
"| Flight pattern | Coverage IQM [95% CI] | Localization (m) IQM [95% CI] | \
Detection rate | Mean GDOP |\n",
);
s.push_str(
"|----------------|-----------------------|-------------------------------|\
----------------|-----------|\n",
);
for (name, agg) in rows {
s.push_str(&format!(
"| {} | {} | {} | {:.1}% | {:.3} |\n",
name,
fmt_ci(&agg.coverage_iqm),
fmt_ci(&agg.localization_iqm),
agg.detection_rate * 100.0,
agg.mean_gdop,
));
}
// Wi2SAR paper baseline row (paper-to-paper, no kinematic re-run).
s.push_str(&format!(
"| _Wi2SAR (paper baseline)_ | _n/a_ | _{:.1} (paper)_ | _n/a_ | _n/a_ |\n",
WI2SAR_LOCALIZATION_M,
));
s.push_str(
"\n_Wi2SAR row is the published single-drone localization figure \
(arxiv 2604.09115), shown paper-to-paper for reference only — it was \
not re-run through this kinematic harness._\n",
);
s
}
#[cfg(test)]
mod tests {
use super::*;
use crate::evals::stats::ConfidenceInterval;
fn agg(cov: f64, det: f64) -> AggregateMetrics {
let ci = |p: f64| ConfidenceInterval { point: p, lo: p - 0.05, hi: p + 0.05 };
AggregateMetrics {
coverage_iqm: ci(cov),
localization_iqm: ci(1.5),
detection_rate: det,
mean_gdop: 2.1,
return_iqm: ci(80.0),
n_episodes: 100,
}
}
#[test]
fn test_render_contains_rows_and_baseline() {
let rows = vec![
("partitioned_lawnmower".to_string(), agg(0.92, 0.95)),
("levy_flight".to_string(), agg(0.40, 0.50)),
];
let md = render_results_md(&rows);
assert!(md.contains("partitioned_lawnmower"));
assert!(md.contains("levy_flight"));
assert!(md.contains("Wi2SAR"));
assert!(md.contains("Stage 2 pending"));
assert!(md.contains("95% stratified bootstrap"));
// Coverage point estimate appears.
assert!(md.contains("0.920"));
}
}
+364
View File
@@ -0,0 +1,364 @@
//! Stage-1 kinematic rollout + seed × episode matrix (ADR-149).
//!
//! A single `run_episode` deterministically drives `drones` drones across a
//! mission area under a chosen [`FlightPattern`], marks coverage on a grid,
//! simulates CSI victim detection perturbed by `(sigma, kappa)` amplitude /
//! von-Mises-phase noise, and computes the GDOP of the contributing-drone
//! constellation at first detection. It is self-contained and seeded — no
//! Candle / training backend required — so it runs in CI by default.
use crate::config::SwarmConfig;
use crate::evals::gdop::gdop;
use crate::evals::metrics::EpisodeMetrics;
use crate::planning::patterns::{FlightPattern, PatternContext};
use crate::types::{NodeId, Position3D};
/// CSI-noise level: amplitude std `sigma` and von-Mises phase concentration `kappa`.
/// Higher `sigma` = noisier amplitude; *lower* `kappa` = noisier phase (more diffuse).
#[derive(Debug, Clone, Copy)]
pub struct NoiseLevel {
pub sigma: f64,
pub kappa: f64,
}
/// One evaluation configuration: a flight pattern + swarm/mission parameters.
#[derive(Debug, Clone)]
pub struct EvalConfig {
pub flight: FlightPattern,
pub config: SwarmConfig,
pub drones: usize,
pub steps: usize,
pub seeds: usize, // ≥10 per ADR-149
pub episodes_per_seed: usize, // e.g. 50
pub victims: Vec<Position3D>,
pub noise: NoiseLevel,
}
impl EvalConfig {
/// A small SAR default suitable for fast CI runs.
pub fn sar_small(flight: FlightPattern) -> Self {
EvalConfig {
flight,
config: SwarmConfig::sar_default(),
drones: 4,
steps: 120,
seeds: 10,
episodes_per_seed: 10,
victims: vec![
Position3D { x: 120.0, y: 90.0, z: 0.0 },
Position3D { x: 320.0, y: 280.0, z: 0.0 },
],
noise: NoiseLevel { sigma: 0.05, kappa: 8.0 },
}
}
}
/// Minimal reproducible LCG → f64 in [0, 1). Self-contained for determinism.
struct Lcg(u64);
impl Lcg {
fn new(seed: u64) -> Self {
Lcg(seed ^ 0xD1B5_4A32_D192_ED03)
}
#[inline]
fn next_u64(&mut self) -> u64 {
self.0 = self
.0
.wrapping_mul(6364136223846793005)
.wrapping_add(1442695040888963407);
self.0
}
#[inline]
fn unit(&mut self) -> f64 {
(self.next_u64() >> 11) as f64 / (1u64 << 53) as f64
}
/// Standard-normal sample via BoxMuller (deterministic).
#[inline]
fn normal(&mut self) -> f64 {
let u1 = self.unit().max(1e-12);
let u2 = self.unit();
(-2.0 * u1.ln()).sqrt() * (2.0 * std::f64::consts::PI * u2).cos()
}
}
/// Run one kinematic episode deterministically from `seed`.
///
/// Drives drones step-by-step by the flight pattern, marks a coarse coverage
/// grid, and on the first step a drone comes within scan range of any victim
/// records a fused localization estimate (weighted centroid of contributing
/// drones' per-drone victim estimates, each perturbed by `(sigma, kappa)`
/// noise) and the GDOP of those contributing drones.
pub fn run_episode(cfg: &EvalConfig, seed: u64) -> EpisodeMetrics {
let mut rng = Lcg::new(seed);
let area_w = cfg.config.mission.area_width_m;
let area_h = cfg.config.mission.area_height_m;
let altitude_z = -cfg.config.planning.flight_altitude_m;
let scan_width = cfg.config.planning.csi_scan_width_m.max(1.0);
let min_sep = cfg.config.formation.min_separation_m.max(0.1);
let n = cfg.drones.max(1);
// Coverage grid sized so each cell ~= scan_width.
let gx = ((area_w / scan_width).ceil() as usize).max(1);
let gy = ((area_h / scan_width).ceil() as usize).max(1);
let cell_w = area_w / gx as f64;
let cell_h = area_h / gy as f64;
let mut cover_count = vec![0u32; gx * gy];
// Spread drones along the bottom edge with a small seeded jitter.
let mut positions: Vec<Position3D> = (0..n)
.map(|i| {
let frac = (i as f64 + 0.5) / n as f64;
Position3D {
x: (frac * area_w + (rng.unit() - 0.5) * scan_width).clamp(0.0, area_w),
y: (rng.unit() * scan_width).clamp(0.0, area_h),
z: altitude_z,
}
})
.collect();
// Recent-visit ring buffer for pheromone / potential-field patterns.
let mut visited: Vec<Position3D> = Vec::new();
let max_visited = 32usize;
let scan_range = scan_width; // detect a victim within one scan footprint
let mut collisions = 0u32;
let mut detected = false;
let mut loc_error: Option<f64> = None;
let mut gdop_val: Option<f64> = None;
let mut t_detect: Option<f64> = None;
let dt = step_seconds(cfg);
for step in 0..cfg.steps {
// Advance each drone one waypoint under the pattern.
let snapshot = positions.clone();
for (i, pos) in positions.iter_mut().enumerate() {
let peers: Vec<Position3D> = snapshot
.iter()
.enumerate()
.filter(|(j, _)| *j != i)
.map(|(_, p)| *p)
.collect();
let ctx = PatternContext {
drone_id: NodeId(i as u32),
swarm_size: n,
current: *pos,
area_w,
area_h,
altitude_z,
scan_width_m: scan_width,
step: step as u64,
visited: &visited,
peers: &peers,
};
*pos = cfg.flight.next_target(&ctx);
}
// Mark coverage + record visits.
for pos in &positions {
let cx = ((pos.x / cell_w).floor() as i64).clamp(0, gx as i64 - 1) as usize;
let cy = ((pos.y / cell_h).floor() as i64).clamp(0, gy as i64 - 1) as usize;
cover_count[cy * gx + cx] = cover_count[cy * gx + cx].saturating_add(1);
visited.push(*pos);
}
if visited.len() > max_visited {
let drop = visited.len() - max_visited;
visited.drain(0..drop);
}
// Proximity / collision check (kinematic proxy).
for a in 0..positions.len() {
for b in (a + 1)..positions.len() {
let d = positions[a].distance_to(&positions[b]);
if d < min_sep {
collisions = collisions.saturating_add(1);
}
}
}
// Detection: first step any victim falls within scan range of ≥1 drone,
// fuse a localization estimate from the contributing drones. A single
// contributor still yields a (noisier) estimate; GDOP is only defined
// for the multistatic ≥2-drone case and is `None` otherwise.
if !detected {
for victim in &cfg.victims {
let contributors: Vec<Position3D> = positions
.iter()
.filter(|p| horiz_dist(p, victim) <= scan_range)
.copied()
.collect();
if !contributors.is_empty() {
let (est, g) = fuse_estimate(&contributors, victim, cfg.noise, &mut rng);
loc_error = Some(horiz_dist(&est, victim));
gdop_val = g; // None for a single contributor
t_detect = Some((step as f64 + 1.0) * dt);
detected = true;
break;
}
}
}
}
// Coverage + overlap.
let total_cells = (gx * gy) as f64;
let scanned = cover_count.iter().filter(|&&c| c > 0).count() as f64;
let overlapped = cover_count.iter().filter(|&&c| c > 1).count() as f64;
let coverage_pct = if total_cells > 0.0 { scanned / total_cells } else { 0.0 };
let overlap_ratio = if scanned > 0.0 { overlapped / scanned } else { 0.0 };
// Episodic return: reward coverage + detection, penalize overlap + collisions.
let detect_bonus = if detected { 1.0 } else { 0.0 };
let loc_term = match loc_error {
Some(e) => (1.0 / (1.0 + e)).max(0.0),
None => 0.0,
};
let episodic_return = 100.0 * coverage_pct + 30.0 * detect_bonus + 20.0 * loc_term
- 10.0 * overlap_ratio
- 5.0 * collisions as f64;
EpisodeMetrics {
coverage_pct,
localization_error_m: loc_error,
gdop_at_detection: gdop_val,
time_to_first_detection_s: t_detect,
detected,
collisions,
overlap_ratio,
episodic_return,
}
}
/// Per-step wall-clock seconds, derived from scan width and drone speed.
fn step_seconds(cfg: &EvalConfig) -> f64 {
let speed = cfg.config.planning.max_speed_ms.max(0.1);
(cfg.config.planning.csi_scan_width_m.max(1.0) / speed).max(0.1)
}
/// Horizontal (x, y) distance, ignoring altitude.
fn horiz_dist(a: &Position3D, b: &Position3D) -> f64 {
(a.x - b.x).hypot(a.y - b.y)
}
/// Fuse contributing drones' per-drone victim estimates into a weighted
/// centroid, perturbed by `(sigma, kappa)` CSI noise, and compute the GDOP of
/// the contributing constellation.
fn fuse_estimate(
contributors: &[Position3D],
victim: &Position3D,
noise: NoiseLevel,
rng: &mut Lcg,
) -> (Position3D, Option<f64>) {
// Phase noise std from von Mises concentration: sigma_phase ≈ 1/sqrt(kappa).
let phase_std = 1.0 / noise.kappa.max(1e-3).sqrt();
let mut sx = 0.0;
let mut sy = 0.0;
let mut wsum = 0.0;
for c in contributors {
let range = horiz_dist(c, victim).max(1e-6);
// Each drone's estimate = true victim + range-scaled amplitude noise +
// bearing error from phase noise (perpendicular to LOS).
let amp = noise.sigma * range;
let nx = rng.normal() * amp;
let ny = rng.normal() * amp;
// Bearing wobble: rotate LOS unit vector by a small phase-noise angle.
let bearing = (victim.y - c.y).atan2(victim.x - c.x);
let dtheta = rng.normal() * phase_std;
let bx = range * (bearing + dtheta).cos();
let by = range * (bearing + dtheta).sin();
let est_x = c.x + bx + nx;
let est_y = c.y + by + ny;
// Inverse-range weighting: closer drones trusted more.
let w = 1.0 / range;
sx += est_x * w;
sy += est_y * w;
wsum += w;
}
let w = wsum.max(1e-9);
let est = Position3D { x: sx / w, y: sy / w, z: 0.0 };
let g = gdop(contributors, victim);
(est, g)
}
/// Run the full seed × episode matrix → per-seed strata of [`EpisodeMetrics`].
pub fn run_matrix(cfg: &EvalConfig) -> Vec<Vec<EpisodeMetrics>> {
(0..cfg.seeds)
.map(|s| {
(0..cfg.episodes_per_seed)
.map(|e| {
// Distinct deterministic seed per (seed, episode) cell.
let cell_seed = (s as u64)
.wrapping_mul(0x100_0000)
.wrapping_add(e as u64)
.wrapping_add(0xABCD);
run_episode(cfg, cell_seed)
})
.collect()
})
.collect()
}
/// Standard ADR-149 noise sweep grid: cartesian product of σ × κ levels.
pub fn default_noise_sweep() -> Vec<NoiseLevel> {
let sigmas = [0.02, 0.05, 0.10];
let kappas = [16.0, 8.0, 4.0];
let mut out = Vec::with_capacity(sigmas.len() * kappas.len());
for &sigma in &sigmas {
for &kappa in &kappas {
out.push(NoiseLevel { sigma, kappa });
}
}
out
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_run_episode_deterministic() {
let cfg = EvalConfig::sar_small(FlightPattern::PartitionedLawnmower);
let a = run_episode(&cfg, 12345);
let b = run_episode(&cfg, 12345);
assert_eq!(a.coverage_pct, b.coverage_pct);
assert_eq!(a.detected, b.detected);
assert_eq!(a.localization_error_m, b.localization_error_m);
assert_eq!(a.collisions, b.collisions);
assert_eq!(a.episodic_return, b.episodic_return);
}
#[test]
fn test_partitioned_beats_levy_coverage() {
let mut part = EvalConfig::sar_small(FlightPattern::PartitionedLawnmower);
part.seeds = 3;
part.episodes_per_seed = 5;
let mut levy = part.clone();
levy.flight = FlightPattern::LevyFlight;
let part_m = run_matrix(&part);
let levy_m = run_matrix(&levy);
let part_agg = crate::evals::metrics::AggregateMetrics::from_strata(&part_m, 1);
let levy_agg = crate::evals::metrics::AggregateMetrics::from_strata(&levy_m, 1);
assert!(
part_agg.coverage_iqm.point > levy_agg.coverage_iqm.point,
"partitioned coverage {} should beat levy {}",
part_agg.coverage_iqm.point,
levy_agg.coverage_iqm.point
);
}
#[test]
fn test_matrix_shape() {
let mut cfg = EvalConfig::sar_small(FlightPattern::Spiral);
cfg.seeds = 4;
cfg.episodes_per_seed = 6;
let m = run_matrix(&cfg);
assert_eq!(m.len(), 4);
assert!(m.iter().all(|s| s.len() == 6));
}
#[test]
fn test_noise_sweep_grid() {
let sweep = default_noise_sweep();
assert_eq!(sweep.len(), 9);
}
}
+203
View File
@@ -0,0 +1,203 @@
//! Hand-rolled robust statistics for the evaluation harness (Agarwal 2021).
//!
//! Implements the interquartile mean (IQM), a 95% stratified-bootstrap
//! confidence interval of the IQM, and the probability-of-improvement metric —
//! the three statistics recommended by "Deep RL at the Edge of the
//! Statistical Precipice" (Agarwal et al., NeurIPS 2021) for reporting
//! few-seed RL results.
//!
//! All randomness comes from a local linear-congruential generator (LCG) seeded
//! explicitly, so every CI is fully reproducible — no `thread_rng`, no clock.
/// Interquartile mean: mean of the middle 50% of samples (drop the bottom 25%
/// and the top 25%). Robust to outliers in either tail.
///
/// Small-N behaviour: with fewer than 4 samples the trim would empty the set,
/// so it falls back to the plain arithmetic mean. An empty slice returns 0.0.
pub fn iqm(samples: &[f64]) -> f64 {
if samples.is_empty() {
return 0.0;
}
if samples.len() < 4 {
return samples.iter().sum::<f64>() / samples.len() as f64;
}
let mut sorted = samples.to_vec();
sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
let n = sorted.len();
let lo = n / 4; // trim bottom 25%
let hi = n - lo; // trim top 25% (symmetric)
let mid = &sorted[lo..hi];
if mid.is_empty() {
return sorted.iter().sum::<f64>() / n as f64;
}
mid.iter().sum::<f64>() / mid.len() as f64
}
/// A point estimate with its lower / upper 95% confidence bounds.
#[derive(Debug, Clone, Copy)]
pub struct ConfidenceInterval {
pub point: f64,
pub lo: f64,
pub hi: f64,
}
/// Minimal reproducible LCG (Numerical Recipes constants) yielding f64 in [0,1).
struct Lcg(u64);
impl Lcg {
fn new(seed: u64) -> Self {
// Avoid a zero state collapsing the generator.
Lcg(seed ^ 0x9E37_79B9_7F4A_7C15)
}
#[inline]
fn next_u64(&mut self) -> u64 {
self.0 = self
.0
.wrapping_mul(6364136223846793005)
.wrapping_add(1442695040888963407);
self.0
}
/// Uniform index in [0, n).
#[inline]
fn index(&mut self, n: usize) -> usize {
if n == 0 {
return 0;
}
(self.next_u64() >> 11) as usize % n
}
}
/// 95% stratified-bootstrap CI of the IQM.
///
/// `strata` groups samples (one inner `Vec` per stratum, e.g. per task or per
/// seed). Each bootstrap replicate resamples WITH replacement *within* each
/// stratum (preserving the stratum sizes), pools all resampled values, and
/// recomputes the IQM. Repeat `n_boot` times and take the 2.5 / 97.5
/// percentiles for the CI bounds. The `point` estimate is the IQM of the pooled
/// original samples. Deterministic for a fixed `seed`.
pub fn stratified_bootstrap_ci(
strata: &[Vec<f64>],
n_boot: usize,
seed: u64,
) -> ConfidenceInterval {
let pooled: Vec<f64> = strata.iter().flatten().copied().collect();
let point = iqm(&pooled);
if pooled.is_empty() || n_boot == 0 {
return ConfidenceInterval { point, lo: point, hi: point };
}
let mut rng = Lcg::new(seed);
let mut replicates = Vec::with_capacity(n_boot);
let mut buf: Vec<f64> = Vec::with_capacity(pooled.len());
for _ in 0..n_boot {
buf.clear();
for stratum in strata {
let m = stratum.len();
for _ in 0..m {
buf.push(stratum[rng.index(m)]);
}
}
replicates.push(iqm(&buf));
}
replicates.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
let lo = percentile(&replicates, 2.5);
let hi = percentile(&replicates, 97.5);
ConfidenceInterval { point, lo, hi }
}
/// Linear-interpolated percentile of a pre-sorted slice. `p` in [0, 100].
fn percentile(sorted: &[f64], p: f64) -> f64 {
if sorted.is_empty() {
return 0.0;
}
if sorted.len() == 1 {
return sorted[0];
}
let rank = (p / 100.0) * (sorted.len() as f64 - 1.0);
let lo = rank.floor() as usize;
let hi = rank.ceil() as usize;
if lo == hi {
return sorted[lo];
}
let frac = rank - lo as f64;
sorted[lo] * (1.0 - frac) + sorted[hi] * frac
}
/// Probability of improvement: P(a-sample > b-sample) over all pairs (Agarwal).
///
/// Counts each (a_i, b_j) pair where `a_i > b_j` as 1, a tie as 0.5, and
/// normalizes by the pair count. 1.0 means `a` strictly dominates; ~0.5 means
/// the two are statistically indistinguishable. Returns 0.5 if either is empty.
pub fn probability_of_improvement(a: &[f64], b: &[f64]) -> f64 {
if a.is_empty() || b.is_empty() {
return 0.5;
}
let mut wins = 0.0;
for &ai in a {
for &bj in b {
if ai > bj {
wins += 1.0;
} else if (ai - bj).abs() < f64::EPSILON {
wins += 0.5;
}
}
}
wins / (a.len() as f64 * b.len() as f64)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_iqm_trims_outliers() {
// 0..=100 plus one extreme outlier; IQM should sit near the middle (~50),
// not be dragged toward 1e9.
let mut samples: Vec<f64> = (0..=100).map(|i| i as f64).collect();
samples.push(1e9);
let v = iqm(&samples);
assert!(
(40.0..=60.0).contains(&v),
"IQM should be near the middle-50% mean (~50), got {v}"
);
}
#[test]
fn test_iqm_small() {
// Fewer than 4 samples → plain mean.
assert_eq!(iqm(&[2.0, 4.0]), 3.0);
assert_eq!(iqm(&[10.0]), 10.0);
assert_eq!(iqm(&[1.0, 2.0, 3.0]), 2.0);
assert_eq!(iqm(&[]), 0.0);
}
#[test]
fn test_bootstrap_ci_brackets_point() {
let strata = vec![
vec![1.0, 2.0, 3.0, 4.0, 5.0],
vec![2.0, 3.0, 4.0, 5.0, 6.0],
];
let ci = stratified_bootstrap_ci(&strata, 500, 42);
assert!(ci.lo <= ci.point, "lo ≤ point: {} ≤ {}", ci.lo, ci.point);
assert!(ci.point <= ci.hi, "point ≤ hi: {} ≤ {}", ci.point, ci.hi);
// Deterministic: same seed → identical interval.
let ci2 = stratified_bootstrap_ci(&strata, 500, 42);
assert_eq!(ci.point, ci2.point);
assert_eq!(ci.lo, ci2.lo);
assert_eq!(ci.hi, ci2.hi);
}
#[test]
fn test_prob_improvement_obvious() {
assert_eq!(
probability_of_improvement(&[10.0, 10.0, 10.0], &[0.0, 0.0, 0.0]),
1.0
);
// Identical samples → all ties → 0.5.
let poi = probability_of_improvement(&[5.0, 5.0], &[5.0, 5.0]);
assert!((poi - 0.5).abs() < 1e-9, "symmetric ties → ~0.5, got {poi}");
}
}
+1
View File
@@ -13,6 +13,7 @@ pub mod security;
pub mod failsafe;
pub mod config;
pub mod demo;
pub mod evals;
pub mod integration;
pub mod bench_support;
pub mod orchestrator;