mirror of
https://github.com/ruvnet/RuView
synced 2026-06-09 10:13:17 +00:00
0d3d835bf8
* feat(swarm): add wifi-densepose-swarm crate implementing ADR-148 drone swarm control system
New crate `wifi-densepose-swarm` with hierarchical-mesh swarm topology,
Raft consensus, MAPPO MARL, CSI sensing integration, and ITAR-gated
coordination features. Closes 3 of 7 milestones (M1, M2, M5) with 5/5
ADR-148 SOTA performance targets met.
## Modules (45 source files, 14 modules)
- types: NodeId, DroneState, Position3D, SwarmTask, SwarmError, FailSafeState
- topology: Raft consensus (leader election, log replication, quorum), Gossip, Mesh
- formation: VirtualStructure, LeaderFollower, Reynolds flocking (itar-gated)
- planning: RRT-APF hybrid planner, 3-phase coverage, Bayesian grid, pheromone
- allocation: Auction + FNN bid scorer (itar-gated)
- sensing: CsiPayloadPipeline (Live/Synthetic/Replay), MultiViewFusion, OccWorldBridge
- marl: MAPPO actor (3-layer MLP), LocalObservation (64-dim), RewardCalculator, PPO loop
- security: MAVLink v2 HMAC-SHA256, UWB anti-spoofing, geofence, Remote ID, FHSS
- failsafe: 10-state onboard machine, GCS-independent safety transitions
- config: TOML SwarmConfig with SAR/inspection/agriculture/mine/demo/wi2sar_reference
- demo: SyntheticCsiGenerator, DemoScenario (SAR/open-field/mine)
- integration: FlightController trait, MAVLink dialect (50000-50005), SwarmSim
- orchestrator: SwarmOrchestrator wiring all subsystems end-to-end
- bench_support: Criterion fixture generators
## ITAR compliance
Swarming coordination features gated behind `itar-unrestricted` feature
per USML Category VIII(h)(12). Default build compiles clean stubs.
## Benchmark results (criterion, release mode)
- MARL actor inference: 3.3 µs (target ≤ 5 ms — 1,516× headroom)
- RRT-APF planning (100 iter): 0.043 ms (target < 300 ms — 6,946× headroom)
- MultiView CSI fusion (3 UAVs): 58.5 ns (target < 10 ms — 171,000× headroom)
- 3-view localization: 1.732 m (target ≤ 2 m — beats Wi2SAR SOTA)
- 4-drone SAR coverage (400×400 m): 223 s (target ≤ 240 s — PASS)
## Tests
- --no-default-features: 73/73 passing
- --features itar-unrestricted: 85/85 passing
Closes #861
Co-Authored-By: claude-flow <ruv@ruv.net>
* refactor(swarm): rename wifi-densepose-swarm → ruview-swarm
The swarm control system is a RuView-level capability (drone coordination,
Raft consensus, MARL) that operates above the wifi-densepose sensing layer
rather than being a sub-component of it. Rename aligns with the project
identity and separates coordination infrastructure from sensing modules.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(swarm): resolve all clippy warnings + add MARL convergence test
- planning/probability_grid: map_or(true,…) → is_none_or (clippy::unnecessary_map_or)
- planning/pheromone: &mut Vec<T> → &mut [T] on evaporate+deposit (clippy::ptr_arg)
- marl/observation: fix doc lazy-continuation warning on TOTAL line
- marl/trainer: manual Default impl → #[derive(Default)] + #[default] on Demo variant
Also adds test_marl_convergence_improves_mean_return: fills 64-transition
ReplayBuffer with mixed rewards (steps 0-31: negative, 32-63: positive),
runs ppo_update, asserts mean_return is finite and non-zero.
Result: 0 clippy warnings · 74/74 tests (default) · 86/86 (itar-unrestricted)
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(swarm): integrate Ruflo AI-agent capabilities into ruview-swarm
Adds a feature-gated Ruflo integration layer connecting ruview-swarm to the
claude-flow daemon's AgentDB, AIDefence, and SONA intelligence subsystems.
Default build is unaffected (all paths behind `Option<Box<dyn RufloBackend>>`).
## New module: src/ruflo/
- backend.rs: RufloBackend trait (9 async methods) + RufloError, MissionMemoryEntry,
PatternEntry, MavlinkScanResult types (always compiled)
- mock_backend.rs: MockRufloBackend in-memory impl for testing (always compiled, 5 tests)
- http_backend.rs: HttpRufloBackend — JSON-RPC 2.0 → claude-flow daemon localhost:3000
(gated behind `ruflo` feature, requires reqwest)
- mission_summary.rs: MissionSummary serializer with pattern description + confidence
scoring from victim recall, coverage %, collision penalty (always compiled, 3 tests)
## 4 capability areas
1. MissionMemory → memory_store / memory_search (cross-mission victim memory)
2. PatternLearner → agentdb_pattern-store / -search (HNSW SONA trajectory patterns)
3. MavlinkDefence → aidefence_is_safe / aidefence_scan (scan MAVLink before accepting)
4. IntelligenceHooks → trajectory-start/step/end (SONA learning loop)
## SwarmOrchestrator integration
- with_ruflo(backend): builder to attach a backend
- start_trajectory(task) / finish_trajectory(success, key): SONA mission lifecycle
- receive_peer_detection_checked(): AIDefence scan before accepting peer detections
## Cargo feature
`ruflo = ["dep:reqwest", "dep:serde_json"]` — optional, not in default
## Tests
- --no-default-features: 82/82 pass (8 new ruflo tests)
- --features ruflo,itar-unrestricted: 94/94 pass
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(swarm): M7 mission profiles with victim confirmation reports + pre-merge docs
Adds end-to-end mission runners producing structured MissionReport output,
and updates project docs (CHANGELOG, README, CLAUDE.md) per pre-merge checklist.
## M7 Mission Profiles (integration/mission_report.rs + swarm_sim.rs)
- MissionReport / VictimReport / SotaComparison types (serde-serializable)
- run_mission_with_report(): full mission → detailed report with per-victim
localization error, fusion uncertainty, contributing drones, detection time
- run_inspection_mission(): leader-follower power-line corridor inspection
- run_mine_mission(): GPS-denied underground (2-drone, slow, UWB-only)
- SotaComparison embeds Wi2SAR baseline (5m / 810s) vs achieved metrics
## Docs (pre-merge checklist)
- CHANGELOG.md: ruview-swarm + Ruflo integration + performance entries
- README.md: ruview-swarm row
- CLAUDE.md: Key Rust Crates table row + ADR-148 in ADR list
## Tests
- --no-default-features: 86/86 pass
- --features ruflo,itar-unrestricted: 98/98 pass
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(swarm): convergence-assist for victim fusion + 5s Ruflo HTTP timeout
Follow-up to 13b08927 which committed an intermediate M7 state with one
failing test. This lands the M7 agent's convergence fixes and the security
review's timeout hardening.
## Fixes
- swarm_sim.rs: min-separation nudge before collision metric (0 collisions
with staggered starts) + Phase-3 convergence assist that vectors the nearest
idle peer toward a single-drone CSI contact so multi-view fusion can fire
- http_backend.rs: add 5s request timeout to reqwest client (security review
Medium finding — a dead daemon would otherwise hang the swarm step loop)
## Security review verdict (HttpRufloBackend)
Safe to merge. No credentials in requests, serde_json prevents injection,
fail-open on daemon-down is documented and appropriate for SAR missions,
MAVLink passed as structured text (not raw bytes). Timeout fix applied.
## Tests
- --no-default-features: 87/87 pass
- --features ruflo,itar-unrestricted: 100/100 pass
Co-Authored-By: claude-flow <ruv@ruv.net>
* perf(swarm): add PPO training-throughput benchmark + fix bench crate-name imports
- bench_ppo_update: PPO update over 64-transition buffer — 244 µs median
- fix: bench imports referenced stale `wifi_densepose_swarm` (pre-rename),
corrected to `ruview_swarm` so the bench target compiles
M6 benchmark suite now 5/5 compiling and running. Tests unchanged: 87/100.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4)
Replaces the finite-difference PPO placeholder with a real GPU-capable Candle
0.9 autodiff trainer, adds A-MAPPO heterogeneous-role attention, a runnable
training binary, and right-sized GCP/local launch scripts. This is the unlock
that makes "GPU long training cycles" actually mean something — the previous
ppo_update did no gradient descent.
## Real autodiff PPO (feature `train`, optional `cuda`)
- candle_ppo.rs: CandleActorCritic (64→128→64 MLP + action/value heads +
learnable log_std), CandlePpoConfig, CandleTrainer with GAE and a genuine
optimizer.backward_step over the network. select_device() picks CUDA when
built --features cuda and a GPU is present, else CPU.
- Verified: 5-episode CPU smoke run shows value_loss 12643→12375 (critic
actually learning); safetensors checkpoint saved. Placeholder never moved weights.
## A-MAPPO heterogeneous-role attention (role_attention.rs, always compiled)
Addresses the four sensor-vs-relay edge cases:
- relay attention floor (prevents collapse — relays produce no CSI)
- role-segmented sensor/relay attention pools (variable neighbor cardinality)
- sensor-gated triangulation-geometry penalty (protects 3-view fusion baseline,
ADR-148 §4.2 — relays not dragged into triangulation geometry)
- one-hot role embeddings for keys
## Training binary
- src/bin/train_marl.rs (required-features=["train"], excluded from default build)
- CLI: --episodes --drones --profile --steps --checkpoint-dir --checkpoint-every
- Wires CandleTrainer to the SwarmOrchestrator rollout loop; GAE + PPO update
per episode; periodic safetensors checkpoints
## Right-sized launch (scripts/gcp/)
- provision_marl.sh: g2-standard-16 (1× L4, 16 vCPU, ~$1.40/hr) — NOT the
$29/hr A100×8 box. MARL is rollout-bound not matmul-bound; ~21× cheaper.
- run_marl_train.sh: GCP rsync + train + checkpoint pull
- run_marl_train_local.sh: local RTX 5080, $0
- A100×8 provision_training.sh left for OccWorld (which saturates the GPUs)
## Tests
- --no-default-features: 91/91 (87 + 4 role_attention)
- --features train: 96/96 (+ 5 candle_ppo, incl. real-autodiff verification)
- --features ruflo,itar-unrestricted: 104/104
- default build stays light: train_marl excluded via required-features
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr-148): mark M4 complete — real GPU autodiff training; overall 98%
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(swarm): training visualizer — JSONL telemetry + self-contained HTML viewer
Adds an offline, dependency-free visualization for the drone training system:
a top-down swarm replay synced with training-metric curves, fed by a JSONL
telemetry log the trainer emits. No server, no build step, no CDN.
## Telemetry recorder (integration/telemetry.rs, always compiled, no new deps)
- TelemetryRecorder writes newline-delimited JSON: one `meta` (profile, area,
ground-truth victims), many `step` (per-tick drone x/y/heading/battery/detection
+ coverage%), and per-episode `episode` (mean_return, policy_loss, value_loss).
- Written by hand (no serde_json) so it stays in the default build; 2 tests.
## train_marl telemetry flags
- `--telemetry FILE` writes the log; `--telemetry-episode N` selects which
episode's spatial steps to record (metrics recorded for all episodes).
## Visualizer (viz/swarm_viz.html — single file, vanilla JS + canvas)
- LEFT: top-down replay — heading-oriented drone triangles (cyan/lime on
detection), victim markers, growing coverage heatmap, detection pulse rings,
play/pause/scrub/speed controls + live coverage/detection readout.
- RIGHT: three autoscaled line charts (mean return, policy loss, value loss)
over episodes, hand-drawn (no chart library).
- Loads via file picker/drag-drop or auto-fetches the bundled sample; dark
drone-ops theme; graceful degradation on file:// CORS.
- viz/sample_telemetry.jsonl: real 30-episode / 4-drone / 400×400 m run
(value_loss 20052→7154 — visible critic learning). Parses 1 meta / 60 step / 30 episode.
## Usage
cargo run --release -p ruview-swarm --features train,cuda --bin train_marl -- \
--episodes 5000 --telemetry run.jsonl
open v2/crates/ruview-swarm/viz/swarm_viz.html # load run.jsonl
Tests unchanged (91 default / 96 train / 104 ruflo+itar); telemetry adds 2.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(swarm): selectable flight + self-learning patterns, wired into training + viz
Adds multiple flight/coverage-optimization strategies and self-learning
strategies, selectable from the trainer, and fixes drone clustering — the
demo sweep now covers 36% of the area (was ~0.9%) with 4 disjoint strips.
## Flight patterns (planning/patterns.rs) — `FlightPattern`
- PartitionedLawnmower (new default): area split into per-drone strips → no
overlap, coverage scales ~linearly with swarm size (clustering fix)
- Boustrophedon (baseline), Spiral, Pheromone (stigmergic), PotentialField,
LevyFlight. from_str/name/all + next_target(&PatternContext).
## Self-learning patterns (marl/learning.rs) — `LearningPattern`
- Mappo (CTDE centralized critic), Ippo (independent, jamming-robust),
MappoCuriosity (count-based intrinsic novelty), MetaRl (MAML fast-adapt).
- CuriosityModule (visit_bonus = beta/sqrt(count), novelty decays on revisit),
MetaAdapter (base + fast-weights, reset_fast/consolidate), shaped_reward().
## Trainer wiring (bin/train_marl.rs)
- --flight-pattern {boustrophedon|partitioned|spiral|pheromone|potential|levy}
- --learn-pattern {mappo|ippo|curiosity|meta}
- Rollout now moves each drone per the selected FlightPattern (PatternContext
with visited trail + live peers), curiosity-shapes the reward, and logs
CTDE vs independent. Telemetry meta profile carries the pattern labels so the
viewer header shows `flight=… · learn=…`.
## Verification
- Browser pass (viz at localhost:8777): partitioned run renders 4 distinct
serpentine coverage bands, header shows the patterns, final coverage 36.3%,
scrubber/speed/playback work, ZERO console errors. Screenshot confirmed.
- Regenerated viz/sample_telemetry.jsonl: 1 meta / 120 step / 30 episode,
coverage 0.9% → 36.3%.
## Tests
- --no-default-features: 103/103 (was 91; +6 patterns +6 learning)
- --features train: 108/108
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(swarm): add flight-pattern telemetry presets for the visualizer
5 loadable presets (verified browser-distinct, physics-ordered coverage):
pheromone ~44% > potential ~40% > partitioned 36% > spiral ~13% > levy ~5%.
Load any in viz/swarm_viz.html to compare flight strategies without retraining.
Co-Authored-By: claude-flow <ruv@ruv.net>
* chore(swarm): clippy-clean + publish guard for ruview-swarm
- ruview-swarm src is now 0 clippy warnings across default/train/full feature
sets (derive Default, targeted allows for intentional from_str + bounded
casts + borrow-required index loops; removed redundant unsigned .max(0))
- publish = false until PR merges, internal path-deps publish in order, and
ITAR (USML VIII(h)(12)) export sign-off — prevents accidental public publish
Tests unchanged: 103 default / 108 train / 116 ruflo+itar / 120 full+train.
(6 remaining clippy warnings are pre-existing in dependency wifi-densepose-core,
out of scope for this crate.)
Co-Authored-By: claude-flow <ruv@ruv.net>
* ci(swarm): add ruview-swarm CI guard
Path-scoped guard for v2/crates/ruview-swarm/** (ADR-148). Complements the
main ci.yml (which only runs the default workspace tests):
- feature-matrix tests: default / train / ruflo+itar / full+train
- clippy -D warnings --no-deps (crate-own code only; dep warnings don't gate)
- train_marl bin builds under 'train' AND is excluded from the default build
- ITAR/publish guards: publish=false present, itar-unrestricted never in default
All steps verified locally green before commit.
Co-Authored-By: claude-flow <ruv@ruv.net>
142 lines
6.4 KiB
Bash
Executable File
142 lines
6.4 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# Run ruview-swarm MARL training on a GCP L4 instance (ADR-148 M4).
|
|
# Usage: bash scripts/gcp/run_marl_train.sh <INSTANCE_IP> [EPISODES] [DRONES] [PROFILE]
|
|
#
|
|
# Rsyncs the v2/ Rust workspace to the instance, then runs the Candle PPO
|
|
# MARL trainer:
|
|
# cargo run --release -p ruview-swarm --features train,cuda --bin train_marl
|
|
# Downloads the trained checkpoints back on completion.
|
|
#
|
|
# NOTE: the `--bin train_marl` target is added by the companion MARL trainer
|
|
# work (Candle PPO trainer). This script calls it; it is expected to
|
|
# exist once that work lands.
|
|
|
|
set -euo pipefail
|
|
|
|
# ── Usage ─────────────────────────────────────────────────────────────────────
|
|
if [[ $# -lt 1 ]]; then
|
|
echo "Usage: $0 <INSTANCE_IP> [EPISODES] [DRONES] [PROFILE]" >&2
|
|
echo ""
|
|
echo " INSTANCE_IP External IP of the GCP L4 MARL training instance"
|
|
echo " EPISODES Training episodes (default: 5000)"
|
|
echo " DRONES Swarm size (default: 4)"
|
|
echo " PROFILE Mission profile (default: sar)"
|
|
echo ""
|
|
echo "Example:"
|
|
echo " $0 34.123.45.67"
|
|
echo " $0 34.123.45.67 10000 6 sar"
|
|
exit 1
|
|
fi
|
|
|
|
INSTANCE_IP="$1"
|
|
EPISODES="${2:-5000}"
|
|
DRONES="${3:-4}"
|
|
PROFILE="${4:-sar}"
|
|
|
|
GCP_USER="${GCP_USER:-$(gcloud config get-value account 2>/dev/null | cut -d@ -f1)}"
|
|
REMOTE="${GCP_USER}@${INSTANCE_IP}"
|
|
LOCAL_V2_DIR="$(cd "$(dirname "$0")/../.." && pwd)/v2"
|
|
OUTPUT_DIR="./out/gcp-checkpoints/marl"
|
|
REMOTE_CRATE="~/ruview-swarm"
|
|
REMOTE_CHECKPOINTS="~/ruview-swarm/marl-checkpoints"
|
|
|
|
log() { echo "[run_marl_train] $*"; }
|
|
|
|
# ── Validation ────────────────────────────────────────────────────────────────
|
|
if [[ ! -d "$LOCAL_V2_DIR" ]]; then
|
|
echo "ERROR: v2 workspace not found: $LOCAL_V2_DIR" >&2
|
|
exit 1
|
|
fi
|
|
|
|
log "Config: $EPISODES episodes, $DRONES drones, profile=$PROFILE"
|
|
|
|
# ── SSH connectivity check ────────────────────────────────────────────────────
|
|
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=15 -o BatchMode=yes"
|
|
log "Checking SSH connectivity to $REMOTE ..."
|
|
if ! ssh $SSH_OPTS "$REMOTE" "echo ok" &>/dev/null; then
|
|
echo "ERROR: Cannot SSH to $REMOTE" >&2
|
|
echo " Ensure the instance is running and your SSH key is authorized." >&2
|
|
echo " Try: gcloud compute ssh <INSTANCE_NAME> --project=cognitum-20260110" >&2
|
|
exit 1
|
|
fi
|
|
log "SSH connection OK"
|
|
|
|
# ── Startup script completion check ───────────────────────────────────────────
|
|
log "Checking that startup script completed ..."
|
|
STARTUP_READY=$(ssh $SSH_OPTS "$REMOTE" \
|
|
"grep -c 'setup complete' /var/log/ruview-marl-startup.log 2>/dev/null || echo 0")
|
|
if [[ "$STARTUP_READY" -lt 1 ]]; then
|
|
log "WARNING: Startup script may not have finished yet."
|
|
log " Check /var/log/ruview-marl-startup.log on the instance."
|
|
log " Continuing anyway — the Rust toolchain may need more time."
|
|
fi
|
|
|
|
# ── Rsync the v2 Rust workspace ───────────────────────────────────────────────
|
|
# Exclude build artifacts and VCS — the instance rebuilds from source.
|
|
log "Rsyncing v2 workspace → $REMOTE:$REMOTE_CRATE ..."
|
|
ssh $SSH_OPTS "$REMOTE" "mkdir -p $REMOTE_CRATE"
|
|
rsync -avz --progress --stats \
|
|
-e "ssh $SSH_OPTS" \
|
|
--exclude="target/" \
|
|
--exclude=".git/" \
|
|
--exclude="marl-checkpoints/" \
|
|
--exclude="*.log" \
|
|
"$LOCAL_V2_DIR/" \
|
|
"${REMOTE}:${REMOTE_CRATE}/"
|
|
log "Workspace sync complete"
|
|
|
|
# ── Run MARL training ─────────────────────────────────────────────────────────
|
|
log "=== MARL training ($EPISODES episodes, $DRONES drones, $PROFILE) ==="
|
|
TRAIN_START=$(date +%s)
|
|
|
|
ssh $SSH_OPTS "$REMOTE" bash << REMOTE_TRAIN
|
|
set -euo pipefail
|
|
# shellcheck source=/dev/null
|
|
source "\$HOME/.cargo/env"
|
|
cd "\$HOME/ruview-swarm"
|
|
|
|
mkdir -p ./marl-checkpoints
|
|
|
|
echo "[train] \$(date): starting Candle PPO MARL trainer"
|
|
# --bin train_marl is provided by the companion MARL trainer work.
|
|
cargo run --release -p ruview-swarm --features train,cuda --bin train_marl -- \\
|
|
--episodes ${EPISODES} --drones ${DRONES} --profile ${PROFILE} \\
|
|
--checkpoint-dir ./marl-checkpoints
|
|
|
|
echo "[train] \$(date): MARL training complete"
|
|
ls -lh ./marl-checkpoints/
|
|
REMOTE_TRAIN
|
|
|
|
TRAIN_END=$(date +%s)
|
|
TRAIN_MIN=$(( (TRAIN_END - TRAIN_START) / 60 ))
|
|
log "Training complete in ${TRAIN_MIN} min"
|
|
|
|
# ── Download checkpoints ──────────────────────────────────────────────────────
|
|
log "Downloading checkpoints → $OUTPUT_DIR ..."
|
|
mkdir -p "$OUTPUT_DIR"
|
|
rsync -avz --progress --stats \
|
|
-e "ssh $SSH_OPTS" \
|
|
"${REMOTE}:${REMOTE_CHECKPOINTS}/" \
|
|
"$OUTPUT_DIR/"
|
|
|
|
# ── Verify download ───────────────────────────────────────────────────────────
|
|
LOCAL_FILE_COUNT=$(find "$OUTPUT_DIR" -type f 2>/dev/null | wc -l)
|
|
LOCAL_SIZE_MB=$(du -sm "$OUTPUT_DIR" 2>/dev/null | awk '{print $1}')
|
|
log "Downloaded $LOCAL_FILE_COUNT files, ~${LOCAL_SIZE_MB} MB to $OUTPUT_DIR"
|
|
if [[ "$LOCAL_FILE_COUNT" -lt 1 ]]; then
|
|
echo "WARNING: No checkpoints were downloaded from $REMOTE" >&2
|
|
fi
|
|
|
|
# ── Summary ───────────────────────────────────────────────────────────────────
|
|
TRAIN_HR=$(awk "BEGIN {printf \"%.2f\", $TRAIN_MIN / 60}")
|
|
COST=$(awk "BEGIN {printf \"%.2f\", 1.40 * $TRAIN_HR}")
|
|
log ""
|
|
log "=== MARL training complete ==="
|
|
log " Episodes : $EPISODES (drones=$DRONES, profile=$PROFILE)"
|
|
log " Wall time : ${TRAIN_MIN} min (${TRAIN_HR} hr)"
|
|
log " Est. compute cost: ~\$$COST (at \$1.40/hr on-demand, g2-standard-16)"
|
|
log " Checkpoints in : $OUTPUT_DIR"
|
|
log ""
|
|
log "Next step (teardown):"
|
|
log " bash scripts/gcp/teardown.sh <INSTANCE_NAME> --skip-download"
|