mirror of
https://github.com/ruvnet/RuView
synced 2026-06-09 10:13:17 +00:00
3314c8db8d
* feat(cog-pose-estimation): scaffold first Cog from this repo (ADR-100 + ADR-101) Adds the foundation for the pose-estimation Cog that ships from this repo into Cognitum V0 appliances. Companion ADR-225 + crate land in cognitum-one/v0-appliance. ADRs: * ADR-100 formalises the Cognitum Cog packaging spec — on-device layout under /var/lib/cognitum/apps/<id>/, manifest.json schema (incl. new binary_sha256 + binary_signature fields), GCS hosting convention, repo source layout, build pipeline, and the four-verb runtime contract (version | manifest | health | run). Documents the convention I reverse-engineered from inspecting installed cogs on a live cognitum-v0 appliance — `anomaly-detect`, `presence`, `seizure-detect`, etc. * ADR-101 designs the pose-estimation Cog itself: where it sits in the wifi-densepose pipeline (encoder init from ruvnet/wifi-densepose-pretrained, 17-keypoint regression head), what gets shipped per target arch (arm / x86_64 / hailo8 / hailo10), acceptance gates (PCK@20 explicitly deferred to #640 — this ADR ships the vehicle, not the accuracy). Crate v2/crates/cog-pose-estimation/: * Cargo.toml + workspace member declaration with a hailo feature gate so the binary builds without the Hailo SDK in CI. * main.rs implements the four-verb CLI exactly per ADR-100. * config.rs / manifest.rs / publisher.rs / inference.rs / runtime.rs — small modules, each <100 lines. * publisher.rs emits ADR-100 structured JSON events. * inference.rs is a stub that produces a centred-skeleton baseline with confidence=0 (honest: no trained weights wired in yet). * runtime.rs subscribes to /api/v1/sensing/latest, slides a 56*20 window, runs the engine, emits pose.frame events. * cog/manifest.template.json + cog/config.schema.json define the release artifact + runtime config schemas. * cog/Makefile holds build / sign / upload targets. * tests/smoke.rs covers manifest roundtrip + engine I/O surface. Verified locally: * cargo check -p cog-pose-estimation: clean. * cargo test -p cog-pose-estimation: 4/4 pass. * ./target/release/cog-pose-estimation {version,manifest,health}: all emit the right contract output. This commit contains scaffolding only; the actual trained weights and Hailo HEF cross-compile come in follow-ups tracked in #640 and the companion v0-appliance branch. * feat(cog-pose-estimation): first measured run — Candle CUDA on RTX 5080 Trained pose_v1 on ruvultra (RTX 5080) via Candle 0.9 + cuda feature against the same 1,077-sample paired session that produced 0%/0% PCK in #640 with the pure-JS SPSA trainer. First real numbers: PCK@20 = 3.0% (up from 0.0%) PCK@50 = 18.5% (up from 0.0%) MPJPE = 0.093 (down from 0.66, ~7x improvement) 400 epochs in 2.1 s wall time, full-batch, ~5 ms/epoch. Loss curve 0.181 -> 0.014 over the run, eval 0.010. Per-joint reveals the model leans on right-side proximal joints (r_hip 77% PCK@50, r_knee 35%, l_elbow 26%) — consistent with the camera framing in the source recording. Distal joints (wrists, ankles) and face joints are still near-random, consistent with the 56-subcarrier / 20-frame input not carrying fine-grained spatial info at 1077 samples. This commit: * Adds v2/crates/cog-pose-estimation/cog/artifacts/{pose_v1.safetensors, train_results.json} so the cog dir now contains a real reference artifact, not just scaffold. * Updates cog/README.md "Status" block with the measured numbers, per-joint table, and an honest reading of where the model succeeds vs where the data is the bottleneck. * Adds docs/benchmarks/pose-estimation-cog.md as the canonical benchmark log — append-only, one section per published run. * Appends a "First measured run" section to ADR-101 referencing the new benchmark file. Still pending in the follow-up: * Wire pose_v1.safetensors into src/inference.rs (replace stub). * ONNX export (Candle lacks a writer — needs external conversion). * Hailo HEF cross-compile + cluster deploy. The data-bound gap to PCK@20 >= 35% is tracked in #640. * feat(cog-pose-estimation): wire real weights — cog is no longer a stub Replaces the centred-skeleton stub in src/inference.rs with a real Candle-based loader that reads cog/artifacts/pose_v1.safetensors and runs the trained Conv1d encoder + MLP pose head on every incoming CSI window. What changes: * src/inference.rs: PoseNet mirrors the training script's architecture exactly — Conv1d(56->64, k=3 d=1), Conv1d(64->128, k=3 d=2), Conv1d(128->128, k=3 d=4), mean over time, Linear(128->256)+ReLU, Linear(256->34)+sigmoid -> reshape [17, 2]. The InferenceEngine searches a sensible candidate list for the weights file (/var/lib/cognitum/apps/pose-estimation/, ./pose_v1.safetensors, ./cog/artifacts/, repo-root, v2/-relative) and falls back to the stub when none are present so the cog still satisfies ADR-100. * Cargo.toml: adds candle-core 0.9 + candle-nn 0.9 (no-default-features, CPU build by default) + safetensors 0.4. New `cuda` feature opt-in for GPU inference on hosts that have it. Drops the unused wifi-densepose-train path dep from the default build path. * src/main.rs + src/publisher.rs: health.ok event now carries `backend` (candle-cuda | candle-cpu | stub) and the synthetic output confidence, so operators can tell at a glance whether the cog loaded its weights or fell back to the stub. * tests/smoke.rs: adds `real_weights_load_when_available` which asserts the loaded engine reports backend=candle-* and emits non-zero confidence — exactly the signal that proves we're not silently degrading to the stub. Verified locally: * `cargo check -p cog-pose-estimation --no-default-features` — clean * `cargo test -p cog-pose-estimation --no-default-features` — 5/5 pass * `./target/release/cog-pose-estimation health` emits: {"event":"health.ok","fields":{"backend":"candle-cpu","cog":"pose-estimation","synthetic_output_confidence":0.185}} — 0.185 is the published PCK@50 from cog/artifacts/train_results.json, emitted by the real Candle inference path (would be 0.0 if it had fallen back to the stub). The cog now runs the trained pose_v1 model end-to-end. Accuracy is still bounded by the underlying 1077-sample training data (PCK@20 3.0%, PCK@50 18.5% per docs/benchmarks/pose-estimation-cog.md) — that gap is data-bound and tracked in #640. ONNX export + Hailo HEF cross-compile remain follow-ups. * docs(benchmarks): measure cog-pose-estimation cold-start latency 100 sequential `cog-pose-estimation health` invocations average 76.2 ms each on a Windows x86_64 host using the `candle-cpu` backend. Each invocation re-loads pose_v1.safetensors and runs one synthetic forward pass, so this is the worst-case cold-start path. Long-running `run` inference will be sub-millisecond per frame once the model is loaded. Updates the benchmarks doc accordingly. * feat(cog-pose-estimation): ONNX export — pose_v1.onnx + scripts/export-onnx.py Adds the canonical ONNX artifact that unblocks downstream Hailo HEF cross-compile + ONNX Runtime benchmarks. Generated on ruvultra (torch 2.12.0 + CUDA), 12,059 bytes, opset 18, dynamic batch axis. * scripts/export-onnx.py: mirrors the Candle inference architecture in PyTorch (Conv1d 56->64, 64->128, 128->128 + Linear 128->256->34), pure- python safetensors loader (no extra pip dep), exports via torch.onnx.export, then verifies via onnx.checker.check_model and numerical parity against the torch reference. * Verified parity vs torch: max |torch - onnx| = 8.94e-8 (1e-5 threshold). Effectively bit-perfect. * v2/crates/cog-pose-estimation/cog/artifacts/pose_v1.onnx — the artifact itself, 12 KB. * docs/benchmarks/pose-estimation-cog.md — adds an ONNX export section with the verification numbers. Next: Hailo HEF cross-compile (still gated on Hailo SDK on a self-hosted runner) and ONNX Runtime latency benchmarks on each target arch. * feat(cog-pose-estimation): release v0.0.1 — signed aarch64 binary on GCS End-to-end deploy: cross-compiled to aarch64-unknown-linux-gnu on ruvultra, ran via qemu-aarch64-static, then smoke-tested on a real cognitum-v0 Pi 5. Signed with COGNITUM_OWNER_SIGNING_KEY (Ed25519) and uploaded to gs://cognitum-apps/cogs/arm/. Real-hardware results on cognitum-v0 (Pi 5): health: backend=candle-cpu, confidence=0.185, real weights loaded 30x sequential `health`: 0.251 s total -> 8.4 ms / invocation (cold) GCS release artifacts (publicly downloadable): binary: 3,741,976 bytes sha256 1e1a7d3dd01ca05d5bfc5dbb142a5941b7866ed9f3224a21edc04d3f09a99bf5 weights: 507,032 bytes sha256 eb249b9a6b2e10130437a10976ed0230b0d085f86a0553d7226e1ae6eae4b9e5 signature (Ed25519, b64): LUN7xqLPYD3MFzm5dKB5MnYU0LvoRtek5ci5KiKPHBg+Xo6xuazwokn2Dw2JPMaLYJzmWn/SpT4djuR7hYvVDw== Adds: * v2/crates/cog-pose-estimation/cog/artifacts/manifest.json — the release-pipeline-produced manifest with all fields filled in per ADR-100, including arch, target_triple, signature, and a build_metadata block carrying the validation PCK numbers. * docs/benchmarks/pose-estimation-cog.md — new sections covering the real Pi 5 smoke (8.4 ms cold-start) and the signed GCS release artifacts. Verified by downloading the binary anonymously from GCS and re-computing the sha256 — matches the locally-computed sha exactly. Signature decoded to the expected 64-byte Ed25519 length. Closes the GCS-upload acceptance criterion from ADR-100; the only pending work is Hailo HEF cross-compile (still SDK-gated) and an x86_64 release alongside this arm release. * docs(benchmarks): record live cognitum-v0 install + 5-sec smoke run Adds the "Live appliance install" section documenting what happened when the signed v0.0.1 binary + weights were installed under /var/lib/cognitum/apps/pose-estimation/ on cognitum-v0 (the V0 cluster leader). * Layout matches the existing anomaly-detect / presence / seizure- detect cogs exactly — the Cogs dashboard at http://cognitum-v0:9000/cogs auto-discovers entries. * `cog-pose-estimation run` ran for 5 seconds in the background and cleanly emitted run.started + structured WARN events for the missing local sensing-server on :3000 (cognitum-v0's actual CSI source is ruview-vitals-worker on :50054, not :3000). No crashes, no NaN, no leaks. * Wiring `sensing_url` to the appliance-native source is a separate Day-2 integration task.
WiFi-DensePose Rust Crates
See through walls with WiFi. No cameras. No wearables. Just radio waves.
A modular Rust workspace for WiFi-based human pose estimation, vital sign monitoring, and disaster response using Channel State Information (CSI). Built on RuVector graph algorithms and the WiFi-DensePose research platform by rUv.
Performance
| Operation | Python v1 | Rust v2 | Speedup |
|---|---|---|---|
| CSI Preprocessing | ~5 ms | 5.19 us | ~1000x |
| Phase Sanitization | ~3 ms | 3.84 us | ~780x |
| Feature Extraction | ~8 ms | 9.03 us | ~890x |
| Motion Detection | ~1 ms | 186 ns | ~5400x |
| Full Pipeline | ~15 ms | 18.47 us | ~810x |
| Vital Signs | N/A | 86 us (11,665 fps) | -- |
Crate Overview
Core Foundation
| Crate | Description | crates.io |
|---|---|---|
wifi-densepose-core |
Types, traits, and utilities (CsiFrame, PoseEstimate, SignalProcessor) |
|
wifi-densepose-config |
Configuration management (env, TOML, YAML) | |
wifi-densepose-db |
Database persistence (PostgreSQL, SQLite, Redis) |
Signal Processing & Sensing
| Crate | Description | RuVector Integration | crates.io |
|---|---|---|---|
wifi-densepose-signal |
SOTA CSI signal processing (6 algorithms from SpotFi, FarSense, Widar 3.0) | ruvector-mincut, ruvector-attn-mincut, ruvector-attention, ruvector-solver |
|
wifi-densepose-vitals |
Vital sign extraction: breathing (6-30 BPM) and heart rate (40-120 BPM) | -- | |
wifi-densepose-wifiscan |
Multi-BSSID WiFi scanning for Windows-enhanced sensing | -- |
Neural Network & Training
| Crate | Description | RuVector Integration | crates.io |
|---|---|---|---|
wifi-densepose-nn |
Multi-backend inference (ONNX, PyTorch, Candle) with DensePose head (24 body parts) | -- | |
wifi-densepose-train |
Training pipeline with MM-Fi dataset, 114->56 subcarrier interpolation | All 5 crates |
Disaster Response
| Crate | Description | RuVector Integration | crates.io |
|---|---|---|---|
wifi-densepose-mat |
Mass Casualty Assessment Tool -- survivor detection, triage, multi-AP localization | ruvector-solver, ruvector-temporal-tensor |
Hardware & Deployment
| Crate | Description | crates.io |
|---|---|---|
wifi-densepose-hardware |
ESP32, Intel 5300, Atheros CSI sensor interfaces (pure Rust, no FFI) | |
wifi-densepose-wasm |
WebAssembly bindings for browser-based disaster dashboard | |
wifi-densepose-sensing-server |
Axum server: ESP32 UDP ingestion, WebSocket broadcast, sensing UI |
Applications
| Crate | Description | crates.io |
|---|---|---|
wifi-densepose-api |
REST + WebSocket API layer | |
wifi-densepose-cli |
Command-line tool for MAT disaster scanning |
Architecture
wifi-densepose-core
(types, traits, errors)
|
+-------------------+-------------------+
| | |
wifi-densepose-signal wifi-densepose-nn wifi-densepose-hardware
(CSI processing) (inference) (ESP32, Intel 5300)
+ ruvector-mincut + ONNX Runtime |
+ ruvector-attn-mincut + PyTorch (tch) wifi-densepose-vitals
+ ruvector-attention + Candle (breathing, heart rate)
+ ruvector-solver |
| | wifi-densepose-wifiscan
+--------+---------+ (BSSID scanning)
|
+------------+------------+
| |
wifi-densepose-train wifi-densepose-mat
(training pipeline) (disaster response)
+ ALL 5 ruvector + ruvector-solver
+ ruvector-temporal-tensor
|
+-----------------+-----------------+
| | |
wifi-densepose-api wifi-densepose-wasm wifi-densepose-cli
(REST/WS) (browser WASM) (CLI tool)
|
wifi-densepose-sensing-server
(Axum + WebSocket)
RuVector Integration
All RuVector crates at v2.0.4 from crates.io:
| RuVector Crate | Used In | Purpose |
|---|---|---|
ruvector-mincut |
signal, train | Dynamic min-cut for subcarrier selection & person matching |
ruvector-attn-mincut |
signal, train | Attention-weighted min-cut for antenna gating & spectrograms |
ruvector-temporal-tensor |
train, mat | Tiered temporal compression (4-10x memory reduction) |
ruvector-solver |
signal, train, mat | Sparse Neumann solver for interpolation & triangulation |
ruvector-attention |
signal, train | Scaled dot-product attention for spatial features & BVP |
Signal Processing Algorithms
Six state-of-the-art algorithms implemented in wifi-densepose-signal:
| Algorithm | Paper | Year | Module |
|---|---|---|---|
| Conjugate Multiplication | SpotFi (SIGCOMM) | 2015 | csi_ratio.rs |
| Hampel Filter | WiGest | 2015 | hampel.rs |
| Fresnel Zone Model | FarSense (MobiCom) | 2019 | fresnel.rs |
| CSI Spectrogram | Standard STFT | 2018+ | spectrogram.rs |
| Subcarrier Selection | WiDance (MobiCom) | 2017 | subcarrier_selection.rs |
| Body Velocity Profile | Widar 3.0 (MobiSys) | 2019 | bvp.rs |
Quick Start
As a Library
use wifi_densepose_core::{CsiFrame, CsiMetadata, SignalProcessor};
use wifi_densepose_signal::{CsiProcessor, CsiProcessorConfig};
// Configure the CSI processor
let config = CsiProcessorConfig::default();
let processor = CsiProcessor::new(config);
// Process a CSI frame
let frame = CsiFrame { /* ... */ };
let processed = processor.process(&frame)?;
Vital Sign Monitoring
use wifi_densepose_vitals::{
CsiVitalPreprocessor, BreathingExtractor, HeartRateExtractor,
VitalAnomalyDetector,
};
let mut preprocessor = CsiVitalPreprocessor::new(56); // 56 subcarriers
let mut breathing = BreathingExtractor::new(100.0); // 100 Hz sample rate
let mut heartrate = HeartRateExtractor::new(100.0);
// Feed CSI frames and extract vitals
for frame in csi_stream {
let residuals = preprocessor.update(&frame.amplitudes);
if let Some(bpm) = breathing.push_residuals(&residuals) {
println!("Breathing: {:.1} BPM", bpm);
}
}
Disaster Response (MAT)
use wifi_densepose_mat::{DisasterResponse, DisasterConfig, DisasterType};
let config = DisasterConfig {
disaster_type: DisasterType::Earthquake,
max_scan_zones: 16,
..Default::default()
};
let mut responder = DisasterResponse::new(config);
responder.add_scan_zone(zone)?;
responder.start_continuous_scan().await?;
Hardware (ESP32)
use wifi_densepose_hardware::{Esp32CsiParser, CsiFrame};
let parser = Esp32CsiParser::new();
let raw_bytes: &[u8] = /* UDP packet from ESP32 */;
let frame: CsiFrame = parser.parse(raw_bytes)?;
println!("RSSI: {} dBm, {} subcarriers", frame.metadata.rssi, frame.subcarriers.len());
Training
# Check training crate (no GPU needed)
cargo check -p wifi-densepose-train --no-default-features
# Run training with GPU (requires tch/libtorch)
cargo run -p wifi-densepose-train --features tch-backend --bin train -- \
--config training.toml --dataset /path/to/mmfi
# Verify deterministic training proof
cargo run -p wifi-densepose-train --features tch-backend --bin verify-training
Building
# Clone the repository
git clone https://github.com/ruvnet/wifi-densepose.git
cd wifi-densepose/v2
# Check workspace (no GPU dependencies)
cargo check --workspace --no-default-features
# Run all tests
cargo test --workspace --no-default-features
# Build release
cargo build --release --workspace
Feature Flags
| Crate | Feature | Description |
|---|---|---|
wifi-densepose-nn |
onnx (default) |
ONNX Runtime backend |
wifi-densepose-nn |
tch-backend |
PyTorch (libtorch) backend |
wifi-densepose-nn |
candle-backend |
Candle (pure Rust) backend |
wifi-densepose-nn |
cuda |
CUDA GPU acceleration |
wifi-densepose-train |
tch-backend |
Enable GPU training modules |
wifi-densepose-mat |
ruvector (default) |
RuVector graph algorithms |
wifi-densepose-mat |
api (default) |
REST + WebSocket API |
wifi-densepose-mat |
distributed |
Multi-node coordination |
wifi-densepose-mat |
drone |
Drone-mounted scanning |
wifi-densepose-hardware |
esp32 |
ESP32 protocol support |
wifi-densepose-hardware |
intel5300 |
Intel 5300 CSI Tool |
wifi-densepose-hardware |
linux-wifi |
Linux commodity WiFi |
wifi-densepose-wifiscan |
wlanapi |
Windows WLAN API async scanning |
wifi-densepose-core |
serde |
Serialization support |
wifi-densepose-core |
async |
Async trait support |
Testing
# Unit tests (all crates)
cargo test --workspace --no-default-features
# Signal processing benchmarks
cargo bench -p wifi-densepose-signal
# Training benchmarks
cargo bench -p wifi-densepose-train --no-default-features
# Detection benchmarks
cargo bench -p wifi-densepose-mat
Supported Hardware
| Hardware | Crate Feature | CSI Subcarriers | Cost |
|---|---|---|---|
| ESP32-S3 Mesh (3-6 nodes) | hardware/esp32 |
52-56 | ~$54 |
| Intel 5300 NIC | hardware/intel5300 |
30 | ~$50 |
| Atheros AR9580 | hardware/linux-wifi |
56 | ~$100 |
| Any WiFi (Windows/Linux) | wifiscan |
RSSI-only | $0 |
Architecture Decision Records
Key design decisions documented in docs/adr/:
| ADR | Title | Status |
|---|---|---|
| ADR-014 | SOTA Signal Processing | Accepted |
| ADR-015 | MM-Fi + Wi-Pose Training Datasets | Accepted |
| ADR-016 | RuVector Training Pipeline | Accepted (Complete) |
| ADR-017 | RuVector Signal + MAT Integration | Accepted |
| ADR-021 | Vital Sign Detection Pipeline | Accepted |
| ADR-022 | Windows WiFi Enhanced Sensing | Accepted |
| ADR-024 | Contrastive CSI Embedding Model | Accepted |
Related Projects
- WiFi-DensePose -- Main repository (Python v1 + Rust v2)
- RuVector -- Graph algorithms for neural networks (5 crates, v2.0.4)
- rUv -- Creator and maintainer
License
All crates are dual-licensed under MIT OR Apache-2.0.
Copyright (c) 2024 rUv