Compare commits

..

3 Commits

Author SHA1 Message Date
ruv 3b4e151507 docs: ADR-081 add ruvector-cnn spectrogram gesture classification
- Replace DTW with CNN on CSI spectrograms via ruvector-cnn WASM
- Pipeline: CSI → STFT → 64x64 spectrogram → CnnEmbedder → 128-dim → classifier
- Two-phase training: InfoNCE contrastive + supervised classification
- Dual-path fusion: DTW + CNN in parallel for max robustness
- Comparison table: CNN ~95% vs DTW ~85% accuracy (literature)
- Fallback: lightweight 1D CNN for ESP32 edge deployment

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-07 09:04:16 -04:00
ruv 68d47a25d5 docs: ADR-081 add AR camera overlay with floating charts + lower third
- AR overlay: live camera feed with skeleton, gesture cursor, and
  floating charts anchored to hand/body position
- Lower third: RuView "pi" logo, vital signs, gesture state, sensor
  status in broadcast-style bar (semi-transparent dark, teal accents)
- 6 composited layers: camera → skeleton → cursor → chart → labels → lower third
- Chart placement rules: follows dominant hand, stays in frame bounds
- Skeleton style: teal keypoints/bones, yellow highlight on active hand
- Cursor types: open hand, pointing ray, grab, pinch, ghost (CSI-only)

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-07 08:57:41 -04:00
ruv 0d3292314b docs: ADR-081 gesture-controlled data visualization
Camera + CSI fusion for hands-free chart manipulation:
- 11 arm-level gestures (CSI-detectable): swipe, circle, hold, spread
- 7 finger-level gestures (camera-required): pinch, point, grab, thumbs
- Fusion engine: camera precision + CSI through-wall capability
- Chart types: line, bar, 3D scatter, heatmap, gauge, spectrogram
- Visual feedback: gesture cursor overlay + state indicator
- WebSocket protocol for gesture events → UI commands
- Dual-mode: fusion (full precision) or CSI-only (works in dark)
- Builds on WiFlow (ADR-079) + DTW gestures (ADR-029)

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-07 08:52:39 -04:00
9 changed files with 640 additions and 156 deletions
-59
View File
@@ -5,65 +5,6 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Fixed
- **`provision.py` esptool v5 compat** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct.
- **`provision.py` silent NVS wipe** (#391) — The script replaces the entire `csi_cfg` NVS namespace on every run, so partial invocations were silently erasing WiFi credentials and causing `Retrying WiFi connection (10/10)` in the field. Now refuses to run without `--ssid`, `--password`, and `--target-ip` unless `--force-partial` is passed. `--force-partial` prints a warning listing which keys will be wiped.
- **Firmware: defensive `node_id` capture** (#232, #375, #385, #386, #390) — Users on multi-node deployments reported `node_id` reverting to the Kconfig default (`1`) in UDP frames and in the `csi_collector` init log, despite NVS loading the correct value. The root cause (memory corruption of `g_nvs_config`) has not been definitively isolated, but the UDP frame header is now tamper-proof: `csi_collector_init()` captures `g_nvs_config.node_id` into a module-local `s_node_id` once, and `csi_serialize_frame()` plus all other consumers (`edge_processing.c`, `wasm_runtime.c`, `display_ui.c`, `swarm_bridge_init`) read it via the new `csi_collector_get_node_id()` accessor. A canary logs `WARN` if `g_nvs_config.node_id` diverges from `s_node_id` at end-of-init, helping isolate the upstream corruption path. Validated on attached ESP32-S3 (COM8): NVS `node_id=2` propagates through boot log, capture log, init log, and byte[4] of every UDP frame.
### Docs
- **CHANGELOG catch-up** (#367) — Added missing entries for v0.5.5, v0.6.0, and v0.7.0 releases.
## [v0.7.0] — 2026-04-06
Model release (no new firmware binary). Firmware remains at v0.6.0-esp32.
### Added
- **Camera ground-truth training pipeline (ADR-079)** — End-to-end supervised WiFlow pose training using MediaPipe + real ESP32 CSI.
- `scripts/collect-ground-truth.py` — MediaPipe PoseLandmarker webcam capture (17 COCO keypoints, 30fps), synchronized with CSI recording over nanosecond timestamps.
- `scripts/align-ground-truth.js` — Time-aligns camera keypoints with 20-frame CSI windows by binary search, confidence-weighted averaging.
- `scripts/train-wiflow-supervised.js` — 3-phase curriculum training (contrastive → supervised SmoothL1 → bone/temporal refinement) with 4 scale presets (lite/small/medium/full).
- `scripts/eval-wiflow.js` — PCK@10/20/50, MPJPE, per-joint breakdown, baseline proxy mode.
- `scripts/record-csi-udp.py` — Lightweight ESP32 CSI UDP recorder (no Rust build required).
- **ruvector optimizations (O6-O10)** — Subcarrier selection (70→35, 50% reduction), attention-weighted subcarriers, Stoer-Wagner min-cut person separation, multi-SPSA gradient estimation, Mac M4 Pro training via Tailscale.
- **Scalable WiFlow presets** — `lite` (189K params, ~19 min) through `full` (7.7M params, ~8 hrs) to match dataset size.
- **Pre-trained WiFlow v1 model** — 92.9% PCK@20, 974 KB, 186,946 params. Published to [HuggingFace](https://huggingface.co/ruv/ruview) under `wiflow-v1/`.
### Validated
- **92.9% PCK@20** pose accuracy from a 5-minute data collection session with one $9 ESP32-S3 and one laptop webcam.
- Training pipeline validated on real paired data: 345 samples, 19 min training, eval loss 0.082, bone constraint 0.008.
## [v0.6.0-esp32] — 2026-04-03
### Added
- **Pre-trained CSI sensing weights published** — First official pre-trained models on [HuggingFace](https://huggingface.co/ruv/ruview). `model.safetensors` (48 KB), `model-q4.bin` (8 KB 4-bit), `model-q2.bin` (4 KB), `presence-head.json`, per-node LoRA adapters.
- **17 sensing applications** — Sleep monitor, apnea detector, stress monitor, gait analyzer, RF tomography, passive radar, material classifier, through-wall detector, device fingerprint, and more. Each as a standalone `scripts/*.js`.
- **ADRs 069-078** — 10 new architecture decisions covering Cognitum Seed integration, self-supervised pretraining, ruvllm pipeline, WiFlow architecture, channel hopping, SNN, MinCut person separation, CNN spectrograms, novel RF applications, multi-frequency mesh.
- **Kalman tracker** (PR #341 by @taylorjdawson) — temporal smoothing of pose keypoints.
### Fixed
- Security fix merged via PR #310.
### Performance
- Presence detection: 100% accuracy on 60,630 overnight samples.
- Inference: 0.008 ms per sample, 164K embeddings/sec.
- Contrastive self-supervised training: 51.6% improvement over baseline.
## [v0.5.5-esp32] — 2026-04-03
### Added
- **WiFlow SOTA architecture (ADR-072)** — TCN + axial attention pose decoder, 1.8M params, 881 KB at 4-bit. 17 COCO keypoints from CSI amplitude only (no phase).
- **Multi-frequency mesh scanning (ADR-073)** — ESP32 nodes hop across channels 1/3/5/6/9/11 at 200ms dwell. Neighbor WiFi networks used as passive radar illuminators. Null subcarriers reduced from 19% to 16%.
- **Spiking neural network (ADR-074)** — STDP online learning, adapts to new rooms in <30s with no labels, 16-160x less compute than batch training.
- **MinCut person counting (ADR-075)** — Stoer-Wagner min-cut on subcarrier correlation graph. Fixes #348 (was always reporting 4 people).
- **CNN spectrogram embeddings (ADR-076)** — Treat 64×20 CSI as an image, produce 128-dim environment fingerprints (0.95+ same-room similarity).
- **Graph transformer fusion** — Multi-node CSI fusion via GATv2 attention (replaces naive averaging).
- **Camera-free pose training pipeline** — Trains 17-keypoint model from 10 sensor signals with no camera required.
### Fixed
- **#348 person counting** — MinCut correctly counts 1-4 people (24/24 validation windows).
## [v0.5.4-esp32] — 2026-04-02
### Added
@@ -0,0 +1,627 @@
# ADR-081: Gesture-Controlled Data Visualization
- **Status**: Proposed
- **Date**: 2026-04-07
- **Deciders**: ruv
- **Relates to**: ADR-079 (Camera Ground-Truth Training), ADR-029 (RuvSense Gesture Recognition), ADR-072 (WiFlow Architecture), ADR-076 (CNN Spectrogram Embeddings)
## Context
RuView can now track 17 COCO keypoints at 92.9% PCK@20 (ADR-079) and detect gestures
via DTW template matching (ADR-029). These capabilities exist independently — pose
estimation produces skeleton coordinates, and the UI displays static charts. There is no
system that connects hand/arm movements to interactive data exploration.
Gesture-controlled visualization would let users manipulate charts and graphs by waving
their hands in front of the ESP32 sensing zone — no mouse, no touchscreen, no wearable.
This is particularly valuable for:
- **Lab/cleanroom** — gloved hands can't use touchscreens
- **Kitchen/workshop** — dirty or wet hands
- **Presentations** — stand back and gesture at projected dashboards
- **Accessibility** — motor impairments that make mouse use difficult
- **Digital signage** — public displays without touch hardware
### Why Camera + CSI Fusion
Camera alone can do gesture control (e.g., Leap Motion, MediaPipe Hands). CSI alone can
detect coarse gestures (ADR-029). The fusion provides:
| Modality | Strengths | Weaknesses |
|----------|-----------|-----------|
| Camera (MediaPipe Hands) | 21 hand landmarks, finger-level precision, 30fps | Requires line of sight, lighting dependent, privacy concern |
| CSI (ESP32) | Through-wall, works in dark, privacy-preserving, $9 | Coarse spatial resolution, no finger tracking |
| **Fusion** | **Finger precision near camera + coarse tracking everywhere** | Requires both sensors during training |
The fusion model trains on camera + CSI pairs (like ADR-079), then deploys in two modes:
1. **Camera-assisted** — full precision when camera is available
2. **CSI-only** — reduced but functional gesture control without camera
## Decision
Build a gesture-to-visualization control system that maps hand/arm movements to chart
interactions using fused camera + CSI input.
### Gesture Vocabulary
#### Navigation Gestures (arm-level, CSI-detectable)
| Gesture | Motion | Chart Action | CSI Feasibility |
|---------|--------|-------------|-----------------|
| **Swipe left** | Open hand sweeps left | Pan chart left / previous dataset | High — clear directional motion |
| **Swipe right** | Open hand sweeps right | Pan chart right / next dataset | High |
| **Swipe up** | Open hand sweeps up | Scroll up / zoom out | High |
| **Swipe down** | Open hand sweeps down | Scroll down / zoom in | High |
| **Push forward** | Palm pushes toward screen | Select / drill into data point | Medium — depth motion harder |
| **Pull back** | Hand pulls away from screen | Back / zoom out | Medium |
| **Circular CW** | Hand circles clockwise | Increase value / rotate view | Medium — temporal pattern |
| **Circular CCW** | Hand circles counter-clockwise | Decrease value / rotate back | Medium |
| **Hold still** | Hand stationary 2+ seconds | Hover / show tooltip | High — absence of motion |
| **Both hands apart** | Arms spread outward | Expand / zoom into selection | High — bilateral motion |
| **Both hands together** | Arms move inward | Collapse / zoom out | High |
#### Precision Gestures (finger-level, camera-required)
| Gesture | Motion | Chart Action | Sensor |
|---------|--------|-------------|--------|
| **Pinch zoom** | Thumb + index spread/close | Continuous zoom | Camera only |
| **Point** | Index finger extended | Cursor position on chart | Camera only |
| **Grab** | Close fist | Grab and drag data point | Camera only |
| **Thumb up** | Thumbs up | Confirm / approve | Camera only |
| **Thumb down** | Thumbs down | Reject / undo | Camera only |
| **Two-finger rotate** | Two fingers twist | Rotate 3D visualization | Camera only |
| **Finger slider** | Index finger moves along axis | Adjust parameter value | Camera only |
### Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ Input Layer │
│ │
│ ESP32 CSI (UDP 5005) ──→ CSI Gesture Detector (DTW + WiFlow) │
│ ↓ │
│ Webcam (MediaPipe Hands) ──→ Hand Landmark Tracker (21 joints) │
│ ↓ │
│ Gesture Fusion Engine │
│ ├── CSI coarse: swipe/circle/hold │
│ ├── Camera fine: pinch/point/grab │
│ └── Confidence weighting by modality │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Gesture Interpreter │
│ │
│ Raw gestures ──→ State Machine ──→ Chart Commands │
│ │
│ States: │
│ IDLE ──(motion detected)──→ TRACKING │
│ TRACKING ──(gesture matched)──→ ACTING │
│ ACTING ──(gesture complete)──→ COOLDOWN │
│ COOLDOWN ──(500ms)──→ IDLE │
│ │
│ Debounce: 200ms minimum gesture duration │
│ Cooldown: 500ms between consecutive gestures │
│ Confidence threshold: 0.7 for CSI, 0.9 for camera │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Visualization Controller │
│ │
│ Chart Commands ──→ WebSocket ──→ UI │
│ │
│ Commands: │
│ { type: "pan", dx: -0.1, dy: 0 } │
│ { type: "zoom", factor: 1.2, center: [0.5, 0.5] } │
│ { type: "select", x: 0.45, y: 0.62 } │
│ { type: "rotate", angle: 15 } │
│ { type: "slider", axis: "x", value: 0.73 } │
│ { type: "hover", x: 0.45, y: 0.62 } │
│ { type: "back" } │
│ { type: "confirm" } │
│ { type: "reject" } │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Visualization UI │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Line Chart │ │ Bar Chart │ │ 3D Scatter │ │
│ │ (time │ │ (category │ │ (spatial │ │
│ │ series) │ │ compare) │ │ data) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Heatmap │ │ Gauge │ │ Spectrogram │ │
│ │ (CSI grid) │ │ (vitals) │ │ (frequency) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Visual feedback: gesture cursor overlay + action indicator │
│ Framework: D3.js / Observable Plot in existing UI │
└──────────────────────────────────────────────────────────────────┘
```
### Gesture Detection Pipeline
#### CSI Gesture Detection (arm-level)
Extends the existing DTW gesture classifier (ADR-029) with WiFlow pose input:
```
CSI [35, 20] ──→ WiFlow lite ──→ 17 keypoints ──→ Extract arm features:
- Wrist velocity (dx/dt, dy/dt)
- Elbow angle (shoulder-elbow-wrist)
- Bilateral symmetry (left vs right)
- Motion energy (frame differencing)
DTW template matching:
- 11 gesture templates
- Sliding window (1s)
- Top match + confidence
```
#### Camera Gesture Detection (finger-level)
Uses MediaPipe Hands (21 landmarks per hand, 30fps):
```
Webcam ──→ MediaPipe Hands ──→ 21 landmarks × 2 hands ──→ Extract:
- Finger states (extended/curled)
- Pinch distance (thumb-index)
- Grab state (all fingers curled)
- Point direction (index ray)
- Hand center velocity
Rule-based classifier:
- Pinch: thumb-index < 0.05
- Point: only index extended
- Grab: all fingers curled
- Thumbs up/down: thumb angle
```
#### Fusion Strategy
```
CSI confidence ──┐
├──→ Weighted fusion ──→ Final gesture + confidence
Camera conf ──┘
Rules:
- If both agree: confidence = max(csi_conf, cam_conf) + 0.1 * min(csi_conf, cam_conf)
- If only CSI: use CSI gesture, confidence *= 0.8
- If only camera: use camera gesture, confidence *= 0.95
- If conflict: prefer camera for fine gestures, CSI for coarse gestures
- Minimum confidence for action: 0.6
```
### Chart Interaction Mapping
#### Line Chart (Time Series)
| Gesture | Action | Parameters |
|---------|--------|-----------|
| Swipe left/right | Pan time axis | dx proportional to swipe speed |
| Pinch zoom | Zoom time axis | Continuous, centered on hand position |
| Both hands apart/together | Zoom (CSI-only alternative) | Binary zoom in/out |
| Point | Show tooltip at nearest data point | x from index finger position |
| Hold still | Sticky tooltip | Duration-based activation |
| Swipe up/down | Switch dataset / Y-axis scale | Discrete steps |
#### Bar Chart (Category Comparison)
| Gesture | Action | Parameters |
|---------|--------|-----------|
| Swipe left/right | Navigate categories | One category per swipe |
| Point | Highlight bar | Nearest bar to finger X position |
| Push forward | Select bar for drill-down | Depth gesture |
| Grab + drag | Reorder bars | Camera-only |
| Circular | Sort ascending/descending | Direction determines order |
#### 3D Scatter Plot
| Gesture | Action | Parameters |
|---------|--------|-----------|
| Swipe left/right | Rotate Y axis | Angle proportional to speed |
| Swipe up/down | Rotate X axis | Angle proportional to speed |
| Two-finger rotate | Rotate Z axis | Camera-only |
| Pinch zoom | Zoom | Camera-only |
| Both hands apart | Zoom in (CSI alternative) | Binary |
| Point | Highlight nearest point | Ray-cast from finger direction |
#### Heatmap (CSI Grid)
| Gesture | Action | Parameters |
|---------|--------|-----------|
| Swipe | Pan view | dx, dy |
| Pinch | Zoom region | Center + scale |
| Hold | Show cell value | Position-based |
| Circular | Adjust color scale range | CW = expand, CCW = contract |
#### Gauge (Vital Signs)
| Gesture | Action | Parameters |
|---------|--------|-----------|
| Swipe left/right | Switch vital (HR → BR → SpO2) | Discrete |
| Circular CW | Set high alert threshold | Continuous |
| Circular CCW | Set low alert threshold | Continuous |
| Thumb up | Acknowledge alert | Binary |
### Visual Feedback: AR Camera Overlay
The primary view is the **live camera feed with AR overlays** — the person is visible
with charts, skeleton, and data rendered on top. This creates a "Minority Report" style
interface where you see yourself manipulating data in real-time.
```
┌──────────────────────────────────────────────────────────────┐
│ │
│ ╔══════════════════════════════════════════════════════════╗ │
│ ║ ║ │
│ ║ [Live Camera Feed — person visible] ║ │
│ ║ ║ │
│ ║ ╭─────╮ ║ │
│ ║ │ │ ← skeleton overlay (17 keypoints) ║ │
│ ║ ╰──┬──╯ ║ │
│ ║ ╱ ╲ ║ │
│ ║ ╱ ╲ ┌──────────────────────┐ ║ │
│ ║ │ │ │ CSI Amplitude Chart │ ║ │
│ ║ │ 🖐→ │ │ ┌─╮ ╭─╮ ╭──╮ │ ║ │
│ ║ │ │ │ │ ╰─╯ ╰───╯ │ │ ║ │
│ ║ ╲ ╱ │ │ │ │ ║ │
│ ║ ╲ ╱ └──────────────────────┘ ║ │
│ ║ │ │ ↑ chart follows hand position ║ │
│ ║ ╱ ╲ ║ │
│ ║ ╱ ╲ ║ │
│ ║ ║ │
│ ╚══════════════════════════════════════════════════════════╝ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ LOWER THIRD │ │
│ │ ┌────┐ │ │
│ │ │ pi │ RuView Sensing HR: 72 BPM BR: 16 BPM │ │
│ │ │ │ v0.7.0 Presence: 1 Motion: 0.23 │ │
│ │ └────┘ │ │
│ │ [logo] [gesture: Swipe Right] [CSI ●] [CAM ●] [28fps]│ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
```
#### AR Overlay Layers (bottom to top)
| Layer | Content | Opacity | Update Rate |
|-------|---------|---------|-------------|
| 0 | Live camera feed (full frame) | 100% | 30fps |
| 1 | Skeleton overlay (17 keypoints + bones) | 70% | 30fps |
| 2 | Gesture cursor (hand position + state) | 90% | 30fps |
| 3 | Floating chart (anchored to hand/body region) | 85% | 30fps |
| 4 | Data labels + tooltips | 95% | On gesture |
| 5 | Lower third (RuView branding + vitals + status) | 95% | 1fps |
#### Floating Chart Placement
Charts are **anchored to the person's body** and follow movement:
```
Placement rules:
- Default: chart floats to the right of the person's dominant hand
- If hand moves left: chart slides to left side
- Chart stays within frame bounds (never clips off-screen)
- Multiple charts: stack vertically with 10% gap
- Inactive charts: shrink to thumbnail and anchor near shoulder
Chart anchor point = wrist_position + offset(0.15, -0.1) // right and slightly above hand
Chart size: 30% of frame width × 20% of frame height
```
#### Lower Third Design
The lower third bar provides persistent status in broadcast-style framing:
```
┌──────────────────────────────────────────────────────────────┐
│ ┌──────┐ │
│ │ pi │ RuView Sensing v0.7.0 │
│ │ │ ────────────────────────────────────────────── │
│ │ logo │ HR: 72 BPM | BR: 16 BPM | Persons: 1 │
│ └──────┘ Motion: Low | Gesture: Swipe Right | 28fps │
│ [CSI ●] [CAM ●] [FUSE] PCK@20: 92.9% │
└──────────────────────────────────────────────────────────────┘
Design:
- Background: semi-transparent dark (#1a1a2e, 80% opacity)
- Logo: RuView "pi" icon (32x32px), left-aligned
- Text: white (#ffffff) primary, gray (#a0a0a0) secondary
- Accent: teal (#00d4aa) for active indicators
- Height: 15% of frame
- Font: system monospace for data, sans-serif for labels
- Divider: thin teal line separating logo from data
```
#### RuView Logo Placement
```
The "pi" logo appears in two contexts:
1. Lower third (persistent):
- Position: bottom-left corner, 12px padding
- Size: 32x32px
- Style: white outline on dark background
- Always visible during gesture mode
2. Watermark (optional):
- Position: top-right corner, 8px padding
- Size: 24x24px, 30% opacity
- Style: subtle, doesn't interfere with data
```
#### Skeleton Rendering Style
```
Keypoint rendering:
- Detected joints: teal circles (#00d4aa), radius 6px
- Low-confidence joints: gray circles (#666), radius 4px
- Active hand (gesturing): yellow highlight (#ffcc00), radius 8px, glow effect
Bone rendering:
- Normal bones: teal lines (#00d4aa), 2px stroke
- Active arm (gesturing): yellow lines (#ffcc00), 3px stroke, glow
- Torso: slightly thicker (3px) to anchor the skeleton visually
Style: dark-theme friendly, high contrast against camera feed
```
**Cursor types:**
- **Open hand** — teal ring around wrist, rays extending from fingers
- **Pointing** — teal ray from index finger toward chart
- **Grabbing** — yellow fist icon, chart border highlights
- **Pinching** — two teal dots (thumb + index) with distance line
- **Ghost cursor** — CSI-only mode: larger, more diffuse circle (no finger detail)
### Data Flow Protocol
WebSocket messages from gesture engine to UI:
```typescript
interface GestureEvent {
type: 'gesture';
gesture: 'swipe_left' | 'swipe_right' | 'swipe_up' | 'swipe_down'
| 'pinch_zoom' | 'point' | 'grab' | 'hold' | 'circle_cw'
| 'circle_ccw' | 'push' | 'pull' | 'spread' | 'contract'
| 'thumb_up' | 'thumb_down';
confidence: number; // 0-1
source: 'csi' | 'camera' | 'fusion';
position?: [number, number]; // Normalized [0,1] hand position
velocity?: [number, number]; // Hand velocity for proportional control
param?: number; // Gesture-specific parameter (pinch distance, rotation angle)
}
interface CursorEvent {
type: 'cursor';
x: number; // 0-1 normalized
y: number; // 0-1 normalized
state: 'tracking' | 'pointing' | 'grabbing' | 'pinching' | 'idle';
hands: number; // 0, 1, or 2
}
interface StatusEvent {
type: 'status';
csi_active: boolean;
camera_active: boolean;
mode: 'fusion' | 'csi_only' | 'camera_only';
fps: number;
gesture_count: number; // Total gestures detected this session
}
```
### Training the CSI Gesture Model
Extends ADR-079's camera ground-truth pipeline:
```bash
# 1. Collect gesture training data (camera + CSI, 10 min)
# Perform each gesture 20+ times with natural variation
python scripts/collect-gesture-gt.py --duration 600 --gestures all --preview
# 2. Label gesture segments (auto-detected from camera)
node scripts/label-gestures.js \
--gt data/ground-truth/gestures-*.jsonl \
--csi data/recordings/csi-*.jsonl
# 3. Train gesture classifier
node scripts/train-gesture-model.js \
--data data/gestures/labeled-*.jsonl \
--scale lite
# 4. Deploy
# CSI-only mode: gestures detected from WiFlow keypoint motion
# Fusion mode: camera adds finger-level precision
```
**Training data per gesture:** ~20 examples × 11 gestures = 220 labeled samples.
With augmentation (time warp, amplitude noise): ~1,000 effective samples.
### Optimization: ruvector-cnn Spectrogram Gesture Classification
Replace DTW template matching with a CNN operating on CSI spectrograms via the
`ruvector-cnn` WASM package (ADR-076). This treats each gesture as an image
classification problem on the CSI time-frequency representation.
#### Why CNN Over DTW
| | DTW (current, ADR-029) | CNN Spectrogram (proposed) |
|---|---|---|
| Input | 1D keypoint trajectories | 2D CSI spectrogram image |
| Features | Hand-crafted (wrist velocity, elbow angle) | Learned end-to-end |
| Robustness | Sensitive to speed variation | Warp-invariant (pooling layers) |
| Multi-scale | Single scale | Hierarchical (dilated convolutions) |
| Training | Template recording + DTW distance | Supervised from camera labels |
| New gestures | Record new template | Retrain (or few-shot with embedding) |
| Accuracy | ~85% (DTW literature) | ~95%+ (CNN on spectrograms, literature) |
#### Pipeline
```
CSI [N_subcarriers, T=30] (1-second window)
Spectrogram transform: STFT per subcarrier
→ [N_sub, F_bins, T_bins] ≈ [35, 16, 15]
Reshape to grayscale image: [35×16, 15] = [560, 15]
→ Resize to [64, 64] (bilinear)
ruvector-cnn CnnEmbedder (WASM-accelerated)
→ 128-dim gesture embedding
Classifier head: Linear(128 → 18 gestures) + softmax
→ gesture_id + confidence
```
#### ruvector-cnn Integration
The `@ruvector/cnn` WASM package provides:
```javascript
const { init, CnnEmbedder, InfoNCELoss } = require('@ruvector/cnn');
await init();
// Create embedder for 64x64 CSI spectrogram "images"
const embedder = new CnnEmbedder({
inputSize: 64,
embeddingDim: 128,
normalize: true,
});
// Extract embedding from CSI spectrogram
const spectrogram = csiToSpectrogram(csiWindow); // [64, 64] Uint8Array
const embedding = embedder.extract(spectrogram, 64, 64);
// Classify gesture via nearest-neighbor to trained templates
const gesture = classifyGesture(embedding, gestureTemplates);
```
#### Training with Contrastive + Classification
Two-phase training using ruvector-cnn's built-in losses:
**Phase 1: Contrastive embedding (unsupervised)**
```javascript
const loss = new InfoNCELoss(0.07);
// Same gesture performed at different speeds → positive pairs
// Different gestures → negative pairs
// Train CnnEmbedder to cluster same-gesture spectrograms
```
**Phase 2: Gesture classification (supervised)**
```javascript
// Linear classifier on frozen embeddings
// 18 gestures × 20 examples each = 360 labeled samples
// Camera auto-labels: MediaPipe Hands detects gesture type
```
#### Dual-Path Architecture
Run both CNN and DTW in parallel for maximum robustness:
```
CSI input ──┬──→ WiFlow → keypoints → DTW templates → gesture_A (conf_A)
└──→ Spectrogram → ruvector-cnn → embedding → classifier → gesture_B (conf_B)
Fusion: if gesture_A == gesture_B → conf = max(conf_A, conf_B) + 0.15
if conflict → pick higher confidence
if only one detects → use it at 0.8× confidence
```
This dual-path approach provides:
- **DTW** catches gestures the CNN might miss (novel variations)
- **CNN** provides higher accuracy for trained gesture types
- **Fusion** reduces false positives (both must agree for high-confidence)
### Optimization: Temporal Gesture Encoding
Alternative lightweight path for when ruvector-cnn WASM overhead matters
(e.g., ESP32 edge deployment):
```
Keypoint sequence [T=30 frames, 1 second]:
wrist_x[0..29], wrist_y[0..29],
elbow_angle[0..29],
hand_velocity[0..29]
1D CNN (k=5, d=[1,2,4]) → 64-dim gesture embedding
Nearest-neighbor to gesture templates (cosine distance)
Top gesture + confidence
```
This is lighter than DTW for real-time use and can be trained end-to-end with
the WiFlow backbone (shared TCN features).
## File Structure
```
scripts/
collect-gesture-gt.py # Camera + CSI gesture data collection
label-gestures.js # Auto-label gesture segments from camera
train-gesture-model.js # Train CSI gesture classifier
gesture-server.js # WebSocket gesture detection server
ui/
components/
GestureOverlay.js # Cursor + feedback overlay
GestureChart.js # Gesture-controlled chart wrapper
GestureStatus.js # Sensor health bar
services/
gesture.service.js # WebSocket client for gesture events
```
## Consequences
### Positive
- **Hands-free data exploration** — manipulate charts without touching anything
- **Works in dark/dirty/gloved conditions** — CSI-only mode needs no camera
- **Natural interaction** — swipe, pinch, point are intuitive
- **Builds on existing infrastructure** — WiFlow + DTW + MediaPipe all exist
- **Dual-mode deployment** — degrade gracefully from fusion to CSI-only
- **Low latency** — WiFlow inference is 0.79ms, gesture detection adds ~5ms
### Negative
- **Learning curve** — users must learn gesture vocabulary
- **False positives** — normal movement may trigger gestures (mitigated by state machine + cooldown)
- **CSI-only precision** — coarse gestures only without camera
- **Single-user** — multi-user gesture disambiguation is hard
### Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Gesture false positives from normal movement | Medium | High | State machine with IDLE→TRACKING threshold, 200ms debounce, 0.7 confidence gate |
| CSI gestures too coarse for chart control | Medium | Medium | Camera fallback for precision; CSI handles navigation-level gestures only |
| Latency > 100ms feels unresponsive | Low | High | WiFlow 0.79ms + gesture 5ms + WebSocket <10ms = ~16ms total |
| User fatigue ("gorilla arm") | Medium | Medium | Support seated gestures; small wrist movements, not full arm sweeps |
| MediaPipe Hands not detecting in low light | Medium | Low | CSI-only fallback; works in complete darkness |
## Implementation Plan
| Phase | Task | Effort | Dependencies |
|-------|------|--------|-------------|
| P1 | `gesture-server.js` — WebSocket server with camera hand tracking | 3 hrs | MediaPipe Hands model |
| P2 | Camera gesture classifier (rule-based from hand landmarks) | 2 hrs | P1 |
| P3 | CSI gesture classifier (WiFlow keypoints → DTW templates) | 3 hrs | WiFlow model (ADR-079) |
| P4 | Fusion engine (confidence-weighted merge) | 2 hrs | P2 + P3 |
| P5 | `GestureOverlay.js` — cursor + feedback UI component | 2 hrs | P1 |
| P6 | `GestureChart.js` — gesture-controlled D3 chart wrapper | 4 hrs | P4 + P5 |
| P7 | Gesture training data collection + model training | 2 hrs | P3 |
| P8 | Integration with existing sensing UI | 2 hrs | P6 |
| **Total** | | **~20 hrs** | |
## References
- MediaPipe Hands — Google's 21-landmark hand tracking (30fps, CPU)
- ADR-029 — RuvSense DTW gesture recognition
- ADR-079 — Camera ground-truth training pipeline (92.9% PCK@20)
- Leap Motion — commercial gesture controller (comparison point)
- SolidJS/D3 gesture interaction patterns
- "GestureWiFi" (IEEE 2023) — WiFi gesture recognition survey
+4 -36
View File
@@ -25,14 +25,6 @@
/* ADR-060: Access the global NVS config for MAC filter and channel override. */
extern nvs_config_t g_nvs_config;
/* Defensive fix (#232, #375, #385, #386, #390): capture node_id at init-time
* into a module-local static. Using the global g_nvs_config.node_id directly
* at every callback is vulnerable to any memory corruption that clobbers the
* struct (which users have reported reverting node_id to the Kconfig default
* of 1). The local copy is set once at csi_collector_init() and then used
* exclusively by csi_serialize_frame(). */
static uint8_t s_node_id = 1;
/* ADR-057: Build-time guard — fail early if CSI is not enabled in sdkconfig.
* Without this, the firmware compiles but crashes at runtime with:
* "E (xxxx) wifi:CSI not enabled in menuconfig!"
@@ -125,9 +117,8 @@ size_t csi_serialize_frame(const wifi_csi_info_t *info, uint8_t *buf, size_t buf
uint32_t magic = CSI_MAGIC;
memcpy(&buf[0], &magic, 4);
/* Node ID (captured at init into s_node_id to survive memory corruption
* that could clobber g_nvs_config.node_id - see #232/#375/#385/#390). */
buf[4] = s_node_id;
/* Node ID (from NVS runtime config, not compile-time Kconfig) */
buf[4] = g_nvs_config.node_id;
/* Number of antennas */
buf[5] = n_antennas;
@@ -224,13 +215,6 @@ static void wifi_promiscuous_cb(void *buf, wifi_promiscuous_pkt_type_t type)
void csi_collector_init(void)
{
/* Capture node_id into module-local static at init time. After this point
* csi_serialize_frame() uses s_node_id exclusively, isolating the UDP
* frame node_id field from any memory corruption of g_nvs_config. */
s_node_id = g_nvs_config.node_id;
ESP_LOGI(TAG, "Captured node_id=%u at init (defensive copy for #232/#375/#385/#390)",
(unsigned)s_node_id);
/* ADR-060: Determine the CSI channel.
* Priority: 1) NVS override (--channel), 2) connected AP channel, 3) Kconfig default. */
uint8_t csi_channel = (uint8_t)CONFIG_CSI_WIFI_CHANNEL;
@@ -288,24 +272,8 @@ void csi_collector_init(void)
g_nvs_config.filter_mac[4], g_nvs_config.filter_mac[5]);
}
ESP_LOGI(TAG, "CSI collection initialized (node_id=%u, channel=%u)",
(unsigned)s_node_id, (unsigned)csi_channel);
/* Clobber-detection canary: if g_nvs_config.node_id no longer matches the
* value we captured, something corrupted the struct between nvs_config_load
* and here. This is the historic #232/#375 symptom. */
if (g_nvs_config.node_id != s_node_id) {
ESP_LOGW(TAG, "node_id clobber detected: captured=%u but g_nvs_config=%u "
"(frames will use captured value %u). Please report to #390.",
(unsigned)s_node_id, (unsigned)g_nvs_config.node_id,
(unsigned)s_node_id);
}
}
/* Accessor for other modules that need the authoritative runtime node_id. */
uint8_t csi_collector_get_node_id(void)
{
return s_node_id;
ESP_LOGI(TAG, "CSI collection initialized (node_id=%d, channel=%u)",
g_nvs_config.node_id, (unsigned)csi_channel);
}
/* ---- ADR-029: Channel hopping ---- */
@@ -29,18 +29,6 @@
*/
void csi_collector_init(void);
/**
* Get the runtime node_id captured at csi_collector_init().
*
* This is a defensive copy of g_nvs_config.node_id taken at init time. Other
* modules (edge_processing, wasm_runtime, display_ui) should prefer this
* accessor over reading g_nvs_config.node_id directly, because the global
* struct can be clobbered by memory corruption (see #232, #375, #385, #390).
*
* @return Node ID (0-255) as loaded from NVS or Kconfig default at boot.
*/
uint8_t csi_collector_get_node_id(void);
/**
* Serialize CSI data into ADR-018 binary frame format.
*
+1 -2
View File
@@ -8,7 +8,6 @@
#include "display_ui.h"
#include "nvs_config.h"
#include "csi_collector.h" /* csi_collector_get_node_id() - defensive #390 */
#include "sdkconfig.h"
extern nvs_config_t g_nvs_config;
@@ -351,7 +350,7 @@ void display_ui_update(void)
{
char buf[48];
snprintf(buf, sizeof(buf), "Node: %u", (unsigned)csi_collector_get_node_id());
snprintf(buf, sizeof(buf), "Node: %d", g_nvs_config.node_id);
lv_label_set_text(s_sys_node, buf);
snprintf(buf, sizeof(buf), "Heap: %lu KB free",
@@ -19,7 +19,6 @@
#include "edge_processing.h"
#include "nvs_config.h"
#include "csi_collector.h" /* csi_collector_get_node_id() - defensive #390 */
#include "mmwave_sensor.h"
/* Runtime config — declared in main.c, loaded from NVS at boot. */
@@ -442,7 +441,7 @@ static void send_compressed_frame(const uint8_t *iq_data, uint16_t iq_len,
uint32_t magic = EDGE_COMPRESSED_MAGIC;
memcpy(&pkt[0], &magic, 4);
pkt[4] = csi_collector_get_node_id(); /* #390: defensive copy */
pkt[4] = g_nvs_config.node_id;
pkt[5] = channel;
memcpy(&pkt[6], &iq_len, 2);
memcpy(&pkt[8], &comp_len, 2);
@@ -558,7 +557,7 @@ static void send_vitals_packet(void)
memset(&pkt, 0, sizeof(pkt));
pkt.magic = EDGE_VITALS_MAGIC;
pkt.node_id = csi_collector_get_node_id(); /* #390: defensive copy */
pkt.node_id = g_nvs_config.node_id;
pkt.flags = 0;
if (s_presence_detected) pkt.flags |= 0x01;
@@ -648,7 +647,7 @@ static void send_feature_vector(void)
memset(&pkt, 0, sizeof(pkt));
pkt.magic = EDGE_FEATURE_MAGIC;
pkt.node_id = csi_collector_get_node_id(); /* #390: defensive copy */
pkt.node_id = g_nvs_config.node_id;
pkt.reserved = 0;
pkt.seq = s_feature_seq++;
pkt.timestamp_us = esp_timer_get_time();
+1 -1
View File
@@ -267,7 +267,7 @@ void app_main(void)
strncpy(swarm_cfg.seed_url, g_nvs_config.seed_url, sizeof(swarm_cfg.seed_url) - 1);
strncpy(swarm_cfg.seed_token, g_nvs_config.seed_token, sizeof(swarm_cfg.seed_token) - 1);
strncpy(swarm_cfg.zone_name, g_nvs_config.zone_name, sizeof(swarm_cfg.zone_name) - 1);
swarm_ret = swarm_bridge_init(&swarm_cfg, csi_collector_get_node_id());
swarm_ret = swarm_bridge_init(&swarm_cfg, g_nvs_config.node_id);
if (swarm_ret != ESP_OK) {
ESP_LOGW(TAG, "Swarm bridge init failed: %s", esp_err_to_name(swarm_ret));
}
+1 -2
View File
@@ -13,7 +13,6 @@
#include "sdkconfig.h"
#include "wasm_runtime.h"
#include "nvs_config.h"
#include "csi_collector.h" /* csi_collector_get_node_id() - defensive #390 */
extern nvs_config_t g_nvs_config;
@@ -384,7 +383,7 @@ static void send_wasm_output(uint8_t slot_id)
memset(&pkt, 0, sizeof(pkt));
pkt.magic = WASM_OUTPUT_MAGIC;
pkt.node_id = csi_collector_get_node_id(); /* #390: defensive copy */
pkt.node_id = g_nvs_config.node_id;
pkt.module_id = slot_id;
pkt.event_count = n_filtered;
+3 -40
View File
@@ -9,13 +9,8 @@ Usage:
python provision.py --port COM7 --ssid "MyWiFi" --password "secret" --target-ip 192.168.1.20
Requirements:
pip install 'esptool>=5.0' nvs-partition-gen
pip install esptool nvs-partition-gen
(or use the nvs_partition_gen.py bundled with ESP-IDF)
WARNING -- FULL-REPLACE SEMANTICS (issue #391):
Every invocation REPLACES the entire `csi_cfg` NVS namespace on the device.
Any key you don't pass on the CLI is erased. Always include WiFi credentials
(--ssid, --password, --target-ip) unless you pass --force-partial.
"""
import argparse
@@ -155,7 +150,7 @@ def flash_nvs(port, baud, nvs_bin):
"--chip", "esp32s3",
"--port", port,
"--baud", str(baud),
"write-flash",
"write_flash",
hex(NVS_PARTITION_OFFSET), bin_path,
]
print(f"Flashing NVS partition ({len(nvs_bin)} bytes) to {port}...")
@@ -204,10 +199,6 @@ def main():
parser.add_argument("--swarm-hb", type=int, help="Swarm heartbeat interval in seconds (default 30)")
parser.add_argument("--swarm-ingest", type=int, help="Swarm vector ingest interval in seconds (default 5)")
parser.add_argument("--dry-run", action="store_true", help="Generate NVS binary but don't flash")
parser.add_argument("--force-partial", action="store_true",
help="Allow partial config without WiFi credentials. "
"WARNING: flashing REPLACES the entire csi_cfg NVS namespace - "
"any key not passed on the CLI will be erased (issue #391).")
args = parser.parse_args()
@@ -224,34 +215,6 @@ def main():
if not has_value:
parser.error("At least one config value must be specified")
# Bug 2 (#391): Prevent silent wipe of WiFi credentials on partial invocations.
# Flashing the generated NVS binary to offset 0x9000 REPLACES the entire
# csi_cfg namespace — there is no merge with existing NVS. Require the full
# WiFi trio unless the user explicitly opts in with --force-partial.
wifi_trio_missing = [
name for name, val in [
("--ssid", args.ssid),
("--password", args.password),
("--target-ip", args.target_ip),
] if val is None or val == ""
]
if wifi_trio_missing and not args.force_partial:
parser.error(
f"Missing required WiFi credentials: {', '.join(wifi_trio_missing)}.\n"
f"\n"
f" provision.py REPLACES the entire csi_cfg NVS namespace on each run.\n"
f" Any key not passed on the CLI will be erased -- including WiFi creds.\n"
f"\n"
f" Either pass all of --ssid, --password, --target-ip,\n"
f" or add --force-partial to acknowledge that other NVS keys will be wiped."
)
if args.force_partial and wifi_trio_missing:
print("WARNING: --force-partial is set. The following NVS keys will be WIPED "
"(not present in this invocation):", file=sys.stderr)
for k in wifi_trio_missing:
print(f" - {k.lstrip('-')}", file=sys.stderr)
print(" Plus any other csi_cfg keys not passed on the CLI.\n", file=sys.stderr)
# Validate TDM: if one is given, both should be
if (args.tdm_slot is not None) != (args.tdm_total is not None):
parser.error("--tdm-slot and --tdm-total must be specified together")
@@ -335,7 +298,7 @@ def main():
f.write(nvs_bin)
print(f"NVS binary saved to {out} ({len(nvs_bin)} bytes)")
print(f"Flash manually: python -m esptool --chip esp32s3 --port {args.port} "
f"write-flash 0x9000 {out}")
f"write_flash 0x9000 {out}")
return
flash_nvs(args.port, args.baud, nvs_bin)