Compare commits

...

2 Commits

Author SHA1 Message Date
ruv 425f0e6aac fix(firmware): defensive node_id capture prevents runtime clobber (#390)
Users on multi-node ESP32 deployments have been reporting for months
that their provisioned `node_id` reverts to the Kconfig default of `1`
in UDP frames and the `csi_collector` init log, despite boot showing:

    nvs_config: NVS override: node_id=4
    main: ESP32-S3 CSI Node (ADR-018) - Node ID: 4
    csi_collector: CSI collection initialized (node_id=1, channel=11)

See #232, #375, #385, #386, #390. The root memory-corruption path for
the `g_nvs_config.node_id` byte has not been definitively isolated
(does not reproduce on my attached ESP32-S3 running current source
and the v0.6.0 release binary), but the UDP frame header can be made
tamper-proof regardless:

1. `csi_collector_init()` now captures `g_nvs_config.node_id` into a
   module-local `static uint8_t s_node_id` at init time.
2. `csi_serialize_frame()` reads `buf[4]` from `s_node_id`, not from
   the global - so any later corruption of `g_nvs_config` cannot
   affect outgoing CSI frames.
3. All other consumers (`edge_processing.c` x3, `wasm_runtime.c`,
   `display_ui.c`, `main.c swarm_bridge_init`) now go through a new
   `csi_collector_get_node_id()` accessor instead of reading the
   global directly.
4. A canary at end-of-init logs `WARN` if `g_nvs_config.node_id`
   already diverges from the captured value - this will pinpoint
   the corruption path if it happens on a user's device.

Hardware validation on attached ESP32-S3 (COM8):
  - NVS loads node_id=2
  - Boot log: `main: ... Node ID: 2`
  - NEW log: `csi_collector: Captured node_id=2 at init (defensive
    copy for #232/#375/#385/#390)`
  - Init log: `csi_collector: CSI collection initialized (node_id=2)`
  - UDP frame byte[4] = 2 (verified via socket sniffer, 15/15 packets)

This is defense in depth - it shields the UDP frame from whatever
upstream bug is clobbering the struct. When a user hits the original
bug, the canary WARN will help isolate the root cause.

Refs #232 #375 #385 #386 #390

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-15 13:47:34 -04:00
rUv 6e015c4626 fix: provision.py esptool v5 + refuse partial NVS flashes (#391) (#392)
* fix: provision.py esptool v5 syntax + refuse partial NVS flashes (#391)

Bug 1: `write_flash` -> `write-flash` for esptool v5.x compat
  - Actual flash command (flash_nvs, line 153) was already fixed
  - Dry-run manual-flash hint (line 301) still printed old syntax

Bug 2: Refuse partial invocations that would silently wipe NVS
  - provision.py flashes a fresh NVS binary at offset 0x9000, which
    REPLACES the entire csi_cfg namespace. Any key not passed on the
    CLI is erased.
  - Previously: `provision.py --port COM8 --target-port 5005` would
    silently wipe ssid, password, target_ip, node_id, etc., causing
    "Retrying WiFi connection (10/10)" in the field.
  - Now: refuse unless all of --ssid/--password/--target-ip provided,
    or --force-partial is set (prints warning listing wiped keys).

Validation:
  - Dry-run: binary generates to 24576 bytes, hint uses write-flash
  - Safety check: partial invocation rejected with clear message
  - Force-partial: warning lists keys that will be wiped
  - Hardware: esptool v5.1.0 `read-flash 0x9000 0x100` works on
    attached ESP32-S3 (COM8); NVS preserved, device reconnected at
    192.168.1.104 with node_id=2 intact after reset.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: CHANGELOG catch-up for v0.5.5, v0.6.0, v0.7.0 (#367)

The changelog was stale at v0.5.4 — three releases were cut without
updating it. Added full entries for each, plus an [Unreleased] block
for the #391 provision.py fixes.

version.txt correctly stays at 0.6.0 — v0.7.0 was a model/pipeline
release, not a new firmware binary. Latest firmware is v0.6.0-esp32.

Closes #367

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-15 13:12:46 -04:00
8 changed files with 156 additions and 13 deletions
+59
View File
@@ -5,6 +5,65 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Fixed
- **`provision.py` esptool v5 compat** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct.
- **`provision.py` silent NVS wipe** (#391) — The script replaces the entire `csi_cfg` NVS namespace on every run, so partial invocations were silently erasing WiFi credentials and causing `Retrying WiFi connection (10/10)` in the field. Now refuses to run without `--ssid`, `--password`, and `--target-ip` unless `--force-partial` is passed. `--force-partial` prints a warning listing which keys will be wiped.
- **Firmware: defensive `node_id` capture** (#232, #375, #385, #386, #390) — Users on multi-node deployments reported `node_id` reverting to the Kconfig default (`1`) in UDP frames and in the `csi_collector` init log, despite NVS loading the correct value. The root cause (memory corruption of `g_nvs_config`) has not been definitively isolated, but the UDP frame header is now tamper-proof: `csi_collector_init()` captures `g_nvs_config.node_id` into a module-local `s_node_id` once, and `csi_serialize_frame()` plus all other consumers (`edge_processing.c`, `wasm_runtime.c`, `display_ui.c`, `swarm_bridge_init`) read it via the new `csi_collector_get_node_id()` accessor. A canary logs `WARN` if `g_nvs_config.node_id` diverges from `s_node_id` at end-of-init, helping isolate the upstream corruption path. Validated on attached ESP32-S3 (COM8): NVS `node_id=2` propagates through boot log, capture log, init log, and byte[4] of every UDP frame.
### Docs
- **CHANGELOG catch-up** (#367) — Added missing entries for v0.5.5, v0.6.0, and v0.7.0 releases.
## [v0.7.0] — 2026-04-06
Model release (no new firmware binary). Firmware remains at v0.6.0-esp32.
### Added
- **Camera ground-truth training pipeline (ADR-079)** — End-to-end supervised WiFlow pose training using MediaPipe + real ESP32 CSI.
- `scripts/collect-ground-truth.py` — MediaPipe PoseLandmarker webcam capture (17 COCO keypoints, 30fps), synchronized with CSI recording over nanosecond timestamps.
- `scripts/align-ground-truth.js` — Time-aligns camera keypoints with 20-frame CSI windows by binary search, confidence-weighted averaging.
- `scripts/train-wiflow-supervised.js` — 3-phase curriculum training (contrastive → supervised SmoothL1 → bone/temporal refinement) with 4 scale presets (lite/small/medium/full).
- `scripts/eval-wiflow.js` — PCK@10/20/50, MPJPE, per-joint breakdown, baseline proxy mode.
- `scripts/record-csi-udp.py` — Lightweight ESP32 CSI UDP recorder (no Rust build required).
- **ruvector optimizations (O6-O10)** — Subcarrier selection (70→35, 50% reduction), attention-weighted subcarriers, Stoer-Wagner min-cut person separation, multi-SPSA gradient estimation, Mac M4 Pro training via Tailscale.
- **Scalable WiFlow presets** — `lite` (189K params, ~19 min) through `full` (7.7M params, ~8 hrs) to match dataset size.
- **Pre-trained WiFlow v1 model** — 92.9% PCK@20, 974 KB, 186,946 params. Published to [HuggingFace](https://huggingface.co/ruv/ruview) under `wiflow-v1/`.
### Validated
- **92.9% PCK@20** pose accuracy from a 5-minute data collection session with one $9 ESP32-S3 and one laptop webcam.
- Training pipeline validated on real paired data: 345 samples, 19 min training, eval loss 0.082, bone constraint 0.008.
## [v0.6.0-esp32] — 2026-04-03
### Added
- **Pre-trained CSI sensing weights published** — First official pre-trained models on [HuggingFace](https://huggingface.co/ruv/ruview). `model.safetensors` (48 KB), `model-q4.bin` (8 KB 4-bit), `model-q2.bin` (4 KB), `presence-head.json`, per-node LoRA adapters.
- **17 sensing applications** — Sleep monitor, apnea detector, stress monitor, gait analyzer, RF tomography, passive radar, material classifier, through-wall detector, device fingerprint, and more. Each as a standalone `scripts/*.js`.
- **ADRs 069-078** — 10 new architecture decisions covering Cognitum Seed integration, self-supervised pretraining, ruvllm pipeline, WiFlow architecture, channel hopping, SNN, MinCut person separation, CNN spectrograms, novel RF applications, multi-frequency mesh.
- **Kalman tracker** (PR #341 by @taylorjdawson) — temporal smoothing of pose keypoints.
### Fixed
- Security fix merged via PR #310.
### Performance
- Presence detection: 100% accuracy on 60,630 overnight samples.
- Inference: 0.008 ms per sample, 164K embeddings/sec.
- Contrastive self-supervised training: 51.6% improvement over baseline.
## [v0.5.5-esp32] — 2026-04-03
### Added
- **WiFlow SOTA architecture (ADR-072)** — TCN + axial attention pose decoder, 1.8M params, 881 KB at 4-bit. 17 COCO keypoints from CSI amplitude only (no phase).
- **Multi-frequency mesh scanning (ADR-073)** — ESP32 nodes hop across channels 1/3/5/6/9/11 at 200ms dwell. Neighbor WiFi networks used as passive radar illuminators. Null subcarriers reduced from 19% to 16%.
- **Spiking neural network (ADR-074)** — STDP online learning, adapts to new rooms in <30s with no labels, 16-160x less compute than batch training.
- **MinCut person counting (ADR-075)** — Stoer-Wagner min-cut on subcarrier correlation graph. Fixes #348 (was always reporting 4 people).
- **CNN spectrogram embeddings (ADR-076)** — Treat 64×20 CSI as an image, produce 128-dim environment fingerprints (0.95+ same-room similarity).
- **Graph transformer fusion** — Multi-node CSI fusion via GATv2 attention (replaces naive averaging).
- **Camera-free pose training pipeline** — Trains 17-keypoint model from 10 sensor signals with no camera required.
### Fixed
- **#348 person counting** — MinCut correctly counts 1-4 people (24/24 validation windows).
## [v0.5.4-esp32] — 2026-04-02
### Added
+36 -4
View File
@@ -25,6 +25,14 @@
/* ADR-060: Access the global NVS config for MAC filter and channel override. */
extern nvs_config_t g_nvs_config;
/* Defensive fix (#232, #375, #385, #386, #390): capture node_id at init-time
* into a module-local static. Using the global g_nvs_config.node_id directly
* at every callback is vulnerable to any memory corruption that clobbers the
* struct (which users have reported reverting node_id to the Kconfig default
* of 1). The local copy is set once at csi_collector_init() and then used
* exclusively by csi_serialize_frame(). */
static uint8_t s_node_id = 1;
/* ADR-057: Build-time guard — fail early if CSI is not enabled in sdkconfig.
* Without this, the firmware compiles but crashes at runtime with:
* "E (xxxx) wifi:CSI not enabled in menuconfig!"
@@ -117,8 +125,9 @@ size_t csi_serialize_frame(const wifi_csi_info_t *info, uint8_t *buf, size_t buf
uint32_t magic = CSI_MAGIC;
memcpy(&buf[0], &magic, 4);
/* Node ID (from NVS runtime config, not compile-time Kconfig) */
buf[4] = g_nvs_config.node_id;
/* Node ID (captured at init into s_node_id to survive memory corruption
* that could clobber g_nvs_config.node_id - see #232/#375/#385/#390). */
buf[4] = s_node_id;
/* Number of antennas */
buf[5] = n_antennas;
@@ -215,6 +224,13 @@ static void wifi_promiscuous_cb(void *buf, wifi_promiscuous_pkt_type_t type)
void csi_collector_init(void)
{
/* Capture node_id into module-local static at init time. After this point
* csi_serialize_frame() uses s_node_id exclusively, isolating the UDP
* frame node_id field from any memory corruption of g_nvs_config. */
s_node_id = g_nvs_config.node_id;
ESP_LOGI(TAG, "Captured node_id=%u at init (defensive copy for #232/#375/#385/#390)",
(unsigned)s_node_id);
/* ADR-060: Determine the CSI channel.
* Priority: 1) NVS override (--channel), 2) connected AP channel, 3) Kconfig default. */
uint8_t csi_channel = (uint8_t)CONFIG_CSI_WIFI_CHANNEL;
@@ -272,8 +288,24 @@ void csi_collector_init(void)
g_nvs_config.filter_mac[4], g_nvs_config.filter_mac[5]);
}
ESP_LOGI(TAG, "CSI collection initialized (node_id=%d, channel=%u)",
g_nvs_config.node_id, (unsigned)csi_channel);
ESP_LOGI(TAG, "CSI collection initialized (node_id=%u, channel=%u)",
(unsigned)s_node_id, (unsigned)csi_channel);
/* Clobber-detection canary: if g_nvs_config.node_id no longer matches the
* value we captured, something corrupted the struct between nvs_config_load
* and here. This is the historic #232/#375 symptom. */
if (g_nvs_config.node_id != s_node_id) {
ESP_LOGW(TAG, "node_id clobber detected: captured=%u but g_nvs_config=%u "
"(frames will use captured value %u). Please report to #390.",
(unsigned)s_node_id, (unsigned)g_nvs_config.node_id,
(unsigned)s_node_id);
}
}
/* Accessor for other modules that need the authoritative runtime node_id. */
uint8_t csi_collector_get_node_id(void)
{
return s_node_id;
}
/* ---- ADR-029: Channel hopping ---- */
@@ -29,6 +29,18 @@
*/
void csi_collector_init(void);
/**
* Get the runtime node_id captured at csi_collector_init().
*
* This is a defensive copy of g_nvs_config.node_id taken at init time. Other
* modules (edge_processing, wasm_runtime, display_ui) should prefer this
* accessor over reading g_nvs_config.node_id directly, because the global
* struct can be clobbered by memory corruption (see #232, #375, #385, #390).
*
* @return Node ID (0-255) as loaded from NVS or Kconfig default at boot.
*/
uint8_t csi_collector_get_node_id(void);
/**
* Serialize CSI data into ADR-018 binary frame format.
*
+2 -1
View File
@@ -8,6 +8,7 @@
#include "display_ui.h"
#include "nvs_config.h"
#include "csi_collector.h" /* csi_collector_get_node_id() - defensive #390 */
#include "sdkconfig.h"
extern nvs_config_t g_nvs_config;
@@ -350,7 +351,7 @@ void display_ui_update(void)
{
char buf[48];
snprintf(buf, sizeof(buf), "Node: %d", g_nvs_config.node_id);
snprintf(buf, sizeof(buf), "Node: %u", (unsigned)csi_collector_get_node_id());
lv_label_set_text(s_sys_node, buf);
snprintf(buf, sizeof(buf), "Heap: %lu KB free",
@@ -19,6 +19,7 @@
#include "edge_processing.h"
#include "nvs_config.h"
#include "csi_collector.h" /* csi_collector_get_node_id() - defensive #390 */
#include "mmwave_sensor.h"
/* Runtime config — declared in main.c, loaded from NVS at boot. */
@@ -441,7 +442,7 @@ static void send_compressed_frame(const uint8_t *iq_data, uint16_t iq_len,
uint32_t magic = EDGE_COMPRESSED_MAGIC;
memcpy(&pkt[0], &magic, 4);
pkt[4] = g_nvs_config.node_id;
pkt[4] = csi_collector_get_node_id(); /* #390: defensive copy */
pkt[5] = channel;
memcpy(&pkt[6], &iq_len, 2);
memcpy(&pkt[8], &comp_len, 2);
@@ -557,7 +558,7 @@ static void send_vitals_packet(void)
memset(&pkt, 0, sizeof(pkt));
pkt.magic = EDGE_VITALS_MAGIC;
pkt.node_id = g_nvs_config.node_id;
pkt.node_id = csi_collector_get_node_id(); /* #390: defensive copy */
pkt.flags = 0;
if (s_presence_detected) pkt.flags |= 0x01;
@@ -647,7 +648,7 @@ static void send_feature_vector(void)
memset(&pkt, 0, sizeof(pkt));
pkt.magic = EDGE_FEATURE_MAGIC;
pkt.node_id = g_nvs_config.node_id;
pkt.node_id = csi_collector_get_node_id(); /* #390: defensive copy */
pkt.reserved = 0;
pkt.seq = s_feature_seq++;
pkt.timestamp_us = esp_timer_get_time();
+1 -1
View File
@@ -267,7 +267,7 @@ void app_main(void)
strncpy(swarm_cfg.seed_url, g_nvs_config.seed_url, sizeof(swarm_cfg.seed_url) - 1);
strncpy(swarm_cfg.seed_token, g_nvs_config.seed_token, sizeof(swarm_cfg.seed_token) - 1);
strncpy(swarm_cfg.zone_name, g_nvs_config.zone_name, sizeof(swarm_cfg.zone_name) - 1);
swarm_ret = swarm_bridge_init(&swarm_cfg, g_nvs_config.node_id);
swarm_ret = swarm_bridge_init(&swarm_cfg, csi_collector_get_node_id());
if (swarm_ret != ESP_OK) {
ESP_LOGW(TAG, "Swarm bridge init failed: %s", esp_err_to_name(swarm_ret));
}
+2 -1
View File
@@ -13,6 +13,7 @@
#include "sdkconfig.h"
#include "wasm_runtime.h"
#include "nvs_config.h"
#include "csi_collector.h" /* csi_collector_get_node_id() - defensive #390 */
extern nvs_config_t g_nvs_config;
@@ -383,7 +384,7 @@ static void send_wasm_output(uint8_t slot_id)
memset(&pkt, 0, sizeof(pkt));
pkt.magic = WASM_OUTPUT_MAGIC;
pkt.node_id = g_nvs_config.node_id;
pkt.node_id = csi_collector_get_node_id(); /* #390: defensive copy */
pkt.module_id = slot_id;
pkt.event_count = n_filtered;
+40 -3
View File
@@ -9,8 +9,13 @@ Usage:
python provision.py --port COM7 --ssid "MyWiFi" --password "secret" --target-ip 192.168.1.20
Requirements:
pip install esptool nvs-partition-gen
pip install 'esptool>=5.0' nvs-partition-gen
(or use the nvs_partition_gen.py bundled with ESP-IDF)
WARNING -- FULL-REPLACE SEMANTICS (issue #391):
Every invocation REPLACES the entire `csi_cfg` NVS namespace on the device.
Any key you don't pass on the CLI is erased. Always include WiFi credentials
(--ssid, --password, --target-ip) unless you pass --force-partial.
"""
import argparse
@@ -150,7 +155,7 @@ def flash_nvs(port, baud, nvs_bin):
"--chip", "esp32s3",
"--port", port,
"--baud", str(baud),
"write_flash",
"write-flash",
hex(NVS_PARTITION_OFFSET), bin_path,
]
print(f"Flashing NVS partition ({len(nvs_bin)} bytes) to {port}...")
@@ -199,6 +204,10 @@ def main():
parser.add_argument("--swarm-hb", type=int, help="Swarm heartbeat interval in seconds (default 30)")
parser.add_argument("--swarm-ingest", type=int, help="Swarm vector ingest interval in seconds (default 5)")
parser.add_argument("--dry-run", action="store_true", help="Generate NVS binary but don't flash")
parser.add_argument("--force-partial", action="store_true",
help="Allow partial config without WiFi credentials. "
"WARNING: flashing REPLACES the entire csi_cfg NVS namespace - "
"any key not passed on the CLI will be erased (issue #391).")
args = parser.parse_args()
@@ -215,6 +224,34 @@ def main():
if not has_value:
parser.error("At least one config value must be specified")
# Bug 2 (#391): Prevent silent wipe of WiFi credentials on partial invocations.
# Flashing the generated NVS binary to offset 0x9000 REPLACES the entire
# csi_cfg namespace — there is no merge with existing NVS. Require the full
# WiFi trio unless the user explicitly opts in with --force-partial.
wifi_trio_missing = [
name for name, val in [
("--ssid", args.ssid),
("--password", args.password),
("--target-ip", args.target_ip),
] if val is None or val == ""
]
if wifi_trio_missing and not args.force_partial:
parser.error(
f"Missing required WiFi credentials: {', '.join(wifi_trio_missing)}.\n"
f"\n"
f" provision.py REPLACES the entire csi_cfg NVS namespace on each run.\n"
f" Any key not passed on the CLI will be erased -- including WiFi creds.\n"
f"\n"
f" Either pass all of --ssid, --password, --target-ip,\n"
f" or add --force-partial to acknowledge that other NVS keys will be wiped."
)
if args.force_partial and wifi_trio_missing:
print("WARNING: --force-partial is set. The following NVS keys will be WIPED "
"(not present in this invocation):", file=sys.stderr)
for k in wifi_trio_missing:
print(f" - {k.lstrip('-')}", file=sys.stderr)
print(" Plus any other csi_cfg keys not passed on the CLI.\n", file=sys.stderr)
# Validate TDM: if one is given, both should be
if (args.tdm_slot is not None) != (args.tdm_total is not None):
parser.error("--tdm-slot and --tdm-total must be specified together")
@@ -298,7 +335,7 @@ def main():
f.write(nvs_bin)
print(f"NVS binary saved to {out} ({len(nvs_bin)} bytes)")
print(f"Flash manually: python -m esptool --chip esp32s3 --port {args.port} "
f"write_flash 0x9000 {out}")
f"write-flash 0x9000 {out}")
return
flash_nvs(args.port, args.baud, nvs_bin)