Compare commits

...

3 Commits

Author SHA1 Message Date
rUv c0d3d7c792 chore(firmware): add release guard against stale-sdkconfig partition mismatch (#1194)
While cutting v0.8.3-esp32, an incremental 8MB build reused a leftover
generated `sdkconfig` and silently linked the 4MB dual-OTA partition layout
(no spiffs, ota_1 @ 0x1F0000) — the would-be released `partition-table.bin`
did not match the 8MB `partitions_display.csv` it claimed.

scripts/firmware-release-guard.sh regenerates the expected partition table
from the CSV the named flash-size variant must use and byte-compares it to the
built `partition-table.bin`, and cross-checks flash size in flasher_args.json.
Fails closed so a release pipeline can't ship a mismatched table.

Usage: scripts/firmware-release-guard.sh <8mb|4mb> <build-dir>


Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36
2026-06-27 13:21:05 -04:00
rUv fca5e6f0a0 fix: multistatic canonicalization, csi_fps burst inflation, control-packet starvation (#1170, #1180, #1183) (#1193)
#1170 — live multistatic bridge fed raw, un-canonicalized per-node CSI
(64/128/192 bins) to MultistaticFuser, tripping DimensionMismatch every
cycle and silently disabling fusion on mixed HT20/HT40 meshes. Add
HardwareNormalizer::resample_to_canonical (resample-only, no z-score) and
canonicalize every node frame onto the 56-tone grid before fusion.

#1180 — update_csi_fps_ema only rejected dt<=0 or >=1s, so sub-ms UDP-burst
arrivals (36us -> ~27kHz) inflated csi_fps_ema 40-840x. Add a 5ms plausibility
floor and stop re-anchoring observe_csi_frame_arrival on burst deltas.

#1183 — global ENOMEM backoff (CSI flood) starved <=48B/<=1Hz control packets.
Add stream_sender_send_priority() bypassing the backoff gate without touching
the streak; route feature_state/HEALTH/sync through it. Fix the misleading
"HEALTH sent" log that printed even on rv_mesh_send failure.

Verified: signal 501, sensing-server 677 tests (0 failed); firmware builds clean.


Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36
2026-06-27 13:04:44 -04:00
rUv 7831f29436 fix(firmware): phantom LD2410 detection + ENOMEM backoff (#1135) (#1159)
Bug #2 (root cause): LD2410 probe-detection matched only the 4-byte head
0xF4F3F2F1, so a floating UART at 256000 baud could phantom-detect a sensor
and spawn a UART task. Now requires a full validated report frame (head +
sane length + tail 0xF8F7F6F5), extracted to mmwave_detect.h and shared with
a host unit test (test_mmwave_detect.c, 8 vectors) so firmware and test can't
diverge. Matches the validate-before-trust approach used for MR60 in #1107.

Bug #1: sendto ENOMEM used a fixed 100 ms backoff too short to drain sustained
lwIP/WiFi buffer pressure, so a node could stay stuck. Now exponential
(100->200->...->2000 ms per consecutive ENOMEM, reset on first successful
send). Removing the phantom LD2410 task (bug #2) also removes the extra load
that tipped the reporter's tier-2 node into the stuck state.

Validated on ESP32-S3 QFN56 rev v0.2 (the reporter's silicon): tier-2 streams
~100 frames/s with no stuck ENOMEM and correctly reports no mmWave (no
phantom). LD2410 predicate truth table proven (head-without-tail REJECTED).
Could not reproduce the reporter's environment-specific floating-pin noise, so
the deterministic proof is the host unit test.
2026-06-22 12:31:21 -04:00
15 changed files with 520 additions and 35 deletions
+3
View File
@@ -8,6 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Fixed
- **Multistatic fusion never ran on a mixed-mode ESP32 mesh — live bridge fed raw, un-canonicalized per-node CSI to the fuser (#1170).** `node_frame_from_state` (`multistatic_bridge.rs`) wrapped each node's **raw** amplitude vector (HT20 ≈ 64 bins, HT40 ≈ 128/192) into a struct *named* `CanonicalCsiFrame` without ever resampling, so `MultistaticFuser::fuse` tripped `DimensionMismatch` on every cycle, silently fell back to per-node sum/dedup, and spun `total_engine_errors` unbounded. Added `HardwareNormalizer::resample_to_canonical` (resample-only, **no z-score** — preserves the amplitude scale the person-score's `variance/mean²` relies on) and run every node frame through it onto the canonical 56-tone grid before fusion. Heterogeneous meshes now fuse instead of erroring. Pinned by `heterogeneous_node_counts_canonicalize_and_fuse` (mixed 64/192 → fuses), `resample_to_canonical_is_length_only_no_zscore`, and an updated `test_node_frame_conversion`; the pre-existing `engine_bridge::observe_cycle_counts_engine_errors` was retargeted to force a `TimestampMismatch` (its old 56-vs-30 setup now canonicalizes cleanly). `wifi-densepose-signal` 501 / `wifi-densepose-sensing-server` 677 tests, 0 failed.
- **`csi_fps_ema` reported the CSI frame rate 40840× too high under bursty UDP delivery (#1180).** `update_csi_fps_ema` only rejected deltas `≤ 0` or `≥ 1 s`, so a 36 µs intra-burst arrival delta yielded `1/dt ≈ 27 kHz` straight into the EMA — the metric measured server arrival jitter, not the node's ~40 fps production rate. Added a `MIN_PLAUSIBLE_CSI_DT_SEC = 0.005` floor (derived from the firmware's 50 fps `CSI_MIN_SEND_INTERVAL_US` ceiling, ×4 slack) and made `observe_csi_frame_arrival` keep its anchor across sub-floor bursts so the next genuine inter-frame gap measures true cadence. Pinned by `subms_burst_delta_rejected`, `burst_interleaved_with_nominal_stays_in_band`, and `observe_csi_frame_arrival_ignores_subms_bursts`.
- **`stream_sender` ENOMEM backoff starved low-rate control packets under a weak uplink (#1183, follow-up to #1135/#1159).** The global `s_backoff_until_us` gate (triggered by the 50 Hz CSI flood at weak RSSI) also suppressed the ≤48 B, ≤1 Hz `feature_state` / mesh `HEALTH` / sync packets that contribute negligible buffer pressure, so telemetry failed essentially every cycle. Added `stream_sender_send_priority()` — bypasses the backoff gate, reports ENOMEM quietly, and never extends/resets the global streak — and routed `feature_state`, HEALTH/anomaly (`rv_mesh_send`), and sync packets through it. Also fixed the misleading `"HEALTH sent"` log that printed unconditionally even when `rv_mesh_send` returned `ESP_FAIL` (now prints `sent`/`FAILED` from the actual return). Firmware builds clean (ESP-IDF v5.4).
- **Multistatic fusion guard interval is now operator-configurable — fixes permanent trust demotion with WiFi-synced ESP32 nodes (#1049).** Two independently-clocked ESP32-S3 boards on ESP-NOW sync drift 10150 ms (typ. ~70 ms) — the 100 ms beacon + WiFi-MAC jitter cannot hold them within the published 60 ms default guard, so the governed-trust cycle permanently demoted to `Restricted`, suppressed all pose output, and spun the error counter to 200k+ with **no escape hatch but a container restart**. Added a **direct `WDP_GUARD_INTERVAL_US` override** (+ optional `WDP_SOFT_GUARD_US`) to `multistatic_guard_config_from_env`, so a deployment can lift the hard guard past its measured spread (e.g. `WDP_GUARD_INTERVAL_US=200000`) without having to know its exact TDM schedule. Precedence is most-specific-wins: a direct override beats the existing `WDP_TDM_SLOTS`+`WDP_TDM_SLOT_US` schedule-derived guard, which beats the 60 ms/20 ms default; the override is applied on top of whichever base is selected, the soft band is always clamped strictly below the hard guard, and a malformed/zero value is ignored (falls back to the base rather than breaking fusion). The effective guard is now logged at startup. Pinned by 6 new tests (`multistatic_guard_config_tests`): direct-override-wins / beats-TDM-derived / soft-clamped-below-hard / lowering-hard-pulls-soft-down / malformed-or-zero-falls-back / default-when-unset. `wifi-densepose-sensing-server` bin tests **449 → 455**, 0 failed; Python proof VERDICT PASS, hash unchanged (off the signal proof path).
### Security
@@ -319,7 +319,9 @@ static void emit_feature_state(void)
(uint64_t)esp_timer_get_time(),
profile);
int sent = stream_sender_send((const uint8_t *)&pkt, sizeof(pkt));
/* feature_state is ~1 Hz and small — priority path so the CSI ENOMEM
* backoff can't starve it (#1183). */
int sent = stream_sender_send_priority((const uint8_t *)&pkt, sizeof(pkt));
if (sent < 0) {
ESP_LOGW(TAG, "feature_state emit failed");
}
@@ -333,11 +335,14 @@ static void slow_loop_cb(TimerHandle_t t)
* detect sync-error drift. */
uint8_t nid[8];
node_id_bytes(nid);
rv_mesh_send_health(s_role, s_mesh_epoch, nid);
/* #1183: report the actual send result — the old log printed "HEALTH sent"
* unconditionally even when rv_mesh_send returned ESP_FAIL. */
esp_err_t health_rc = rv_mesh_send_health(s_role, s_mesh_epoch, nid);
ESP_LOGI(TAG, "slow tick (state=%u, feature_state_seq=%u, role=%u, epoch=%u) HEALTH sent",
ESP_LOGI(TAG, "slow tick (state=%u, feature_state_seq=%u, role=%u, epoch=%u) HEALTH %s",
(unsigned)s_state, (unsigned)s_feature_state_seq,
(unsigned)s_role, (unsigned)s_mesh_epoch);
(unsigned)s_role, (unsigned)s_mesh_epoch,
health_rc == ESP_OK ? "sent" : "FAILED");
}
/* ---- Public API ---- */
+3 -1
View File
@@ -341,7 +341,9 @@ static void wifi_csi_callback(void *ctx, wifi_csi_info_t *info)
memcpy(&sync[24], &s_sequence, 4); /* high-water seq for pairing */
uint32_t zero32 = 0;
memcpy(&sync[28], &zero32, 4); /* reserved (room for leader_id low32) */
int sr = stream_sender_send(sync, sizeof(sync));
/* Sync packets are 32 B at ~0.5 Hz — priority path so the CSI
* ENOMEM backoff can't starve cross-node time alignment (#1183). */
int sr = stream_sender_send_priority(sync, sizeof(sync));
static uint32_t s_sync_count = 0;
s_sync_count++;
if (s_sync_count <= 3 || (s_sync_count % 60) == 0) {
@@ -0,0 +1,37 @@
/**
* @file mmwave_detect.h
* @brief Pure (host-testable) mmWave frame-validation predicates for probe-time
* sensor detection. No ESP-IDF deps — safe to #include in a host unit test.
*
* Detection must validate a *full* frame, never a bare header byte/pattern: a
* floating UART with no sensor reads line noise that can contain header-looking
* bytes, which the old loose checks mistook for a real sensor (#1107 MR60,
* #1135 LD2410). These predicates are the validate-before-trust gate.
*/
#ifndef MMWAVE_DETECT_H
#define MMWAVE_DETECT_H
#include <stdint.h>
#include <stdbool.h>
/**
* True iff buf[i..] begins a *validated* LD2410 report frame within [0,len):
* F4 F3 F2 F1 | len(LE,2) | data[len] | F8 F7 F6 F5
* Requires the head magic, a sane intra-frame length, AND the matching tail at
* head+6+len. Pure noise that merely contains 0xF4F3F2F1 fails the tail check.
*/
static inline bool mmwave_ld2410_valid_at(const uint8_t *buf, int i, int len)
{
if (i < 0 || i + 5 >= len) return false;
if (!(buf[i] == 0xF4 && buf[i+1] == 0xF3 && buf[i+2] == 0xF2 && buf[i+3] == 0xF1))
return false;
uint16_t flen = (uint16_t)buf[i+4] | ((uint16_t)buf[i+5] << 8);
/* Real LD2410 report frames are small (basic=13, engineering=35). */
if (flen < 1 || flen > 64) return false;
int tail = i + 6 + (int)flen;
if (tail + 3 >= len) return false;
return buf[tail] == 0xF8 && buf[tail+1] == 0xF7
&& buf[tail+2] == 0xF6 && buf[tail+3] == 0xF5;
}
#endif /* MMWAVE_DETECT_H */
+7 -4
View File
@@ -26,6 +26,7 @@
*/
#include "mmwave_sensor.h"
#include "mmwave_detect.h"
#include <string.h>
#include <math.h>
@@ -401,10 +402,12 @@ static mmwave_type_t probe_at_baud(uint32_t baud)
}
}
}
/* LD2410: 4-byte header 0xF4F3F2F1 (already specific enough). */
if (i + 3 < len && buf[i] == 0xF4 && buf[i+1] == 0xF3
&& buf[i+2] == 0xF2 && buf[i+3] == 0xF1
&& baud == MMWAVE_LD2410_BAUD) {
/* LD2410: require a *full validated* report frame, not just the
* 4-byte head. A floating UART1 at 256000 baud can emit the head
* pattern 0xF4F3F2F1 from line noise (#1135 bug #2). The shared
* predicate (host-unit-tested in mmwave_detect.h) demands a sane
* intra-frame length AND the matching tail 0xF8F7F6F5. */
if (baud == MMWAVE_LD2410_BAUD && mmwave_ld2410_valid_at(buf, i, len)) {
ld2410_header_seen++;
}
}
+3 -1
View File
@@ -188,7 +188,9 @@ size_t rv_mesh_encode_calibration_start(uint8_t sender_role,
esp_err_t rv_mesh_send(const uint8_t *frame, size_t len)
{
if (frame == NULL || len == 0) return ESP_ERR_INVALID_ARG;
int sent = stream_sender_send(frame, len);
/* Mesh control packets (HEALTH, anomaly) are low-rate and tiny — send them
* on the priority path so the CSI ENOMEM backoff can't starve them (#1183). */
int sent = stream_sender_send_priority(frame, len);
if (sent < 0) {
ESP_LOGW(TAG, "rv_mesh_send: stream_sender failed (len=%u)",
(unsigned)len);
+48 -5
View File
@@ -26,9 +26,16 @@ static struct sockaddr_in s_dest_addr;
* rapid-fire CSI callbacks can exhaust the pbuf pool and crash the device.
*/
static int64_t s_backoff_until_us = 0; /* esp_timer timestamp to resume */
#define ENOMEM_COOLDOWN_MS 100 /* suppress sends for 100 ms */
#define ENOMEM_COOLDOWN_MS 100 /* base backoff; doubles per streak */
#define ENOMEM_COOLDOWN_MAX_MS 2000 /* cap on the exponential backoff */
#define ENOMEM_LOG_INTERVAL 50 /* log every Nth suppressed send */
static uint32_t s_enomem_suppressed = 0;
/* Consecutive ENOMEM episodes without an intervening successful send. A fixed
* 100 ms backoff is too short to drain sustained lwIP/WiFi buffer pressure
* (#1135 bug #1: tier-2 + concurrent TX keeps the node stuck), so the backoff
* grows 100→200→400→…→2000 ms per streak and resets on the first send that
* succeeds. */
static uint32_t s_enomem_streak = 0;
static int sender_init_internal(const char *ip, uint16_t port)
{
@@ -93,16 +100,52 @@ int stream_sender_send(const uint8_t *data, size_t len)
(struct sockaddr *)&s_dest_addr, sizeof(s_dest_addr));
if (sent < 0) {
if (errno == ENOMEM) {
/* Start backoff to let lwIP reclaim buffers */
s_backoff_until_us = esp_timer_get_time() +
(int64_t)ENOMEM_COOLDOWN_MS * 1000;
ESP_LOGW(TAG, "sendto ENOMEM — backing off for %d ms", ENOMEM_COOLDOWN_MS);
/* Exponential backoff: double the cooldown each consecutive ENOMEM
* (capped) so sustained buffer pressure actually drains instead of
* the node re-failing every 100 ms forever (#1135 bug #1). */
uint32_t shift = s_enomem_streak < 5 ? s_enomem_streak : 5;
uint32_t cooldown = ENOMEM_COOLDOWN_MS << shift;
if (cooldown > ENOMEM_COOLDOWN_MAX_MS) cooldown = ENOMEM_COOLDOWN_MAX_MS;
s_enomem_streak++;
s_backoff_until_us = esp_timer_get_time() + (int64_t)cooldown * 1000;
ESP_LOGW(TAG, "sendto ENOMEM — backing off for %lu ms (streak %lu)",
(unsigned long)cooldown, (unsigned long)s_enomem_streak);
} else {
ESP_LOGW(TAG, "sendto failed: errno %d", errno);
}
return -1;
}
/* A send got through — buffer pressure cleared; reset the backoff streak. */
s_enomem_streak = 0;
return sent;
}
int stream_sender_send_priority(const uint8_t *data, size_t len)
{
if (s_sock < 0) {
return -1;
}
/* Priority path (#1183): low-rate control packets (feature_state, HEALTH,
* mesh sync) bypass the global ENOMEM backoff gate so the high-rate CSI
* stream cannot starve them. These are ≤48 B at ≤1 Hz — negligible pbuf
* pressure, so they won't re-trigger the crash cascade that the backoff
* (driven by the 50 Hz CSI flood) exists to prevent.
*
* Crucially, an ENOMEM here is reported quietly and does NOT extend the
* global streak/backoff: a tiny control packet failing is a symptom of
* the bulk-stream pressure, not a cause, so it must not feed the cooldown
* that suppresses the next CSI frame. Likewise a success does not reset
* the streak — the bulk path owns that signal. */
int sent = sendto(s_sock, data, len, 0,
(struct sockaddr *)&s_dest_addr, sizeof(s_dest_addr));
if (sent < 0) {
if (errno != ENOMEM) {
ESP_LOGW(TAG, "priority sendto failed: errno %d", errno);
}
return -1;
}
return sent;
}
@@ -36,6 +36,20 @@ int stream_sender_init_with(const char *ip, uint16_t port);
*/
int stream_sender_send(const uint8_t *data, size_t len);
/**
* Send a low-rate control packet, bypassing the ENOMEM backoff gate (#1183).
*
* Intended for ≤48 B, ≤1 Hz control traffic (feature_state, HEALTH, mesh
* sync) that must not be starved by the global backoff the high-rate CSI
* stream triggers. An ENOMEM on this path is reported quietly and does NOT
* extend or reset the global backoff streak.
*
* @param data Frame data buffer.
* @param len Length of data to send.
* @return Number of bytes sent, or -1 on error.
*/
int stream_sender_send_priority(const uint8_t *data, size_t len);
/**
* Close the UDP sender socket.
*/
+15 -4
View File
@@ -44,9 +44,9 @@ FUZZ_DURATION ?= 30
FUZZ_JOBS ?= 1
.PHONY: all clean run_serialize run_edge run_nvs run_all test_adr110 run_adr110 \
test_vitals run_vitals host_tests
test_vitals run_vitals test_mmwave_detect run_mmwave_detect host_tests
all: fuzz_serialize fuzz_edge fuzz_nvs test_adr110 test_vitals
all: fuzz_serialize fuzz_edge fuzz_nvs test_adr110 test_vitals test_mmwave_detect
# --- ADR-110 encoding unit tests ---
# Host-side, no libFuzzer needed — plain C99 deterministic table tests
@@ -69,8 +69,19 @@ test_vitals: test_vitals_count_presence.c $(MAIN_DIR)/edge_processing.h
run_vitals: test_vitals
./test_vitals
host_tests: run_adr110 run_vitals
@echo "Host tests passed (ADR-110 + vitals #998/#996)"
# --- mmWave LD2410 detection predicate (#1135 bug #2) ---
# Host-side, no libFuzzer. Proves a floating-UART head pattern (0xF4F3F2F1)
# without a valid frame length+tail is REJECTED, so a phantom LD2410 is never
# detected on a node with no sensor wired. Tests the real predicate the
# firmware uses (../main/mmwave_detect.h) — test and firmware can't disagree.
test_mmwave_detect: test_mmwave_detect.c $(MAIN_DIR)/mmwave_detect.h
cc -std=c99 -Wall -Wextra -I$(MAIN_DIR) -o $@ $<
run_mmwave_detect: test_mmwave_detect
./test_mmwave_detect
host_tests: run_adr110 run_vitals run_mmwave_detect
@echo "Host tests passed (ADR-110 + vitals #998/#996 + mmwave detect #1135)"
# --- Serialize fuzzer ---
# Tests csi_serialize_frame() with random wifi_csi_info_t inputs.
@@ -0,0 +1,80 @@
/**
* @file test_mmwave_detect.c
* @brief Host-side unit tests for the LD2410 frame-validation predicate (#1135).
*
* Proves the phantom-detection fix: a floating UART can emit the 4-byte head
* 0xF4F3F2F1, but the predicate rejects it unless a sane length + matching tail
* 0xF8F7F6F5 are also present. Tests the REAL predicate from mmwave_detect.h
* (the same code the firmware's probe_at_baud calls).
*
* cc -std=c99 -Wall -I../main -o test_mmwave_detect test_mmwave_detect.c && ./test_mmwave_detect
*
* Exits 0 on all-pass; prints the failing case otherwise.
*/
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include "mmwave_detect.h"
static int failures = 0;
#define CHECK(cond, msg) do { \
if (!(cond)) { printf("FAIL: %s\n", msg); failures++; } \
else { printf("ok: %s\n", msg); } \
} while (0)
/* Build a valid LD2410 report frame: F4F3F2F1 | len(LE) | data[len] | F8F7F6F5 */
static int make_frame(uint8_t *out, uint16_t dlen)
{
int n = 0;
out[n++] = 0xF4; out[n++] = 0xF3; out[n++] = 0xF2; out[n++] = 0xF1;
out[n++] = (uint8_t)(dlen & 0xFF); out[n++] = (uint8_t)(dlen >> 8);
for (uint16_t k = 0; k < dlen; k++) out[n++] = (uint8_t)(0xAA ^ k);
out[n++] = 0xF8; out[n++] = 0xF7; out[n++] = 0xF6; out[n++] = 0xF5;
return n;
}
int main(void)
{
uint8_t buf[256];
/* 1. A real basic-report frame (data len 13) validates. */
int n = make_frame(buf, 13);
CHECK(mmwave_ld2410_valid_at(buf, 0, n), "valid basic frame (len=13) accepted");
/* 2. A real engineering-report frame (data len 35) validates. */
n = make_frame(buf, 35);
CHECK(mmwave_ld2410_valid_at(buf, 0, n), "valid engineering frame (len=35) accepted");
/* 3. Head magic present but NO valid tail — the #1135 phantom case. */
memset(buf, 0x00, sizeof(buf));
buf[0]=0xF4; buf[1]=0xF3; buf[2]=0xF2; buf[3]=0xF1; buf[4]=13; buf[5]=0;
/* data present but tail is zeros, not F8F7F6F5 */
CHECK(!mmwave_ld2410_valid_at(buf, 0, 64), "head magic without valid tail REJECTED (#1135)");
/* 4. Head magic with insane length is rejected. */
memset(buf, 0xFF, sizeof(buf));
buf[0]=0xF4; buf[1]=0xF3; buf[2]=0xF2; buf[3]=0xF1; buf[4]=0xFF; buf[5]=0xFF; /* len=65535 */
CHECK(!mmwave_ld2410_valid_at(buf, 0, 200), "head magic with oversized length REJECTED");
/* 5. Pure noise (no head) is rejected. */
for (int k = 0; k < 64; k++) buf[k] = (uint8_t)(0x5A + k);
CHECK(!mmwave_ld2410_valid_at(buf, 0, 64), "non-header noise REJECTED");
/* 6. Truncated frame (tail would run past the buffer) is rejected. */
n = make_frame(buf, 13);
CHECK(!mmwave_ld2410_valid_at(buf, 0, n - 2), "truncated frame (tail past buffer) REJECTED");
/* 7. Valid frame at a non-zero offset still validates. */
memset(buf, 0x00, sizeof(buf));
n = make_frame(buf + 7, 13);
CHECK(mmwave_ld2410_valid_at(buf, 7, 7 + n), "valid frame at offset 7 accepted");
/* 8. Repeated head bytes without a frame (worst-case noise) rejected. */
for (int k = 0; k + 3 < 64; k += 4) {
buf[k]=0xF4; buf[k+1]=0xF3; buf[k+2]=0xF2; buf[k+3]=0xF1;
}
CHECK(!mmwave_ld2410_valid_at(buf, 0, 64), "repeated bare head bytes REJECTED");
printf("\n%s (%d failures)\n", failures ? "FAILED" : "ALL PASS", failures);
return failures ? 1 : 0;
}
+94
View File
@@ -0,0 +1,94 @@
#!/usr/bin/env bash
#
# firmware-release-guard.sh — guard against shipping firmware built from a
# stale generated `sdkconfig` (the v0.8.3-esp32 release bug).
#
# Symptom it catches: an incremental build reuses a leftover `sdkconfig`
# instead of `sdkconfig.defaults`, so an "8MB" build silently links the 4MB
# dual-OTA partition layout (no spiffs, ota_1 @ 0x1F0000) and the released
# `partition-table.bin` does not match the flash-size variant it claims to be.
#
# What it does: for the named flash-size variant, regenerate the EXPECTED
# partition table from the partition CSV that variant must use, and byte-compare
# it against the freshly built `partition-table.bin`. Also cross-checks the
# flash size recorded in the build's `flasher_args.json`. Exits non-zero on any
# mismatch so a release pipeline fails closed.
#
# Usage:
# scripts/firmware-release-guard.sh <8mb|4mb> <build-dir>
#
# Example:
# scripts/firmware-release-guard.sh 8mb firmware/esp32-csi-node/build
#
set -euo pipefail
VARIANT="${1:-}"
BUILD_DIR="${2:-}"
if [[ -z "$VARIANT" || -z "$BUILD_DIR" ]]; then
echo "usage: $0 <8mb|4mb> <build-dir>" >&2
exit 2
fi
# Firmware project root (this script lives in <repo>/scripts).
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
FW_DIR="$SCRIPT_DIR/../firmware/esp32-csi-node"
case "$VARIANT" in
8mb) EXPECT_CSV="partitions_display.csv"; EXPECT_FLASH="8MB" ;;
4mb) EXPECT_CSV="partitions_4mb.csv"; EXPECT_FLASH="4MB" ;;
*) echo "ERROR: unknown variant '$VARIANT' (want 8mb|4mb)" >&2; exit 2 ;;
esac
BUILT_PT="$BUILD_DIR/partition_table/partition-table.bin"
CSV_PATH="$FW_DIR/$EXPECT_CSV"
[[ -f "$BUILT_PT" ]] || { echo "ERROR: built partition table not found: $BUILT_PT" >&2; exit 1; }
[[ -f "$CSV_PATH" ]] || { echo "ERROR: expected CSV not found: $CSV_PATH" >&2; exit 1; }
# Locate the ESP-IDF partition table generator.
GEN="${IDF_PATH:-}/components/partition_table/gen_esp32part.py"
if [[ ! -f "$GEN" ]]; then
GEN="C:/Users/ruv/esp/v5.4/esp-idf/components/partition_table/gen_esp32part.py"
fi
[[ -f "$GEN" ]] || { echo "ERROR: gen_esp32part.py not found (set IDF_PATH)" >&2; exit 1; }
PY="${PYTHON:-python}"
command -v "$PY" >/dev/null 2>&1 || PY="C:/Espressif/tools/python/v5.4/venv/Scripts/python.exe"
TMP="$(mktemp -d)"
trap 'rm -rf "$TMP"' EXIT
EXPECT_PT="$TMP/expected-partition-table.bin"
# Regenerate the expected table from the CSV this variant must use.
"$PY" "$GEN" --quiet "$CSV_PATH" "$EXPECT_PT"
fail=0
if ! cmp -s "$EXPECT_PT" "$BUILT_PT"; then
echo "FAIL: built partition table does not match $EXPECT_CSV for the $VARIANT variant." >&2
echo " The build likely reused a stale sdkconfig. Decoded built table:" >&2
"$PY" "$GEN" "$BUILT_PT" 2>/dev/null | grep -vE '^#|^Parsing|^Verifying' | sed 's/^/ /' >&2
fail=1
fi
# Cross-check the flash size the build actually targeted.
FA="$BUILD_DIR/flasher_args.json"
if [[ -f "$FA" ]]; then
GOT_FLASH="$("$PY" - "$FA" <<'PYEOF'
import json,sys
with open(sys.argv[1]) as f: d=json.load(f)
print(d.get("flash_settings",{}).get("flash_size",""))
PYEOF
)"
if [[ "$GOT_FLASH" != "$EXPECT_FLASH" ]]; then
echo "FAIL: flasher_args.json flash_size='$GOT_FLASH', expected '$EXPECT_FLASH'." >&2
fail=1
fi
fi
if [[ "$fail" -ne 0 ]]; then
exit 1
fi
echo "OK: $VARIANT firmware build matches $EXPECT_CSV (flash_size=$EXPECT_FLASH)."
@@ -402,17 +402,36 @@ mod tests {
assert!(!bridge.suppress_raw_outputs());
}
/// Error wiring (review finding 1a): two live nodes with mismatched
/// subcarrier counts make fusion return a `DimensionMismatch` →
/// `EngineError` — previously dropped by `if let Some(Ok(..))` at the
/// Error wiring (review finding 1a): a live cycle that fails fusion yields
/// an `EngineError` — previously dropped by `if let Some(Ok(..))` at the
/// call sites. The counter must increment and the last good trust state
/// must survive a later failure.
///
/// Originally this forced the failure with a 56-vs-30 subcarrier mismatch
/// (`DimensionMismatch`). Since #1170 the live bridge canonicalizes every
/// node onto the 56-tone grid, so heterogeneous counts now fuse cleanly —
/// a frame-timestamp spread wider than the fuser's 60 ms guard interval is
/// the remaining deterministic way to provoke a fusion error here.
#[test]
fn observe_cycle_counts_engine_errors() {
// Both nodes are 56-subcarrier (canonicalization-clean), but their
// frame timestamps are 500 ms apart — far beyond the 60 ms guard —
// so the fuser rejects the cycle with TimestampMismatch. Future
// offsets keep both instants safely after the bridge's lazy EPOCH.
fn mismatched_states() -> HashMap<u8, NodeState> {
let now = Instant::now();
let mut a = node_state_with_history(1.0, 56);
a.last_frame_time = Some(now + std::time::Duration::from_millis(600));
let mut b = node_state_with_history(1.05, 56);
b.last_frame_time = Some(now + std::time::Duration::from_millis(100));
let mut m = HashMap::new();
m.insert(0u8, a);
m.insert(1u8, b);
m
}
let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
let mut mismatched = HashMap::new();
mismatched.insert(0u8, node_state_with_history(1.0, 56));
mismatched.insert(1u8, node_state_with_history(1.05, 30)); // 30 ≠ 56 subcarriers
let mismatched = mismatched_states();
assert!(bridge.observe_cycle(&mismatched, 1_000).is_none());
assert_eq!(bridge.engine_error_count(), 1);
@@ -518,17 +518,31 @@ const NOVELTY_HISTORY_CAPACITY: usize = 64;
/// subcarrier ordering / normalisation so banks reject stale data.
const NOVELTY_SKETCH_VERSION: u16 = 1;
/// Lower plausibility floor (seconds) for a CSI inter-frame delta.
///
/// The firmware caps CSI sends at `CSI_MIN_SEND_INTERVAL_US = 20 ms`
/// (`csi_collector.c`), so a single node cannot physically produce frames
/// faster than 50 fps. UDP/OS buffering, however, delivers frames in tight
/// bursts whose intra-burst arrival deltas are tens of microseconds apart —
/// a 36 µs delta yields `1/dt ≈ 27 kHz`, which the old `< 1 s` guard let
/// straight into the EMA and inflated `csi_fps_ema` by 13 orders of
/// magnitude (issue #1180). We reject any delta implying more than 200 fps
/// (4× the physical ceiling, leaving slack for benign arrival jitter); such
/// deltas are burst artifacts, not distinct production intervals.
pub(crate) const MIN_PLAUSIBLE_CSI_DT_SEC: f64 = 0.005;
/// ADR-110 iter 18 — EMA update for per-node CSI fps tracking.
///
/// Returns the new EMA value, or `None` if the delta is implausible
/// (≤ 0, or > 1 second — likely a connection gap, not a real frame
/// rate sample). α = 1/8 fixed shift, ~8-sample effective window,
/// matching the firmware-side ESP-NOW offset smoother in §A0.10.
/// (below [`MIN_PLAUSIBLE_CSI_DT_SEC`] — a sub-ms burst artifact, see
/// issue #1180 — or `> 1 second`, likely a connection gap rather than a
/// real frame-rate sample). α = 1/8 fixed shift, ~8-sample effective
/// window, matching the firmware-side ESP-NOW offset smoother in §A0.10.
///
/// Free function for testability — every transformation that doesn't
/// touch the rest of `NodeState` lives outside the `impl` block.
pub(crate) fn update_csi_fps_ema(prev_fps: f64, dt_sec: f64) -> Option<f64> {
if !(dt_sec > 0.0 && dt_sec < 1.0) {
if !(dt_sec >= MIN_PLAUSIBLE_CSI_DT_SEC && dt_sec < 1.0) {
return None;
}
let instantaneous = 1.0 / dt_sec;
@@ -569,6 +583,35 @@ mod fps_ema_tests {
fn long_gap_rejected_as_implausible() {
assert!(update_csi_fps_ema(20.0, 2.0).is_none());
}
#[test]
fn subms_burst_delta_rejected() {
// Issue #1180: a 36 µs intra-burst delta implies ~27 kHz and must
// not enter the EMA. Anything below the 5 ms floor is rejected.
assert!(update_csi_fps_ema(40.0, 0.000_036).is_none());
assert!(update_csi_fps_ema(40.0, 0.001).is_none());
// Just above the floor is accepted.
assert!(update_csi_fps_ema(40.0, 0.005).is_some());
}
#[test]
fn burst_interleaved_with_nominal_stays_in_band() {
// A true ~40 fps node whose frames arrive in sub-ms bursts: feeding
// only the plausible (nominal-cadence) deltas keeps the EMA near the
// ground truth instead of blowing up. Burst deltas are rejected by
// the caller (see NodeState::observe_csi_frame_arrival), so the EMA
// only ever sees the ~25 ms inter-group gaps.
let mut fps = 40.0;
for _ in 0..40 {
// nominal 25 ms gap (40 fps); intervening sub-ms bursts skipped
fps = update_csi_fps_ema(fps, 0.025).unwrap();
assert!(update_csi_fps_ema(fps, 0.000_040).is_none());
}
assert!(
(fps - 40.0).abs() < 1.0,
"EMA should stay within ~1 Hz of the 40 fps ground truth, got {fps}"
);
}
}
impl NodeState {
@@ -653,6 +696,15 @@ impl NodeState {
pub(crate) fn observe_csi_frame_arrival(&mut self, now: std::time::Instant) {
if let Some(prev) = self.last_frame_time {
let dt = now.duration_since(prev).as_secs_f64();
// Burst arrivals (sub-floor dt, issue #1180): do NOT re-anchor on
// them. Keeping the previous anchor means the next genuine
// inter-frame gap measures the true cadence across the whole
// burst instead of intra-burst jitter — so a 50 fps node whose
// frames arrive in 36 µs bursts every 25 ms still reads ~40 fps,
// not 27 kHz.
if dt < MIN_PLAUSIBLE_CSI_DT_SEC {
return;
}
if let Some(new_ema) = update_csi_fps_ema(self.csi_fps_ema, dt) {
self.csi_fps_ema = new_ema;
self.csi_fps_samples = self.csi_fps_samples.saturating_add(1);
@@ -8037,6 +8089,36 @@ mod sync_snapshot_helper_tests {
assert_eq!(snap.csi_fps_samples, 42);
}
#[test]
fn observe_csi_frame_arrival_ignores_subms_bursts() {
// Issue #1180 regression: a ~40 fps node whose frames are delivered
// in tight UDP bursts (sub-ms intra-burst deltas) must still report
// ~40 fps, not tens of kHz. Synthesize the arrival stream by adding
// Durations to a base Instant.
use std::time::Duration;
let base = std::time::Instant::now();
let mut ns = NodeState::new();
ns.csi_fps_ema = 40.0; // pretend already warmed up
ns.csi_fps_samples = 10;
// 30 nominal 25 ms groups, each preceded by a 3-frame sub-ms burst.
for g in 0..30u64 {
let group_t = base + Duration::from_millis(25 * g);
ns.observe_csi_frame_arrival(group_t);
// burst: two extra arrivals 40 µs and 80 µs later — must be
// ignored for rate purposes (anchor must not advance to them).
ns.observe_csi_frame_arrival(group_t + Duration::from_micros(40));
ns.observe_csi_frame_arrival(group_t + Duration::from_micros(80));
}
assert!(
(ns.csi_fps_ema - 40.0).abs() < 2.0,
"csi_fps_ema must stay near the 40 fps ground truth despite \
sub-ms bursts, got {}",
ns.csi_fps_ema
);
}
#[test]
fn apply_sync_packet_populates_a_fresh_node() {
// Mirrors what udp_receiver_task does on the very first sync
@@ -10,7 +10,7 @@ use std::collections::HashMap;
use std::sync::LazyLock;
use std::time::{Duration, Instant};
use wifi_densepose_signal::hardware_norm::{CanonicalCsiFrame, HardwareType};
use wifi_densepose_signal::hardware_norm::{CanonicalCsiFrame, HardwareNormalizer, HardwareType};
use wifi_densepose_signal::ruvsense::multiband::MultiBandCsiFrame;
use wifi_densepose_signal::ruvsense::multistatic::{FusedSensingFrame, MultistaticFuser};
@@ -26,6 +26,11 @@ const DEFAULT_FREQ_MHZ: u32 = 2437; // Channel 6
/// are relative to this instant, avoiding wall-clock/monotonic mixing issues.
static EPOCH: LazyLock<Instant> = LazyLock::new(Instant::now);
/// Shared length-only canonicalizer (issue #1170). The default 56-tone grid
/// matches what `MultistaticFuser` (ADR-154) expects. Stateless and immutable,
/// so a single process-wide instance is safe to share across nodes.
static NORMALIZER: LazyLock<HardwareNormalizer> = LazyLock::new(HardwareNormalizer::new);
/// Convert a single `NodeState` into a `MultiBandCsiFrame` suitable for
/// multistatic fusion.
///
@@ -38,7 +43,14 @@ pub fn node_frame_from_state(node_id: u8, ns: &NodeState) -> Option<MultiBandCsi
return None;
}
let amplitude: Vec<f32> = latest.iter().map(|&v| v as f32).collect();
// Issue #1170: resample the raw amplitude onto the canonical 56-tone grid
// BEFORE fusion. ESP32 nodes in mixed HT20/HT40 capture modes report
// different subcarrier counts (64 / 128 / 192); feeding those raw into
// `MultistaticFuser::fuse` tripped `DimensionMismatch` on every cycle and
// silently disabled real multistatic fusion. Length-only canonicalization
// (no z-score) keeps the amplitude scale the person-score relies on.
let canonical_amp = NORMALIZER.resample_to_canonical(latest);
let amplitude: Vec<f32> = canonical_amp.iter().map(|&v| v as f32).collect();
let n_sub = amplitude.len();
let phase = vec![0.0_f32; n_sub];
@@ -201,15 +213,58 @@ mod tests {
assert_eq!(frame.channel_frames.len(), 1);
let ch = &frame.channel_frames[0];
assert_eq!(ch.amplitude.len(), 3);
assert!((ch.amplitude[0] - 10.0_f32).abs() < f32::EPSILON);
assert!((ch.amplitude[1] - 20.0_f32).abs() < f32::EPSILON);
assert!((ch.amplitude[2] - 30.5_f32).abs() < f32::EPSILON);
// Issue #1170: amplitude is now resampled onto the canonical 56-tone
// grid regardless of the raw count.
assert_eq!(ch.amplitude.len(), 56);
// resample_cubic preserves the endpoints (no z-scoring), so the scale
// the person-score relies on is intact.
assert!((ch.amplitude[0] - 10.0_f32).abs() < 1e-3);
assert!((ch.amplitude[55] - 30.5_f32).abs() < 1e-3);
// Phase should be all zeros
assert!(ch.phase.iter().all(|&p| p == 0.0));
assert_eq!(ch.hardware_type, HardwareType::Esp32S3);
}
#[test]
fn heterogeneous_node_counts_canonicalize_and_fuse() {
// Issue #1170 regression: a mixed mesh with HT20 (64-bin) and HT40
// (192-bin) nodes must canonicalize to a uniform 56 tones and fuse,
// instead of tripping DimensionMismatch on every cycle.
let mut states: HashMap<u8, NodeState> = HashMap::new();
let mut h64 = VecDeque::new();
h64.push_back((0..64).map(|i| 1.0 + 0.1 * i as f64).collect::<Vec<f64>>());
states.insert(1, make_node_state(h64, Some(Instant::now()), 1));
let mut h192 = VecDeque::new();
h192.push_back((0..192).map(|i| 2.0 + 0.05 * i as f64).collect::<Vec<f64>>());
states.insert(3, make_node_state(h192, Some(Instant::now()), 1));
let frames = node_frames_from_states(&states);
assert_eq!(frames.len(), 2, "both nodes should produce frames");
for f in &frames {
assert_eq!(
f.channel_frames[0].amplitude.len(),
56,
"every node must present the canonical 56-tone dimension"
);
}
// The fuser must now accept the cycle (no DimensionMismatch).
let fuser = MultistaticFuser::new();
let result = fuser.fuse(&frames);
assert!(
result.is_ok(),
"heterogeneous mesh should fuse after canonicalization, got {result:?}"
);
// And the higher-level fallback path returns the fused frame, not the
// sum/dedup fallback.
let (fused, fallback) = fuse_or_fallback(&fuser, &states, 3.0);
assert!(fused.is_some(), "fusion should succeed");
assert!(fallback.is_none(), "no fallback when fusion succeeds");
}
#[test]
fn test_stale_node_excluded() {
let mut states: HashMap<u8, NodeState> = HashMap::new();
@@ -167,6 +167,22 @@ impl HardwareNormalizer {
hardware_type: hw,
})
}
/// Resample a raw 1-D CSI vector onto the canonical subcarrier grid
/// **without** z-score normalization (length-only canonicalization).
///
/// Used by the live multistatic bridge (issue #1170): heterogeneous
/// ESP32 capture modes report different subcarrier counts (HT20 ≈ 64,
/// HT40 ≈ 128/192), and [`MultistaticFuser`] requires every node frame
/// to share one dimension. Full [`Self::normalize`] would z-score the
/// amplitude (mean → 0), which saturates the downstream person-score
/// (a squared coefficient of variation `variance / mean²`); resampling
/// alone makes frames fusable while preserving amplitude scale.
///
/// [`MultistaticFuser`]: crate::ruvsense::multistatic::MultistaticFuser
pub fn resample_to_canonical(&self, raw: &[f64]) -> Vec<f64> {
resample_cubic(raw, self.canonical_subcarriers)
}
}
impl Default for HardwareNormalizer {
@@ -344,6 +360,25 @@ mod tests {
}
}
#[test]
fn resample_to_canonical_is_length_only_no_zscore() {
// Issue #1170: resample_to_canonical must change length to 56 but
// NOT z-score (mean must be preserved, not driven to ~0). A raw
// amplitude vector with a large positive mean keeps that mean.
let norm = HardwareNormalizer::new();
let raw: Vec<f64> = (0..192).map(|i| 50.0 + 0.1 * i as f64).collect();
let out = norm.resample_to_canonical(&raw);
assert_eq!(out.len(), 56, "must resample onto the 56-tone grid");
let mean = out.iter().sum::<f64>() / out.len() as f64;
assert!(
mean > 40.0,
"resample-only must preserve amplitude scale (mean ~60), got {mean}"
);
// Endpoints preserved.
assert!((out[0] - raw[0]).abs() < 1e-6);
assert!((out[55] - raw[191]).abs() < 0.5);
}
#[test]
fn zscore_produces_zero_mean_unit_std() {
let data: Vec<f64> = (0..100)