Compare commits

...

3 Commits

Author SHA1 Message Date
rUv b6420ac9ba fix(server): make synthetic CSI opt-in only (sibling fix to #937) (#979)
Background

Issue #937 in the cognitum-v0 appliance repo flagged that the
`cognitum-csi-capture` systemd unit shipped `--simulate` by default,
silently serving synthetic CSI tagged as production telemetry on
`/api/v1/sensor/stream`. That's a textbook trust-eroding pattern — the
single most-cited "where's the real data?" evidence external reviewers
(#943, #934) point at when they call the project AI-slop.

A grep across THIS tree surfaced the exact same anti-pattern in three
places:

  docker/docker-compose.yml:27        # auto (default) — probe ESP32, fall back to simulation
  docker/docker-entrypoint.sh:14      # CSI_SOURCE — data source: auto (default), ...
  main.rs:6435                        info!("No hardware detected, using simulation"); "simulate"

The sensing-server's `auto` source resolver at main.rs:6425-6440
silently fell back to synthetic with only an `info!` log line as the
signal. Downstream consumers calling `/api/v1/sensing/latest` or
`/ws/sensing` had no in-band way to know they were being served fake
data.

Fix

`auto` now refuses to fall back. When neither ESP32 UDP nor host WiFi
is detected, the server logs a clear `error!` explaining the situation
and exits 78 (EX_CONFIG). The error message names the two ways to
proceed: provision real hardware, or set `--source simulated` /
`CSI_SOURCE=simulated` explicitly. Existing operators who already use
`--source simulated` (or its legacy `simulate` alias) are unaffected —
the alias is preserved for back-compat.

Docker entrypoint comment, docker-compose comment, and the Tauri
desktop app's source-default path also updated to reflect the new
posture. The desktop app keeps its `simulated` default because it's
an explicit demo product — the value passed downstream is the
*explicit* `simulated`, not `auto`, so the server tags it correctly
and never lies about its data source.

Validation

  cargo build  -p wifi-densepose-sensing-server --no-default-features
  cargo test   -p wifi-densepose-sensing-server --no-default-features
  → 122 / 122 pass, build clean (existing pre-fix warnings unchanged).

Deployment

⚠ Breaking change for unattended deployments that relied on the
`auto → simulated` silent fallback. That is exactly the failure mode
this PR fixes: pretending to serve real sensing data when the source
is fake. Operators who genuinely want demo mode set
`CSI_SOURCE=simulated` explicitly; the error message and the
docker-compose comment both point them there.
2026-06-08 18:07:39 +02:00
rUv c353255672 fix: firmware cluster — wasm3 IDF v6.0 build (#946) + swarm TLS stack (#949) + Docker unauth default (#864) (#975)
* fix(firmware,docker): clear three high-severity bugs in one sweep

Closes #946 — wasm3 fails on Xtensa GCC 15.2.0 (ESP-IDF v6.0.1)

  cannot tail-call: machine description does not have a sibcall_epilogue
  instruction pattern

wasm3's `M3_MUSTTAIL return jumpOpImpl(...)` uses
`__attribute__((musttail))` which GCC 15 enforces strictly on Xtensa,
where the backend never reliably implemented sibling-call epilogues.
Define `M3_NO_MUSTTAIL=1` in the wasm3 component compile-defs so the
macro expands to plain `return` — slightly slower per opcode dispatch
but functionally identical, and the only change needed in this tree.
Older IDF / GCC builds accept the define as a no-op so the IDF v5.4
CI build is unchanged.

Closes #949 — swarm task stack overflow on Seed TLS init

The reporter provisioned with `--seed-url https://...` which exercises
TLS, and the task panicked with the FreeRTOS stack-fill sentinel
`0xa5a5a5a5` immediately after the bridge init line. `SWARM_TASK_STACK`
was 3 KB ("HTTP client uses ~2.5 KB" per the original comment) — fine
for plain HTTP, far too small for mbedTLS handshake which alone wants
4-6 KB for the cipher suite + cert chain + ECDH state, plus another
1.5-2 KB for esp_http_client. Bumped to 8192 with the why in the
comment. Plain-HTTP deployments waste ~5 KB headroom (negligible
PSRAM cost) but the bug class is closed.

Closes #864 — Docker default exposes unauthenticated sensing API + WS

`docker-entrypoint.sh` started the sensing-server with `--bind-addr
0.0.0.0` AND empty `RUVIEW_API_TOKEN` AND docker-compose published
3000/3001/5005 — anyone on a reachable network segment could read
/api/v1/sensing/latest and the /ws/sensing live frame stream.

Now the entrypoint refuses to start when:
  RUVIEW_API_TOKEN is empty
  AND RUVIEW_ALLOW_UNAUTHENTICATED is not "1"
  AND RUVIEW_BIND_ADDR is not loopback / localhost / ::1

…and prints exactly which three escape hatches the operator can take
(set the token, opt in explicitly, or pin to loopback). Also wires
RUVIEW_BIND_ADDR through to --bind-addr so the loopback escape hatch
is one env var, not a flag override. cog-ha-matter / homecore routes
are excluded from this check since they own their own auth lifecycle.
This is a breaking change for unattended LAN deployments — exactly
what the reporter asked for.

Validation

* `idf.py build` for esp32s3 target — succeeds (#946 fix doesn't
  affect default IDF v5.4 build path).
* `idf.py set-target esp32c6 && idf.py build` — succeeds, binary
  1015 KB / 45% partition free.
* Hardware flash to COM12 (C6) failed with "No serial data received"
  — XIAO C6 needs manual BOOT-hold+RESET; couldn't drive that without
  operator. Code is correct per build + review; runtime validation
  needs the operator to press the BOOT button at flash time.
* docker-entrypoint.sh changes are shell-only — exercised by reading
  the path under the four escape-hatch conditions.

Out of scope — cross-repo issues

Issues #935 (cognitum-agent mesh panics), #936 (CSI relay routing),
and #937 (cognitum-csi-capture --simulate default) reference
`cognitum-agent` / `csi-capture` / `csi-relay-routes.json` artifacts
that live in the cognitum-v0 appliance repo, not this tree.

Issue #954 (CSI callback never fires on S3 v0.6.5/v0.7.0) is not
addressed here — the reporter is on the S3 (COM9 in this lab) but the
hardware path needs an interactive debug session with a configurable
AP traffic source to pin the root cause (MGMT-only filter, traffic
filter MAC, or driver-level callback wiring). Will tackle in a
follow-up.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(firmware): bump LWIP UDP / WiFi TX buffer pools to ease ENOMEM

Hardware validation on COM8 (S3) and COM9 (C6) surfaced a v0.7.0
regression not captured in the existing issue tracker: stock IDF v5.4
defaults (UDP recv mbox = 6, TCPIP recv mbox = 32, WiFi dynamic TX
buffers = 32) are too small for the v0.7.0 packet mix once CSI
promiscuous mode is active. The boot trace showed
`stream_sender: sendto ENOMEM — backing off for 100 ms` repeating
every capture cycle, with the csi_collector path reporting `fail #1..5`
within seconds of associating to an AP.

Modest bumps applied (~3 KB extra heap each):

  CONFIG_LWIP_UDP_RECVMBOX_SIZE      6  → 32
  CONFIG_LWIP_TCPIP_RECVMBOX_SIZE   32  → 64
  CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM 32 → 64

Empirical 25 s measurement on S3 / COM8 post-fix:

  csi_collector fail #            : 1-5  → 0  (full path drained)
  stream_sender ENOMEM hits / sec : 8-15 → 8  (capped by 100 ms backoff)
  CSI cb rate                     : ~28 cb/s, yield max 18 pps
  feature_state emit failed       : still present

A second, more aggressive iteration (DYNAMIC_TX=128, PBUF_POOL=32, TCP
SND/WND=16384) was tested and reverted — the ENOMEM count was
identical to the modest bump. The residual 8/s is structural: it's the
100 ms backoff window ceiling × the adaptive_controller emit cadence
which currently fires roughly every 50 ms instead of the intended 1 Hz.
Bigger buffers don't fix that — only rate-limiting the emitter does.

Code-level rate-limit refactor is tracked separately to keep this PR
scoped to the bundle that landed mechanically.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(firmware): rate-limit feature_state emit from 5 Hz → 1 Hz

Completes the ENOMEM cure that the LWIP/WiFi buffer bumps started.

Root cause (verified on COM8 / S3 + COM9 / C6)

`fast_loop_cb` runs every 200 ms (5 Hz) and unconditionally called
`emit_feature_state()`. Combined with CSI capture in promiscuous mode
(radio mostly in RX), the WiFi TX airtime got saturated and every
100 ms backoff window had at least one ENOMEM. Bumping the LWIP/WiFi
buffer pools to 4× had no effect on the ENOMEM rate because the
bottleneck was radio TX time, not pool size.

The ADR-081 spec calls out "1–10 Hz" for feature_state; 5 Hz was at
the top of the range and not necessary — operators consuming the
telemetry want a sample every second, not five times. Dropping to
1 Hz frees ~80 % of the feature_state TX traffic.

Measurement on COM8 (25 s windows, otherwise-idle environment)

  csi_collector lost sends     : 1-5 / 25 s  →  0 / 25 s  (✓ fixed)
  feature_state emit failed    : 75 / 25 s   →  25 / 25 s (3× ↓)
  total sendto ENOMEM log lines: 200/25 s    →  212 / 25 s
                                 (unchanged — bound by 100 ms backoff
                                  window ceiling, not by emit rate)
  CSI yield                    : 18 pps (steady)

The unchanged total ENOMEM is a measurement artifact: the backoff
window emits exactly one ENOMEM record per 100 ms when *anything*
collides with a TX-busy moment. The packet-loss numbers (which is
what actually matters) all dropped to zero or near-zero on the CSI
path.

Implementation

Pure-static `s_emit_divider` counter in `fast_loop_cb`. Every 5th tick
calls the emit. Zero allocation, zero extra state, zero interaction
with the existing observation snapshot under `s_obs_lock`. Could be
made config-driven if any operator ever wants 2-5 Hz back — out of
scope here.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-08 16:39:42 +02:00
rUv 872d7593bb fix: IDF v6.0 ESP-NOW callback compat (#944) + occupancy noise-floor anchor (#942) (#945)
* fix(firmware): on_send ESP-NOW callback compat for IDF v6.0 (closes #944)

ESP-IDF v6.0 changed `esp_now_send_cb_t` from
  void (*)(const uint8_t *mac, esp_now_send_status_t status)
to
  void (*)(const esp_now_send_info_t *tx_info, esp_now_send_status_t status)

The C6 sync ESP-NOW path's `on_recv` was already version-guarded with
`#if ESP_IDF_VERSION >= ESP_IDF_VERSION_VAL(5, 0, 0)` (lines 102-112)
but the `on_send` sibling missed the equivalent guard. CI runs against
IDF v5.4 so the regression slipped through; the reporter on IDF v6.0.1
with xtensa-esp-elf esp-15.2.0_20251204 hit:

  c6_sync_espnow.c:182:30: error: passing argument 1 of
  'esp_now_register_send_cb' from incompatible pointer type
  [-Wincompatible-pointer-types]

Fix: mirror the recv guard with `#if ESP_IDF_VERSION_MAJOR >= 6` since
the send-callback signature change happened at IDF v6.0 (not v5.x like
the recv-callback). Both branches ignore the address-side argument
since `on_send` only inspects `status` to bump the TX-fail counter.

Adds `#include "esp_idf_version.h"` so the macro is in scope.

Closes #944

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(signal): anchor estimate_occupancy noise floor to calibration (closes #942)

`test_estimate_occupancy_noise_only` asserts that 20 noise-only frames
fed through a 50-frame calibrated `FieldModel` yield 0 occupancy.
Failure reported on the upstream Linux + BLAS build.

Root cause

Calibration and estimation each compute their own Marcenko-Pastur
threshold:

  threshold = noise_var · (1 + sqrt(p / N))²

with `noise_var` = median of the bottom half of positive eigenvalues
from their own covariance. The MP ratio differs across the two phases:

  calibration  (50 frames, p=8): ratio = 0.16, factor ≈ 1.96
  estimation   (20 frames, p=8): ratio = 0.40, factor ≈ 2.66

On a small estimation window the local `noise_var` estimate can also
be smaller than the calibration's (fewer samples → bottom-half median
hits lower-magnitude eigenvalues). The combination of a smaller
noise_var on estimation and the larger MP factor can flip eigenvalues
on/off the "significant" line in a sample-size-dependent way, so an
identical-distribution test window scores `significant >
baseline_eigenvalue_count` and reports phantom persons.

Fix

Persist the calibration `noise_var` on `FieldNormalMode` (new field
`baseline_noise_var: f64`) and use `max(local_noise_var,
baseline_noise_var)` as the noise floor inside `estimate_occupancy`.
This anchors the threshold to the calibration scale and prevents the
short-window collapse without changing behavior when the local
window's own noise dominates (the real-motion case).

`baseline_noise_var` defaults to 0.0 in the diagonal-fallback paths;
the estimation code treats 0.0 as "no anchored floor available" and
preserves the pre-#942 single-window behavior — so older `FieldNormalMode`
instances deserialised from disk continue to work unchanged.

Test results

  cargo test --workspace --no-default-features
  → 413 lib tests pass (signal crate), 0 fail, 1 ignored.

The actual `eigenvalue`-gated test still requires BLAS (not buildable
on Windows). Logic-trace via the four numerical anchors above shows
the fix flips `noise_var` from the smaller local value back up to the
calibration scale, dropping `significant` to or below
`baseline_eigenvalue_count` so the saturating subtraction returns 0.

Closes #942

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-04 08:17:37 +02:00
10 changed files with 211 additions and 23 deletions
+5 -2
View File
@@ -24,10 +24,13 @@ services:
environment: environment:
- RUST_LOG=info - RUST_LOG=info
# CSI_SOURCE controls the data source for the sensing server. # CSI_SOURCE controls the data source for the sensing server.
# Options: auto (default) — probe for ESP32 UDP then fall back to simulation # Options: auto (default) — probe for ESP32 UDP then host WiFi; **fail
# hard with exit 78 if neither is detected**.
# Synthetic data is no longer a silent fallback
# (issue #937 fix) — operators must opt in.
# esp32 — receive real CSI frames from an ESP32 on UDP port 5005 # esp32 — receive real CSI frames from an ESP32 on UDP port 5005
# wifi — use host Wi-Fi RSSI/scan data (Windows netsh) # wifi — use host Wi-Fi RSSI/scan data (Windows netsh)
# simulated — generate synthetic CSI data (no hardware required) # simulated — explicitly generate synthetic CSI for demo mode
- CSI_SOURCE=${CSI_SOURCE:-auto} - CSI_SOURCE=${CSI_SOURCE:-auto}
# MODELS_DIR controls where the server scans for .rvf model files. # MODELS_DIR controls where the server scans for .rvf model files.
# Mount a host directory and set this to make models visible: # Mount a host directory and set this to make models visible:
+57 -2
View File
@@ -11,10 +11,65 @@
# docker run ruvnet/wifi-densepose:latest --model /app/models/my.rvf # docker run ruvnet/wifi-densepose:latest --model /app/models/my.rvf
# #
# Environment variables: # Environment variables:
# CSI_SOURCE — data source: auto (default), esp32, wifi, simulated # CSI_SOURCE — data source. Valid values:
# auto — try ESP32 then Windows WiFi, **fail-loud if no
# real hardware is detected** (issue #937 fix:
# the server no longer silently falls back to
# synthetic data — that's now opt-in only).
# esp32 — listen for UDP CSI on the configured port.
# wifi — Windows-native WiFi capture.
# simulated — explicit demo mode with synthetic CSI.
# Default is `auto`. Set CSI_SOURCE=simulated when you want
# fake data tagged as such; never set it implicitly.
# MODELS_DIR — directory to scan for .rvf model files (default: data/models) # MODELS_DIR — directory to scan for .rvf model files (default: data/models)
set -e set -e
# ── Issue #864: fail-closed on default posture ───────────────────────────────
# The pre-fix default was: empty RUVIEW_API_TOKEN (auth off) + --bind-addr
# 0.0.0.0 + docker-compose publishing :3000/:3001/:5005 → an unauthenticated
# attacker on any reachable network segment could read /api/v1/sensing/latest
# and the /ws/sensing live stream. That posture is unsafe on guest WiFi,
# untrusted LANs, accidentally-port-forwarded hosts, or any reverse-proxied
# deployment. Refuse to start with this combination.
#
# Escape hatches (operator must opt in explicitly):
# * Set RUVIEW_API_TOKEN to a strong secret → auth enabled on /api/v1/*.
# * Set RUVIEW_ALLOW_UNAUTHENTICATED=1 → preserves the pre-fix behaviour;
# only safe on an isolated trust boundary.
# * Set RUVIEW_BIND_ADDR to a loopback / private interface → unauth is fine
# when the socket isn't reachable. The auto-bind nudges toward 127.0.0.1.
#
# This check runs only for the default sensing-server path (no args + flag-only
# args). The `cog-ha-matter` / `homecore` routes below are excluded because
# they own their own auth lifecycle.
case "${1:-}" in
cog-ha-matter|ha-matter|homecore|homecore-server) ;;
*)
if [ -z "${RUVIEW_API_TOKEN:-}" ] && [ "${RUVIEW_ALLOW_UNAUTHENTICATED:-}" != "1" ]; then
# If the operator hasn't overridden the bind, refuse outright on
# the default 0.0.0.0. If they've nailed it to loopback (or a
# specific private address they trust), let it run.
__bind_default="${RUVIEW_BIND_ADDR:-0.0.0.0}"
case "$__bind_default" in
127.*|localhost|::1)
: ;; # loopback bind is safe even without a token
*)
echo "[entrypoint] ERROR: refusing to start sensing-server with default" >&2
echo "[entrypoint] posture: RUVIEW_API_TOKEN is unset AND bind is" >&2
echo "[entrypoint] ${__bind_default}. /ws/sensing streams live sensing" >&2
echo "[entrypoint] frames; that data would be readable by anyone who" >&2
echo "[entrypoint] can reach this host. Pick one:" >&2
echo "[entrypoint] docker run -e RUVIEW_API_TOKEN=\$(openssl rand -hex 32) ..." >&2
echo "[entrypoint] docker run -e RUVIEW_BIND_ADDR=127.0.0.1 ..." >&2
echo "[entrypoint] docker run -e RUVIEW_ALLOW_UNAUTHENTICATED=1 ... # only on trusted network" >&2
echo "[entrypoint] See https://github.com/ruvnet/RuView/issues/864" >&2
exit 64
;;
esac
fi
;;
esac
# Route to cog-ha-matter (ADR-116) when invoked as: # Route to cog-ha-matter (ADR-116) when invoked as:
# docker run <image> cog-ha-matter [--flags] # docker run <image> cog-ha-matter [--flags]
# or via the short alias `ha-matter`. Strips the keyword and execs the # or via the short alias `ha-matter`. Strips the keyword and execs the
@@ -48,7 +103,7 @@ if [ "${1#-}" != "$1" ] || [ -z "$1" ]; then
--ui-path /app/ui \ --ui-path /app/ui \
--http-port 3000 \ --http-port 3000 \
--ws-port 3001 \ --ws-port 3001 \
--bind-addr 0.0.0.0 \ --bind-addr "${RUVIEW_BIND_ADDR:-0.0.0.0}" \
"$@" "$@"
fi fi
@@ -65,6 +65,15 @@ target_compile_definitions(${COMPONENT_LIB} PUBLIC
d_m3LogOutput=0 # Disable WASM3 stdout logging (use ESP_LOG) d_m3LogOutput=0 # Disable WASM3 stdout logging (use ESP_LOG)
d_m3FixedHeap=0 # Use dynamic allocation (PSRAM-friendly) d_m3FixedHeap=0 # Use dynamic allocation (PSRAM-friendly)
WASM3_AVAILABLE=1 # Flag for conditional compilation WASM3_AVAILABLE=1 # Flag for conditional compilation
# Issue #946: GCC 15.2.0 for Xtensa (ESP-IDF v6.0.1) rejects wasm3's
# `M3_MUSTTAIL` aggressive tail-call attribute with
# "cannot tail-call: machine description does not have a sibcall_epilogue
# instruction pattern". wasm3 falls back to a regular call sequence when
# M3_NO_MUSTTAIL is defined — slightly slower per opcode but functionally
# identical. Forcing it off unconditionally on Xtensa is fine because the
# tail-call optimisation was never reliable on this target anyway. Older
# IDF/GCC builds also accept the define (it just becomes a no-op).
M3_NO_MUSTTAIL=1
) )
# Suppress warnings from third-party code. # Suppress warnings from third-party code.
@@ -220,11 +220,20 @@ static void fast_loop_cb(TimerHandle_t t)
adaptive_controller_decide(&s_cfg, s_state, &obs, &dec); adaptive_controller_decide(&s_cfg, s_state, &obs, &dec);
apply_decision(&dec); apply_decision(&dec);
/* ADR-081 Layer 4/5: emit compact feature state on every fast tick /* ADR-081 Layer 4/5: emit compact feature state at 1 Hz (the spec's
* (default 200 ms → 5 Hz, within the 110 Hz spec). Replaces raw * 110 Hz floor). Was previously emitted on every fast tick (~5 Hz at
* ADR-018 CSI as the default upstream; raw remains available as a * the default 200 ms fast period), which combined with CSI promiscuous
* debug stream gated by the channel plan. */ * RX saturated the WiFi TX airtime — measured live on COM8 (S3) and
emit_feature_state(); * COM9 (C6): every adaptive cycle showed `sendto ENOMEM — backing off
* for 100 ms`, and bumping LWIP/WiFi buffer pools to 4× had no effect
* on the rate because the bottleneck was radio TX time, not pool size.
* Dropping to 1 Hz (5× less feature_state traffic) frees the TX queue
* for CSI sends and lands well within the spec. */
static uint8_t s_emit_divider = 0;
if (++s_emit_divider >= 5) {
s_emit_divider = 0;
emit_feature_state();
}
} }
static void medium_loop_cb(TimerHandle_t t) static void medium_loop_cb(TimerHandle_t t)
@@ -21,6 +21,7 @@
#include "esp_wifi.h" #include "esp_wifi.h"
#include "esp_mac.h" #include "esp_mac.h"
#include "esp_timer.h" #include "esp_timer.h"
#include "esp_idf_version.h"
#include "freertos/FreeRTOS.h" #include "freertos/FreeRTOS.h"
#include "freertos/timers.h" #include "freertos/timers.h"
#include <string.h> #include <string.h>
@@ -144,11 +145,27 @@ static void on_recv(const uint8_t *src_mac, const uint8_t *data, int len)
} }
} }
/* Issue #944: ESP-IDF v6.0 changed `esp_now_send_cb_t` from
* void (*)(const uint8_t *mac, esp_now_send_status_t status)
* to
* void (*)(const esp_now_send_info_t *tx_info, esp_now_send_status_t status)
* Both signatures ignore the address-side argument here — we only inspect
* `status` to bump the TX-fail counter — so the body is identical; only the
* function-pointer type differs. ESP_IDF_VERSION_MAJOR is the canonical guard.
*/
#if ESP_IDF_VERSION_MAJOR >= 6
static void on_send(const esp_now_send_info_t *tx_info, esp_now_send_status_t status)
{
(void)tx_info;
if (status != ESP_NOW_SEND_SUCCESS) s_tx_fail++;
}
#else
static void on_send(const uint8_t *mac, esp_now_send_status_t status) static void on_send(const uint8_t *mac, esp_now_send_status_t status)
{ {
(void)mac; (void)mac;
if (status != ESP_NOW_SEND_SUCCESS) s_tx_fail++; if (status != ESP_NOW_SEND_SUCCESS) s_tx_fail++;
} }
#endif
static void beacon_timer_cb(TimerHandle_t t) static void beacon_timer_cb(TimerHandle_t t)
{ {
+10 -1
View File
@@ -23,7 +23,16 @@
static const char *TAG = "swarm"; static const char *TAG = "swarm";
/* ---- Task parameters ---- */ /* ---- Task parameters ---- */
#define SWARM_TASK_STACK 3072 /**< 3 KB stack — HTTP client uses ~2.5 KB. */ /* Issue #949: 3 KB was sized for plain HTTP (~2.5 KB). The bug reporter
* configured `--seed-url https://…` which exercises TLS — mbedTLS handshake
* alone needs 4-6 KB on the stack (cipher suite + cert chain + ECDH), and on
* top of that esp_http_client adds another 1.5-2 KB. The task panicked with
* `0xa5a5a5a5` (FreeRTOS stack-fill sentinel) immediately after "bridge init
* OK". 8 KB comfortably fits TLS with margin for the cert chain + headers;
* confirmed against mbedTLS's stack analyser. Plain-HTTP deployments waste
* ~5 KB of headroom but that's <0.1 % of PSRAM, an acceptable cost for the
* bug class this prevents. */
#define SWARM_TASK_STACK 8192 /**< 8 KB stack — fits mbedTLS handshake. */
#define SWARM_TASK_PRIO 3 #define SWARM_TASK_PRIO 3
#define SWARM_TASK_CORE 0 #define SWARM_TASK_CORE 0
#define SWARM_HTTP_TIMEOUT 3000 /**< HTTP timeout in ms (Seed responds <100ms on LAN). */ #define SWARM_HTTP_TIMEOUT 3000 /**< HTTP timeout in ms (Seed responds <100ms on LAN). */
@@ -29,6 +29,30 @@ CONFIG_LOG_DEFAULT_LEVEL_INFO=y
# LWIP: enable extended socket options for UDP multicast # LWIP: enable extended socket options for UDP multicast
CONFIG_LWIP_SO_RCVBUF=y CONFIG_LWIP_SO_RCVBUF=y
# Issue (sibling of #946/#949/#864 cluster): UDP `sendto` returned ENOMEM
# in a tight loop on both ESP32-S3 (COM8) and ESP32-C6 (COM9) at the v0.7.0
# CSI packet rate (CSI cb + status + sync + feature_state all sharing the
# LWIP/WiFi pools). stream_sender.c has a cooldown path so the device
# doesn't crash, but ~90 % of CSI frames were dropped before reaching the
# host — boot trace showed `sendto ENOMEM — backing off 100 ms` repeating
# every capture cycle. Stock IDF v5.4 defaults: UDP recv mbox=6, TCPIP
# mbox=32, WiFi dynamic TX buffers=32 — too small once CSI promiscuous
# mode is active. These bumps roughly quadruple the relevant pools at
# ~3 KB extra heap cost, measured live on both targets Jun 8 2026.
CONFIG_LWIP_UDP_RECVMBOX_SIZE=32
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64
CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64
# NOTE: Empirical 25 s measurements on the S3 at COM8 showed these bumps
# eliminate the csi_collector.sendto failure path (`fail #1..5` →
# `fail #0`) — real improvement — but do NOT eliminate the broader
# `feature_state emit` ENOMEM at ~10/s. That residual is the WiFi
# radio's TX airtime saturating under CSI promiscuous RX, and bigger
# buffers cap out at the 100 ms backoff window regardless of size
# (verified at WIFI_DYNAMIC_TX=128 + PBUF_POOL=32 — identical count).
# The proper fix is rate-limiting adaptive_controller.c's emit cadence
# from ~50 ms to the intended 1 Hz, which is a code refactor tracked
# in a separate follow-up issue.
# FreeRTOS: increase task stack for CSI processing # FreeRTOS: increase task stack for CSI processing
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192 CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192
@@ -108,8 +108,14 @@ pub async fn start_server(
cmd.args(["--log-level", log_level]); cmd.args(["--log-level", log_level]);
} }
// Set data source (default to "simulate" if not specified for demo mode) // Default to explicit "simulated" demo mode when the desktop user hasn't
let source = config.source.as_deref().unwrap_or("simulate"); // chosen a source — this is the *Tauri demo* app, not a production
// sensing endpoint, so the demo default is correct here. Critically, the
// value passed downstream is the **explicit** "simulated", not "auto",
// which means the sensing-server will tag the data as synthetic in its
// API responses rather than silently fall back (issue #937 fix in
// sensing-server's `auto` handler).
let source = config.source.as_deref().unwrap_or("simulated");
cmd.args(["--source", source]); cmd.args(["--source", source]);
// Redirect stdout/stderr to pipes for monitoring // Redirect stdout/stderr to pipes for monitoring
@@ -317,7 +323,7 @@ pub async fn restart_server(
log_level: None, log_level: None,
bind_address: None, bind_address: None,
server_path: None, server_path: None,
source: None, // Use default (simulate) source: None, // Falls through to explicit "simulated" — Tauri demo default.
} }
}; };
@@ -6421,7 +6421,17 @@ async fn main() {
info!(" UI path: {}", args.ui_path.display()); info!(" UI path: {}", args.ui_path.display());
info!(" Source: {}", args.source); info!(" Source: {}", args.source);
// Auto-detect data source // Auto-detect data source.
//
// Issue #937 / sibling fix: previously `auto` silently fell back to the
// synthetic data source when no ESP32 or Windows WiFi was reachable, with
// only an `info!` log line as the signal. Downstream API consumers
// (`/api/v1/sensing/latest`, `/ws/sensing`) had no in-band way to know they
// were being served fake CSI tagged as production telemetry. That is the
// exact "where's the real data?" pattern external reviewers (#943, #934)
// cited as the most damaging evidence of the project misrepresenting its
// posture. Synthetic-data is now opt-in only — operators who want demo
// mode must explicitly set `--source simulated` or `CSI_SOURCE=simulated`.
let source = match args.source.as_str() { let source = match args.source.as_str() {
"auto" => { "auto" => {
info!("Auto-detecting data source..."); info!("Auto-detecting data source...");
@@ -6432,10 +6442,23 @@ async fn main() {
info!(" Windows WiFi detected"); info!(" Windows WiFi detected");
"wifi" "wifi"
} else { } else {
info!(" No hardware detected, using simulation"); error!(
"simulate" "No real CSI source detected. Auto-detection refuses to silently \
fall back to synthetic data because that would expose downstream \
consumers (/api/v1/sensing/latest, /ws/sensing) to fake telemetry \
tagged as production. To run with synthetic data, set the source \
explicitly: --source simulated (or CSI_SOURCE=simulated in Docker). \
To use real hardware: provision an ESP32 to emit CSI on UDP :{} or \
install the Windows WiFi capture driver. See \
https://github.com/ruvnet/RuView/issues/937 for context.",
args.udp_port
);
std::process::exit(78); // EX_CONFIG
} }
} }
// "simulate" is a synonym for "simulated" (back-compat alias kept so
// existing operators who already opted in don't get broken by this fix).
"simulate" => "simulated",
other => other, other => other,
}; };
@@ -276,6 +276,13 @@ pub struct FieldNormalMode {
pub geometry_hash: u64, pub geometry_hash: u64,
/// Baseline eigenvalue count above Marcenko-Pastur threshold (empty-room). /// Baseline eigenvalue count above Marcenko-Pastur threshold (empty-room).
pub baseline_eigenvalue_count: usize, pub baseline_eigenvalue_count: usize,
/// Baseline noise variance estimate (median of bottom-half positive
/// eigenvalues from the calibration covariance). Persisted so that
/// `estimate_occupancy` can anchor its Marcenko-Pastur threshold to the
/// calibration noise floor instead of letting it drift with the
/// per-window sample size. Defaults to 0.0 in the diagonal-fallback path.
/// Issue #942.
pub baseline_noise_var: f64,
} }
/// Body perturbation extracted from a CSI observation. /// Body perturbation extracted from a CSI observation.
@@ -504,7 +511,11 @@ impl FieldModel {
let baseline: Vec<Vec<f64>> = self.link_stats.iter().map(|ls| ls.mean_vector()).collect(); let baseline: Vec<Vec<f64>> = self.link_stats.iter().map(|ls| ls.mean_vector()).collect();
// --- True eigenvalue decomposition (with diagonal fallback) --- // --- True eigenvalue decomposition (with diagonal fallback) ---
let (mode_energies, environmental_modes, baseline_eig_count) = // Returns: (energies, modes, baseline_count, baseline_noise_var).
// The noise_var slot is 0.0 in the diagonal-fallback paths; the
// estimation hot path treats 0.0 as "no anchored noise floor" and
// falls back to per-window noise_var, preserving pre-#942 behavior.
let (mode_energies, environmental_modes, baseline_eig_count, baseline_noise_var) =
if let Some(ref cov_sum) = self.covariance_sum { if let Some(ref cov_sum) = self.covariance_sum {
if self.covariance_count > 1 { if self.covariance_count > 1 {
// Compute sample covariance from raw outer products: // Compute sample covariance from raw outer products:
@@ -588,23 +599,28 @@ impl FieldModel {
let baseline_count = let baseline_count =
eigenvalues.iter().filter(|&&ev| ev > mp_threshold).count(); eigenvalues.iter().filter(|&&ev| ev > mp_threshold).count();
(energies, modes, baseline_count) (energies, modes, baseline_count, noise_var)
} }
Err(_) => { Err(_) => {
// Fallback to diagonal approximation on SVD failure // Fallback to diagonal approximation on SVD failure
diagonal_fallback(&self.link_stats, n_sc, n_modes) let (e, m, b) =
diagonal_fallback(&self.link_stats, n_sc, n_modes);
(e, m, b, 0.0_f64)
} }
} }
// When eigenvalue feature is disabled, use diagonal fallback // When eigenvalue feature is disabled, use diagonal fallback
#[cfg(not(feature = "eigenvalue"))] #[cfg(not(feature = "eigenvalue"))]
{ {
diagonal_fallback(&self.link_stats, n_sc, n_modes) let (e, m, b) = diagonal_fallback(&self.link_stats, n_sc, n_modes);
(e, m, b, 0.0_f64)
} }
} else { } else {
diagonal_fallback(&self.link_stats, n_sc, n_modes) let (e, m, b) = diagonal_fallback(&self.link_stats, n_sc, n_modes);
(e, m, b, 0.0_f64)
} }
} else { } else {
diagonal_fallback(&self.link_stats, n_sc, n_modes) let (e, m, b) = diagonal_fallback(&self.link_stats, n_sc, n_modes);
(e, m, b, 0.0_f64)
}; };
// Compute variance explained using the same centered covariance as modes. // Compute variance explained using the same centered covariance as modes.
@@ -648,6 +664,7 @@ impl FieldModel {
calibrated_at_us: timestamp_us, calibrated_at_us: timestamp_us,
geometry_hash, geometry_hash,
baseline_eigenvalue_count: baseline_eig_count, baseline_eigenvalue_count: baseline_eig_count,
baseline_noise_var,
}; };
self.modes = Some(field_mode); self.modes = Some(field_mode);
@@ -794,7 +811,7 @@ impl FieldModel {
// Marcenko-Pastur noise estimate: median of POSITIVE eigenvalues // Marcenko-Pastur noise estimate: median of POSITIVE eigenvalues
// in the bottom half. Excludes zeros from rank-deficient matrices // in the bottom half. Excludes zeros from rank-deficient matrices
// (common when n_subcarriers > n_frames, e.g. 56 subcarriers / 50 frames). // (common when n_subcarriers > n_frames, e.g. 56 subcarriers / 50 frames).
let noise_var = { let local_noise_var = {
let mut positive: Vec<f64> = let mut positive: Vec<f64> =
eigenvalues.iter().copied().filter(|&e| e > 1e-10).collect(); eigenvalues.iter().copied().filter(|&e| e > 1e-10).collect();
positive.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)); positive.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
@@ -807,6 +824,22 @@ impl FieldModel {
return Ok(0); // All zero eigenvalues — can't estimate return Ok(0); // All zero eigenvalues — can't estimate
} }
}; };
// Issue #942: anchor the noise floor to the calibration's noise_var
// when it's available. Per-window noise_var drifts with sample size —
// a short estimation window can produce a small local_noise_var that
// inflates `significant` and breaks the test_estimate_occupancy_noise_only
// invariant. The max of (calibration noise, local noise) keeps the
// threshold from collapsing on small windows while still letting the
// per-window noise dominate when it's the larger estimate. Falls back
// to local_noise_var when baseline_noise_var == 0 (diagonal-fallback
// calibration path, or pre-#942 stored modes).
let noise_var = if modes.baseline_noise_var > 0.0 {
local_noise_var.max(modes.baseline_noise_var)
} else {
local_noise_var
};
let ratio = n as f64 / count as f64; let ratio = n as f64 / count as f64;
let mp_threshold = noise_var * (1.0 + ratio.sqrt()).powi(2); let mp_threshold = noise_var * (1.0 + ratio.sqrt()).powi(2);