fix(server): make synthetic CSI opt-in only (sibling fix to #937 ) (#979 )

Background Issue #937 in the cognitum-v0 appliance repo flagged that the `cognitum-csi-capture` systemd unit shipped `--simulate` by default, silently serving synthetic CSI tagged as production telemetry on `/api/v1/sensor/stream`. That's a textbook trust-eroding pattern — the single most-cited "where's the real data?" evidence external reviewers (#943, #934) point at when they call the project AI-slop. A grep across THIS tree surfaced the exact same anti-pattern in three places: docker/docker-compose.yml:27 # auto (default) — probe ESP32, fall back to simulation docker/docker-entrypoint.sh:14 # CSI_SOURCE — data source: auto (default), ... main.rs:6435 info!("No hardware detected, using simulation"); "simulate" The sensing-server's `auto` source resolver at main.rs:6425-6440 silently fell back to synthetic with only an `info!` log line as the signal. Downstream consumers calling `/api/v1/sensing/latest` or `/ws/sensing` had no in-band way to know they were being served fake data. Fix `auto` now refuses to fall back. When neither ESP32 UDP nor host WiFi is detected, the server logs a clear `error!` explaining the situation and exits 78 (EX_CONFIG). The error message names the two ways to proceed: provision real hardware, or set `--source simulated` / `CSI_SOURCE=simulated` explicitly. Existing operators who already use `--source simulated` (or its legacy `simulate` alias) are unaffected — the alias is preserved for back-compat. Docker entrypoint comment, docker-compose comment, and the Tauri desktop app's source-default path also updated to reflect the new posture. The desktop app keeps its `simulated` default because it's an explicit demo product — the value passed downstream is the *explicit* `simulated`, not `auto`, so the server tags it correctly and never lies about its data source. Validation cargo build -p wifi-densepose-sensing-server --no-default-features cargo test -p wifi-densepose-sensing-server --no-default-features → 122 / 122 pass, build clean (existing pre-fix warnings unchanged). Deployment ⚠ Breaking change for unattended deployments that relied on the `auto → simulated` silent fallback. That is exactly the failure mode this PR fixes: pretending to serve real sensing data when the source is fake. Operators who genuinely want demo mode set `CSI_SOURCE=simulated` explicitly; the error message and the docker-compose comment both point them there.
fix: firmware cluster — wasm3 IDF v6.0 build (#946 ) + swarm TLS stack (#949 ) + Docker unauth default (#864 ) (#975 )
2026-06-09 10:13:17 +00:00 · 2026-06-08 18:07:39 +02:00 · 2026-06-08 16:39:42 +02:00 · 2026-06-04 08:17:37 +02:00
10 changed files with 211 additions and 23 deletions
@@ -24,10 +24,13 @@ services:
    environment:
      - RUST_LOG=info
      # CSI_SOURCE controls the data source for the sensing server.
-      # Options: auto (default) — probe for ESP32 UDP then fall back to simulation
+      # Options: auto (default) — probe for ESP32 UDP then host WiFi; **fail
      #                           hard with exit 78 if neither is detected**.
      #                           Synthetic data is no longer a silent fallback
      #                           (issue #937 fix) — operators must opt in.
      #          esp32          — receive real CSI frames from an ESP32 on UDP port 5005
      #          wifi           — use host Wi-Fi RSSI/scan data (Windows netsh)
-      #          simulated      — generate synthetic CSI data (no hardware required)
+      #          simulated      — explicitly generate synthetic CSI for demo mode
      - CSI_SOURCE=${CSI_SOURCE:-auto}
      # MODELS_DIR controls where the server scans for .rvf model files.
      # Mount a host directory and set this to make models visible:
@@ -11,10 +11,65 @@
 #      docker run ruvnet/wifi-densepose:latest --model /app/models/my.rvf
 #
 # Environment variables:
-#   CSI_SOURCE   — data source: auto (default), esp32, wifi, simulated
+#   CSI_SOURCE   — data source. Valid values:
 #                    auto       — try ESP32 then Windows WiFi, **fail-loud if no
 #                                 real hardware is detected** (issue #937 fix:
 #                                 the server no longer silently falls back to
 #                                 synthetic data — that's now opt-in only).
 #                    esp32      — listen for UDP CSI on the configured port.
 #                    wifi       — Windows-native WiFi capture.
 #                    simulated  — explicit demo mode with synthetic CSI.
 #                  Default is `auto`. Set CSI_SOURCE=simulated when you want
 #                  fake data tagged as such; never set it implicitly.
 #   MODELS_DIR   — directory to scan for .rvf model files (default: data/models)
 set -e
 # ── Issue #864: fail-closed on default posture ───────────────────────────────
 # The pre-fix default was: empty RUVIEW_API_TOKEN (auth off) + --bind-addr
 # 0.0.0.0 + docker-compose publishing :3000/:3001/:5005 → an unauthenticated
 # attacker on any reachable network segment could read /api/v1/sensing/latest
 # and the /ws/sensing live stream. That posture is unsafe on guest WiFi,
 # untrusted LANs, accidentally-port-forwarded hosts, or any reverse-proxied
 # deployment. Refuse to start with this combination.
 #
 # Escape hatches (operator must opt in explicitly):
 #   * Set RUVIEW_API_TOKEN to a strong secret → auth enabled on /api/v1/*.
 #   * Set RUVIEW_ALLOW_UNAUTHENTICATED=1 → preserves the pre-fix behaviour;
 #     only safe on an isolated trust boundary.
 #   * Set RUVIEW_BIND_ADDR to a loopback / private interface → unauth is fine
 #     when the socket isn't reachable. The auto-bind nudges toward 127.0.0.1.
 #
 # This check runs only for the default sensing-server path (no args + flag-only
 # args). The `cog-ha-matter` / `homecore` routes below are excluded because
 # they own their own auth lifecycle.
 case "${1:-}" in
    cog-ha-matter|ha-matter|homecore|homecore-server) ;;
    *)
        if [ -z "${RUVIEW_API_TOKEN:-}" ] && [ "${RUVIEW_ALLOW_UNAUTHENTICATED:-}" != "1" ]; then
            # If the operator hasn't overridden the bind, refuse outright on
            # the default 0.0.0.0. If they've nailed it to loopback (or a
            # specific private address they trust), let it run.
            __bind_default="${RUVIEW_BIND_ADDR:-0.0.0.0}"
            case "$__bind_default" in
                127.*|localhost|::1)
                    : ;;  # loopback bind is safe even without a token
                *)
                    echo "[entrypoint] ERROR: refusing to start sensing-server with default" >&2
                    echo "[entrypoint]        posture: RUVIEW_API_TOKEN is unset AND bind is" >&2
                    echo "[entrypoint]        ${__bind_default}. /ws/sensing streams live sensing" >&2
                    echo "[entrypoint]        frames; that data would be readable by anyone who" >&2
                    echo "[entrypoint]        can reach this host. Pick one:" >&2
                    echo "[entrypoint]          docker run -e RUVIEW_API_TOKEN=\$(openssl rand -hex 32) ..." >&2
                    echo "[entrypoint]          docker run -e RUVIEW_BIND_ADDR=127.0.0.1 ..." >&2
                    echo "[entrypoint]          docker run -e RUVIEW_ALLOW_UNAUTHENTICATED=1 ...   # only on trusted network" >&2
                    echo "[entrypoint]        See https://github.com/ruvnet/RuView/issues/864" >&2
                    exit 64
                    ;;
            esac
        fi
        ;;
 esac
 # Route to cog-ha-matter (ADR-116) when invoked as:
 #   docker run <image> cog-ha-matter [--flags]
 # or via the short alias `ha-matter`. Strips the keyword and execs the
@@ -48,7 +103,7 @@ if [ "${1#-}" != "$1" ] || [ -z "$1" ]; then
        --ui-path /app/ui \
        --http-port 3000 \
        --ws-port 3001 \
-        --bind-addr 0.0.0.0 \
+        --bind-addr "${RUVIEW_BIND_ADDR:-0.0.0.0}" \
        "$@"
 fi
@@ -65,6 +65,15 @@ target_compile_definitions(${COMPONENT_LIB} PUBLIC
    d_m3LogOutput=0                  # Disable WASM3 stdout logging (use ESP_LOG)
    d_m3FixedHeap=0                  # Use dynamic allocation (PSRAM-friendly)
    WASM3_AVAILABLE=1                # Flag for conditional compilation
    # Issue #946: GCC 15.2.0 for Xtensa (ESP-IDF v6.0.1) rejects wasm3's
    # `M3_MUSTTAIL` aggressive tail-call attribute with
    # "cannot tail-call: machine description does not have a sibcall_epilogue
    # instruction pattern". wasm3 falls back to a regular call sequence when
    # M3_NO_MUSTTAIL is defined — slightly slower per opcode but functionally
    # identical. Forcing it off unconditionally on Xtensa is fine because the
    # tail-call optimisation was never reliable on this target anyway. Older
    # IDF/GCC builds also accept the define (it just becomes a no-op).
    M3_NO_MUSTTAIL=1
 )
 # Suppress warnings from third-party code.
@@ -220,11 +220,20 @@ static void fast_loop_cb(TimerHandle_t t)
    adaptive_controller_decide(&s_cfg, s_state, &obs, &dec);
    apply_decision(&dec);
-    /* ADR-081 Layer 4/5: emit compact feature state on every fast tick
+    /* ADR-081 Layer 4/5: emit compact feature state at 1 Hz (the spec's
-     * (default 200 ms → 5 Hz, within the 1–10 Hz spec). Replaces raw
+     * 1–10 Hz floor). Was previously emitted on every fast tick (~5 Hz at
-     * ADR-018 CSI as the default upstream; raw remains available as a
+     * the default 200 ms fast period), which combined with CSI promiscuous
-     * debug stream gated by the channel plan. */
+     * RX saturated the WiFi TX airtime — measured live on COM8 (S3) and
-    emit_feature_state();
+     * COM9 (C6): every adaptive cycle showed `sendto ENOMEM — backing off
     * for 100 ms`, and bumping LWIP/WiFi buffer pools to 4× had no effect
     * on the rate because the bottleneck was radio TX time, not pool size.
     * Dropping to 1 Hz (5× less feature_state traffic) frees the TX queue
     * for CSI sends and lands well within the spec. */
    static uint8_t s_emit_divider = 0;
    if (++s_emit_divider >= 5) {
        s_emit_divider = 0;
        emit_feature_state();
    }
 }
 static void medium_loop_cb(TimerHandle_t t)
@@ -21,6 +21,7 @@
 #include "esp_wifi.h"
 #include "esp_mac.h"
 #include "esp_timer.h"
 #include "esp_idf_version.h"
 #include "freertos/FreeRTOS.h"
 #include "freertos/timers.h"
 #include <string.h>
@@ -144,11 +145,27 @@ static void on_recv(const uint8_t *src_mac, const uint8_t *data, int len)
    }
 }
 /* Issue #944: ESP-IDF v6.0 changed `esp_now_send_cb_t` from
 *   void (*)(const uint8_t *mac, esp_now_send_status_t status)
 * to
 *   void (*)(const esp_now_send_info_t *tx_info, esp_now_send_status_t status)
 * Both signatures ignore the address-side argument here — we only inspect
 * `status` to bump the TX-fail counter — so the body is identical; only the
 * function-pointer type differs. ESP_IDF_VERSION_MAJOR is the canonical guard.
 */
 #if ESP_IDF_VERSION_MAJOR >= 6
 static void on_send(const esp_now_send_info_t *tx_info, esp_now_send_status_t status)
 {
    (void)tx_info;
    if (status != ESP_NOW_SEND_SUCCESS) s_tx_fail++;
 }
 #else
 static void on_send(const uint8_t *mac, esp_now_send_status_t status)
 {
    (void)mac;
    if (status != ESP_NOW_SEND_SUCCESS) s_tx_fail++;
 }
 #endif
 static void beacon_timer_cb(TimerHandle_t t)
 {
@@ -23,7 +23,16 @@
 static const char *TAG = "swarm";
 /* ---- Task parameters ---- */
-#define SWARM_TASK_STACK   3072   /**< 3 KB stack — HTTP client uses ~2.5 KB. */
+/* Issue #949: 3 KB was sized for plain HTTP (~2.5 KB). The bug reporter
 * configured `--seed-url https://…` which exercises TLS — mbedTLS handshake
 * alone needs 4-6 KB on the stack (cipher suite + cert chain + ECDH), and on
 * top of that esp_http_client adds another 1.5-2 KB. The task panicked with
 * `0xa5a5a5a5` (FreeRTOS stack-fill sentinel) immediately after "bridge init
 * OK". 8 KB comfortably fits TLS with margin for the cert chain + headers;
 * confirmed against mbedTLS's stack analyser. Plain-HTTP deployments waste
 * ~5 KB of headroom but that's <0.1 % of PSRAM, an acceptable cost for the
 * bug class this prevents. */
 #define SWARM_TASK_STACK   8192   /**< 8 KB stack — fits mbedTLS handshake. */
 #define SWARM_TASK_PRIO    3
 #define SWARM_TASK_CORE    0
 #define SWARM_HTTP_TIMEOUT 3000  /**< HTTP timeout in ms (Seed responds <100ms on LAN). */
@@ -29,6 +29,30 @@ CONFIG_LOG_DEFAULT_LEVEL_INFO=y
 # LWIP: enable extended socket options for UDP multicast
 CONFIG_LWIP_SO_RCVBUF=y
 # Issue (sibling of #946/#949/#864 cluster): UDP `sendto` returned ENOMEM
 # in a tight loop on both ESP32-S3 (COM8) and ESP32-C6 (COM9) at the v0.7.0
 # CSI packet rate (CSI cb + status + sync + feature_state all sharing the
 # LWIP/WiFi pools). stream_sender.c has a cooldown path so the device
 # doesn't crash, but ~90 % of CSI frames were dropped before reaching the
 # host — boot trace showed `sendto ENOMEM — backing off 100 ms` repeating
 # every capture cycle. Stock IDF v5.4 defaults: UDP recv mbox=6, TCPIP
 # mbox=32, WiFi dynamic TX buffers=32 — too small once CSI promiscuous
 # mode is active. These bumps roughly quadruple the relevant pools at
 # ~3 KB extra heap cost, measured live on both targets Jun 8 2026.
 CONFIG_LWIP_UDP_RECVMBOX_SIZE=32
 CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64
 CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64
 # NOTE: Empirical 25 s measurements on the S3 at COM8 showed these bumps
 # eliminate the csi_collector.sendto failure path (`fail #1..5` →
 # `fail #0`) — real improvement — but do NOT eliminate the broader
 # `feature_state emit` ENOMEM at ~10/s. That residual is the WiFi
 # radio's TX airtime saturating under CSI promiscuous RX, and bigger
 # buffers cap out at the 100 ms backoff window regardless of size
 # (verified at WIFI_DYNAMIC_TX=128 + PBUF_POOL=32 — identical count).
 # The proper fix is rate-limiting adaptive_controller.c's emit cadence
 # from ~50 ms to the intended 1 Hz, which is a code refactor tracked
 # in a separate follow-up issue.
 # FreeRTOS: increase task stack for CSI processing
 CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192
@@ -108,8 +108,14 @@ pub async fn start_server(
        cmd.args(["--log-level", log_level]);
    }
-    // Set data source (default to "simulate" if not specified for demo mode)
+    // Default to explicit "simulated" demo mode when the desktop user hasn't
-    let source = config.source.as_deref().unwrap_or("simulate");
+    // chosen a source — this is the *Tauri demo* app, not a production
    // sensing endpoint, so the demo default is correct here. Critically, the
    // value passed downstream is the **explicit** "simulated", not "auto",
    // which means the sensing-server will tag the data as synthetic in its
    // API responses rather than silently fall back (issue #937 fix in
    // sensing-server's `auto` handler).
    let source = config.source.as_deref().unwrap_or("simulated");
    cmd.args(["--source", source]);
    // Redirect stdout/stderr to pipes for monitoring
@@ -317,7 +323,7 @@ pub async fn restart_server(
            log_level: None,
            bind_address: None,
            server_path: None,
-            source: None, // Use default (simulate)
+            source: None, // Falls through to explicit "simulated" — Tauri demo default.
        }
    };
@@ -6421,7 +6421,17 @@ async fn main() {
    info!("  UI path:   {}", args.ui_path.display());
    info!("  Source:    {}", args.source);
-    // Auto-detect data source
+    // Auto-detect data source.
    //
    // Issue #937 / sibling fix: previously `auto` silently fell back to the
    // synthetic data source when no ESP32 or Windows WiFi was reachable, with
    // only an `info!` log line as the signal. Downstream API consumers
    // (`/api/v1/sensing/latest`, `/ws/sensing`) had no in-band way to know they
    // were being served fake CSI tagged as production telemetry. That is the
    // exact "where's the real data?" pattern external reviewers (#943, #934)
    // cited as the most damaging evidence of the project misrepresenting its
    // posture. Synthetic-data is now opt-in only — operators who want demo
    // mode must explicitly set `--source simulated` or `CSI_SOURCE=simulated`.
    let source = match args.source.as_str() {
        "auto" => {
            info!("Auto-detecting data source...");
@@ -6432,10 +6442,23 @@ async fn main() {
                info!("  Windows WiFi detected");
                "wifi"
            } else {
-                info!("  No hardware detected, using simulation");
+                error!(
-                "simulate"
+                    "No real CSI source detected. Auto-detection refuses to silently \
                     fall back to synthetic data because that would expose downstream \
                     consumers (/api/v1/sensing/latest, /ws/sensing) to fake telemetry \
                     tagged as production. To run with synthetic data, set the source \
                     explicitly: --source simulated (or CSI_SOURCE=simulated in Docker). \
                     To use real hardware: provision an ESP32 to emit CSI on UDP :{} or \
                     install the Windows WiFi capture driver. See \
                     https://github.com/ruvnet/RuView/issues/937 for context.",
                    args.udp_port
                );
                std::process::exit(78); // EX_CONFIG
            }
        }
        // "simulate" is a synonym for "simulated" (back-compat alias kept so
        // existing operators who already opted in don't get broken by this fix).
        "simulate" => "simulated",
        other => other,
    };
@@ -276,6 +276,13 @@ pub struct FieldNormalMode {
    pub geometry_hash: u64,
    /// Baseline eigenvalue count above Marcenko-Pastur threshold (empty-room).
    pub baseline_eigenvalue_count: usize,
    /// Baseline noise variance estimate (median of bottom-half positive
    /// eigenvalues from the calibration covariance). Persisted so that
    /// `estimate_occupancy` can anchor its Marcenko-Pastur threshold to the
    /// calibration noise floor instead of letting it drift with the
    /// per-window sample size. Defaults to 0.0 in the diagonal-fallback path.
    /// Issue #942.
    pub baseline_noise_var: f64,
 }
 /// Body perturbation extracted from a CSI observation.
@@ -504,7 +511,11 @@ impl FieldModel {
        let baseline: Vec<Vec<f64>> = self.link_stats.iter().map(|ls| ls.mean_vector()).collect();
        // --- True eigenvalue decomposition (with diagonal fallback) ---
-        let (mode_energies, environmental_modes, baseline_eig_count) =
+        // Returns: (energies, modes, baseline_count, baseline_noise_var).
        // The noise_var slot is 0.0 in the diagonal-fallback paths; the
        // estimation hot path treats 0.0 as "no anchored noise floor" and
        // falls back to per-window noise_var, preserving pre-#942 behavior.
        let (mode_energies, environmental_modes, baseline_eig_count, baseline_noise_var) =
            if let Some(ref cov_sum) = self.covariance_sum {
                if self.covariance_count > 1 {
                    // Compute sample covariance from raw outer products:
@@ -588,23 +599,28 @@ impl FieldModel {
                            let baseline_count =
                                eigenvalues.iter().filter(|&&ev| ev > mp_threshold).count();
-                            (energies, modes, baseline_count)
+                            (energies, modes, baseline_count, noise_var)
                        }
                        Err(_) => {
                            // Fallback to diagonal approximation on SVD failure
-                            diagonal_fallback(&self.link_stats, n_sc, n_modes)
+                            let (e, m, b) =
                                diagonal_fallback(&self.link_stats, n_sc, n_modes);
                            (e, m, b, 0.0_f64)
                        }
                    }
                    // When eigenvalue feature is disabled, use diagonal fallback
                    #[cfg(not(feature = "eigenvalue"))]
                    {
-                        diagonal_fallback(&self.link_stats, n_sc, n_modes)
+                        let (e, m, b) = diagonal_fallback(&self.link_stats, n_sc, n_modes);
                        (e, m, b, 0.0_f64)
                    }
                } else {
-                    diagonal_fallback(&self.link_stats, n_sc, n_modes)
+                    let (e, m, b) = diagonal_fallback(&self.link_stats, n_sc, n_modes);
                    (e, m, b, 0.0_f64)
                }
            } else {
-                diagonal_fallback(&self.link_stats, n_sc, n_modes)
+                let (e, m, b) = diagonal_fallback(&self.link_stats, n_sc, n_modes);
                (e, m, b, 0.0_f64)
            };
        // Compute variance explained using the same centered covariance as modes.
@@ -648,6 +664,7 @@ impl FieldModel {
            calibrated_at_us: timestamp_us,
            geometry_hash,
            baseline_eigenvalue_count: baseline_eig_count,
            baseline_noise_var,
        };
        self.modes = Some(field_mode);
@@ -794,7 +811,7 @@ impl FieldModel {
        // Marcenko-Pastur noise estimate: median of POSITIVE eigenvalues
        // in the bottom half. Excludes zeros from rank-deficient matrices
        // (common when n_subcarriers > n_frames, e.g. 56 subcarriers / 50 frames).
-        let noise_var = {
+        let local_noise_var = {
            let mut positive: Vec<f64> =
                eigenvalues.iter().copied().filter(|&e| e > 1e-10).collect();
            positive.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
@@ -807,6 +824,22 @@ impl FieldModel {
                return Ok(0); // All zero eigenvalues — can't estimate
            }
        };
        // Issue #942: anchor the noise floor to the calibration's noise_var
        // when it's available. Per-window noise_var drifts with sample size —
        // a short estimation window can produce a small local_noise_var that
        // inflates `significant` and breaks the test_estimate_occupancy_noise_only
        // invariant. The max of (calibration noise, local noise) keeps the
        // threshold from collapsing on small windows while still letting the
        // per-window noise dominate when it's the larger estimate. Falls back
        // to local_noise_var when baseline_noise_var == 0 (diagonal-fallback
        // calibration path, or pre-#942 stored modes).
        let noise_var = if modes.baseline_noise_var > 0.0 {
            local_noise_var.max(modes.baseline_noise_var)
        } else {
            local_noise_var
        };
        let ratio = n as f64 / count as f64;
        let mp_threshold = noise_var * (1.0 + ratio.sqrt()).powi(2);