ruvnet--RuView

mirror of https://github.com/ruvnet/RuView synced 2026-07-05 14:33:19 +00:00

Author	SHA1	Message	Date
rUv	e6f26e9ac9	docs(adr): deep review of the RuView npm surface — ADR-263/264/265 optimization strategies (#1229 ) * docs(adr): deep review of the RuView npm surface — ADR-263/264/265 optimization strategies ADR-263 — @ruvnet/ruview@0.1.0 harness review (O1–O9): - HIGH: claim-check CLI fails open on empty input (no --text/--file -> PASS exit 0) - HIGH: MCP stdio server head-of-line blocking (spawnSync verify/calibrate up to 600s) - MEASURED: optionalDependencies triple the cold npx install (4 pkgs/620kB/71 files vs 1 pkg/172kB/22 files with --omit=optional) for a path that never imports them - maxBuffer truncation, python -c port interpolation, version drift, duplicate skills, guardrail METRIC_TERMS substring false positives ('map'/'F1' — found by dogfooding claim-check on these very ADRs), zero CI ADR-264 — @ruvnet/rvagent@0.1.0 + @ruv/ruview-cli review (O1–O9), verified against the published registry tarball: - HIGH: exports.require -> dist/index.cjs which is never built nor published - MEASURED: 44 dead source-map files = 62,698B of the 188kB unpacked payload - stdio-only server described as dual-transport; mixed dot/underscore tool names; double Zod validation + hand-duplicated advertised schemas; 2-fd leak per training job; unbounded body in the unwired HTTP scaffold; dead detectCogBinary candidates; ruview bin-name collision ADR-265 — cross-cutting npm distribution strategy: npm-packages.yml CI matrix (test + pack-content/size gate + tarball-install smoke test), publish-from-CI-only with npm provenance, version single-sourcing from package.json, bin/namespace ownership (ruview bin belongs to @ruvnet/ruview), claim-check on package READMEs. Docs only — no runtime code changed. Index/CHANGELOG/CLAUDE.md/README counts updated. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01WrGfTGKv1oWZ6iwXZACULz * fix(npm): implement ADR-263/264/265 — harness fail-closed + async MCP, rvagent packaging/transport/naming, npm CI+provenance gate ADR-263 (@ruvnet/ruview 0.2.0), O1-O9: - claim-check fails closed on empty input (CLI exit 2, empty_text tool error) - MCP stdio server dispatches tools/call asynchronously (promise-based spawn); ping answers while a 3s fake verify runs — pinned by new e2e test - optionalDependencies dropped: cold npx installs exactly 1 package (MEASURED: was 4 pkgs/620kB/71 files via npm i in a clean prefix) - bounded rolling output tails replace spawnSync 1MiB maxBuffer - node_monitor port passed via sys.argv, never spliced into python -c source - serverInfo.version read from package.json; resources/prompts stubs - skills single-sourced: prepack sync script generates .claude/skills/ copies - which() = memoized dep-free PATH scan - tools underscore-canonical (ruview_claim_check, ...) + dotted aliases - guardrail precision: word-boundary map/f1/auc/iou, code-span + F1/O2 label scrubbing, quantitative-claims-only; packaging reproducer hints - 30/30 tests (was 17), incl. concurrency e2e + fail-open regression pins ADR-264 (@ruvnet/rvagent 0.2.0), O1-O9: - exports fixed: types-first, phantom dist/index.cjs require target removed - tarball map-free: 127,704B unpacked / 46 files / 0 maps (MEASURED, npm pack --dry-run; was 188kB incl. 44 maps referencing unshipped src) - Streamable HTTP actually wired behind RVAGENT_HTTP_PORT: one transport + one MCP server per session (mcp-session-id routing), 1MiB body cap (413), port-aware localhost origin gate; dual-transport description now true - tools renamed underscore-canonical with dotted router-only aliases - single Zod validation gate; advertised inputSchema generated from the same Zod source (zod-to-json-schema) - train_count: parent log fds closed (was leaking 2/job); job records persisted to <jobsDir>/<id>.json (job_status survives restarts); bounded log-tail reads - detectCogBinary probes its candidates instead of dead-coding them - version from package.json; @types/express dropped; @types/jest -> 29 - README rewritten to match reality (no phantom subcommands/policy layer) - 99/99 jest tests (incl. new session/body-cap suite + previously-broken manifest suite); stdio handshake + HTTP session flow smoke-tested live ADR-265 D1-D4: - .github/workflows/npm-packages.yml: 3-package x Node 20/22 gate — tests, version-literal grep (D3), pack-content/size gate, tarball-install smoke test (catches the ADR-264 F1 class), README claim-check (D4) - .github/workflows/ruview-npm-release.yml: publish from CI only with npm publish --provenance - @ruv/ruview-cli bin renamed ruview-cli (ruview bin belongs to @ruvnet/ruview); version single-sourced - ci.yml NODE_VERSION 18 -> 20 ADR statuses updated to Accepted/implemented; harness manifest re-pinned; ADR-263/264/265 + both package READMEs pass claim-check. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01WrGfTGKv1oWZ6iwXZACULz * perf(rvagent): lazy-load HTTP transport + memoize generated tool schemas stdio time-to-first-response ~242ms -> ~189ms (-22%; MEASURED, median of repeated initialize round-trips against dist/index.js in this container). - ./http-transport.js now imported lazily inside the RVAGENT_HTTP_PORT branch: it chain-loads the MCP SDK streamableHttp module (~48ms MEASURED via per-module import() timing) which the default stdio path never uses - toolInputJsonSchema memoized per tool: schemas are static for the process lifetime; under the session-per-server HTTP model every session calls tools/list, so stop re-walking the Zod tree each time No behavior change: 99/99 jest tests; HTTP session flow re-smoke-tested through the lazy import path (initialize -> 200 + mcp-session-id). Profiled @ruvnet/ruview too and left it alone: 50ms CLI startup vs ~29ms bare 'node -e ""' floor on the same box (MEASURED) — already near the interpreter floor with zero dependencies. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01WrGfTGKv1oWZ6iwXZACULz * ci(ruview-cli): pass jest --passWithNoTests so the private no-test package doesn't fail the npm-packages matrix Co-Authored-By: claude-flow <ruv@ruv.net> * fix(npm): address 10 verified review findings in harness + rvagent before 0.2.0 publish harness/ruview (@ruvnet/ruview): - guardrails: digit gate now sees numbers inside code spans; F1-style metric tokens followed by ':' or a nearby number are no longer scrubbed (fail-open regressions in the honesty gate) - mcp-server: tools/call requests serialize through a FIFO promise chain (hardware/mutating tools never overlap) while ping/tools/list stay immediate; stdin close drains in-flight responses before exit - tools: which() no longer memoizes negative lookups tools/ruview-mcp (@ruvnet/rvagent): - index: realpath invoked-directly guard — library import no longer connects a stdio transport to the consumer's process - http-transport: explicit allowedOrigins is exact-match only (localhost any-port convenience applies only with no configured allowlist); session map gains maxSessions=64 + 5min idle TTL sweep - train-count: job records persist the child pid and reconcile stale 'running' status after a server restart (exit-code marker or dead pid) - config: cog binary candidates ordered by process.arch .github/workflows/ruview-npm-release.yml: port the full ADR-265 D1 gate (version-literal check, unpacked-size budget, tarball-install smoke test) from npm-packages.yml so the publish path enforces what the header claims. Tests: harness 30→36, rvagent 99→112, all passing. Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-07-02 13:11:15 -04:00
rUv	1a0492992f	fix(build): repoint stale Makefile targets to v2/ and archive/v1/ (#1201 ) The root Makefile still referenced pre-reorg paths that no longer exist, so the documented build/test/run entrypoints were all broken: - rust-port/wifi-densepose-rs/ -> v2/ (build-rust, build-wasm[-mat], test-rust, bench, clean cargo step) - v1.src.api.main -> archive.v1.src.api.main (run-api, run-api-dev) test-rust now uses the documented `--no-default-features` invocation (the proven-passing, GPU-free path). Verified: `cd v2 && cargo test --workspace --no-default-features --no-run` compiles the full workspace clean. Surfaced during a metaharness review (the broken root build entrypoint is why genome reported build:none / publish_readiness 0.40 at the repo root). Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36	2026-06-27 18:43:09 -04:00
rUv	c0d3d7c792	chore(firmware): add release guard against stale-sdkconfig partition mismatch (#1194 ) While cutting v0.8.3-esp32, an incremental 8MB build reused a leftover generated `sdkconfig` and silently linked the 4MB dual-OTA partition layout (no spiffs, ota_1 @ 0x1F0000) — the would-be released `partition-table.bin` did not match the 8MB `partitions_display.csv` it claimed. scripts/firmware-release-guard.sh regenerates the expected partition table from the CSV the named flash-size variant must use and byte-compares it to the built `partition-table.bin`, and cross-checks flash size in flasher_args.json. Fails closed so a release pipeline can't ship a mismatched table. Usage: scripts/firmware-release-guard.sh <8mb\|4mb> <build-dir> Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36 v0.8.3-esp32	2026-06-27 13:21:05 -04:00
rUv	fca5e6f0a0	fix: multistatic canonicalization, csi_fps burst inflation, control-packet starvation (#1170 , #1180 , #1183 ) (#1193 ) #1170 — live multistatic bridge fed raw, un-canonicalized per-node CSI (64/128/192 bins) to MultistaticFuser, tripping DimensionMismatch every cycle and silently disabling fusion on mixed HT20/HT40 meshes. Add HardwareNormalizer::resample_to_canonical (resample-only, no z-score) and canonicalize every node frame onto the 56-tone grid before fusion. #1180 — update_csi_fps_ema only rejected dt<=0 or >=1s, so sub-ms UDP-burst arrivals (36us -> ~27kHz) inflated csi_fps_ema 40-840x. Add a 5ms plausibility floor and stop re-anchoring observe_csi_frame_arrival on burst deltas. #1183 — global ENOMEM backoff (CSI flood) starved <=48B/<=1Hz control packets. Add stream_sender_send_priority() bypassing the backoff gate without touching the streak; route feature_state/HEALTH/sync through it. Fix the misleading "HEALTH sent" log that printed even on rv_mesh_send failure. Verified: signal 501, sensing-server 677 tests (0 failed); firmware builds clean. Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36	2026-06-27 13:04:44 -04:00
rUv	7831f29436	fix(firmware): phantom LD2410 detection + ENOMEM backoff (#1135 ) (#1159 ) Bug #2 (root cause): LD2410 probe-detection matched only the 4-byte head 0xF4F3F2F1, so a floating UART at 256000 baud could phantom-detect a sensor and spawn a UART task. Now requires a full validated report frame (head + sane length + tail 0xF8F7F6F5), extracted to mmwave_detect.h and shared with a host unit test (test_mmwave_detect.c, 8 vectors) so firmware and test can't diverge. Matches the validate-before-trust approach used for MR60 in #1107. Bug #1: sendto ENOMEM used a fixed 100 ms backoff too short to drain sustained lwIP/WiFi buffer pressure, so a node could stay stuck. Now exponential (100->200->...->2000 ms per consecutive ENOMEM, reset on first successful send). Removing the phantom LD2410 task (bug #2) also removes the extra load that tipped the reporter's tier-2 node into the stuck state. Validated on ESP32-S3 QFN56 rev v0.2 (the reporter's silicon): tier-2 streams ~100 frames/s with no stuck ENOMEM and correctly reports no mmWave (no phantom). LD2410 predicate truth table proven (head-without-tail REJECTED). Could not reproduce the reporter's environment-specific floating-pin noise, so the deterministic proof is the host unit test. v0.8.2-esp32	2026-06-22 12:31:21 -04:00
rUv	4bf88e1283	feat(firmware): gate LED gamma viz behind CONFIG_LED_GAMMA_VIZ (ADR-183 follow-up) (#1129 ) The 40 Hz gamma flicker is now Kconfig-gated (default y, unchanged behaviour). Set CONFIG_LED_GAMMA_VIZ=n for a dark, lower-power boot (the LED is simply cleared) — important for photosensitive deployments, no source edit needed. The colormap saturation point is now operator-tunable via CONFIG_LED_MOTION_FULLSCALE_MILLI (default 250 = 0.25). Build + flash confirmed on ESP32-S3 N16R8 (COM8): default y still arms the gamma timer, CSI flows. ADR-183 updated (gate moved from follow-up to done).	2026-06-17 22:22:20 -04:00
rUv	a4c2935a2f	feat(firmware): onboard LED 40 Hz gamma stimulus + CSI-motion colour (ADR-183) (#1127 ) * chore(deps): bump ruv-neural submodule — ColorMap no_std for ESP32 Points to ruvnet/ruv-neural#3 (c9638fa): ruv-neural-viz::ColorMap now builds no_std, so it can run on the ESP32. Unblocks driving the onboard WS2812 from the viridis/cool-warm colormap. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(firmware): onboard LED as 40 Hz gamma stimulus, colour from live CSI motion (ADR-183) The S3 onboard WS2812 (GPIO 48, #962) now runs a GENUS-style 40 Hz gamma square wave (12.5 ms on/off, 50% duty). The ON-phase colour is live CSI motion (edge motion_energy) mapped through a 60-step viridis LUT generated from ruv-neural-viz::ColorMap::viridis() — still=purple, moving=yellow. Uses the now-no_std ColorMap (ruvnet/ruv-neural#3 / #1126). Hardware- confirmed on ESP32-S3 N16R8 (COM8): boot log shows the timer armed, CSI keeps flowing (27-38 pps). Honesty + photosensitivity notes + a Kconfig-gate follow-up are in ADR-183. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-17 21:04:02 -04:00
rUv	315d7df09e	chore(deps): bump ruv-neural submodule — ColorMap no_std for ESP32 (#1126 ) Points to ruvnet/ruv-neural#3 (c9638fa): ruv-neural-viz::ColorMap now builds no_std, so it can run on the ESP32. Unblocks driving the onboard WS2812 from the viridis/cool-warm colormap.	2026-06-17 20:18:35 -04:00
rUv	bdd1eaf927	chore: untrack ruvector.db runtime artifacts + gitignore at any depth (#1124 ) These are hook/runtime-generated databases (ruvector/intelligence store) that kept showing as dirty and don't belong in version control. Removed from the index (files kept on disk) and ignored globally.	2026-06-17 17:49:47 -04:00
rUv	4001e9e178	feat(harness): npx @ruvnet/ruview operator harness + ADR-182 (#1123 ) A host-portable RuView agent harness minted via MetaHarness and hardened per ADR-182. Published as @ruvnet/ruview@0.1.0 (bare `ruview` blocked by npm's typosquat filter → scoped fallback). What it does: - 6 fail-closed `ruview.*` tools (onboard, claim_check, verify, node_monitor, calibrate, node_flash) exposed as CLI verbs + a dependency-free MCP stdio server. - The "prove everything" rule made executable: `ruview.claim_check` flags untagged accuracy claims and the retracted "100%" framing. - 5 host-neutral skills (onboard/provision-node/calibrate-room/ train-pose/verify) + bundled .claude/ config + provenance manifest. Validated: 17/17 unit tests, live MCP handshake, `ruview.verify` ran the real verify.py to VERDICT: PASS, clean `npx @ruvnet/ruview` from registry. Packs to 16.7 kB / 21 files; kernel+host are optionalDependencies so the operator tools install lightweight. README: documented as the portable, multi-host companion to the in-repo plugins/ruview/ Claude Code plugin (not a replacement).	2026-06-17 17:46:31 -04:00
rUv	65e29ef47a	fix(display): no false display-detect on bare DevKit → CSI starves at MGMT-only (#1000 ) (#1121 ) The SH8601 QSPI panel is write-only, so display_hal_init_panel() 'succeeds' even on a bare display-less board — display_is_active() then returned true and main.c skipped the #893/#906 MGMT->MGMT+DATA CSI filter upgrade (yield=0pps). Gate on the FT3168 touch I2C readback (always present on the Touch-AMOLED board, absent on a bare DevKit): if touch is absent, the panel 'success' was a false-positive — bail to headless before the display task starts, so display_is_active() stays false and CSI captures. Co-authored-by: ruv <ruvnet@gmail.com> v0.8.1-esp32	2026-06-17 11:24:53 -04:00
rUv	cb30988cf9	fix(mmwave): require validated MR60 header on probe — no false detect on empty UART (#1107 ) (#1119 ) probe_at_baud counted bare 0x01 (SOF) bytes and declared MR60BHA2 on a single one. A floating UART1 with no sensor reads noise full of 0x01s → false 'Detected MR60BHA2 (caps=0x000f)'. Now a candidate must be a full 8-byte header with a valid header checksum (bytes 0..6) AND a known frame type (0x0A__ / 0x0F09), and clear the ≥3 threshold; removed the weak single-hit fallback. Real sensors stream valid frames continuously, so detection of present hardware is unaffected. Co-authored-by: ruv <ruvnet@gmail.com>	2026-06-17 11:24:23 -04:00
rUv	128b129474	Merge pull request #1120 from ruvnet/fix/1007-paired-data-pipeline fix(paired-data): 4 bugs in CSI recorder + ground-truth aligner (#1007)	2026-06-17 10:26:14 -04:00
ruv	15a983b555	fix(paired-data): 4 bugs corrupting/blocking camera-supervised training data (#1007 ) 1. record-csi-udp.py stamped LOCAL time with a 'Z' (UTC) suffix → camera/CSI disagreed by the UTC offset → 0 aligned pairs. Now writes true UTC via datetime.now(timezone.utc). 2. align-ground-truth.js kept empty-keypoint (non-detection) records at confidence 0, collapsing window avgConf below threshold → all windows rejected. Now skipped at load. 3. extractCsiMatrix silently zero-padded/truncated mixed-subcarrier frames. Now frames are filtered to the session's modal subcarrier count before windowing — never padded. 4. CSI/feature matrices are filled frame-major but were labeled shape [nSc, nFrames] — transposed. Labels corrected to [nFrames, nSc] / [nFrames, dim]. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-17 10:17:12 -04:00
rUv	c6e7667676	Merge pull request #1104 from ruvnet/fix/issue-1049-configurable-guard fix(sensing-server): make multistatic guard interval configurable (closes #1049)	2026-06-17 09:53:23 -04:00
rUv	d639c747df	Merge pull request #1114 from ruvnet/examples/through-wall-tools examples(through-wall): ESP32 sensor auto-detection + WiFlow analysis tools	2026-06-16 17:02:38 -04:00
ruv	42c764652d	examples(through-wall): ESP32 sensor auto-detection + WiFlow analysis tools - wiflow_browser.html: auto-detect live ESP32 nodes from the /ws/sensing stream and lock them as the model schema (NODE_IDS/CSI_DIM dynamic), persisted + restorable - wiflow_ab.py: leakage-controlled A/B (chronological/random/blocked-gap/grouped-bucket, multi-seed) — the honest CSI→pose evaluation harness - wiflow_capture.py / wiflow_train.py / wiflow_infer.py: camera-paired capture + train + infer - pose.html: live WiFi-inferred skeleton viewer; serve.py: static server - gitignore the regenerable 1.5MB model.npz artifact Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 17:00:57 -04:00
rUv	db02956c22	Merge pull request #1113 from ruvnet/chore/bump-ruv-neural-submodule chore: bump ruv-neural submodule to current main	2026-06-16 17:00:56 -04:00
ruv	c84ea39e62	chore: bump ruv-neural submodule → current main (web console, closed-loop neuromod, ruvn mention) Advances the vendored ruv-neural submodule from the stale 'Initial' pin (1ece3af) to current main (81be9e1): the static web console UI, the closed-loop neuromodulation platform, repositioned README, and the @ruvnet/ruvn companion-tool mention. ruv-neural is not a v2 workspace member, so this does not affect the workspace build (cargo metadata resolves clean). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 17:00:13 -04:00
rUv	760d05026c	Merge pull request #1112 from ruvnet/chore/extract-swarm-worldgraph-submodules Extract ruview-swarm → ruvnet/ruv-drone and world crates → ruvnet/worldgraph (submodules)	2026-06-16 16:49:58 -04:00
ruv	a784546918	ci(ruview-swarm): drop removed itar-unrestricted feature from test matrix The industrial rescope (ruv-drone) removed the itar-unrestricted feature flag — formation/allocation/raft/flight-control are now default capabilities. Update the 'ruflo+itar' matrix entry to just '--features ruflo' so CI matches the new feature set. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 16:34:06 -04:00
ruv	9c751d0d92	chore(worldgraph): bump submodule — README + metadata polish Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:52:34 -04:00
ruv	a13e9b66cb	chore: bump ruv-drone + worldgraph submodules (LICENSE + CI polish) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:43:10 -04:00
ruv	6db183bf3e	chore(swarm): bump ruv-drone submodule — README cleanup Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:35:06 -04:00
ruv	f65d0f79e7	chore(swarm): bump ruv-drone submodule (rescope + stray-file cleanup) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:28:30 -04:00
ruv	7fb3b88061	chore(swarm): bump ruv-drone submodule — industrial rescope (drop ITAR/USML gating) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:27:24 -04:00
ruv	aeac5f5543	chore(worldgraph): extract geo+worldgraph+worldmodel to ruvnet/worldgraph submodule - published as github.com/ruvnet/worldgraph (3-crate workspace, history via git-filter-repo) - replace the 3 in-tree crates with one submodule at v2/crates/worldgraph - parent workspace: drop the 3 members, exclude the submodule (it is its own workspace), repoint workspace.dependencies(worldmodel) + engine/sensing-server path-deps into it - cargo metadata resolves clean (geo/worldgraph/worldmodel consumed from the submodule) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:14:34 -04:00
ruv	c257e67c3d	chore(swarm): extract ruview-swarm to ruvnet/ruv-drone submodule - ruview-swarm published as github.com/ruvnet/ruv-drone (history preserved via subtree split) - replace the in-tree crate with a submodule at v2/crates/ruview-swarm (branch main) - standalone repo dropped the unused wifi-densepose-core path-dep; export-control NOTICE added there - workspace member path unchanged; cargo metadata resolves ruview-swarm from the submodule Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-16 14:03:56 -04:00
ruv	a4d5ea88f3	feat(examples): in-browser WiFlow trainer + camera-supervised pipeline + ADR-180/181/181A Tonight's real WiFlow work, all honest: - examples/through-wall/: live 2-node CSI demo (index.html), the WiFlow camera-supervised pipeline (wiflow_capture/train/infer.py — proven +9.4pp over mean-pose baseline on ruvultra), the live pose viewer (pose.html), and the COMPLETE in-browser trainer (wiflow_browser.html): 4-stage calibrate->capture->train->infer, TF.js WebGPU/WASM/WebGL, MediaPipe camera supervision, IndexedDB persistence, mean-pose-baseline honesty. - ADR-180 (through-wall hand-off demo), ADR-181 (full browser WiFlow, WASM+WebGPU, calibration phase, mobile/secure-context matrix), ADR-181A (binary CSI framing protocol). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 17:31:19 -04:00
ruv	ebe217569b	feat(examples): real live WiFi-CSI through-wall sensing demo Self-contained Three.js r128 demo at examples/through-wall/ that renders ONLY genuine data streamed from the running sensing-server over ws://localhost:8765/ws/sensing. No simulation, no fabricated frames, no fake skeleton. Renders, driven by real /ws/sensing frames: - 20x20 signal_field floor heatmap (real values) - coarse RF-localization puck from persons[0].position (labeled coarse, NOT pose; peak signal_field cell as fallback) - live motion/breathing/variance/rssi bars + motion sparkline - presence/confidence/estimated_persons/active-node/tick/Hz meters - 3D room with wall + doorway dividing office (node 9) / hallway (node 13) - honest mutually-exclusive banner: LIVE (source=esp32) / SIMULATED / NO SERVER, showing the real source verbatim - optional webcam tile (ground-truth-when-visible, separate from CSI) Reuses scene/lights/bloom/CSS + webcam path from examples/three.js/demos/05-skinned-realtime.html, the floor-heatmap idea from ui/observatory/js/, and the threaded no-cache server from examples/three.js/server/serve-demo.py (serve.py on :8080). Verified against the live server: real frame source=esp32, nodes [9,13], 400 signal_field values, persons[0].position present. Python proof PASS. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 16:20:49 -04:00
ruv	c27d6cc98e	fix(sensing-server): make multistatic guard interval operator-configurable (#1049 ) Two ESP32-S3 nodes on WiFi/ESP-NOW sync drift 10-150 ms (~70 ms typ.), exceeding the 60 ms default guard → permanent trust demotion to Restricted, all pose output suppressed, 200k+ errors, no escape but a container restart. Add a direct WDP_GUARD_INTERVAL_US override (+ optional WDP_SOFT_GUARD_US) to multistatic_guard_config_from_env. Precedence (most-specific wins): direct override > WDP_TDM_SLOTS+WDP_TDM_SLOT_US schedule-derived > 60ms/20ms default. Soft band always clamped strictly below hard; malformed/zero ignored (falls back, never breaks fusion). Effective guard logged at startup. Pinned by 6 tests (multistatic_guard_config_tests). sensing-server bin tests 449 -> 455, 0 failed. Python proof PASS, hash unchanged (off signal path). Closes #1049. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 13:41:43 -04:00
rUv	cafbeb1e81	fix(wasm-edge): sanitize non-finite host floats at the WASM↔host frame boundary (#1102 ) Closing beyond-SOTA security review of wifi-densepose-wasm-edge (ADR-040, ~70 edge modules). The two WASM↔host boundaries (lib.rs::on_frame/on_timer and bin/ghost_hunter.rs::on_frame) read raw IEEE-754 f32 from the csi_get_* imports with no finiteness check — the crate had zero is_finite/is_nan guards and its clamp helpers propagate NaN. A single non-finite host value latches NaN into long-lived per-module accumulators (EMA / Welford / phasor sums / anomaly baselines), after which detectors fail degraded (stuck gate state, silently-disabled checks) — silent corruption, not a crash. Add sanitize_host_f32() (non-finite -> 0.0, core-only for no_std) applied at every host_get_* float read: one chokepoint covering all downstream modules, mirroring the existing M-01 negative-n_subcarriers boundary clamp. LOW / defense-in-depth (the Tier-2 DSP firmware supplies the imports, a semi-trusted boundary). Pinned by boundary_tests::{sanitize_passes_finite_values_through, sanitize_maps_non_finite_to_zero, coherence_monitor_nan_latches_without_sanitize_but_not_with} — the last asserts on the current CoherenceMonitor that a raw NaN frame latches the smoothed score while the sanitized path stays finite. Other review dimensions attested clean with evidence (see CHANGELOG): no hot-path panics (all unwrap/expect are test-only or std-gated RVF builder), all bounds min()-clamped, all index-by-cast const-bounded or guarded, no leaking closures (no move\|\|/forget/leak), no secrets. Verified: host `cargo test --features std,medical-experimental` 672 passed / 0 failed (+3 new tests); all three wasm32-unknown-unknown release artifacts build clean (lib default no_std/panic=abort, ghost_hunter standalone-bin, medical-experimental); Python proof VERDICT PASS, hash unchanged.	2026-06-15 13:06:46 -04:00
rUv	c859f6f743	security(occworld-candle): int32-checkpoint crash + degenerate-input guards + ADR-179 (closes Milestone #9 ) (#1101 ) * fix(occworld-candle): security review fixes — int32 checkpoint crash + predict input validation Beyond-SOTA security + correctness review of wifi-densepose-occworld-candle (Milestone #9, crate 4/4 — the last ungated crate). Findings fixed: 1. HIGH (MEASURED) — checkpoint-load crash on any int32 tensor. model.rs mapped safetensors I32 -> candle DType::I64 and passed the raw int32 byte buffer (4 bytes/elem) to Tensor::from_raw_buffer(.., I64, ..). Candle derives elem_count = data.len() / dtype.size(), so the I64 path halved the count while keeping the original shape -> a tensor whose shape claims 2x its storage. Reading it PANICS (slice OOB: "range end index 6 out of range for slice of length 3") on any checkpoint containing an int32 tensor. Fixed: I32 -> DType::I32, I16 -> DType::I16 (both first-class candle dtypes). Reproduced on old code; pinned in tests/checkpoint_loading.rs. 2. LOW (MEASURED) — predict() lacked frame/batch validation at the input boundary. f_in > num_frames2 over-indexed the temporal embedding (cryptic candle "gather" error); zero frame/batch fed a zero-element tensor in. Now rejected with a clear ShapeMismatch. Pinned in tests/input_validation.rs. 3. LOW (MEASURED) — divide-by-zero panic in the public VQCodebook::encode on a rank-0 / empty-last-dim tensor (last == 0). Now fails closed with a clear error. Pinned in vqvae.rs unit tests. Dimensions confirmed clean with evidence: panic surface (no unwrap/expect/ panic in prod paths), NaN-state-poisoning (N/A — stateless engine, u8 input), unbounded-alloc/shape-data mismatch (defended upstream by safetensors:: validate), secrets (none). unsafe_code = forbid. Validation (MEASURED, Windows): crate 31/31 pass; workspace 0 failed (lone desktop api_integration "Access is denied" file-lock flake passes 21/21 in isolation); Python proof VERDICT PASS, hash f8e76f21…446f7a unchanged. Warrants ADR slot 179 (parent to author). Co-Authored-By: claude-flow <ruv@ruv.net> docs(adr): ADR-179 — occworld-candle checkpoint-load hardening (closes Milestone #9) Records the HIGH int32-checkpoint crash fix (I32→I64 dtype-widening → slice-OOB panic on load = DoS) + 2 LOW degenerate-input fixes from 5e77f47e5. Stateless engine (NaN-poisoning N/A), unsafe forbidden, safetensors validate() defends malloc upstream. occworld 31/31. Final ungated crate — Milestone #9 complete. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 12:35:29 -04:00
rUv	10c813fde3	security(desktop): IPC serial-command-injection + over-broad shell capability + ADR-178 (#1100 ) * fix(security): desktop IPC serial-command-injection + over-broad shell capability (ADR-178) Beyond-SOTA security review of wifi-densepose-desktop (Tauri v2). Two real findings, each MEASURED on Windows (crate builds + tests under --no-default-features): WDP-DESK-01 (MODERATE) — serial command injection via configure_esp32_wifi. The #[tauri::command] handler concatenated webview-supplied ssid/password into newline-terminated serial commands with no validation; a \r\n let a compromised webview inject an arbitrary follow-up firmware command (reboot/erase). Added validate_wifi_credentials() enforcing WPA2 length bounds and rejecting all control characters, called fail-closed before any serial write. Pinned by 3 new tests (rejects \r\n / \n / NUL injection, rejects out-of-range, accepts valid boundaries). WDP-DESK-02 (MODERATE) — removed unused shell:allow-execute / shell:allow-open from capabilities/default.json. The Rust backend spawns processes via std::process::Command (bypassing the allowlist) and the UI only uses dialog.open; the shell perms were unused privilege granting the webview arbitrary host command execution on compromise. Regenerated capabilities.json confirms only core:default + dialog perms remain. lib tests 18 -> 21 (+3 pins), integration 21 -> 21, 0 failed. Python deterministic proof unchanged (f8e76f21...46f7a; desktop off the signal path). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-178 — desktop IPC injection fix + capability least-privilege Records the 2 MEASURED MODERATE fixes in feddcde9d: WDP-DESK-01 (webview ssid/password \r\n-injected arbitrary firmware serial commands → validated fail-closed) and WDP-DESK-02 (unused shell:allow-execute/open capability granted to the webview → removed). 30-command IPC surface + capability scope audited; 6 dimensions clean-with-evidence. desktop 18→21. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 12:01:17 -04:00
rUv	20ad75f30c	feat(ADR-131): HOMECORE-UI dashboard + BFF gateway — review-fixed (supersedes #1082 ) (#1099 ) * feat(ADR-131): HOMECORE-UI operational dashboard + BFF gateway Complete two-tier Cognitum operator dashboard (ADR-131), served by homecore-server at /homecore, plus the single-origin BFF gateway that wires it to real backends. Front-end (zero-dep vanilla TS/JS + CSS, exact Cognitum design tokens): - All 10 panels (§4.1-4.10): dashboard, SEED fleet + detail, fleet map, entities (live WS subscribe_events, never polls), rooms, COGs, calibration wizard, events + automation builder, witness/audit, settings. - §6 UX invariants in code: first-class provenance, prominent stale/veto/ fragility, null(not-trained) vs withheld vs error, --mono everywhere, Hailo vs CPU COG distinction. - api.js calls the gateway routes in production; mock demoted to a dev-only ?demo=1 fixture (no mock in prod); typed error states. - Tests under plain node: import-graph, boot, render-smoke (22), interaction (3), prod-errors (13) — 5 files green; bundle ~137 KB (~37x smaller than HA), <2 ms/cold-render. BFF gateway (homecore-server/src/gateway.rs, compiled + tested on Rust 1.89): - /api/cal/* reverse-proxy to the calibration API (ADR-151). - GET /api/homecore/rooms with the RoomState adapter (breathing->breathing_bpm, heartbeat:null->heart_bpm:null, injected anomaly.threshold/room_id). - GET /api/homecore/cogs supervisor over /var/lib/cognitum/apps/. - GET /api/homecore/appliance from /proc + TCP service probes. - SEED-device/appliance routes return typed 503 upstream_unavailable. - cargo test -p homecore-server = 12/12; run live (curl-verified); fixed a real double-v1 proxy-URL bug found during live testing. Honest scope: W1/W2/W4/W6-appliance functional; W3/W5/W6-Hailo/federation return typed 503 (depend on services/hardware not in this repo). Co-Authored-By: claude-flow <ruv@ruv.net> * fix(homecore-ui): resolve code-review findings — SSRF guard, CORS/trace coverage, §6 honesty, crash guards Addresses the high-effort review of PR #1082: - SECURITY: cal_proxy rejects path-traversal/confused-deputy SSRF (`.`/`..` segments, backslash, %2e%2e/%2f, absolute) on raw+decoded forms → 400, before attaching the server-side calibration bearer. - CORRECTNESS: /api/homecore/* + /api/cal/* now covered by the shared CORS allowlist (build_cors_layer, exported from homecore-api) + TraceLayer — previously merged outside router()'s layers (no CORS, no tracing). - §6 HONESTY (no fabricated data): dashboard renders '—' for null metrics (not "null%"/"null°C"); cogs Hailo pill reflects the REAL appliance probe (not hardcoded "connected"); room anomaly threshold passed through / null, not a fabricated 0.5. - ROBUSTNESS: cogs asArray(hef) guards a non-array manifest field; calibration progress guards target<=0 (no NaN%/Infinity%); restart clears the poll timer. - CLEANUP: mock.js is now a cached DYNAMIC import (demo-only) — never bundled in production (§2.2). - New ui/tests/unit-fixes.mjs pins the above; ADR-131 + CHANGELOG updated. Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Nick Ruest <127058086+nicholas-ruest@users.noreply.github.com>	2026-06-15 11:11:19 -04:00
rUv	1df6d1e1ee	security(nvsim): guard degenerate input — config panic + NaN silent-corruption + ADR-177 (#1098 ) * fix(nvsim): guard degenerate input — config-induced panic + NaN-state poisoning Beyond-SOTA security review of the ADR-089 NV-diamond simulator (milestone #9, crate 2 of 4). Two real degenerate-input findings, each pinned fails-on-old: NVSIM-DT-01 (config panic/DoS, pipeline.rs): an external f_s_hz == 0 made dt == +Inf, dt_us saturated to u64::MAX, and `sample * dt_us` panicked with "attempt to multiply with overflow" at sample >= 2 (debug/WASM panic=abort; garbage t_us in release). Fix: sanitise dt (non-finite/non-positive -> 1 µs fallback), cap the u64 cast, and saturating_mul the timestamp. NVSIM-NAN-01 (NaN-state poisoning, digitiser.rs): a non-finite scene parameter (NaN dipole position / Inf moment / NaN loop radius) bypasses the near-field clamp (NaN < R_MIN_M is false) and yields a NaN field; at the ADC `NaN as i32` == 0 silently emitted b_pt=[0,0,0] with ADC_SATURATED CLEAR — indistinguishable from a legit zero-field reading. Fix at the funnel: adc_quantise treats any non-finite input as out-of-range -> clamps to code 0 AND raises the saturation flag, so the corruption is visible downstream. Determinism integrity, panic-free MagFrame deserialisation, and RNG seeding confirmed clean with evidence. The published cross-machine witness (cc8de9b0…93b4) is unchanged — guards only affect degenerate inputs. cargo test -p nvsim --no-default-features: 50 -> 53 passed, 0 failed. Workspace green; Python deterministic proof unchanged (f8e76f21…46f7a, nvsim off the signal proof path). Needs ADR slot 177. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-177 — nvsim degenerate-input hardening Records the 2 MEASURED MEDIUM fixes in `37764be55` (NVSIM-DT-01 config-induced overflow panic / WASM-abort DoS; NVSIM-NAN-01 non-finite scene param → silent fake zero-field reading with saturation flag clear) + 3 pins, and the clean-with-evidence determinism/deser/div-by-zero verdict. Cross-machine witness cc8de9b0…93b4 reproduces unchanged. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 10:55:04 -04:00
rUv	4a083999e5	security(ruview-swarm): fail-closed on NaN/Inf at the swarm-comm trust boundary + ADR-176 (#1096 ) * fix(ruview-swarm): fail-closed on NaN/Inf at swarm-comm trust boundary (ADR-148) Beyond-SOTA security review of the ADR-148 drone swarm control plane found four IEEE-754 NaN/Inf fail-open / DoS bugs on data crossing the untrusted swarm-comm boundary (receive_peer_state / receive_peer_detection accept full DroneState/CsiDetection whose f64/f32 fields deserialize with no finite-check). - HIGH: failsafe::tick collision-avoidance + battery checks fail-open on NaN (NaN < threshold == false silently disabled collision avoidance / kept a NaN-battery drone Nominal). Now fails closed to EmergencyDiverge / RTH. - MED: geofence::check NaN-altitude bypass returned Safe through the point-in-polygon path. Now leading non-finite-coordinate guard -> HardBreach. - MED/DoS: antijamming FhssRadio panicked with "% 0" on an empty deserialized channels_mhz. Now len==0 early-returns (benign 0.0 sentinel). - LOW: multiview::fuse propagated a NaN victim_position into the fused "confirmed victim" location. Now requires finite confidence + position. Each fix pinned by a fails-on-old / passes-on-new test (MEASURED: old code returned Nominal/Safe or panicked). cargo test -p ruview-swarm --no-default-features: 117 -> 123 passed, 0 failed. Workspace green; Python deterministic proof unchanged (f8e76f21...46f7a, off the signal path). Documented-not-fixed (ADR slot 176): Raft AppendEntries lacks Log-Matching consistency check (topology/raft.rs); MavlinkSigner::verify uses non-constant -time tag compare + no replay-window rejection (already doc-flagged). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-176 — ruview-swarm NaN-fail-open safety review Records the 4 MEASURED fail-open safety bugs fixed in `f671000d7` (collision avoidance, battery RTH, geofence, anti-jamming %0 panic — all NaN/Inf defeating a safety comparison at the swarm-comm trust boundary) + 6 pins, 5 clean-with-evidence dimensions, and the 2 genuine issues deferred to a focused follow-up (Raft AppendEntries log-matching; MAVLink signer constant-time + replay window). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 09:55:40 -04:00
rUv	0f64d23516	feat(bench): int8 quantization of WiFlow-STD half pose model — MEASURED trade-off (ADR-175, honest negative) (#1095 ) Sub-deliverable 8.2 of the benchmark/optimization milestone. Quantizes the 843,834-param "half" WiFlow-STD pose model (half_best.pth) to int8 two ways and MEASURES the accuracy/size trade-off vs fp32 under ONE locked normalization (ADR-173 torso-diameter PCK, upstream calculate_pck use_torso_norm=True), on the same seed-42 file-level 70/15/15 test split that produced the fp32 sweep numbers. MEASURED on ruvultra (RTX 5080, torch 2.11.0+cu128, fbgemm; clean test, torso-PCK): fp32 96.62% pck@20 99.47% pck@50 0.008981 mpjpe 3.351 MB int8 PTQ static 40.98% pck@20 94.98% pck@50 0.038262 mpjpe 1.046 MB (-55.64pp) int8 QAT (3 ep) 67.48% pck@20 98.69% pck@50 0.026548 mpjpe 1.043 MB (-29.15pp) Verdict (honest no): int8 is NOT a win at the strict PCK@20 edge target. Static PTQ collapses; QAT recovers a large share but still loses 29 pp @20 for a 3.2x size win — keep fp32/fp16 on the edge. Disclosed: QAT fake-quant val pck@20 was 83.45% but converted int8 scores 67.48% (~16pp convert_fx gap, reported honestly). Deliverables: - v2/crates/wifi-densepose-train/scripts/quantize_half_int8.py (reproducible: header carries the exact ssh command + run date; QAT primary, static PTQ fallback) - docs/adr/ADR-175-int8-quantization-half-pose-model-measured.md (MEASURED table, locked normalization, QAT-vs-PTQ labeling, verdict, reproduction, limitations) - CHANGELOG [Unreleased] ### Added entry No production Rust or signal-pipeline change. Python deterministic proof unchanged (f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a, bit-exact).	2026-06-15 09:16:22 -04:00
rUv	b209b8b778	ci(bench): compile-verify regression gate for v2 criterion benches + ADR-174 (#1094 ) * ci(bench): wire v2 criterion benches into CI as a compile-verify regression gate Sub-deliverable 8.3 of the benchmark/optimization milestone (needs ADR slot 174). The v2/ workspace ships 26 criterion benches across 18 crates, but benches are not part of `cargo test`, so nothing in CI compiled them and they silently rot when a public API they call changes. Add `.github/workflows/bench-regression.yml`: - bench-compile (HARD GATE): `cargo bench --workspace --no-default-features --no-run` compiles + links every default-feature bench (no measurement) plus the cir-gated cir_bench — a real, deterministic regression guard against bench bit-rot. - bench-fast-run (INFORMATIONAL, continue-on-error, never gates): runs a curated pure-CPU subset (nvsim, ruvector sketch/fusion) in criterion quick-mode and uploads logs as an artifact. No timing-regression gate, by design: wall-clock on shared GitHub runners varies 2-3x run-to-run, so a hard threshold or cross-runner `criterion --baseline` compare would manufacture false failures. The honest scope is compile-verify + informational-run; the workflow header documents the self-hosted-runner condition under which true timing-gating becomes honest. The crv-gated crv_bench is excluded because its crates.io dep ruvector-crv 0.1.1 fails to build upstream. Running the gate immediately caught one already-bit-rotted bench: wifi-densepose-mat/detection_bench failed to compile (E0063: missing field last_rssi in SensorPosition). Fixed (last_rssi: None) and re-verified. Validation (MEASURED): mat detection_bench + cir_bench + nvsim + ruvector + vitals + swarm benches compile under --no-default-features; fast subset runs; `cargo test -p wifi-densepose-mat --no-default-features` 174 passed / 0 failed; Python proof PASS, hash f8e76f21...46f7a unchanged. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-174 — CI bench-regression compile-verify gate Records sub-deliverable 8.3 (bench-regression.yml, committed `c4c59e085`): a hard compile-verify gate over all 26 v2 criterion benches (caught + fixed one real bit-rotted bench, mat/detection_bench E0063) + an informational fast-run. Documents the honest scope — no timing-regression gate, since shared-runner wall-clock varies 2-3x; states the self-hosted-runner condition under which timing gating becomes honest. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 08:26:38 -04:00
rUv	90a88ada9a	feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092 ) * feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4) identifies metric ambiguity as the single biggest threat to any beyond-SOTA claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization. The project was retracted twice over this (a withdrawn 92.9% used absolute pixels, not torso). New src/accuracy.rs makes the normalizer explicit, selectable, and carried with every reported number: - PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip), BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t) (retracted convention, reproducible + clearly non-comparable). - pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the metrics_core geometric primitives (no duplicate kernel). - mpjpe(pred, gt, vis) — 2D/3D, mm. - PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK number is structurally impossible. 17 hand-computed deterministic tests (no GPU, no datasets) prove the harness arithmetic, including the key proof that identical predictions score 0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never false-perfect). This is measurement infrastructure, NOT an accuracy claim. Public API worth an ADR — needs ADR slot 173 (parent to write). wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace green (exit 0); Python deterministic proof unchanged (f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness Documents the accuracy harness (committed `3a8b2ed13`) that resolves the PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief (#1090): three historical numbers (96/81.6/61) used three unstated normalizations. The harness makes normalization explicit + selectable (PckNormalization enum) and every reported number carries its definition. Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-15 00:41:02 -04:00
rUv	cfd0ad76cf	security(core,cli): pin CSI-deserialiser DoS-resistance + ADR-172 (clean-with-evidence) (#1091 ) * test(core,cli): pin DoS-resistance of CSI deserialisers (ADR-127 security review) Beyond-SOTA security review of wifi-densepose-core + wifi-densepose-cli. Load-bearing-question verdict: the NaN-state-poisoning bug class does NOT originate in core — core exposes no stateful accumulator (no Welford, von-Mises, IIR, voxel grid, running mean); each downstream crate rolls its own, so each fix is correctly local. Both crates confirmed clean on every reviewed dimension (panic-on-adversarial-input, NaN handling, unbounded memory, path traversal, secrets) — no production code changed. Adds 4 regression pins locking in two existing-but-untested DoS guards: - core: from_canonical_bytes shape guard (Vec::with_capacity bound) — proven to fail with `capacity overflow` when the saturating-mul guard is removed. - core: canonical decoder never panics on arbitrary/truncated bytes. - cli: parse_csi_packet rejects an oversized n_antennasn_subcarriers claim before Array2 allocation (33 MB claim in a 2 KB datagram -> None). - cli: parse_csi_packet never panics on arbitrary UDP bytes. core: 35 -> 37 lib tests; cli: 24 -> 26 tests; 0 failed. Python proof unchanged (f8e76f21…46f7a — off the signal path). Co-Authored-By: claude-flow <ruv@ruv.net> docs(adr): ADR-172 — wifi-densepose-cli + core CSI-deserialiser security review Records the clean-with-evidence verdict + 4 DoS-resistance regression pins (test-only, committed in `a1051607d`). Documents the load-bearing finding: the NaN-state-poisoning bug class does NOT originate in a shared core primitive (core exposes no stateful accumulator — MEASURED via grep), so the 3 prior downstream-local fixes are complete. Gives the wifi-densepose-cli review its own ADR slot (core portion cross-refs ADR-127 §9). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 23:58:09 -04:00
rUv	71e8756051	docs(research): SOTA evidence brief for nn/train benchmark ADR (#1090 )	2026-06-14 23:32:58 -04:00
rUv	5287497a4a	security(homecore-migrate): redact secret value from malformed secrets.yaml error (#1089 ) * fix(homecore-migrate): redact secret value from malformed secrets.yaml error (secret-leak) `read_secrets` wrapped serde_yaml's parse error into `MigrateError::YamlParse { source }`. serde_yaml's message for a typed-tag coercion failure embeds the offending scalar verbatim, e.g. `invalid value: string "<the-secret-value>"`. That error propagates out of `read_secrets`, is `?`-returned by the `InspectSecrets` CLI path in main.rs, and printed to stderr by anyhow — leaking a secret value despite the CLI's deliberate `<redacted>` design. Fix: secrets.yaml parse failures now map to a new redacting variant `MigrateError::SecretsParse { path, line, column }` that carries only the file path and a coarse location (from `serde_yaml::Error::location()`), never the scalar content. Other (non-secret) YAML files keep `YamlParse`. Pinned by `secrets::tests::malformed_secrets_error_never_contains_secret_value` (asserts the rendered error AND its full #[source] chain never contain the secret value; fails on the old `YamlParse` path) plus `malformed_secrets_error_reports_location` (still fail-closed + locatable). ADR-165 secret-handling rule: a secret value must never appear in output. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore-migrate): record secret-leak fix in ADR-165 + CHANGELOG Note the secrets.yaml error-redaction fix and the review's clean dimensions (read-only source / no traversal / no panic / fail-closed versioning / no injection) in ADR-165 §2.4, bump the test-evidence count 19→21 in §2.6, and add an [Unreleased] Security entry to CHANGELOG. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 23:09:55 -04:00
rUv	bf1dfe79fd	fix(homecore core): TOCTOU race dropped/reordered state_changed events under concurrent writers (~93k→0) + 2 fail-closed hardenings (#1087 ) * fix(homecore): atomic state set — close TOCTOU lost/reordered state_changed events StateMachine::set did get() (release shard lock) → compute next + no-op decision → insert() (re-acquire lock) → send(). The read-modify-write was not atomic w.r.t. a concurrent writer on the same entity: a writer that read a stale `old` could mis-classify a real transition as a no-op and drop its state_changed event (a missed automation trigger) or fire an event whose new_state duplicated the previously delivered one (a spurious trigger for any automation keyed on old_state != new_state). ADR-127 §2.1 promises "writer atomically replaces the map entry"; the implementation did not. Fix: hold the DashMap shard write-lock across the whole read→decide→insert→ fire sequence via entry()/insert_entry(). tx.send is non-blocking, non-async, and never re-enters the map, so firing under the shard lock cannot deadlock and keeps global event order in lock-step with global commit order. Pinned by concurrent_set_fires_no_duplicate_adjacent_events: 4 writers toggling one entity A/B; asserts no two consecutive fired events carry the same new_state (impossible under correct serialisation). Fails reliably on the old code (~365-476 duplicate-adjacent events on the first trial), passes on the fix across repeated runs. Co-Authored-By: claude-flow <ruv@ruv.net> * harden(homecore): bound entity_id length — close memory-DoS at the REST boundary homecore-api/src/rest.rs parses untrusted path segments straight through EntityId::parse (get/delete/set_state). With no length cap, an otherwise-valid id like "a." + many MB of [a-z0-9_] was accepted; a POST /api/states/<giant> would persist it into the DashMap state store, permanently growing memory (amplification across distinct ids). Fix: reject ids longer than MAX_ENTITY_ID_LEN (255, HA-compatible) up front in parse(), before any per-char scan, with a new EntityIdError::TooLong. Fails closed at the boundary type so every caller (REST, registry deserialize, automation) is protected. Pinned by entity_id_length_boundary: exactly-MAX accepted, MAX+1 rejected, 4 MiB id rejected as TooLong. Fails on old code (oversized parses Ok). Co-Authored-By: claude-flow <ruv@ruv.net> * harden(homecore): isolate panicking service handlers (catch_unwind) ServiceRegistry::call already ran handlers outside the registry lock (the Arc<dyn ServiceHandler> is cloned out of the read guard first), so a panic could never poison the RwLock or block other callers — good. But a panicking handler unwound through call() into the caller's task; the task driving the engine (e.g. an axum request handler invoking a service) could be aborted by one buggy integration. Fix: wrap the handler future in AssertUnwindSafe + FutureExt::catch_unwind and convert a panic into ServiceError::HandlerPanicked. Mirrors HA isolating service-handler exceptions. The registry stays fully usable afterwards. Pinned by panicking_handler_is_isolated_and_registry_survives: the panicking call returns HandlerPanicked (not an unwind), a sibling healthy service still returns its value, and the bad service remains registered. Fails on old code (the await point panics instead of returning Err). Co-Authored-By: claude-flow <ruv@ruv.net> * test(homecore): pin event-bus lag safety (bounded broadcast, no DoS) Documents-with-evidence that the core EventBus does NOT have the homecore-api WS broadcast-lag failure: with EVENT_CHANNEL_CAPACITY=4096, firing 3x capacity while a subscriber never drains keeps fire_* non-blocking (publisher never waits on slow receivers), gives the slow receiver a recoverable Lagged(n) (drop-oldest + re-sync) rather than a closed channel, and leaves the bus live for a fresh fast subscriber. No code change — pins the clean dimension. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore): record ADR-127 §9 security+concurrency review + CHANGELOG Documents the three pinned fixes (HC-RACE-01 state-set TOCTOU, HC-EID-LEN-01 entity_id memory-DoS, HC-SVC-PANIC-01 service-handler isolation) and the clean dimensions (bounded event-bus lag handling, lock discipline / no lock-across-await, no panic-on-input) with their evidence. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 22:28:05 -04:00
rUv	9b126e927e	harden(assist security): bound untrusted utterance (DoS); cmd-injection/ReDoS/NaN/fail-open all proven clean with evidence (#1086 ) * fix(homecore-assist): bound untrusted utterance length, fail closed (ADR-133 security) The intent recognizers accept utterances from untrusted callers (voice transcripts, the WebSocket `assist` command). Neither the regex nor the semantic path bounded utterance length, so a pathological multi-megabyte utterance forced an unbounded `to_lowercase()` clone plus a per-registered- pattern scan (and, in the semantic path, full tokenisation + feature-hash embedding) — an allocation/CPU amplification on attacker-controlled input. The `regex` crate is linear-time (no catastrophic backtracking), so this was a throughput/memory DoS rather than a hang, but it was still unbounded. Fix: introduce MAX_UTTERANCE_BYTES (4 KiB — far above any real spoken command) and check it at both recognizer boundaries BEFORE any allocation or scan. An over-length utterance fails closed: Ok(None) (no intent, no action), identical to an unrecognised phrase. No legitimate command is affected. Pinned by fails-on-old tests: - recognizer::over_length_utterance_fails_closed — an over-length utterance that contains a valid command resolves to None (would have matched before) - semantic_recognizer::over_length_utterance_fails_closed_semantic Co-Authored-By: claude-flow <ruv@ruv.net> * test(homecore-assist): pin clean security dimensions with evidence (ADR-133) Adds regression tests documenting the dimensions reviewed and found clean, so the properties cannot silently regress: - runner: no subprocess surface exists. RufloRunnerOpts.{script_path,env} are inert and never executed; even a hostile script_path/env spawns nothing. And the entity_id capture class [a-z0-9_ .] strips every shell metacharacter, so a resolved slot can never carry ; \| & $ ` / etc into a (future) argv — sanitisation by construction. (shell_metachars_never_survive_into_a_resolved_slot, runner_opts_are_inert_no_process_spawned) - recognizer: the regex crate is a linear-time finite automaton; a classic catastrophic-backtracking shape (a+)+$ on adversarial input completes in bounded time — no ReDoS. (pathological_backtracking_pattern_completes_in_bounded_time) - embedding: embeddings are structurally finite (FNV feature-hash + guarded L2 normalise, no external float input, no unguarded division), so a crafted utterance cannot inject NaN/Inf to poison cosine k-NN; cosine against the zero vector is a finite 0.0, never NaN. (embeddings_are_structurally_finite, cosine_with_zero_vector_is_finite_not_nan, empty_utterance_against_empty_index_no_panic_no_match) - pipeline: injection-shaped utterances never deliver a metacharacter into a service call; the worst case resolves to a clean entity token, and an unrecognised utterance fails closed to not_understood (no action). (pipeline_injection_shaped_utterance_carries_no_metachars_to_service) Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore-assist): record ADR-133 security review (HC-ASSIST-01 + clean dims) CHANGELOG [Unreleased] Security entry + ADR-133 section 6 review notes for the homecore-assist voice/intent pipeline review. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 21:34:38 -04:00
rUv	41bee64593	fix(recorder): bound history query (memory-DoS) + add missing transactional purge (disk-DoS); SQL-injection & NaN dims clean (#1084 ) * fix(homecore-recorder): bound history query + add transactional purge (memory-DoS + disk-DoS) Security review of the HA-compat state recorder (ADR-132) found two real bounding bugs; SQL-injection and NaN-index dimensions confirmed clean. (1) Memory-DoS: get_state_history carried no LIMIT — a wide [since,until] window over a high-frequency entity loaded an unbounded row set into a single in-memory Vec. Added LIMIT MAX_HISTORY_ROWS (1,000,000); the sibling search paths were already k-bounded. (2) Disk-DoS / documented-but-missing purge: README advertised Recorder::purge(older_than) but no retention path existed -> unbounded disk growth. Added a transactional purge with an EXCLUSIVE cutoff (idempotent, no off-by-one) that deletes old states+events and garbage-collects orphaned state_attributes blobs (dedup-shared blobs are kept until their last referencing state is gone). All three deletes run in one transaction so a mid-purge failure rolls back cleanly. Pinning tests (homecore-recorder 19->25 no-default / 25->31 ruvector, 0 failed): - malicious_entity_id_is_stored_literally_not_executed (SQL injection) - like_metacharacters_in_query_are_literal_not_wildcards (LIKE escape) - history_query_carries_a_limit_clause (memory-DoS bound) - purge_keeps_boundary_row_and_drops_older (exclusive-cutoff, true pin) - purge_gcs_orphaned_attributes_but_keeps_shared (dedup-safe GC) - purge_also_removes_old_events No behaviour change beyond the two fixes. Python deterministic proof unchanged (recorder is off the signal proof path). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore-recorder): record ADR-132 security review findings Add a "3a. Security review" section to ADR-132 and a CHANGELOG [Unreleased] Security entry covering the homecore-recorder review: SQL-injection and NaN-index dimensions confirmed clean with evidence (every query bound; LIKE pattern bound+escaped; SHA-256->i32->f32 embeddings always finite, empty index/k=0 probed no-panic), plus the two fixes (unbounded history LIMIT, transactional exclusive-cutoff purge with orphan-attribute GC). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 21:00:52 -04:00
rUv	5bc3b634b7	fix(automation security): template-bomb DoS (100MB/11s render → fuel-bounded, HIGH) + delay panic-on-config (MEDIUM) (#1083 ) * fix(homecore-automation): bound template render to stop unbounded-expansion DoS (HC-SEC-01) A `template:` condition / value_template comes straight from user automation config and was rendered with MiniJinja's default (no instruction budget, no output cap). A single condition such as `{% for i in range(5000) %}{% for j in range(5000) %}xxxx{% endfor %}{% endfor %}` rendered a 100 MB string over ~11 s on one render call (proven empirically) — a CPU/memory denial of service, the bfld-class "unbounded expansion". Fix: - Enable MiniJinja's `fuel` feature and set a per-render instruction budget (`set_fuel(Some(1_000_000))`). A nested loop burns one unit per iteration, so the budget caps total work regardless of nesting; the attack now fails fast (~90 ms) with "engine ran out of fuel". - Reject template sources over 64 KiB before compilation (defense in depth so a pathological literal can neither compile nor emit verbatim). Legitimate HA templates (a few dozen instructions) are unaffected. Tests (fail on old — unbounded render / no rejection): - nested_loop_template_is_bounded_not_unbounded_dos - single_huge_repeat_template_is_bounded - oversized_template_source_is_rejected - legitimate_template_still_renders_within_fuel (no regression) Co-Authored-By: claude-flow <ruv@ruv.net> * fix(homecore-automation): stop crafted delay/timeout from panicking the run task (HC-SEC-02) `Action::Delay { seconds }` and `Action::WaitForTrigger { timeout_seconds }` fed the user-supplied float straight into `Duration::from_secs_f64`, which PANICS on negative, NaN, infinite, or overflowing inputs. All of those are reachable from a crafted (or simply typo'd) automation YAML — `delay: {seconds: -1}`, `.nan`, `.inf`, `1e308` — so one hostile config aborts the spawned automation task with a panic ("cannot convert float seconds to Duration: value is negative", proven empirically). Fix: a `safe_duration_from_secs` guard that saturates instead of panicking, matching Home Assistant's lenient "non-positive delay = no delay": - NaN / ±inf / negative -> Duration::ZERO - absurdly large (would overflow) -> clamped to ~100 years (MAX_DELAY_SECS) Tests (fail on old — panic = failure): - delay_negative_seconds_does_not_panic - delay_nan_seconds_does_not_panic - delay_infinite_seconds_does_not_panic - wait_for_trigger_negative_timeout_does_not_panic - safe_duration_saturates_hostile_values (incl. overflow clamp) Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore-automation): record HC-SEC-01/02 security review (CHANGELOG + ADR-129 §8a) Document the two DoS findings (template unbounded-expansion HC-SEC-01, delay panic-on-config HC-SEC-02) and the dimensions probed clean (condition fail-closed, bounded run-modes, sandboxed read-only templates). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 20:22:07 -04:00
rUv	e1f4897269	fix(geo numerical): parse_hgt underflow/inf-grid (HIGH) + haversine asin-NaN; pointcloud confirmed-robust (NaN-poisoning class, 3rd find) (#1081 ) * fix(geo numerical robustness): parse_hgt underflow panic + haversine asin-domain NaN Targeted numerical-robustness audit of wifi-densepose-geo (ADR-154-class sweep). Two real bugs, each pinned by a fails-on-old test: 1. terrain.rs parse_hgt — usize underflow panic on degenerate input. `side = sqrt(n_samples)`; for empty / sub-2x2 buffers side <= 1, so `1.0 / (side - 1)` underflows `usize` (panic "attempt to subtract with overflow" in debug; wraps to a huge value in release → garbage/inf cell_size_deg that poisons every ElevationGrid::get). A truncated HTTP body or a 404 HTML page reaches parse_hgt. Now bails with a clear error when side < 2. 2. coord.rs haversine — asin domain overflow → NaN for (near-)antipodal points. Floating rounding can push `h.sqrt()` to 1.0 + ~4e-16, and `asin(>1)` is NaN (verified: pair (-44.4994,-178.95722)→(44.49939999, 1.04278001) yields h=1.0000000000000004). A NaN distance silently breaks all downstream `<`/`>` comparisons. Clamp into [0,1] before asin. Also pins the ±90° pole-singularity (cos(lat)=0 division) as no-panic; the ENU transform itself is unchanged (no behavior change for valid inputs). Tests: wifi-densepose-geo 9→15 lib (6 new), 8 integration unchanged. 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * test(pointcloud robustness): pin NaN-state-poisoning resistance + degenerate voxel fusion Numerical-robustness audit of wifi-densepose-pointcloud. No bug found — the crate is confirmed-robust against the proven NaN-state-poisoning class that bit calibration/vitals. This adds regression pins documenting why: 1. csi_pipeline.rs — persistent auto-accumulating state (occupancy EMA, vitals) is provably self-healing. The UDP parser only emits finite amplitudes/phases (sqrt/atan2 of i8), and even an adversarial hand-built CsiFrame with NaN/inf amplitudes+phases cannot latch non-finite state: motion_score = (NaN/100).min(1.0) → 1.0; breathing path → 0 → clamp(5,40) → 5.0; tomography EMA uses only integer rssi. The new test injects 40 poisoned frames and asserts occupancy/vitals stay finite AND the pipeline recovers to an in-range estimate afterward — so a future refactor that drops a `.min`/`.clamp` self-heal would fail this pin. 2. fusion.rs — fuse_clouds voxel averaging is div-by-zero-safe (per-voxel count >= 1 by construction). Pins empty / single-point / all-coincident inputs as no-panic with finite output. No behavior change. Tests: wifi-densepose-pointcloud 18→22 (4 new), 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(geo/pointcloud robustness): CHANGELOG + ADR-154 sibling-crate sweep note Record the wifi-densepose-geo + wifi-densepose-pointcloud numerical-robustness audit under CHANGELOG [Unreleased] → Fixed, and a sibling-crate-extension note on the ADR-154 horizon ledger (these crates are outside ADR-154's signal scope but the sweep is the same ADR-154 class). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 19:37:08 -04:00
rUv	9f80b66ae3	harden(cog-ha-matter crypto): domain-separate witness signing + verify_strict (signing chain otherwise sound — P2 crypto core verified) (#1080 ) * fix(cog-ha-matter): domain-separate witness signing chain + verify_strict (ADR-116 §2.2) Crypto review of the SHA-256 + Ed25519 witness chain that ADR-262 P2 reuses. The sibling wifi-densepose-engine bug class (unframed concatenation of operator-influenceable strings into a signed digest) is ABSENT here — canonical_bytes already length-prefixes kind/payload. Two real hardening gaps fixed: - CHM-WIT-01: add a versioned domain-separation tag (WITNESS_DOMAIN_TAG = b"cog-ha-matter/witness-event/v1\0") to canonical_bytes so the witness SHA-256 preimage / Ed25519 message cannot be replayed as a message for another signing context that shares key infrastructure (notably the manifest binary_signature). Completes the engine review's "domain-tag + length-prefix" rule. Witness bytes change by design (prior on-disk hashes/sigs invalidated); no in-repo crate consumes these bytes programmatically. - CHM-WIT-02: verify_signature uses VerifyingKey::verify_strict (rejects non-canonical encodings + small-order keys) for the audit-uniqueness property. Key stays caller-pinned (not read from the event). Pinned by fails-on-old tests: canonical_bytes_is_domain_separated, canonical_bytes_starts_with_domain_tag_then_prev_hash, witness_preimage_cannot_collide_with_a_bare_manifest_digest, signature_commits_to_domain_tag_not_bare_fields; key-pinning guarded by verify_uses_strict_path_and_pins_caller_key. cog-ha-matter 64 -> 68 tests, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(cog-ha-matter): record ADR-116 crypto review findings (CHM-WIT-01/02) CHANGELOG [Unreleased] Security entry + ADR-116 §4.1 review notes: engine-class signed-digest collision confirmed ABSENT (length-prefixing already correct), domain-separation tag added, verify_strict hardening, and the clean dimensions (verify-before-trust, key-handling, determinism, fail-closed parsing) with byte-layout evidence. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 19:04:09 -04:00
rUv	02cb84e0bb	fix(vitals safety): non-finite CSI frame permanently froze breathing+HR via IIR-state poisoning (self-heal) + noise-never-Valid pin (#1079 ) * fix(vitals): self-heal IIR filters after non-finite CSI frame (ADR-021/ADR-158 §A1) The 2nd-order resonator bandpass_filter in BreathingExtractor and HeartRateExtractor latches each output y[n] into the filter state (y1/y2). A single non-finite amplitude residual from a corrupt CSI frame produced a NaN output that was written into the state. The existing extract() is_finite() guard dropped that one sample from the history buffer but never sanitized the poisoned filter state, so every subsequent output stayed NaN, was rejected too, and the sliding-window history never refilled: breathing AND heart-rate extraction went silently dead (returning None forever) until reset(). On the vitals alert path this is a safety-relevant denial of service — one bad frame stops monitoring with no error surfaced. Same class as the calibration NaN bug (ADR-154 §3) and the firmware vitals fixes (#998/#996/#987): prior hardening guarded the history boundary but not the filter-state boundary. Fix: when bandpass_filter computes a non-finite output it resets the IIR state to default and returns 0.0, so the resonator recovers on the next clean frame (the 0.0 is still dropped by the caller's finite-check, so no spurious sample enters history). Also de-magic the safety-critical HR physiological plausibility band into named HR_PLAUSIBLE_MIN_BPM/HR_PLAUSIBLE_MAX_BPM consts (value-identical 40/180 BPM). Pinned by: - breathing::tests::nan_frame_does_not_permanently_poison_filter (FAILS pre-fix) - breathing::tests::inf_mid_stream_does_not_freeze_history (FAILS pre-fix) - heartrate::tests::nan_frame_does_not_permanently_poison_filter (FAILS pre-fix) - heartrate::tests::pure_noise_is_never_reported_valid (fabricated-vital negative) - heartrate::tests::plausibility_band_constants_pinned (de-magic value pin) wifi-densepose-vitals --no-default-features: 55->60 lib tests, 0 failed. Workspace green (3370 passed, 0 failed). Python proof unchanged (vitals off the deterministic proof's signal path). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(vitals): record IIR NaN/inf self-heal fix (ADR-021, CHANGELOG) Document the wifi-densepose-vitals filter-state poisoning fix in ADR-021 Implementation Notes (parallel to the firmware #998/#996/#987 robustness class) and add a CHANGELOG [Unreleased] Fixed entry. Notes the confirmed clean dimensions with evidence (flat -> None; noise -> low-confidence Unreliable, never Valid; harmonic-rich breathing -> not a confident false HR; out-of-band BPM clamped). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-14 18:01:47 -04:00

1 2 3 4 5 ...

1055 Commits