mirror of
https://github.com/ruvnet/RuView
synced 2026-07-05 14:33:19 +00:00
e6f26e9ac982aadaa18c8271ec65d29a302e49ec
1055 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e6f26e9ac9 |
docs(adr): deep review of the RuView npm surface — ADR-263/264/265 optimization strategies (#1229)
* docs(adr): deep review of the RuView npm surface — ADR-263/264/265 optimization strategies
ADR-263 — @ruvnet/ruview@0.1.0 harness review (O1–O9):
- HIGH: claim-check CLI fails open on empty input (no --text/--file -> PASS exit 0)
- HIGH: MCP stdio server head-of-line blocking (spawnSync verify/calibrate up to 600s)
- MEASURED: optionalDependencies triple the cold npx install (4 pkgs/620kB/71 files
vs 1 pkg/172kB/22 files with --omit=optional) for a path that never imports them
- maxBuffer truncation, python -c port interpolation, version drift, duplicate skills,
guardrail METRIC_TERMS substring false positives ('map'/'F1' — found by dogfooding
claim-check on these very ADRs), zero CI
ADR-264 — @ruvnet/rvagent@0.1.0 + @ruv/ruview-cli review (O1–O9), verified against
the published registry tarball:
- HIGH: exports.require -> dist/index.cjs which is never built nor published
- MEASURED: 44 dead source-map files = 62,698B of the 188kB unpacked payload
- stdio-only server described as dual-transport; mixed dot/underscore tool names;
double Zod validation + hand-duplicated advertised schemas; 2-fd leak per training
job; unbounded body in the unwired HTTP scaffold; dead detectCogBinary candidates;
ruview bin-name collision
ADR-265 — cross-cutting npm distribution strategy: npm-packages.yml CI matrix
(test + pack-content/size gate + tarball-install smoke test), publish-from-CI-only
with npm provenance, version single-sourcing from package.json, bin/namespace
ownership (ruview bin belongs to @ruvnet/ruview), claim-check on package READMEs.
Docs only — no runtime code changed. Index/CHANGELOG/CLAUDE.md/README counts updated.
Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01WrGfTGKv1oWZ6iwXZACULz
* fix(npm): implement ADR-263/264/265 — harness fail-closed + async MCP, rvagent packaging/transport/naming, npm CI+provenance gate
ADR-263 (@ruvnet/ruview 0.2.0), O1-O9:
- claim-check fails closed on empty input (CLI exit 2, empty_text tool error)
- MCP stdio server dispatches tools/call asynchronously (promise-based spawn);
ping answers while a 3s fake verify runs — pinned by new e2e test
- optionalDependencies dropped: cold npx installs exactly 1 package
(MEASURED: was 4 pkgs/620kB/71 files via npm i in a clean prefix)
- bounded rolling output tails replace spawnSync 1MiB maxBuffer
- node_monitor port passed via sys.argv, never spliced into python -c source
- serverInfo.version read from package.json; resources/prompts stubs
- skills single-sourced: prepack sync script generates .claude/skills/ copies
- which() = memoized dep-free PATH scan
- tools underscore-canonical (ruview_claim_check, ...) + dotted aliases
- guardrail precision: word-boundary map/f1/auc/iou, code-span + F1/O2 label
scrubbing, quantitative-claims-only; packaging reproducer hints
- 30/30 tests (was 17), incl. concurrency e2e + fail-open regression pins
ADR-264 (@ruvnet/rvagent 0.2.0), O1-O9:
- exports fixed: types-first, phantom dist/index.cjs require target removed
- tarball map-free: 127,704B unpacked / 46 files / 0 maps (MEASURED,
npm pack --dry-run; was 188kB incl. 44 maps referencing unshipped src)
- Streamable HTTP actually wired behind RVAGENT_HTTP_PORT: one transport +
one MCP server per session (mcp-session-id routing), 1MiB body cap (413),
port-aware localhost origin gate; dual-transport description now true
- tools renamed underscore-canonical with dotted router-only aliases
- single Zod validation gate; advertised inputSchema generated from the same
Zod source (zod-to-json-schema)
- train_count: parent log fds closed (was leaking 2/job); job records
persisted to <jobsDir>/<id>.json (job_status survives restarts); bounded
log-tail reads
- detectCogBinary probes its candidates instead of dead-coding them
- version from package.json; @types/express dropped; @types/jest -> 29
- README rewritten to match reality (no phantom subcommands/policy layer)
- 99/99 jest tests (incl. new session/body-cap suite + previously-broken
manifest suite); stdio handshake + HTTP session flow smoke-tested live
ADR-265 D1-D4:
- .github/workflows/npm-packages.yml: 3-package x Node 20/22 gate — tests,
version-literal grep (D3), pack-content/size gate, tarball-install smoke
test (catches the ADR-264 F1 class), README claim-check (D4)
- .github/workflows/ruview-npm-release.yml: publish from CI only with
npm publish --provenance
- @ruv/ruview-cli bin renamed ruview-cli (ruview bin belongs to
@ruvnet/ruview); version single-sourced
- ci.yml NODE_VERSION 18 -> 20
ADR statuses updated to Accepted/implemented; harness manifest re-pinned;
ADR-263/264/265 + both package READMEs pass claim-check.
Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01WrGfTGKv1oWZ6iwXZACULz
* perf(rvagent): lazy-load HTTP transport + memoize generated tool schemas
stdio time-to-first-response ~242ms -> ~189ms (-22%; MEASURED, median of
repeated initialize round-trips against dist/index.js in this container).
- ./http-transport.js now imported lazily inside the RVAGENT_HTTP_PORT
branch: it chain-loads the MCP SDK streamableHttp module (~48ms MEASURED
via per-module import() timing) which the default stdio path never uses
- toolInputJsonSchema memoized per tool: schemas are static for the process
lifetime; under the session-per-server HTTP model every session calls
tools/list, so stop re-walking the Zod tree each time
No behavior change: 99/99 jest tests; HTTP session flow re-smoke-tested
through the lazy import path (initialize -> 200 + mcp-session-id).
Profiled @ruvnet/ruview too and left it alone: 50ms CLI startup vs ~29ms
bare 'node -e ""' floor on the same box (MEASURED) — already near the
interpreter floor with zero dependencies.
Co-Authored-By: claude-flow <ruv@ruv.net>
Claude-Session: https://claude.ai/code/session_01WrGfTGKv1oWZ6iwXZACULz
* ci(ruview-cli): pass jest --passWithNoTests so the private no-test package doesn't fail the npm-packages matrix
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(npm): address 10 verified review findings in harness + rvagent before 0.2.0 publish
harness/ruview (@ruvnet/ruview):
- guardrails: digit gate now sees numbers inside code spans; F1-style
metric tokens followed by ':' or a nearby number are no longer scrubbed
(fail-open regressions in the honesty gate)
- mcp-server: tools/call requests serialize through a FIFO promise chain
(hardware/mutating tools never overlap) while ping/tools/list stay
immediate; stdin close drains in-flight responses before exit
- tools: which() no longer memoizes negative lookups
tools/ruview-mcp (@ruvnet/rvagent):
- index: realpath invoked-directly guard — library import no longer
connects a stdio transport to the consumer's process
- http-transport: explicit allowedOrigins is exact-match only (localhost
any-port convenience applies only with no configured allowlist);
session map gains maxSessions=64 + 5min idle TTL sweep
- train-count: job records persist the child pid and reconcile stale
'running' status after a server restart (exit-code marker or dead pid)
- config: cog binary candidates ordered by process.arch
.github/workflows/ruview-npm-release.yml: port the full ADR-265 D1 gate
(version-literal check, unpacked-size budget, tarball-install smoke test)
from npm-packages.yml so the publish path enforces what the header claims.
Tests: harness 30→36, rvagent 99→112, all passing.
Co-Authored-By: claude-flow <ruv@ruv.net>
---------
Co-authored-by: Claude <noreply@anthropic.com>
|
||
|
|
1a0492992f |
fix(build): repoint stale Makefile targets to v2/ and archive/v1/ (#1201)
The root Makefile still referenced pre-reorg paths that no longer exist, so the documented build/test/run entrypoints were all broken: - rust-port/wifi-densepose-rs/ -> v2/ (build-rust, build-wasm[-mat], test-rust, bench, clean cargo step) - v1.src.api.main -> archive.v1.src.api.main (run-api, run-api-dev) test-rust now uses the documented `--no-default-features` invocation (the proven-passing, GPU-free path). Verified: `cd v2 && cargo test --workspace --no-default-features --no-run` compiles the full workspace clean. Surfaced during a metaharness review (the broken root build entrypoint is why genome reported build:none / publish_readiness 0.40 at the repo root). Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36 |
||
|
|
c0d3d7c792 |
chore(firmware): add release guard against stale-sdkconfig partition mismatch (#1194)
While cutting v0.8.3-esp32, an incremental 8MB build reused a leftover generated `sdkconfig` and silently linked the 4MB dual-OTA partition layout (no spiffs, ota_1 @ 0x1F0000) — the would-be released `partition-table.bin` did not match the 8MB `partitions_display.csv` it claimed. scripts/firmware-release-guard.sh regenerates the expected partition table from the CSV the named flash-size variant must use and byte-compares it to the built `partition-table.bin`, and cross-checks flash size in flasher_args.json. Fails closed so a release pipeline can't ship a mismatched table. Usage: scripts/firmware-release-guard.sh <8mb|4mb> <build-dir> Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36v0.8.3-esp32 |
||
|
|
fca5e6f0a0 |
fix: multistatic canonicalization, csi_fps burst inflation, control-packet starvation (#1170, #1180, #1183) (#1193)
#1170 — live multistatic bridge fed raw, un-canonicalized per-node CSI (64/128/192 bins) to MultistaticFuser, tripping DimensionMismatch every cycle and silently disabling fusion on mixed HT20/HT40 meshes. Add HardwareNormalizer::resample_to_canonical (resample-only, no z-score) and canonicalize every node frame onto the 56-tone grid before fusion. #1180 — update_csi_fps_ema only rejected dt<=0 or >=1s, so sub-ms UDP-burst arrivals (36us -> ~27kHz) inflated csi_fps_ema 40-840x. Add a 5ms plausibility floor and stop re-anchoring observe_csi_frame_arrival on burst deltas. #1183 — global ENOMEM backoff (CSI flood) starved <=48B/<=1Hz control packets. Add stream_sender_send_priority() bypassing the backoff gate without touching the streak; route feature_state/HEALTH/sync through it. Fix the misleading "HEALTH sent" log that printed even on rv_mesh_send failure. Verified: signal 501, sensing-server 677 tests (0 failed); firmware builds clean. Claude-Session: https://claude.ai/code/session_01AgpTcBLRJ32hUsKWxDXf36 |
||
|
|
7831f29436 |
fix(firmware): phantom LD2410 detection + ENOMEM backoff (#1135) (#1159)
Bug #2 (root cause): LD2410 probe-detection matched only the 4-byte head 0xF4F3F2F1, so a floating UART at 256000 baud could phantom-detect a sensor and spawn a UART task. Now requires a full validated report frame (head + sane length + tail 0xF8F7F6F5), extracted to mmwave_detect.h and shared with a host unit test (test_mmwave_detect.c, 8 vectors) so firmware and test can't diverge. Matches the validate-before-trust approach used for MR60 in #1107. Bug #1: sendto ENOMEM used a fixed 100 ms backoff too short to drain sustained lwIP/WiFi buffer pressure, so a node could stay stuck. Now exponential (100->200->...->2000 ms per consecutive ENOMEM, reset on first successful send). Removing the phantom LD2410 task (bug #2) also removes the extra load that tipped the reporter's tier-2 node into the stuck state. Validated on ESP32-S3 QFN56 rev v0.2 (the reporter's silicon): tier-2 streams ~100 frames/s with no stuck ENOMEM and correctly reports no mmWave (no phantom). LD2410 predicate truth table proven (head-without-tail REJECTED). Could not reproduce the reporter's environment-specific floating-pin noise, so the deterministic proof is the host unit test.v0.8.2-esp32 |
||
|
|
4bf88e1283 |
feat(firmware): gate LED gamma viz behind CONFIG_LED_GAMMA_VIZ (ADR-183 follow-up) (#1129)
The 40 Hz gamma flicker is now Kconfig-gated (default y, unchanged behaviour). Set CONFIG_LED_GAMMA_VIZ=n for a dark, lower-power boot (the LED is simply cleared) — important for photosensitive deployments, no source edit needed. The colormap saturation point is now operator-tunable via CONFIG_LED_MOTION_FULLSCALE_MILLI (default 250 = 0.25). Build + flash confirmed on ESP32-S3 N16R8 (COM8): default y still arms the gamma timer, CSI flows. ADR-183 updated (gate moved from follow-up to done). |
||
|
|
a4c2935a2f |
feat(firmware): onboard LED 40 Hz gamma stimulus + CSI-motion colour (ADR-183) (#1127)
* chore(deps): bump ruv-neural submodule — ColorMap no_std for ESP32 Points to ruvnet/ruv-neural#3 (c9638fa): ruv-neural-viz::ColorMap now builds no_std, so it can run on the ESP32. Unblocks driving the onboard WS2812 from the viridis/cool-warm colormap. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(firmware): onboard LED as 40 Hz gamma stimulus, colour from live CSI motion (ADR-183) The S3 onboard WS2812 (GPIO 48, #962) now runs a GENUS-style 40 Hz gamma square wave (12.5 ms on/off, 50% duty). The ON-phase colour is live CSI motion (edge motion_energy) mapped through a 60-step viridis LUT generated from ruv-neural-viz::ColorMap::viridis() — still=purple, moving=yellow. Uses the now-no_std ColorMap (ruvnet/ruv-neural#3 / #1126). Hardware- confirmed on ESP32-S3 N16R8 (COM8): boot log shows the timer armed, CSI keeps flowing (27-38 pps). Honesty + photosensitivity notes + a Kconfig-gate follow-up are in ADR-183. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
315d7df09e |
chore(deps): bump ruv-neural submodule — ColorMap no_std for ESP32 (#1126)
Points to ruvnet/ruv-neural#3 (c9638fa): ruv-neural-viz::ColorMap now builds no_std, so it can run on the ESP32. Unblocks driving the onboard WS2812 from the viridis/cool-warm colormap. |
||
|
|
bdd1eaf927 |
chore: untrack ruvector.db runtime artifacts + gitignore at any depth (#1124)
These are hook/runtime-generated databases (ruvector/intelligence store) that kept showing as dirty and don't belong in version control. Removed from the index (files kept on disk) and ignored globally. |
||
|
|
4001e9e178 |
feat(harness): npx @ruvnet/ruview operator harness + ADR-182 (#1123)
A host-portable RuView agent harness minted via MetaHarness and hardened per ADR-182. Published as @ruvnet/ruview@0.1.0 (bare `ruview` blocked by npm's typosquat filter → scoped fallback). What it does: - 6 fail-closed `ruview.*` tools (onboard, claim_check, verify, node_monitor, calibrate, node_flash) exposed as CLI verbs + a dependency-free MCP stdio server. - The "prove everything" rule made executable: `ruview.claim_check` flags untagged accuracy claims and the retracted "100%" framing. - 5 host-neutral skills (onboard/provision-node/calibrate-room/ train-pose/verify) + bundled .claude/ config + provenance manifest. Validated: 17/17 unit tests, live MCP handshake, `ruview.verify` ran the real verify.py to VERDICT: PASS, clean `npx @ruvnet/ruview` from registry. Packs to 16.7 kB / 21 files; kernel+host are optionalDependencies so the operator tools install lightweight. README: documented as the portable, multi-host companion to the in-repo plugins/ruview/ Claude Code plugin (not a replacement). |
||
|
|
65e29ef47a |
fix(display): no false display-detect on bare DevKit → CSI starves at MGMT-only (#1000) (#1121)
The SH8601 QSPI panel is write-only, so display_hal_init_panel() 'succeeds' even on a bare display-less board — display_is_active() then returned true and main.c skipped the #893/#906 MGMT->MGMT+DATA CSI filter upgrade (yield=0pps). Gate on the FT3168 touch I2C readback (always present on the Touch-AMOLED board, absent on a bare DevKit): if touch is absent, the panel 'success' was a false-positive — bail to headless before the display task starts, so display_is_active() stays false and CSI captures. Co-authored-by: ruv <ruvnet@gmail.com>v0.8.1-esp32 |
||
|
|
cb30988cf9 |
fix(mmwave): require validated MR60 header on probe — no false detect on empty UART (#1107) (#1119)
probe_at_baud counted bare 0x01 (SOF) bytes and declared MR60BHA2 on a single one. A floating UART1 with no sensor reads noise full of 0x01s → false 'Detected MR60BHA2 (caps=0x000f)'. Now a candidate must be a full 8-byte header with a valid header checksum (bytes 0..6) AND a known frame type (0x0A__ / 0x0F09), and clear the ≥3 threshold; removed the weak single-hit fallback. Real sensors stream valid frames continuously, so detection of present hardware is unaffected. Co-authored-by: ruv <ruvnet@gmail.com> |
||
|
|
128b129474 |
Merge pull request #1120 from ruvnet/fix/1007-paired-data-pipeline
fix(paired-data): 4 bugs in CSI recorder + ground-truth aligner (#1007) |
||
|
|
15a983b555 |
fix(paired-data): 4 bugs corrupting/blocking camera-supervised training data (#1007)
1. record-csi-udp.py stamped LOCAL time with a 'Z' (UTC) suffix → camera/CSI disagreed by the UTC offset → 0 aligned pairs. Now writes true UTC via datetime.now(timezone.utc). 2. align-ground-truth.js kept empty-keypoint (non-detection) records at confidence 0, collapsing window avgConf below threshold → all windows rejected. Now skipped at load. 3. extractCsiMatrix silently zero-padded/truncated mixed-subcarrier frames. Now frames are filtered to the session's modal subcarrier count before windowing — never padded. 4. CSI/feature matrices are filled frame-major but were labeled shape [nSc, nFrames] — transposed. Labels corrected to [nFrames, nSc] / [nFrames, dim]. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
c6e7667676 |
Merge pull request #1104 from ruvnet/fix/issue-1049-configurable-guard
fix(sensing-server): make multistatic guard interval configurable (closes #1049) |
||
|
|
d639c747df |
Merge pull request #1114 from ruvnet/examples/through-wall-tools
examples(through-wall): ESP32 sensor auto-detection + WiFlow analysis tools |
||
|
|
42c764652d |
examples(through-wall): ESP32 sensor auto-detection + WiFlow analysis tools
- wiflow_browser.html: auto-detect live ESP32 nodes from the /ws/sensing stream and lock them as the model schema (NODE_IDS/CSI_DIM dynamic), persisted + restorable - wiflow_ab.py: leakage-controlled A/B (chronological/random/blocked-gap/grouped-bucket, multi-seed) — the honest CSI→pose evaluation harness - wiflow_capture.py / wiflow_train.py / wiflow_infer.py: camera-paired capture + train + infer - pose.html: live WiFi-inferred skeleton viewer; serve.py: static server - gitignore the regenerable 1.5MB model.npz artifact Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
db02956c22 |
Merge pull request #1113 from ruvnet/chore/bump-ruv-neural-submodule
chore: bump ruv-neural submodule to current main |
||
|
|
c84ea39e62 |
chore: bump ruv-neural submodule → current main (web console, closed-loop neuromod, ruvn mention)
Advances the vendored ruv-neural submodule from the stale 'Initial' pin (1ece3af) to current main (81be9e1): the static web console UI, the closed-loop neuromodulation platform, repositioned README, and the @ruvnet/ruvn companion-tool mention. ruv-neural is not a v2 workspace member, so this does not affect the workspace build (cargo metadata resolves clean). Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
760d05026c |
Merge pull request #1112 from ruvnet/chore/extract-swarm-worldgraph-submodules
Extract ruview-swarm → ruvnet/ruv-drone and world crates → ruvnet/worldgraph (submodules) |
||
|
|
a784546918 |
ci(ruview-swarm): drop removed itar-unrestricted feature from test matrix
The industrial rescope (ruv-drone) removed the itar-unrestricted feature flag — formation/allocation/raft/flight-control are now default capabilities. Update the 'ruflo+itar' matrix entry to just '--features ruflo' so CI matches the new feature set. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
9c751d0d92 |
chore(worldgraph): bump submodule — README + metadata polish
Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
a13e9b66cb |
chore: bump ruv-drone + worldgraph submodules (LICENSE + CI polish)
Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
6db183bf3e |
chore(swarm): bump ruv-drone submodule — README cleanup
Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
f65d0f79e7 |
chore(swarm): bump ruv-drone submodule (rescope + stray-file cleanup)
Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
7fb3b88061 |
chore(swarm): bump ruv-drone submodule — industrial rescope (drop ITAR/USML gating)
Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
aeac5f5543 |
chore(worldgraph): extract geo+worldgraph+worldmodel to ruvnet/worldgraph submodule
- published as github.com/ruvnet/worldgraph (3-crate workspace, history via git-filter-repo) - replace the 3 in-tree crates with one submodule at v2/crates/worldgraph - parent workspace: drop the 3 members, exclude the submodule (it is its own workspace), repoint workspace.dependencies(worldmodel) + engine/sensing-server path-deps into it - cargo metadata resolves clean (geo/worldgraph/worldmodel consumed from the submodule) Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
c257e67c3d |
chore(swarm): extract ruview-swarm to ruvnet/ruv-drone submodule
- ruview-swarm published as github.com/ruvnet/ruv-drone (history preserved via subtree split) - replace the in-tree crate with a submodule at v2/crates/ruview-swarm (branch main) - standalone repo dropped the unused wifi-densepose-core path-dep; export-control NOTICE added there - workspace member path unchanged; cargo metadata resolves ruview-swarm from the submodule Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
a4d5ea88f3 |
feat(examples): in-browser WiFlow trainer + camera-supervised pipeline + ADR-180/181/181A
Tonight's real WiFlow work, all honest: - examples/through-wall/: live 2-node CSI demo (index.html), the WiFlow camera-supervised pipeline (wiflow_capture/train/infer.py — proven +9.4pp over mean-pose baseline on ruvultra), the live pose viewer (pose.html), and the COMPLETE in-browser trainer (wiflow_browser.html): 4-stage calibrate->capture->train->infer, TF.js WebGPU/WASM/WebGL, MediaPipe camera supervision, IndexedDB persistence, mean-pose-baseline honesty. - ADR-180 (through-wall hand-off demo), ADR-181 (full browser WiFlow, WASM+WebGPU, calibration phase, mobile/secure-context matrix), ADR-181A (binary CSI framing protocol). Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
ebe217569b |
feat(examples): real live WiFi-CSI through-wall sensing demo
Self-contained Three.js r128 demo at examples/through-wall/ that renders ONLY genuine data streamed from the running sensing-server over ws://localhost:8765/ws/sensing. No simulation, no fabricated frames, no fake skeleton. Renders, driven by real /ws/sensing frames: - 20x20 signal_field floor heatmap (real values) - coarse RF-localization puck from persons[0].position (labeled coarse, NOT pose; peak signal_field cell as fallback) - live motion/breathing/variance/rssi bars + motion sparkline - presence/confidence/estimated_persons/active-node/tick/Hz meters - 3D room with wall + doorway dividing office (node 9) / hallway (node 13) - honest mutually-exclusive banner: LIVE (source=esp32) / SIMULATED / NO SERVER, showing the real source verbatim - optional webcam tile (ground-truth-when-visible, separate from CSI) Reuses scene/lights/bloom/CSS + webcam path from examples/three.js/demos/05-skinned-realtime.html, the floor-heatmap idea from ui/observatory/js/, and the threaded no-cache server from examples/three.js/server/serve-demo.py (serve.py on :8080). Verified against the live server: real frame source=esp32, nodes [9,13], 400 signal_field values, persons[0].position present. Python proof PASS. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
c27d6cc98e |
fix(sensing-server): make multistatic guard interval operator-configurable (#1049)
Two ESP32-S3 nodes on WiFi/ESP-NOW sync drift 10-150 ms (~70 ms typ.), exceeding the 60 ms default guard → permanent trust demotion to Restricted, all pose output suppressed, 200k+ errors, no escape but a container restart. Add a direct WDP_GUARD_INTERVAL_US override (+ optional WDP_SOFT_GUARD_US) to multistatic_guard_config_from_env. Precedence (most-specific wins): direct override > WDP_TDM_SLOTS+WDP_TDM_SLOT_US schedule-derived > 60ms/20ms default. Soft band always clamped strictly below hard; malformed/zero ignored (falls back, never breaks fusion). Effective guard logged at startup. Pinned by 6 tests (multistatic_guard_config_tests). sensing-server bin tests 449 -> 455, 0 failed. Python proof PASS, hash unchanged (off signal path). Closes #1049. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
cafbeb1e81 |
fix(wasm-edge): sanitize non-finite host floats at the WASM↔host frame boundary (#1102)
Closing beyond-SOTA security review of wifi-densepose-wasm-edge (ADR-040,
~70 edge modules). The two WASM↔host boundaries (lib.rs::on_frame/on_timer
and bin/ghost_hunter.rs::on_frame) read raw IEEE-754 f32 from the csi_get_*
imports with no finiteness check — the crate had zero is_finite/is_nan
guards and its clamp helpers propagate NaN. A single non-finite host value
latches NaN into long-lived per-module accumulators (EMA / Welford / phasor
sums / anomaly baselines), after which detectors fail degraded (stuck gate
state, silently-disabled checks) — silent corruption, not a crash.
Add sanitize_host_f32() (non-finite -> 0.0, core-only for no_std) applied at
every host_get_* float read: one chokepoint covering all downstream modules,
mirroring the existing M-01 negative-n_subcarriers boundary clamp. LOW /
defense-in-depth (the Tier-2 DSP firmware supplies the imports, a semi-trusted
boundary).
Pinned by boundary_tests::{sanitize_passes_finite_values_through,
sanitize_maps_non_finite_to_zero,
coherence_monitor_nan_latches_without_sanitize_but_not_with} — the last
asserts on the current CoherenceMonitor that a raw NaN frame latches the
smoothed score while the sanitized path stays finite.
Other review dimensions attested clean with evidence (see CHANGELOG): no
hot-path panics (all unwrap/expect are test-only or std-gated RVF builder),
all bounds min()-clamped, all index-by-cast const-bounded or guarded, no
leaking closures (no move||/forget/leak), no secrets.
Verified: host `cargo test --features std,medical-experimental` 672 passed /
0 failed (+3 new tests); all three wasm32-unknown-unknown release artifacts
build clean (lib default no_std/panic=abort, ghost_hunter standalone-bin,
medical-experimental); Python proof VERDICT PASS, hash unchanged.
|
||
|
|
c859f6f743 |
security(occworld-candle): int32-checkpoint crash + degenerate-input guards + ADR-179 (closes Milestone #9) (#1101)
* fix(occworld-candle): security review fixes — int32 checkpoint crash + predict input validation Beyond-SOTA security + correctness review of wifi-densepose-occworld-candle (Milestone #9, crate 4/4 — the last ungated crate). Findings fixed: 1. HIGH (MEASURED) — checkpoint-load crash on any int32 tensor. model.rs mapped safetensors I32 -> candle DType::I64 and passed the raw int32 byte buffer (4 bytes/elem) to Tensor::from_raw_buffer(.., I64, ..). Candle derives elem_count = data.len() / dtype.size(), so the I64 path halved the count while keeping the original shape -> a tensor whose shape claims 2x its storage. Reading it PANICS (slice OOB: "range end index 6 out of range for slice of length 3") on any checkpoint containing an int32 tensor. Fixed: I32 -> DType::I32, I16 -> DType::I16 (both first-class candle dtypes). Reproduced on old code; pinned in tests/checkpoint_loading.rs. 2. LOW (MEASURED) — predict() lacked frame/batch validation at the input boundary. f_in > num_frames*2 over-indexed the temporal embedding (cryptic candle "gather" error); zero frame/batch fed a zero-element tensor in. Now rejected with a clear ShapeMismatch. Pinned in tests/input_validation.rs. 3. LOW (MEASURED) — divide-by-zero panic in the public VQCodebook::encode on a rank-0 / empty-last-dim tensor (last == 0). Now fails closed with a clear error. Pinned in vqvae.rs unit tests. Dimensions confirmed clean with evidence: panic surface (no unwrap/expect/ panic in prod paths), NaN-state-poisoning (N/A — stateless engine, u8 input), unbounded-alloc/shape-data mismatch (defended upstream by safetensors:: validate), secrets (none). unsafe_code = forbid. Validation (MEASURED, Windows): crate 31/31 pass; workspace 0 failed (lone desktop api_integration "Access is denied" file-lock flake passes 21/21 in isolation); Python proof VERDICT PASS, hash f8e76f21…446f7a unchanged. Warrants ADR slot 179 (parent to author). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-179 — occworld-candle checkpoint-load hardening (closes Milestone #9) Records the HIGH int32-checkpoint crash fix (I32→I64 dtype-widening → slice-OOB panic on load = DoS) + 2 LOW degenerate-input fixes from 5e77f47e5. Stateless engine (NaN-poisoning N/A), unsafe forbidden, safetensors validate() defends malloc upstream. occworld 31/31. Final ungated crate — Milestone #9 complete. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
10c813fde3 |
security(desktop): IPC serial-command-injection + over-broad shell capability + ADR-178 (#1100)
* fix(security): desktop IPC serial-command-injection + over-broad shell capability (ADR-178) Beyond-SOTA security review of wifi-densepose-desktop (Tauri v2). Two real findings, each MEASURED on Windows (crate builds + tests under --no-default-features): WDP-DESK-01 (MODERATE) — serial command injection via configure_esp32_wifi. The #[tauri::command] handler concatenated webview-supplied ssid/password into newline-terminated serial commands with no validation; a \r\n let a compromised webview inject an arbitrary follow-up firmware command (reboot/erase). Added validate_wifi_credentials() enforcing WPA2 length bounds and rejecting all control characters, called fail-closed before any serial write. Pinned by 3 new tests (rejects \r\n / \n / NUL injection, rejects out-of-range, accepts valid boundaries). WDP-DESK-02 (MODERATE) — removed unused shell:allow-execute / shell:allow-open from capabilities/default.json. The Rust backend spawns processes via std::process::Command (bypassing the allowlist) and the UI only uses dialog.open; the shell perms were unused privilege granting the webview arbitrary host command execution on compromise. Regenerated capabilities.json confirms only core:default + dialog perms remain. lib tests 18 -> 21 (+3 pins), integration 21 -> 21, 0 failed. Python deterministic proof unchanged (f8e76f21...46f7a; desktop off the signal path). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-178 — desktop IPC injection fix + capability least-privilege Records the 2 MEASURED MODERATE fixes in feddcde9d: WDP-DESK-01 (webview ssid/password \r\n-injected arbitrary firmware serial commands → validated fail-closed) and WDP-DESK-02 (unused shell:allow-execute/open capability granted to the webview → removed). 30-command IPC surface + capability scope audited; 6 dimensions clean-with-evidence. desktop 18→21. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
20ad75f30c |
feat(ADR-131): HOMECORE-UI dashboard + BFF gateway — review-fixed (supersedes #1082) (#1099)
* feat(ADR-131): HOMECORE-UI operational dashboard + BFF gateway Complete two-tier Cognitum operator dashboard (ADR-131), served by homecore-server at /homecore, plus the single-origin BFF gateway that wires it to real backends. Front-end (zero-dep vanilla TS/JS + CSS, exact Cognitum design tokens): - All 10 panels (§4.1-4.10): dashboard, SEED fleet + detail, fleet map, entities (live WS subscribe_events, never polls), rooms, COGs, calibration wizard, events + automation builder, witness/audit, settings. - §6 UX invariants in code: first-class provenance, prominent stale/veto/ fragility, null(not-trained) vs withheld vs error, --mono everywhere, Hailo vs CPU COG distinction. - api.js calls the gateway routes in production; mock demoted to a dev-only ?demo=1 fixture (no mock in prod); typed error states. - Tests under plain node: import-graph, boot, render-smoke (22), interaction (3), prod-errors (13) — 5 files green; bundle ~137 KB (~37x smaller than HA), <2 ms/cold-render. BFF gateway (homecore-server/src/gateway.rs, compiled + tested on Rust 1.89): - /api/cal/* reverse-proxy to the calibration API (ADR-151). - GET /api/homecore/rooms with the RoomState adapter (breathing->breathing_bpm, heartbeat:null->heart_bpm:null, injected anomaly.threshold/room_id). - GET /api/homecore/cogs supervisor over /var/lib/cognitum/apps/. - GET /api/homecore/appliance from /proc + TCP service probes. - SEED-device/appliance routes return typed 503 upstream_unavailable. - cargo test -p homecore-server = 12/12; run live (curl-verified); fixed a real double-v1 proxy-URL bug found during live testing. Honest scope: W1/W2/W4/W6-appliance functional; W3/W5/W6-Hailo/federation return typed 503 (depend on services/hardware not in this repo). Co-Authored-By: claude-flow <ruv@ruv.net> * fix(homecore-ui): resolve code-review findings — SSRF guard, CORS/trace coverage, §6 honesty, crash guards Addresses the high-effort review of PR #1082: - SECURITY: cal_proxy rejects path-traversal/confused-deputy SSRF (`.`/`..` segments, backslash, %2e%2e/%2f, absolute) on raw+decoded forms → 400, before attaching the server-side calibration bearer. - CORRECTNESS: /api/homecore/* + /api/cal/* now covered by the shared CORS allowlist (build_cors_layer, exported from homecore-api) + TraceLayer — previously merged outside router()'s layers (no CORS, no tracing). - §6 HONESTY (no fabricated data): dashboard renders '—' for null metrics (not "null%"/"null°C"); cogs Hailo pill reflects the REAL appliance probe (not hardcoded "connected"); room anomaly threshold passed through / null, not a fabricated 0.5. - ROBUSTNESS: cogs asArray(hef) guards a non-array manifest field; calibration progress guards target<=0 (no NaN%/Infinity%); restart clears the poll timer. - CLEANUP: mock.js is now a cached DYNAMIC import (demo-only) — never bundled in production (§2.2). - New ui/tests/unit-fixes.mjs pins the above; ADR-131 + CHANGELOG updated. Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Nick Ruest <127058086+nicholas-ruest@users.noreply.github.com> |
||
|
|
1df6d1e1ee |
security(nvsim): guard degenerate input — config panic + NaN silent-corruption + ADR-177 (#1098)
* fix(nvsim): guard degenerate input — config-induced panic + NaN-state poisoning
Beyond-SOTA security review of the ADR-089 NV-diamond simulator (milestone #9,
crate 2 of 4). Two real degenerate-input findings, each pinned fails-on-old:
NVSIM-DT-01 (config panic/DoS, pipeline.rs): an external f_s_hz == 0 made
dt == +Inf, dt_us saturated to u64::MAX, and `sample * dt_us` panicked with
"attempt to multiply with overflow" at sample >= 2 (debug/WASM panic=abort;
garbage t_us in release). Fix: sanitise dt (non-finite/non-positive -> 1 µs
fallback), cap the u64 cast, and saturating_mul the timestamp.
NVSIM-NAN-01 (NaN-state poisoning, digitiser.rs): a non-finite scene parameter
(NaN dipole position / Inf moment / NaN loop radius) bypasses the near-field
clamp (NaN < R_MIN_M is false) and yields a NaN field; at the ADC `NaN as i32`
== 0 silently emitted b_pt=[0,0,0] with ADC_SATURATED CLEAR — indistinguishable
from a legit zero-field reading. Fix at the funnel: adc_quantise treats any
non-finite input as out-of-range -> clamps to code 0 AND raises the saturation
flag, so the corruption is visible downstream.
Determinism integrity, panic-free MagFrame deserialisation, and RNG seeding
confirmed clean with evidence. The published cross-machine witness
(cc8de9b0…93b4) is unchanged — guards only affect degenerate inputs.
cargo test -p nvsim --no-default-features: 50 -> 53 passed, 0 failed.
Workspace green; Python deterministic proof unchanged (f8e76f21…46f7a,
nvsim off the signal proof path). Needs ADR slot 177.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-177 — nvsim degenerate-input hardening
Records the 2 MEASURED MEDIUM fixes in
|
||
|
|
4a083999e5 |
security(ruview-swarm): fail-closed on NaN/Inf at the swarm-comm trust boundary + ADR-176 (#1096)
* fix(ruview-swarm): fail-closed on NaN/Inf at swarm-comm trust boundary (ADR-148)
Beyond-SOTA security review of the ADR-148 drone swarm control plane found
four IEEE-754 NaN/Inf fail-open / DoS bugs on data crossing the untrusted
swarm-comm boundary (receive_peer_state / receive_peer_detection accept full
DroneState/CsiDetection whose f64/f32 fields deserialize with no finite-check).
- HIGH: failsafe::tick collision-avoidance + battery checks fail-open on NaN
(NaN < threshold == false silently disabled collision avoidance / kept a
NaN-battery drone Nominal). Now fails closed to EmergencyDiverge / RTH.
- MED: geofence::check NaN-altitude bypass returned Safe through the
point-in-polygon path. Now leading non-finite-coordinate guard -> HardBreach.
- MED/DoS: antijamming FhssRadio panicked with "% 0" on an empty deserialized
channels_mhz. Now len==0 early-returns (benign 0.0 sentinel).
- LOW: multiview::fuse propagated a NaN victim_position into the fused
"confirmed victim" location. Now requires finite confidence + position.
Each fix pinned by a fails-on-old / passes-on-new test (MEASURED: old code
returned Nominal/Safe or panicked). cargo test -p ruview-swarm
--no-default-features: 117 -> 123 passed, 0 failed. Workspace green; Python
deterministic proof unchanged (f8e76f21...46f7a, off the signal path).
Documented-not-fixed (ADR slot 176): Raft AppendEntries lacks Log-Matching
consistency check (topology/raft.rs); MavlinkSigner::verify uses non-constant
-time tag compare + no replay-window rejection (already doc-flagged).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-176 — ruview-swarm NaN-fail-open safety review
Records the 4 MEASURED fail-open safety bugs fixed in
|
||
|
|
0f64d23516 |
feat(bench): int8 quantization of WiFlow-STD half pose model — MEASURED trade-off (ADR-175, honest negative) (#1095)
Sub-deliverable 8.2 of the benchmark/optimization milestone. Quantizes the 843,834-param "half" WiFlow-STD pose model (half_best.pth) to int8 two ways and MEASURES the accuracy/size trade-off vs fp32 under ONE locked normalization (ADR-173 torso-diameter PCK, upstream calculate_pck use_torso_norm=True), on the same seed-42 file-level 70/15/15 test split that produced the fp32 sweep numbers. MEASURED on ruvultra (RTX 5080, torch 2.11.0+cu128, fbgemm; clean test, torso-PCK): fp32 96.62% pck@20 99.47% pck@50 0.008981 mpjpe 3.351 MB int8 PTQ static 40.98% pck@20 94.98% pck@50 0.038262 mpjpe 1.046 MB (-55.64pp) int8 QAT (3 ep) 67.48% pck@20 98.69% pck@50 0.026548 mpjpe 1.043 MB (-29.15pp) Verdict (honest no): int8 is NOT a win at the strict PCK@20 edge target. Static PTQ collapses; QAT recovers a large share but still loses 29 pp @20 for a 3.2x size win — keep fp32/fp16 on the edge. Disclosed: QAT fake-quant val pck@20 was 83.45% but converted int8 scores 67.48% (~16pp convert_fx gap, reported honestly). Deliverables: - v2/crates/wifi-densepose-train/scripts/quantize_half_int8.py (reproducible: header carries the exact ssh command + run date; QAT primary, static PTQ fallback) - docs/adr/ADR-175-int8-quantization-half-pose-model-measured.md (MEASURED table, locked normalization, QAT-vs-PTQ labeling, verdict, reproduction, limitations) - CHANGELOG [Unreleased] ### Added entry No production Rust or signal-pipeline change. Python deterministic proof unchanged (f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a, bit-exact). |
||
|
|
b209b8b778 |
ci(bench): compile-verify regression gate for v2 criterion benches + ADR-174 (#1094)
* ci(bench): wire v2 criterion benches into CI as a compile-verify regression gate
Sub-deliverable 8.3 of the benchmark/optimization milestone (needs ADR slot 174).
The v2/ workspace ships 26 criterion benches across 18 crates, but benches are
not part of `cargo test`, so nothing in CI compiled them and they silently rot
when a public API they call changes.
Add `.github/workflows/bench-regression.yml`:
- bench-compile (HARD GATE): `cargo bench --workspace --no-default-features
--no-run` compiles + links every default-feature bench (no measurement) plus
the cir-gated cir_bench — a real, deterministic regression guard against
bench bit-rot.
- bench-fast-run (INFORMATIONAL, continue-on-error, never gates): runs a
curated pure-CPU subset (nvsim, ruvector sketch/fusion) in criterion
quick-mode and uploads logs as an artifact.
No timing-regression gate, by design: wall-clock on shared GitHub runners varies
2-3x run-to-run, so a hard threshold or cross-runner `criterion --baseline`
compare would manufacture false failures. The honest scope is compile-verify +
informational-run; the workflow header documents the self-hosted-runner
condition under which true timing-gating becomes honest. The crv-gated crv_bench
is excluded because its crates.io dep ruvector-crv 0.1.1 fails to build upstream.
Running the gate immediately caught one already-bit-rotted bench:
wifi-densepose-mat/detection_bench failed to compile (E0063: missing field
last_rssi in SensorPosition). Fixed (last_rssi: None) and re-verified.
Validation (MEASURED): mat detection_bench + cir_bench + nvsim + ruvector +
vitals + swarm benches compile under --no-default-features; fast subset runs;
`cargo test -p wifi-densepose-mat --no-default-features` 174 passed / 0 failed;
Python proof PASS, hash f8e76f21...46f7a unchanged.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-174 — CI bench-regression compile-verify gate
Records sub-deliverable 8.3 (bench-regression.yml, committed
|
||
|
|
90a88ada9a |
feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092)
* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity
The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).
New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
(retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
number is structurally impossible.
17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).
This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).
wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness
Documents the accuracy harness (committed
|
||
|
|
cfd0ad76cf |
security(core,cli): pin CSI-deserialiser DoS-resistance + ADR-172 (clean-with-evidence) (#1091)
* test(core,cli): pin DoS-resistance of CSI deserialisers (ADR-127 security review)
Beyond-SOTA security review of wifi-densepose-core + wifi-densepose-cli.
Load-bearing-question verdict: the NaN-state-poisoning bug class does NOT
originate in core — core exposes no stateful accumulator (no Welford,
von-Mises, IIR, voxel grid, running mean); each downstream crate rolls its
own, so each fix is correctly local. Both crates confirmed clean on every
reviewed dimension (panic-on-adversarial-input, NaN handling, unbounded
memory, path traversal, secrets) — no production code changed.
Adds 4 regression pins locking in two existing-but-untested DoS guards:
- core: from_canonical_bytes shape guard (Vec::with_capacity bound) — proven
to fail with `capacity overflow` when the saturating-mul guard is removed.
- core: canonical decoder never panics on arbitrary/truncated bytes.
- cli: parse_csi_packet rejects an oversized n_antennas*n_subcarriers claim
before Array2 allocation (33 MB claim in a 2 KB datagram -> None).
- cli: parse_csi_packet never panics on arbitrary UDP bytes.
core: 35 -> 37 lib tests; cli: 24 -> 26 tests; 0 failed. Python proof
unchanged (f8e76f21…46f7a — off the signal path).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-172 — wifi-densepose-cli + core CSI-deserialiser security review
Records the clean-with-evidence verdict + 4 DoS-resistance regression pins
(test-only, committed in
|
||
|
|
71e8756051 | docs(research): SOTA evidence brief for nn/train benchmark ADR (#1090) | ||
|
|
5287497a4a |
security(homecore-migrate): redact secret value from malformed secrets.yaml error (#1089)
* fix(homecore-migrate): redact secret value from malformed secrets.yaml error (secret-leak)
`read_secrets` wrapped serde_yaml's parse error into `MigrateError::YamlParse {
source }`. serde_yaml's message for a typed-tag coercion failure embeds the
offending scalar verbatim, e.g. `invalid value: string "<the-secret-value>"`.
That error propagates out of `read_secrets`, is `?`-returned by the
`InspectSecrets` CLI path in main.rs, and printed to stderr by anyhow — leaking
a secret value despite the CLI's deliberate `<redacted>` design.
Fix: secrets.yaml parse failures now map to a new redacting variant
`MigrateError::SecretsParse { path, line, column }` that carries only the file
path and a coarse location (from `serde_yaml::Error::location()`), never the
scalar content. Other (non-secret) YAML files keep `YamlParse`.
Pinned by `secrets::tests::malformed_secrets_error_never_contains_secret_value`
(asserts the rendered error AND its full #[source] chain never contain the
secret value; fails on the old `YamlParse` path) plus
`malformed_secrets_error_reports_location` (still fail-closed + locatable).
ADR-165 secret-handling rule: a secret value must never appear in output.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(homecore-migrate): record secret-leak fix in ADR-165 + CHANGELOG
Note the secrets.yaml error-redaction fix and the review's clean dimensions
(read-only source / no traversal / no panic / fail-closed versioning / no
injection) in ADR-165 §2.4, bump the test-evidence count 19→21 in §2.6, and add
an [Unreleased] Security entry to CHANGELOG.
Co-Authored-By: claude-flow <ruv@ruv.net>
|
||
|
|
bf1dfe79fd |
fix(homecore core): TOCTOU race dropped/reordered state_changed events under concurrent writers (~93k→0) + 2 fail-closed hardenings (#1087)
* fix(homecore): atomic state set — close TOCTOU lost/reordered state_changed events StateMachine::set did get() (release shard lock) → compute next + no-op decision → insert() (re-acquire lock) → send(). The read-modify-write was not atomic w.r.t. a concurrent writer on the same entity: a writer that read a stale `old` could mis-classify a real transition as a no-op and drop its state_changed event (a missed automation trigger) or fire an event whose new_state duplicated the previously delivered one (a spurious trigger for any automation keyed on old_state != new_state). ADR-127 §2.1 promises "writer atomically replaces the map entry"; the implementation did not. Fix: hold the DashMap shard write-lock across the whole read→decide→insert→ fire sequence via entry()/insert_entry(). tx.send is non-blocking, non-async, and never re-enters the map, so firing under the shard lock cannot deadlock and keeps global event order in lock-step with global commit order. Pinned by concurrent_set_fires_no_duplicate_adjacent_events: 4 writers toggling one entity A/B; asserts no two consecutive fired events carry the same new_state (impossible under correct serialisation). Fails reliably on the old code (~365-476 duplicate-adjacent events on the first trial), passes on the fix across repeated runs. Co-Authored-By: claude-flow <ruv@ruv.net> * harden(homecore): bound entity_id length — close memory-DoS at the REST boundary homecore-api/src/rest.rs parses untrusted path segments straight through EntityId::parse (get/delete/set_state). With no length cap, an otherwise-valid id like "a." + many MB of [a-z0-9_] was accepted; a POST /api/states/<giant> would persist it into the DashMap state store, permanently growing memory (amplification across distinct ids). Fix: reject ids longer than MAX_ENTITY_ID_LEN (255, HA-compatible) up front in parse(), before any per-char scan, with a new EntityIdError::TooLong. Fails closed at the boundary type so every caller (REST, registry deserialize, automation) is protected. Pinned by entity_id_length_boundary: exactly-MAX accepted, MAX+1 rejected, 4 MiB id rejected as TooLong. Fails on old code (oversized parses Ok). Co-Authored-By: claude-flow <ruv@ruv.net> * harden(homecore): isolate panicking service handlers (catch_unwind) ServiceRegistry::call already ran handlers outside the registry lock (the Arc<dyn ServiceHandler> is cloned out of the read guard first), so a panic could never poison the RwLock or block other callers — good. But a panicking handler unwound through call() into the caller's task; the task driving the engine (e.g. an axum request handler invoking a service) could be aborted by one buggy integration. Fix: wrap the handler future in AssertUnwindSafe + FutureExt::catch_unwind and convert a panic into ServiceError::HandlerPanicked. Mirrors HA isolating service-handler exceptions. The registry stays fully usable afterwards. Pinned by panicking_handler_is_isolated_and_registry_survives: the panicking call returns HandlerPanicked (not an unwind), a sibling healthy service still returns its value, and the bad service remains registered. Fails on old code (the await point panics instead of returning Err). Co-Authored-By: claude-flow <ruv@ruv.net> * test(homecore): pin event-bus lag safety (bounded broadcast, no DoS) Documents-with-evidence that the core EventBus does NOT have the homecore-api WS broadcast-lag failure: with EVENT_CHANNEL_CAPACITY=4096, firing 3x capacity while a subscriber never drains keeps fire_* non-blocking (publisher never waits on slow receivers), gives the slow receiver a recoverable Lagged(n) (drop-oldest + re-sync) rather than a closed channel, and leaves the bus live for a fresh fast subscriber. No code change — pins the clean dimension. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(homecore): record ADR-127 §9 security+concurrency review + CHANGELOG Documents the three pinned fixes (HC-RACE-01 state-set TOCTOU, HC-EID-LEN-01 entity_id memory-DoS, HC-SVC-PANIC-01 service-handler isolation) and the clean dimensions (bounded event-bus lag handling, lock discipline / no lock-across-await, no panic-on-input) with their evidence. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
9b126e927e |
harden(assist security): bound untrusted utterance (DoS); cmd-injection/ReDoS/NaN/fail-open all proven clean with evidence (#1086)
* fix(homecore-assist): bound untrusted utterance length, fail closed (ADR-133 security)
The intent recognizers accept utterances from untrusted callers (voice
transcripts, the WebSocket `assist` command). Neither the regex nor the
semantic path bounded utterance length, so a pathological multi-megabyte
utterance forced an unbounded `to_lowercase()` clone plus a per-registered-
pattern scan (and, in the semantic path, full tokenisation + feature-hash
embedding) — an allocation/CPU amplification on attacker-controlled input.
The `regex` crate is linear-time (no catastrophic backtracking), so this was
a throughput/memory DoS rather than a hang, but it was still unbounded.
Fix: introduce MAX_UTTERANCE_BYTES (4 KiB — far above any real spoken
command) and check it at both recognizer boundaries BEFORE any allocation or
scan. An over-length utterance fails closed: Ok(None) (no intent, no action),
identical to an unrecognised phrase. No legitimate command is affected.
Pinned by fails-on-old tests:
- recognizer::over_length_utterance_fails_closed — an over-length utterance
that contains a valid command resolves to None (would have matched before)
- semantic_recognizer::over_length_utterance_fails_closed_semantic
Co-Authored-By: claude-flow <ruv@ruv.net>
* test(homecore-assist): pin clean security dimensions with evidence (ADR-133)
Adds regression tests documenting the dimensions reviewed and found clean,
so the properties cannot silently regress:
- runner: no subprocess surface exists. RufloRunnerOpts.{script_path,env}
are inert and never executed; even a hostile script_path/env spawns
nothing. And the entity_id capture class [a-z0-9_ .] strips every shell
metacharacter, so a resolved slot can never carry ; | & $ ` / etc into a
(future) argv — sanitisation by construction.
(shell_metachars_never_survive_into_a_resolved_slot,
runner_opts_are_inert_no_process_spawned)
- recognizer: the regex crate is a linear-time finite automaton; a classic
catastrophic-backtracking shape (a+)+$ on adversarial input completes in
bounded time — no ReDoS.
(pathological_backtracking_pattern_completes_in_bounded_time)
- embedding: embeddings are structurally finite (FNV feature-hash + guarded
L2 normalise, no external float input, no unguarded division), so a crafted
utterance cannot inject NaN/Inf to poison cosine k-NN; cosine against the
zero vector is a finite 0.0, never NaN.
(embeddings_are_structurally_finite, cosine_with_zero_vector_is_finite_not_nan,
empty_utterance_against_empty_index_no_panic_no_match)
- pipeline: injection-shaped utterances never deliver a metacharacter into a
service call; the worst case resolves to a clean entity token, and an
unrecognised utterance fails closed to not_understood (no action).
(pipeline_injection_shaped_utterance_carries_no_metachars_to_service)
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(homecore-assist): record ADR-133 security review (HC-ASSIST-01 + clean dims)
CHANGELOG [Unreleased] Security entry + ADR-133 section 6 review notes for the
homecore-assist voice/intent pipeline review.
Co-Authored-By: claude-flow <ruv@ruv.net>
|
||
|
|
41bee64593 |
fix(recorder): bound history query (memory-DoS) + add missing transactional purge (disk-DoS); SQL-injection & NaN dims clean (#1084)
* fix(homecore-recorder): bound history query + add transactional purge (memory-DoS + disk-DoS)
Security review of the HA-compat state recorder (ADR-132) found two real
bounding bugs; SQL-injection and NaN-index dimensions confirmed clean.
(1) Memory-DoS: get_state_history carried no LIMIT — a wide [since,until]
window over a high-frequency entity loaded an unbounded row set into a
single in-memory Vec. Added LIMIT MAX_HISTORY_ROWS (1,000,000); the
sibling search paths were already k-bounded.
(2) Disk-DoS / documented-but-missing purge: README advertised
Recorder::purge(older_than) but no retention path existed -> unbounded
disk growth. Added a transactional purge with an EXCLUSIVE cutoff
(idempotent, no off-by-one) that deletes old states+events and
garbage-collects orphaned state_attributes blobs (dedup-shared blobs
are kept until their last referencing state is gone). All three deletes
run in one transaction so a mid-purge failure rolls back cleanly.
Pinning tests (homecore-recorder 19->25 no-default / 25->31 ruvector, 0 failed):
- malicious_entity_id_is_stored_literally_not_executed (SQL injection)
- like_metacharacters_in_query_are_literal_not_wildcards (LIKE escape)
- history_query_carries_a_limit_clause (memory-DoS bound)
- purge_keeps_boundary_row_and_drops_older (exclusive-cutoff, true pin)
- purge_gcs_orphaned_attributes_but_keeps_shared (dedup-safe GC)
- purge_also_removes_old_events
No behaviour change beyond the two fixes. Python deterministic proof
unchanged (recorder is off the signal proof path).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(homecore-recorder): record ADR-132 security review findings
Add a "3a. Security review" section to ADR-132 and a CHANGELOG [Unreleased]
Security entry covering the homecore-recorder review: SQL-injection and
NaN-index dimensions confirmed clean with evidence (every query bound; LIKE
pattern bound+escaped; SHA-256->i32->f32 embeddings always finite, empty
index/k=0 probed no-panic), plus the two fixes (unbounded history LIMIT,
transactional exclusive-cutoff purge with orphan-attribute GC).
Co-Authored-By: claude-flow <ruv@ruv.net>
|
||
|
|
5bc3b634b7 |
fix(automation security): template-bomb DoS (100MB/11s render → fuel-bounded, HIGH) + delay panic-on-config (MEDIUM) (#1083)
* fix(homecore-automation): bound template render to stop unbounded-expansion DoS (HC-SEC-01)
A `template:` condition / value_template comes straight from user
automation config and was rendered with MiniJinja's default (no
instruction budget, no output cap). A single condition such as
`{% for i in range(5000) %}{% for j in range(5000) %}xxxx{% endfor %}{% endfor %}`
rendered a 100 MB string over ~11 s on one render call (proven
empirically) — a CPU/memory denial of service, the bfld-class
"unbounded expansion".
Fix:
- Enable MiniJinja's `fuel` feature and set a per-render instruction
budget (`set_fuel(Some(1_000_000))`). A nested loop burns one unit
per iteration, so the budget caps total work regardless of nesting;
the attack now fails fast (~90 ms) with "engine ran out of fuel".
- Reject template sources over 64 KiB before compilation (defense in
depth so a pathological literal can neither compile nor emit verbatim).
Legitimate HA templates (a few dozen instructions) are unaffected.
Tests (fail on old — unbounded render / no rejection):
- nested_loop_template_is_bounded_not_unbounded_dos
- single_huge_repeat_template_is_bounded
- oversized_template_source_is_rejected
- legitimate_template_still_renders_within_fuel (no regression)
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(homecore-automation): stop crafted delay/timeout from panicking the run task (HC-SEC-02)
`Action::Delay { seconds }` and `Action::WaitForTrigger { timeout_seconds }`
fed the user-supplied float straight into `Duration::from_secs_f64`, which
PANICS on negative, NaN, infinite, or overflowing inputs. All of those are
reachable from a crafted (or simply typo'd) automation YAML —
`delay: {seconds: -1}`, `.nan`, `.inf`, `1e308` — so one hostile config
aborts the spawned automation task with a panic
("cannot convert float seconds to Duration: value is negative", proven
empirically).
Fix: a `safe_duration_from_secs` guard that saturates instead of panicking,
matching Home Assistant's lenient "non-positive delay = no delay":
- NaN / ±inf / negative -> Duration::ZERO
- absurdly large (would overflow) -> clamped to ~100 years (MAX_DELAY_SECS)
Tests (fail on old — panic = failure):
- delay_negative_seconds_does_not_panic
- delay_nan_seconds_does_not_panic
- delay_infinite_seconds_does_not_panic
- wait_for_trigger_negative_timeout_does_not_panic
- safe_duration_saturates_hostile_values (incl. overflow clamp)
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(homecore-automation): record HC-SEC-01/02 security review (CHANGELOG + ADR-129 §8a)
Document the two DoS findings (template unbounded-expansion HC-SEC-01,
delay panic-on-config HC-SEC-02) and the dimensions probed clean
(condition fail-closed, bounded run-modes, sandboxed read-only templates).
Co-Authored-By: claude-flow <ruv@ruv.net>
|
||
|
|
e1f4897269 |
fix(geo numerical): parse_hgt underflow/inf-grid (HIGH) + haversine asin-NaN; pointcloud confirmed-robust (NaN-poisoning class, 3rd find) (#1081)
* fix(geo numerical robustness): parse_hgt underflow panic + haversine asin-domain NaN Targeted numerical-robustness audit of wifi-densepose-geo (ADR-154-class sweep). Two real bugs, each pinned by a fails-on-old test: 1. terrain.rs parse_hgt — usize underflow panic on degenerate input. `side = sqrt(n_samples)`; for empty / sub-2x2 buffers side <= 1, so `1.0 / (side - 1)` underflows `usize` (panic "attempt to subtract with overflow" in debug; wraps to a huge value in release → garbage/inf cell_size_deg that poisons every ElevationGrid::get). A truncated HTTP body or a 404 HTML page reaches parse_hgt. Now bails with a clear error when side < 2. 2. coord.rs haversine — asin domain overflow → NaN for (near-)antipodal points. Floating rounding can push `h.sqrt()` to 1.0 + ~4e-16, and `asin(>1)` is NaN (verified: pair (-44.4994,-178.95722)→(44.49939999, 1.04278001) yields h=1.0000000000000004). A NaN distance silently breaks all downstream `<`/`>` comparisons. Clamp into [0,1] before asin. Also pins the ±90° pole-singularity (cos(lat)=0 division) as no-panic; the ENU transform itself is unchanged (no behavior change for valid inputs). Tests: wifi-densepose-geo 9→15 lib (6 new), 8 integration unchanged. 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * test(pointcloud robustness): pin NaN-state-poisoning resistance + degenerate voxel fusion Numerical-robustness audit of wifi-densepose-pointcloud. No bug found — the crate is confirmed-robust against the proven NaN-state-poisoning class that bit calibration/vitals. This adds regression pins documenting why: 1. csi_pipeline.rs — persistent auto-accumulating state (occupancy EMA, vitals) is provably self-healing. The UDP parser only emits finite amplitudes/phases (sqrt/atan2 of i8), and even an adversarial hand-built CsiFrame with NaN/inf amplitudes+phases cannot latch non-finite state: motion_score = (NaN/100).min(1.0) → 1.0; breathing path → 0 → clamp(5,40) → 5.0; tomography EMA uses only integer rssi. The new test injects 40 poisoned frames and asserts occupancy/vitals stay finite AND the pipeline recovers to an in-range estimate afterward — so a future refactor that drops a `.min`/`.clamp` self-heal would fail this pin. 2. fusion.rs — fuse_clouds voxel averaging is div-by-zero-safe (per-voxel count >= 1 by construction). Pins empty / single-point / all-coincident inputs as no-panic with finite output. No behavior change. Tests: wifi-densepose-pointcloud 18→22 (4 new), 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(geo/pointcloud robustness): CHANGELOG + ADR-154 sibling-crate sweep note Record the wifi-densepose-geo + wifi-densepose-pointcloud numerical-robustness audit under CHANGELOG [Unreleased] → Fixed, and a sibling-crate-extension note on the ADR-154 horizon ledger (these crates are outside ADR-154's signal scope but the sweep is the same ADR-154 class). Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
9f80b66ae3 |
harden(cog-ha-matter crypto): domain-separate witness signing + verify_strict (signing chain otherwise sound — P2 crypto core verified) (#1080)
* fix(cog-ha-matter): domain-separate witness signing chain + verify_strict (ADR-116 §2.2) Crypto review of the SHA-256 + Ed25519 witness chain that ADR-262 P2 reuses. The sibling wifi-densepose-engine bug class (unframed concatenation of operator-influenceable strings into a signed digest) is ABSENT here — canonical_bytes already length-prefixes kind/payload. Two real hardening gaps fixed: - CHM-WIT-01: add a versioned domain-separation tag (WITNESS_DOMAIN_TAG = b"cog-ha-matter/witness-event/v1\0") to canonical_bytes so the witness SHA-256 preimage / Ed25519 message cannot be replayed as a message for another signing context that shares key infrastructure (notably the manifest binary_signature). Completes the engine review's "domain-tag + length-prefix" rule. Witness bytes change by design (prior on-disk hashes/sigs invalidated); no in-repo crate consumes these bytes programmatically. - CHM-WIT-02: verify_signature uses VerifyingKey::verify_strict (rejects non-canonical encodings + small-order keys) for the audit-uniqueness property. Key stays caller-pinned (not read from the event). Pinned by fails-on-old tests: canonical_bytes_is_domain_separated, canonical_bytes_starts_with_domain_tag_then_prev_hash, witness_preimage_cannot_collide_with_a_bare_manifest_digest, signature_commits_to_domain_tag_not_bare_fields; key-pinning guarded by verify_uses_strict_path_and_pins_caller_key. cog-ha-matter 64 -> 68 tests, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(cog-ha-matter): record ADR-116 crypto review findings (CHM-WIT-01/02) CHANGELOG [Unreleased] Security entry + ADR-116 §4.1 review notes: engine-class signed-digest collision confirmed ABSENT (length-prefixing already correct), domain-separation tag added, verify_strict hardening, and the clean dimensions (verify-before-trust, key-handling, determinism, fail-closed parsing) with byte-layout evidence. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|
|
02cb84e0bb |
fix(vitals safety): non-finite CSI frame permanently froze breathing+HR via IIR-state poisoning (self-heal) + noise-never-Valid pin (#1079)
* fix(vitals): self-heal IIR filters after non-finite CSI frame (ADR-021/ADR-158 §A1) The 2nd-order resonator bandpass_filter in BreathingExtractor and HeartRateExtractor latches each output y[n] into the filter state (y1/y2). A single non-finite amplitude residual from a corrupt CSI frame produced a NaN output that was written into the state. The existing extract() is_finite() guard dropped that one sample from the history buffer but never sanitized the poisoned filter state, so every subsequent output stayed NaN, was rejected too, and the sliding-window history never refilled: breathing AND heart-rate extraction went silently dead (returning None forever) until reset(). On the vitals alert path this is a safety-relevant denial of service — one bad frame stops monitoring with no error surfaced. Same class as the calibration NaN bug (ADR-154 §3) and the firmware vitals fixes (#998/#996/#987): prior hardening guarded the history boundary but not the filter-state boundary. Fix: when bandpass_filter computes a non-finite output it resets the IIR state to default and returns 0.0, so the resonator recovers on the next clean frame (the 0.0 is still dropped by the caller's finite-check, so no spurious sample enters history). Also de-magic the safety-critical HR physiological plausibility band into named HR_PLAUSIBLE_MIN_BPM/HR_PLAUSIBLE_MAX_BPM consts (value-identical 40/180 BPM). Pinned by: - breathing::tests::nan_frame_does_not_permanently_poison_filter (FAILS pre-fix) - breathing::tests::inf_mid_stream_does_not_freeze_history (FAILS pre-fix) - heartrate::tests::nan_frame_does_not_permanently_poison_filter (FAILS pre-fix) - heartrate::tests::pure_noise_is_never_reported_valid (fabricated-vital negative) - heartrate::tests::plausibility_band_constants_pinned (de-magic value pin) wifi-densepose-vitals --no-default-features: 55->60 lib tests, 0 failed. Workspace green (3370 passed, 0 failed). Python proof unchanged (vitals off the deterministic proof's signal path). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(vitals): record IIR NaN/inf self-heal fix (ADR-021, CHANGELOG) Document the wifi-densepose-vitals filter-state poisoning fix in ADR-021 Implementation Notes (parallel to the firmware #998/#996/#987 robustness class) and add a CHANGELOG [Unreleased] Fixed entry. Notes the confirmed clean dimensions with evidence (flat -> None; noise -> low-confidence Unreliable, never Valid; harmonic-rich breathing -> not a confident false HR; out-of-band BPM clamped). Co-Authored-By: claude-flow <ruv@ruv.net> |