mirror of https://github.com/ruvnet/RuView synced 2026-06-09 10:13:17 +00:00

Files

T

rUv 3314c8db8d feat(cog-pose-estimation): scaffold first Cog from this repo (ADR-100 + ADR-101) (#642 )

* feat(cog-pose-estimation): scaffold first Cog from this repo (ADR-100 + ADR-101)

Adds the foundation for the pose-estimation Cog that ships from this
repo into Cognitum V0 appliances. Companion ADR-225 + crate land in
cognitum-one/v0-appliance.

ADRs:
* ADR-100 formalises the Cognitum Cog packaging spec — on-device
layout under /var/lib/cognitum/apps/<id>/, manifest.json schema
(incl. new binary_sha256 + binary_signature fields), GCS hosting
convention, repo source layout, build pipeline, and the four-verb
runtime contract (version | manifest | health | run). Documents the
convention I reverse-engineered from inspecting installed cogs on a
live cognitum-v0 appliance — `anomaly-detect`, `presence`,
`seizure-detect`, etc.
* ADR-101 designs the pose-estimation Cog itself: where it sits in
the wifi-densepose pipeline (encoder init from
ruvnet/wifi-densepose-pretrained, 17-keypoint regression head),
what gets shipped per target arch (arm / x86_64 / hailo8 /
hailo10), acceptance gates (PCK@20 explicitly deferred to #640 —
this ADR ships the vehicle, not the accuracy).

Crate v2/crates/cog-pose-estimation/:
* Cargo.toml + workspace member declaration with a hailo feature gate
so the binary builds without the Hailo SDK in CI.
* main.rs implements the four-verb CLI exactly per ADR-100.
* config.rs / manifest.rs / publisher.rs / inference.rs / runtime.rs —
small modules, each <100 lines.
* publisher.rs emits ADR-100 structured JSON events.
* inference.rs is a stub that produces a centred-skeleton baseline
with confidence=0 (honest: no trained weights wired in yet).
* runtime.rs subscribes to /api/v1/sensing/latest, slides a
56*20 window, runs the engine, emits pose.frame events.
* cog/manifest.template.json + cog/config.schema.json define the
release artifact + runtime config schemas.
* cog/Makefile holds build / sign / upload targets.
* tests/smoke.rs covers manifest roundtrip + engine I/O surface.

Verified locally:
* cargo check -p cog-pose-estimation: clean.
* cargo test -p cog-pose-estimation: 4/4 pass.
* ./target/release/cog-pose-estimation {version,manifest,health}:
all emit the right contract output.

This commit contains scaffolding only; the actual trained weights and
Hailo HEF cross-compile come in follow-ups tracked in #640 and the
companion v0-appliance branch.

* feat(cog-pose-estimation): first measured run — Candle CUDA on RTX 5080

Trained pose_v1 on ruvultra (RTX 5080) via Candle 0.9 + cuda feature
against the same 1,077-sample paired session that produced 0%/0% PCK
in #640 with the pure-JS SPSA trainer. First real numbers:

PCK@20 = 3.0% (up from 0.0%)
PCK@50 = 18.5% (up from 0.0%)
MPJPE = 0.093 (down from 0.66, ~7x improvement)

400 epochs in 2.1 s wall time, full-batch, ~5 ms/epoch. Loss curve
0.181 -> 0.014 over the run, eval 0.010. Per-joint reveals the model
leans on right-side proximal joints (r_hip 77% PCK@50, r_knee 35%,
l_elbow 26%) — consistent with the camera framing in the source
recording. Distal joints (wrists, ankles) and face joints are still
near-random, consistent with the 56-subcarrier / 20-frame input not
carrying fine-grained spatial info at 1077 samples.

This commit:

* Adds v2/crates/cog-pose-estimation/cog/artifacts/{pose_v1.safetensors,
train_results.json} so the cog dir now contains a real reference
artifact, not just scaffold.
* Updates cog/README.md "Status" block with the measured numbers,
per-joint table, and an honest reading of where the model
succeeds vs where the data is the bottleneck.
* Adds docs/benchmarks/pose-estimation-cog.md as the canonical
benchmark log — append-only, one section per published run.
* Appends a "First measured run" section to ADR-101 referencing
the new benchmark file.

Still pending in the follow-up:
* Wire pose_v1.safetensors into src/inference.rs (replace stub).
* ONNX export (Candle lacks a writer — needs external conversion).
* Hailo HEF cross-compile + cluster deploy.

The data-bound gap to PCK@20 >= 35% is tracked in #640.

* feat(cog-pose-estimation): wire real weights — cog is no longer a stub

Replaces the centred-skeleton stub in src/inference.rs with a real
Candle-based loader that reads cog/artifacts/pose_v1.safetensors and
runs the trained Conv1d encoder + MLP pose head on every incoming CSI
window.

What changes:

* src/inference.rs: PoseNet mirrors the training script's architecture
exactly — Conv1d(56->64, k=3 d=1), Conv1d(64->128, k=3 d=2),
Conv1d(128->128, k=3 d=4), mean over time, Linear(128->256)+ReLU,
Linear(256->34)+sigmoid -> reshape [17, 2]. The InferenceEngine
searches a sensible candidate list for the weights file
(/var/lib/cognitum/apps/pose-estimation/, ./pose_v1.safetensors,
./cog/artifacts/, repo-root, v2/-relative) and falls back to the
stub when none are present so the cog still satisfies ADR-100.
* Cargo.toml: adds candle-core 0.9 + candle-nn 0.9 (no-default-features,
CPU build by default) + safetensors 0.4. New `cuda` feature opt-in
for GPU inference on hosts that have it. Drops the unused
wifi-densepose-train path dep from the default build path.
* src/main.rs + src/publisher.rs: health.ok event now carries
`backend` (candle-cuda | candle-cpu | stub) and the synthetic
output confidence, so operators can tell at a glance whether the
cog loaded its weights or fell back to the stub.
* tests/smoke.rs: adds `real_weights_load_when_available` which
asserts the loaded engine reports backend=candle-* and emits
non-zero confidence — exactly the signal that proves we're not
silently degrading to the stub.

Verified locally:

* `cargo check -p cog-pose-estimation --no-default-features` — clean
* `cargo test -p cog-pose-estimation --no-default-features` — 5/5 pass
* `./target/release/cog-pose-estimation health` emits:
{"event":"health.ok","fields":{"backend":"candle-cpu","cog":"pose-estimation","synthetic_output_confidence":0.185}}
— 0.185 is the published PCK@50 from cog/artifacts/train_results.json,
emitted by the real Candle inference path (would be 0.0 if it had
fallen back to the stub).

The cog now runs the trained pose_v1 model end-to-end. Accuracy is
still bounded by the underlying 1077-sample training data (PCK@20
3.0%, PCK@50 18.5% per docs/benchmarks/pose-estimation-cog.md) — that
gap is data-bound and tracked in #640. ONNX export + Hailo HEF
cross-compile remain follow-ups.

* docs(benchmarks): measure cog-pose-estimation cold-start latency

100 sequential `cog-pose-estimation health` invocations average 76.2 ms
each on a Windows x86_64 host using the `candle-cpu` backend. Each
invocation re-loads pose_v1.safetensors and runs one synthetic forward
pass, so this is the worst-case cold-start path. Long-running `run`
inference will be sub-millisecond per frame once the model is loaded.

Updates the benchmarks doc accordingly.

* feat(cog-pose-estimation): ONNX export — pose_v1.onnx + scripts/export-onnx.py

Adds the canonical ONNX artifact that unblocks downstream Hailo HEF
cross-compile + ONNX Runtime benchmarks. Generated on ruvultra (torch
2.12.0 + CUDA), 12,059 bytes, opset 18, dynamic batch axis.

* scripts/export-onnx.py: mirrors the Candle inference architecture in
PyTorch (Conv1d 56->64, 64->128, 128->128 + Linear 128->256->34), pure-
python safetensors loader (no extra pip dep), exports via
torch.onnx.export, then verifies via onnx.checker.check_model and
numerical parity against the torch reference.
* Verified parity vs torch: max |torch - onnx| = 8.94e-8 (1e-5
threshold). Effectively bit-perfect.
* v2/crates/cog-pose-estimation/cog/artifacts/pose_v1.onnx — the
artifact itself, 12 KB.
* docs/benchmarks/pose-estimation-cog.md — adds an ONNX export
section with the verification numbers.

Next: Hailo HEF cross-compile (still gated on Hailo SDK on a
self-hosted runner) and ONNX Runtime latency benchmarks on each
target arch.

* feat(cog-pose-estimation): release v0.0.1 — signed aarch64 binary on GCS

End-to-end deploy: cross-compiled to aarch64-unknown-linux-gnu on
ruvultra, ran via qemu-aarch64-static, then smoke-tested on a real
cognitum-v0 Pi 5. Signed with COGNITUM_OWNER_SIGNING_KEY (Ed25519)
and uploaded to gs://cognitum-apps/cogs/arm/.

Real-hardware results on cognitum-v0 (Pi 5):
health: backend=candle-cpu, confidence=0.185, real weights loaded
30x sequential `health`: 0.251 s total -> 8.4 ms / invocation (cold)

GCS release artifacts (publicly downloadable):
binary: 3,741,976 bytes
sha256 1e1a7d3dd01ca05d5bfc5dbb142a5941b7866ed9f3224a21edc04d3f09a99bf5
weights: 507,032 bytes
sha256 eb249b9a6b2e10130437a10976ed0230b0d085f86a0553d7226e1ae6eae4b9e5
signature (Ed25519, b64): LUN7xqLPYD3MFzm5dKB5MnYU0LvoRtek5ci5KiKPHBg+Xo6xuazwokn2Dw2JPMaLYJzmWn/SpT4djuR7hYvVDw==

Adds:
* v2/crates/cog-pose-estimation/cog/artifacts/manifest.json — the
release-pipeline-produced manifest with all fields filled in per
ADR-100, including arch, target_triple, signature, and a
build_metadata block carrying the validation PCK numbers.
* docs/benchmarks/pose-estimation-cog.md — new sections covering
the real Pi 5 smoke (8.4 ms cold-start) and the signed GCS
release artifacts.

Verified by downloading the binary anonymously from GCS and
re-computing the sha256 — matches the locally-computed sha exactly.
Signature decoded to the expected 64-byte Ed25519 length.

Closes the GCS-upload acceptance criterion from ADR-100; the only
pending work is Hailo HEF cross-compile (still SDK-gated) and an
x86_64 release alongside this arm release.

* docs(benchmarks): record live cognitum-v0 install + 5-sec smoke run

Adds the "Live appliance install" section documenting what happened
when the signed v0.0.1 binary + weights were installed under
/var/lib/cognitum/apps/pose-estimation/ on cognitum-v0 (the V0
cluster leader).

* Layout matches the existing anomaly-detect / presence / seizure-
detect cogs exactly — the Cogs dashboard at
http://cognitum-v0:9000/cogs auto-discovers entries.
* `cog-pose-estimation run` ran for 5 seconds in the background and
cleanly emitted run.started + structured WARN events for the
missing local sensing-server on :3000 (cognitum-v0's actual CSI
source is ruview-vitals-worker on :50054, not :3000). No crashes,
no NaN, no leaks.
* Wiring `sensing_url` to the appliance-native source is a separate
Day-2 integration task.

2026-05-19 17:03:09 -04:00

10 KiB

Raw Blame History

ADR-101: Pose Estimation Cog (WiFi-DensePose side)

Status: Accepted
Date: 2026-05-19
Deciders: ruv
Companion ADR (v0-appliance side): v0-appliance ADR-225 (cognitum-pose-estimation crate)

Context

ADR-079 designed the 17-keypoint COCO pose-estimation training pipeline. ADR-100 formalised the Cognitum Cog packaging spec. This ADR is the bridge: it specifies how the wifi-densepose training pipeline produces an artifact that ships as a Cog (cog-pose-estimation) onto the Cognitum V0 appliance and out to the Pi+Hailo cluster.

It is the next product step beyond the published presence Cog (binary head trained from the contrastive encoder on Hugging Face at ruvnet/wifi-densepose-pretrained). Where presence reports a single boolean per tick, cog-pose-estimation reports 17 (x, y) keypoints per person, per tick.

Decision

Pipeline

                         (training side — ruvultra GPU)
ESP32 / rvcsi  ─►  collect-ground-truth.py + sensing-server recording
                         │
                         ▼
                   data/paired/*.paired.jsonl   (CSI window + camera keypoints)
                         │
                         ▼
                   v2/crates/wifi-densepose-train  ──►  Rust + libtorch trainer
                   (uses RTX 5080 / CUDA 12.x)         │
                   init from ruvnet/wifi-densepose-pretrained
                                                       │
                                                       ▼
                                                  model.safetensors  (encoder + pose head)
                                                       │
                                          ─────────────┴─────────────
                                          │                         │
                                          ▼                         ▼
                                  v2/crates/cog-pose-estimation     export to ONNX
                                  (this repo)                       │
                                   • emits manifest.json            ▼
                                   • produces cog binary       cognitum-hailo
                                   • signs + uploads to GCS    (v0-appliance side)
                                                                    │
                                                                    ▼
                                                           cog-pose-estimation.hef
                                                                    │
                                                                    ▼
                              (appliance side — cognitum-v0 + Pi+Hailo cluster)
                                                           
                              gs://cognitum-apps/cogs/{arm,hailo8,hailo10}/cog-pose-estimation-<arch>
                                                                    │
                                                                    ▼
                              `cognitum-cog-gateway` pulls artifact + manifest, verifies signature, installs
                              into /var/lib/cognitum/apps/pose-estimation/
                                                                    │
                                                                    ▼
                              run loop: read CSI frames from local sensing-server
                              → encoder → pose head → emit `{ts, persons: [{keypoints: [...17 x,y...] }]}`
                              on stdout as the Cog runtime contract requires

Architecture (model)

Stage	Module	Notes
Input	`[56 subcarriers × 20 frames]` per CSI window	matches today's `data/paired/wiflow-p7-*.paired.jsonl`
Encoder	TCN-lite or contrastive encoder lifted from HF presence model	128-dim embedding; weights init from `ruvnet/wifi-densepose-pretrained/model.safetensors`
Pose head	2-layer MLP `(128 → 256 → 34)`	34 = 17 × (x, y)
Output	`[B, 17, 2]` keypoints in `[0, 1]` image-normalised coords	confidence is implicit in keypoint variance over time; ADR-079 P9 will add explicit per-joint confidence
Loss	Confidence-weighted SmoothL1 (frame-level) + bone-length regulariser + temporal smoothness	per ADR-079 Phase 3 refinement
Init	Encoder = HF presence weights (frozen for 50 epochs, then jointly fine-tuned)	unblocks the sigmoid-saturation failure mode observed in #640
Training	`v2/crates/wifi-densepose-train` with libtorch backend on RTX 5080	replaces the pure-JS SPSA trainer that produced 0% PCK in #640

Repo layout

v2/crates/cog-pose-estimation/        # NEW (this ADR)
├── Cargo.toml
├── src/
│   ├── main.rs                # CLI: run | health | version | manifest
│   ├── lib.rs
│   ├── inference.rs           # ONNX runtime + Hailo HEF runtime dispatch
│   ├── frame_subscriber.rs    # local sensing-server subscriber
│   └── publisher.rs           # emits structured JSON events per Cog contract
├── cog/
│   ├── manifest.template.json
│   ├── config.schema.json
│   ├── README.md
│   ├── icon.svg
│   └── Makefile               # build-arm | build-x86_64 | sign | upload
└── tests/
    ├── manifest_signature.rs
    └── inference_smoke.rs

Runtime contract

Honours ADR-100's per-Cog CLI contract:

cog-pose-estimation version → pose-estimation 0.0.1
cog-pose-estimation manifest → JSON
cog-pose-estimation health → 0 if encoder+head load and a synthetic frame produces a finite output
cog-pose-estimation run --config /etc/cognitum/cogs/pose-estimation/config.json → long-running; emits one JSON event per inferred frame:

{
  "ts": 1779210883.444,
  "level": "info",
  "event": "pose.frame",
  "fields": {
    "tick": 12345,
    "n_persons": 1,
    "persons": [
      {"keypoints": [[0.48, 0.31], [0.52, 0.28], ...], "confidence": 0.81}
    ]
  }
}

Hardware deployment

Target	arch	runtime	notes
ruvultra (dev)	`x86_64`	ONNX Runtime CPU/CUDA	development & smoke tests
cognitum-v0 (Pi 5)	`arm`	ONNX Runtime ARM	reference deploy; ~20 ms/frame
Pi + Hailo-8 hat	`hailo8`	Hailo HEF runtime via `cognitum-hailo`	~2 ms/frame, 26 TOPS budget
Pi + Hailo-10 hat	`hailo10`	Hailo HEF runtime via `cognitum-hailo`	~1 ms/frame, 40 TOPS budget

Acceptance gates

Validates: cargo test -p cog-pose-estimation green; cog-pose-estimation health returns 0 against a synthetic CSI window.
Benchmarks: end-to-end frame latency on each target arch logged in target/criterion/; published in docs/benchmarks/pose-estimation-cog.md.
Optimised: the Hailo-targeted ONNX graph passes through Hailo Dataflow Compiler without quantisation-aware-training warnings.
Published: signed binary at gs://cognitum-apps/cogs/<arch>/cog-pose-estimation-<arch>; manifest valid against the JSON schema in ADR-100; appliance installer can pull and run it.

PCK@20 is intentionally not an acceptance gate of this ADR. Achieving the ADR-079 ≥35% target is a separate, data-bound milestone tracked in #640. This ADR ships the vehicle, not the model accuracy.

First measured run — v0.0.1 (2026-05-19)

A Candle-on-CUDA training run on ruvultra's RTX 5080 against the same 1,077-sample paired session that produced the 0%/0% baseline in #640 yielded:

PCK@20 = 3.0%, PCK@50 = 18.5%, MPJPE = 0.093 (normalized).
400 epochs in 2.1 s wall time (~5 ms/epoch, full-batch).
Loss reduction 13× (0.181 → 0.014, eval 0.010).
Strongest signal at r_hip (PCK@50 = 76.9%), r_knee (35.2%), l_elbow (26.4%).

This confirms the pipeline trains end-to-end and produces a signal-bearing model. The remaining gap to PCK@20 ≥ 35% is data-bound (1,077 samples is ≪ the ADR-079 target of ~30K). See docs/benchmarks/pose-estimation-cog.md for the full result dump.

Consequences

Positive

First Cog from this repo that integrates with the appliance/cog-gateway pipeline. Future cogs (e.g. cog-vitals, cog-fall-alert) follow the same template.
Closes the loop from data collection → training → quantisation → cluster deployment with a single repo-anchored artifact.
Forces a real signature on cog binaries (per ADR-100), which improves supply-chain hygiene across the whole appliance.

Negative

Adds a hard dependency on the Hailo Dataflow Compiler, which lives behind a self-hosted runner — Hailo-targeted PRs land more slowly.
The first published binary will have low PCK (data + training time gap, #640) — UX needs to surface this clearly so end users do not interpret bad keypoints as a bug.

Risks

Model size on Hailo: the encoder fits comfortably in Hailo-8's on-chip SRAM, but the pose-head expansion to [17×2] plus required temporal stacking pushes us close to the Hailo-8 envelope. Mitigation: Hailo-10 path is the primary deploy target; Hailo-8 is a stretch.
Sensing-server schema drift: the cog subscribes to /api/v1/sensing/latest JSON. If the appliance's sensing-server schema changes, the cog fails open (logs warning, emits nothing). The frame_subscriber.rs module pins to schema version 2.

Migration / rollout

Land this ADR + ADR-100 on main of RuView.
Land companion ADR-225 + crate on main of v0-appliance.
First release cog-pose-estimation@0.0.1 ships only to ruvultra and cognitum-v0. Not pushed to the cluster Pis yet.
After P7→P9 data work (#640) brings PCK above a usable threshold, rebuild + re-publish; only then enable cluster rollout via cognitum-cog-gateway's OTA channel.

10 KiB Raw Blame History Unescape Escape