* feat(cog-pose-estimation): scaffold first Cog from this repo (ADR-100 + ADR-101) Adds the foundation for the pose-estimation Cog that ships from this repo into Cognitum V0 appliances. Companion ADR-225 + crate land in cognitum-one/v0-appliance. ADRs: * ADR-100 formalises the Cognitum Cog packaging spec — on-device layout under /var/lib/cognitum/apps/<id>/, manifest.json schema (incl. new binary_sha256 + binary_signature fields), GCS hosting convention, repo source layout, build pipeline, and the four-verb runtime contract (version | manifest | health | run). Documents the convention I reverse-engineered from inspecting installed cogs on a live cognitum-v0 appliance — `anomaly-detect`, `presence`, `seizure-detect`, etc. * ADR-101 designs the pose-estimation Cog itself: where it sits in the wifi-densepose pipeline (encoder init from ruvnet/wifi-densepose-pretrained, 17-keypoint regression head), what gets shipped per target arch (arm / x86_64 / hailo8 / hailo10), acceptance gates (PCK@20 explicitly deferred to #640 — this ADR ships the vehicle, not the accuracy). Crate v2/crates/cog-pose-estimation/: * Cargo.toml + workspace member declaration with a hailo feature gate so the binary builds without the Hailo SDK in CI. * main.rs implements the four-verb CLI exactly per ADR-100. * config.rs / manifest.rs / publisher.rs / inference.rs / runtime.rs — small modules, each <100 lines. * publisher.rs emits ADR-100 structured JSON events. * inference.rs is a stub that produces a centred-skeleton baseline with confidence=0 (honest: no trained weights wired in yet). * runtime.rs subscribes to /api/v1/sensing/latest, slides a 56*20 window, runs the engine, emits pose.frame events. * cog/manifest.template.json + cog/config.schema.json define the release artifact + runtime config schemas. * cog/Makefile holds build / sign / upload targets. * tests/smoke.rs covers manifest roundtrip + engine I/O surface. Verified locally: * cargo check -p cog-pose-estimation: clean. * cargo test -p cog-pose-estimation: 4/4 pass. * ./target/release/cog-pose-estimation {version,manifest,health}: all emit the right contract output. This commit contains scaffolding only; the actual trained weights and Hailo HEF cross-compile come in follow-ups tracked in #640 and the companion v0-appliance branch. * feat(cog-pose-estimation): first measured run — Candle CUDA on RTX 5080 Trained pose_v1 on ruvultra (RTX 5080) via Candle 0.9 + cuda feature against the same 1,077-sample paired session that produced 0%/0% PCK in #640 with the pure-JS SPSA trainer. First real numbers: PCK@20 = 3.0% (up from 0.0%) PCK@50 = 18.5% (up from 0.0%) MPJPE = 0.093 (down from 0.66, ~7x improvement) 400 epochs in 2.1 s wall time, full-batch, ~5 ms/epoch. Loss curve 0.181 -> 0.014 over the run, eval 0.010. Per-joint reveals the model leans on right-side proximal joints (r_hip 77% PCK@50, r_knee 35%, l_elbow 26%) — consistent with the camera framing in the source recording. Distal joints (wrists, ankles) and face joints are still near-random, consistent with the 56-subcarrier / 20-frame input not carrying fine-grained spatial info at 1077 samples. This commit: * Adds v2/crates/cog-pose-estimation/cog/artifacts/{pose_v1.safetensors, train_results.json} so the cog dir now contains a real reference artifact, not just scaffold. * Updates cog/README.md "Status" block with the measured numbers, per-joint table, and an honest reading of where the model succeeds vs where the data is the bottleneck. * Adds docs/benchmarks/pose-estimation-cog.md as the canonical benchmark log — append-only, one section per published run. * Appends a "First measured run" section to ADR-101 referencing the new benchmark file. Still pending in the follow-up: * Wire pose_v1.safetensors into src/inference.rs (replace stub). * ONNX export (Candle lacks a writer — needs external conversion). * Hailo HEF cross-compile + cluster deploy. The data-bound gap to PCK@20 >= 35% is tracked in #640. * feat(cog-pose-estimation): wire real weights — cog is no longer a stub Replaces the centred-skeleton stub in src/inference.rs with a real Candle-based loader that reads cog/artifacts/pose_v1.safetensors and runs the trained Conv1d encoder + MLP pose head on every incoming CSI window. What changes: * src/inference.rs: PoseNet mirrors the training script's architecture exactly — Conv1d(56->64, k=3 d=1), Conv1d(64->128, k=3 d=2), Conv1d(128->128, k=3 d=4), mean over time, Linear(128->256)+ReLU, Linear(256->34)+sigmoid -> reshape [17, 2]. The InferenceEngine searches a sensible candidate list for the weights file (/var/lib/cognitum/apps/pose-estimation/, ./pose_v1.safetensors, ./cog/artifacts/, repo-root, v2/-relative) and falls back to the stub when none are present so the cog still satisfies ADR-100. * Cargo.toml: adds candle-core 0.9 + candle-nn 0.9 (no-default-features, CPU build by default) + safetensors 0.4. New `cuda` feature opt-in for GPU inference on hosts that have it. Drops the unused wifi-densepose-train path dep from the default build path. * src/main.rs + src/publisher.rs: health.ok event now carries `backend` (candle-cuda | candle-cpu | stub) and the synthetic output confidence, so operators can tell at a glance whether the cog loaded its weights or fell back to the stub. * tests/smoke.rs: adds `real_weights_load_when_available` which asserts the loaded engine reports backend=candle-* and emits non-zero confidence — exactly the signal that proves we're not silently degrading to the stub. Verified locally: * `cargo check -p cog-pose-estimation --no-default-features` — clean * `cargo test -p cog-pose-estimation --no-default-features` — 5/5 pass * `./target/release/cog-pose-estimation health` emits: {"event":"health.ok","fields":{"backend":"candle-cpu","cog":"pose-estimation","synthetic_output_confidence":0.185}} — 0.185 is the published PCK@50 from cog/artifacts/train_results.json, emitted by the real Candle inference path (would be 0.0 if it had fallen back to the stub). The cog now runs the trained pose_v1 model end-to-end. Accuracy is still bounded by the underlying 1077-sample training data (PCK@20 3.0%, PCK@50 18.5% per docs/benchmarks/pose-estimation-cog.md) — that gap is data-bound and tracked in #640. ONNX export + Hailo HEF cross-compile remain follow-ups. * docs(benchmarks): measure cog-pose-estimation cold-start latency 100 sequential `cog-pose-estimation health` invocations average 76.2 ms each on a Windows x86_64 host using the `candle-cpu` backend. Each invocation re-loads pose_v1.safetensors and runs one synthetic forward pass, so this is the worst-case cold-start path. Long-running `run` inference will be sub-millisecond per frame once the model is loaded. Updates the benchmarks doc accordingly. * feat(cog-pose-estimation): ONNX export — pose_v1.onnx + scripts/export-onnx.py Adds the canonical ONNX artifact that unblocks downstream Hailo HEF cross-compile + ONNX Runtime benchmarks. Generated on ruvultra (torch 2.12.0 + CUDA), 12,059 bytes, opset 18, dynamic batch axis. * scripts/export-onnx.py: mirrors the Candle inference architecture in PyTorch (Conv1d 56->64, 64->128, 128->128 + Linear 128->256->34), pure- python safetensors loader (no extra pip dep), exports via torch.onnx.export, then verifies via onnx.checker.check_model and numerical parity against the torch reference. * Verified parity vs torch: max |torch - onnx| = 8.94e-8 (1e-5 threshold). Effectively bit-perfect. * v2/crates/cog-pose-estimation/cog/artifacts/pose_v1.onnx — the artifact itself, 12 KB. * docs/benchmarks/pose-estimation-cog.md — adds an ONNX export section with the verification numbers. Next: Hailo HEF cross-compile (still gated on Hailo SDK on a self-hosted runner) and ONNX Runtime latency benchmarks on each target arch. * feat(cog-pose-estimation): release v0.0.1 — signed aarch64 binary on GCS End-to-end deploy: cross-compiled to aarch64-unknown-linux-gnu on ruvultra, ran via qemu-aarch64-static, then smoke-tested on a real cognitum-v0 Pi 5. Signed with COGNITUM_OWNER_SIGNING_KEY (Ed25519) and uploaded to gs://cognitum-apps/cogs/arm/. Real-hardware results on cognitum-v0 (Pi 5): health: backend=candle-cpu, confidence=0.185, real weights loaded 30x sequential `health`: 0.251 s total -> 8.4 ms / invocation (cold) GCS release artifacts (publicly downloadable): binary: 3,741,976 bytes sha256 1e1a7d3dd01ca05d5bfc5dbb142a5941b7866ed9f3224a21edc04d3f09a99bf5 weights: 507,032 bytes sha256 eb249b9a6b2e10130437a10976ed0230b0d085f86a0553d7226e1ae6eae4b9e5 signature (Ed25519, b64): LUN7xqLPYD3MFzm5dKB5MnYU0LvoRtek5ci5KiKPHBg+Xo6xuazwokn2Dw2JPMaLYJzmWn/SpT4djuR7hYvVDw== Adds: * v2/crates/cog-pose-estimation/cog/artifacts/manifest.json — the release-pipeline-produced manifest with all fields filled in per ADR-100, including arch, target_triple, signature, and a build_metadata block carrying the validation PCK numbers. * docs/benchmarks/pose-estimation-cog.md — new sections covering the real Pi 5 smoke (8.4 ms cold-start) and the signed GCS release artifacts. Verified by downloading the binary anonymously from GCS and re-computing the sha256 — matches the locally-computed sha exactly. Signature decoded to the expected 64-byte Ed25519 length. Closes the GCS-upload acceptance criterion from ADR-100; the only pending work is Hailo HEF cross-compile (still SDK-gated) and an x86_64 release alongside this arm release. * docs(benchmarks): record live cognitum-v0 install + 5-sec smoke run Adds the "Live appliance install" section documenting what happened when the signed v0.0.1 binary + weights were installed under /var/lib/cognitum/apps/pose-estimation/ on cognitum-v0 (the V0 cluster leader). * Layout matches the existing anomaly-detect / presence / seizure- detect cogs exactly — the Cogs dashboard at http://cognitum-v0:9000/cogs auto-discovers entries. * `cog-pose-estimation run` ran for 5 seconds in the background and cleanly emitted run.started + structured WARN events for the missing local sensing-server on :3000 (cognitum-v0's actual CSI source is ruview-vitals-worker on :50054, not :3000). No crashes, no NaN, no leaks. * Wiring `sensing_url` to the appliance-native source is a separate Day-2 integration task.
8.0 KiB
ADR-100: Cognitum Cog Packaging Specification
- Status: Accepted (formalises existing convention)
- Date: 2026-05-19
- Deciders: ruv
Context
The Cognitum V0 Appliance (/var/lib/cognitum/apps/) deploys discrete units called Cogs. They appear in the Appliance dashboard (http://cognitum-v0:9000/cogs) under an app-store UI (Today / Apps / Categories / Search / Updates). Until this ADR, the packaging convention has been implicit — derived from inspecting installed cogs (anomaly-detect, presence, seizure-detect, etc.) on a live appliance. Bringing new Cogs to the platform required reverse-engineering the layout each time.
This ADR formalises the layout so:
- A repo crate can be built into a Cog with a deterministic Makefile / CI pipeline.
- Cog binaries can be cross-compiled for every supported architecture from a single source.
- The appliance's installer (
cognitum-cog-gateway) can verify manifests without bespoke per-cog adapters. - Future Cogs in this repo (starting with
cog-pose-estimation— see ADR-101) follow a single rule.
Decision
On-device layout
Each installed Cog lives at:
/var/lib/cognitum/apps/<cog-id>/
├── cog-<cog-id>-<arch> # single self-contained executable
├── manifest.json # immutable; signed by the publisher
├── config.json # mutable; runtime config, owned by the appliance
├── pid # current PID when running; absent when stopped
├── output.log # stdout (truncated on rotation)
└── error.log # stderr (truncated on rotation)
<cog-id> is kebab-case, ASCII, [a-z0-9-]{2,32}. <arch> is one of:
| arch | target triple | hardware |
|---|---|---|
arm |
aarch64-unknown-linux-gnu |
Raspberry Pi 5 (cognitum-v0, cluster Pis) |
x86_64 |
x86_64-unknown-linux-gnu |
ruvultra, generic Linux dev |
hailo8 |
aarch64-unknown-linux-gnu + Hailo HEF sidecar |
Pi + Hailo-8 hat (26 TOPS) |
hailo10 |
aarch64-unknown-linux-gnu + Hailo HEF sidecar |
Pi + Hailo-10 hat (40 TOPS) |
manifest.json schema
{
"id": "anomaly-detect",
"version": "0.1.0",
"binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-anomaly-detect-arm",
"binary_bytes": 461904,
"binary_sha256": "<hex>",
"binary_signature": "<base64 Ed25519 sig over binary_sha256, signed with COGNITUM_OWNER_SIGNING_KEY>",
"installed_at": 1778772536,
"status": "installed"
}
Fields:
id,version,binary_url,binary_bytes,installed_at,status— already implemented and observed in production manifests (e.g.anomaly-detect@0.0.0). Documented here without change.binary_sha256,binary_signature— new, REQUIRED for any Cog shipped from this repo. Backwards-compatible with existing manifests: the appliance gateway treats both fields as optional today, MUST verify them when present. ADR-103 (witness chain) covers the trust model in more detail.statusvalues:"installed","running","stopped","failed","updating".
Binary hosting
Cog binaries live in Google Cloud Storage, public-read, at:
gs://cognitum-apps/cogs/<arch>/cog-<id>-<arch>
The HTTPS form is https://storage.googleapis.com/cognitum-apps/cogs/<arch>/cog-<id>-<arch> (no trailing extension; the URL is the canonical artifact). For Hailo variants, the HEF model file is sibling: cog-<id>-<arch>.hef.
Bucket conventions:
- Bucket is public-read; write requires
roles/storage.objectAdminin projectcognitum-20260110. - Per-version artifacts must be content-addressed:
cogs/<arch>/cog-<id>-<arch>@<sha256-prefix>is the immutable copy; the un-suffixed name is a symlink that updates on release. COGNITUM_OWNER_SIGNING_KEY(GCP Secret Manager) signs every binary before upload.
Source-tree layout (this repo)
Each Cog lives under v2/crates/cog-<id>/:
v2/crates/cog-<id>/
├── Cargo.toml # crate name = cog-<id>; binary = cog-<id>
├── src/
│ ├── main.rs # CLI: cog-<id> run | status | version
│ ├── lib.rs
│ └── inference.rs # the actual work
├── cog/
│ ├── manifest.template.json
│ ├── config.schema.json # JSON schema for runtime config
│ ├── README.md # consumer-facing description (used by the App Store UI)
│ ├── icon.svg # 1024×1024 icon (used by App Store hero)
│ └── Makefile # build / sign / upload targets
└── tests/
├── smoke.rs
└── manifest_signature.rs
Build pipeline
cd v2/crates/cog-<id>
make build-arm # cross-compile to aarch64-unknown-linux-gnu
make build-x86_64 # x86_64 Linux build
make build-hailo8 # arm + HEF compilation (requires Hailo Dataflow Compiler)
make build-hailo10 # arm + HEF compilation
make sign # produce binary_sha256 + binary_signature
make upload # gsutil cp to gs://cognitum-apps/cogs/<arch>/
make manifest # emit manifest.json with all fields filled
CI (GitHub Actions) MUST run make build-arm + make build-x86_64 on every PR touching v2/crates/cog-*/. Hailo HEF compilation requires the proprietary Hailo SDK and runs only on the Hailo-capable runners (currently a labelled self-hosted runner on the Pi cluster — TBD, separate ADR).
Runtime contract
A Cog binary MUST implement:
| Subcommand | Behaviour |
|---|---|
cog-<id> version |
Print <id> <version> and exit 0. |
cog-<id> manifest |
Print the embedded manifest JSON and exit 0. |
cog-<id> run --config /path/to/config.json |
Long-running. Writes structured JSON logs to stdout (parsed by cognitum-cog-gateway). Exit code 0 on graceful shutdown, non-zero on fatal error. |
cog-<id> health |
One-shot. Exit 0 if the cog could come up healthy; non-zero with diagnostic on stderr. Called by the gateway before run. |
stdout JSON line format (one event per line):
{"ts": 1779210883.444, "level": "info", "event": "<event-name>", "fields": { ... }}
Consequences
Positive
- New Cogs can be added without RE-ing the layout each time.
- CI can verify the manifest schema before merge.
- Signed binaries close a real supply-chain gap — current installed cogs (
anomaly-detect@0.0.0) have no signature, and a compromised GCS object could push malicious code to every appliance. - The runtime contract (
run | health | version | manifest) is uniform across cogs, socognitum-cog-gatewaycan stop carrying per-cog adapters.
Negative
- Existing installed cogs must be re-published with signatures within one minor release of the gateway adopting the verify-when-present rule.
- Hailo HEF cross-compile is gated on a self-hosted runner; we accept that PRs touching Hailo variants will be slower to land.
Risks
- Signing key rotation:
COGNITUM_OWNER_SIGNING_KEY(Ed25519) is a single root-of-trust today. ADR-103 (witness chain) describes the rotation/recovery path; this ADR depends on that. - GCS bucket misconfiguration: a public-read bucket with versioning-off could allow rollback attacks. Bucket MUST have Object Versioning enabled + 90-day non-current-version retention.
Migration
- Land this ADR.
- Land ADR-101 (
cog-pose-estimation— first Cog built to this spec). - After two clean releases of
cog-pose-estimation, re-publish the existing cogs (anomaly-detect,presence, etc.) withbinary_sha256+binary_signature. Track in a follow-up issue. - Flip
cognitum-cog-gatewayfrom "verify when present" to "require signature" — separate ADR, separate review.
See also
- ADR-101: Pose Estimation Cog (first Cog built to this spec).
- ADR-103: Witness chain trust model (signing key rotation, future ADR).
docs/adr/ADR-079-camera-ground-truth-training.md— the training pipeline behindcog-pose-estimation.CLAUDE.local.md§ "Fleet Infrastructure (Tailscale)" — appliance layout this ADR describes.