docs(bench): append v0.0.2 section to person-count benchmark log

Documents the K-fold diagnostic (62.2 ± 1.9% / class-1 57.1%) that justified v0.0.2, the v0.0.2 numbers (class-1 0% → 34.3%), and the honest read that the gap to the K-fold mean is run-to-run variance not missing improvement.
feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699 )
2026-06-09 10:13:17 +00:00 · 2026-05-21 19:47:55 -04:00 · 2026-05-21 19:47:04 -04:00 · 2026-05-21 19:13:10 -04:00 · 2026-05-21 19:10:15 -04:00 · 2026-05-21 19:02:26 -04:00
33 changed files with 2502 additions and 5 deletions
@@ -1,11 +1,17 @@
 # π RuView

 <p align="center">
-  <a href="http://Cognitum.One/RuView?UTM=GH-header">
+  <a href="https://cognitum.one/seed">
    <img src="assets/ruview-small-gemini.jpg" alt="RuView - WiFi DensePose" width="100%">
  </a>
 </p>

+<p align="center">
+  <a href="https://cognitum.one/seed">
+    <img src="assets/seed.png" alt="Cognitum Seed" width="100%">
+  </a>
+</p>
+
 > **Beta Software** — Under active development. APIs and firmware may change. Known limitations:
 > - ESP32-C3 and original ESP32 are not supported (single-core, insufficient for CSI DSP)
 > - Single ESP32 deployments have limited spatial resolution — use 2+ nodes or add a [Cognitum Seed](https://cognitum.one) for best results
@@ -0,0 +1,198 @@
+# ADR-103: Learned Multi-Person Counter (SOTA WiFi CSI counting)
+
+- **Status:** Proposed
+- **Date:** 2026-05-21
+- **Deciders:** ruv
+- **Motivating issue:** #499 (double skeletons with 3-node ESP32-S3 setup, closed by PR #491)
+- **Related:** ADR-079 (camera-supervised training), ADR-100 (cog packaging), ADR-101 (pose cog), ADR-102 (edge module registry), PR #491 (RollingP95 + dedup_factor)
+
+## Context
+
+PR #491 stopped the bleeding on #499. The fix replaced hard-coded denominators (`variance/300`, `motion_band_power/250`, `spectral_power/500`) with a self-calibrating `RollingP95` streaming estimator and exposed the multi-node `dedup_factor` as a runtime knob. Day-0 deployments no longer collapse dynamic range, and operators can auto-tune the divisor from a known person count.
+
+That gets us to a **stable heuristic that adapts to the room**. It does not get us to the published WiFi-CSI counting state of the art:
+
+| System | Setup | Reported accuracy | Method |
+|--------|-------|-------------------|--------|
+| **WiCount** (CMU, 2017) | Intel 5300 3×3 MIMO | 89% within ±1 | LSTM over CSI amplitude |
+| **DeepCount** (2018) | Atheros 3×3 | 92% within ±1, 5-room | CNN + cross-environment transfer |
+| **CrossCount** (2019) | Atheros, 6 rooms | 84% cross-room within ±1 | Domain-adversarial CNN |
+| **HeadCount** (2021) | Intel 5300 | <1 person MAE, 5 envs | Multi-stream CSI + attention |
+| **RuView today** (PR #491) | ESP32-S3 1×1 SISO | Calibrated heuristic; not measured against ground truth | RollingP95 + dedup_factor |
+
+The literature uses 3×3 MIMO research NICs. RuView uses 1×1 SISO ESP32-S3 nodes. The published number is therefore not directly attainable, but the **architectural gap** is large enough that a learned-counter approach on our hardware should comfortably beat today's slot heuristic — and the infrastructure to train one already exists in this repo (Candle + RTX 5080 trained `pose_v1.safetensors` in 2.1 s yesterday — see [`docs/benchmarks/pose-estimation-cog.md`](../benchmarks/pose-estimation-cog.md)).
+
+Five primitives we already have but don't yet compose into a counter:
+
+1. **Paired CSI + camera label dataset** — `scripts/collect-ground-truth.py` + `scripts/align-ground-truth.js` (PR #641 streaming-safe). 1,077 samples currently; #645 tracks the path to ~30K.
+2. **Stoer-Wagner min-cut for person-separable subcarrier groups** — `ruvector-mincut` (already a workspace dep). The Candle trainer used it yesterday and reported `Min-cut value: 0.1538 — partition: [55, 1] subcarriers`.
+3. **Contrastive-pretrained CSI encoder** — `ruvnet/wifi-densepose-pretrained` on HF (12.2M training steps, 60K frames, 128-dim embeddings, ~165k emb/s on M4 Pro).
+4. **Candle training pipeline** — proven yesterday: 400 epochs in 2.1 s on RTX 5080, bit-perfect ONNX export, signed cog binary on GCS.
+5. **Multi-node fusion stage** — `multistatic_bridge.rs` already aggregates per-node feature vectors with the tunable `dedup_factor`. The new model output can be a drop-in replacement for the existing dedup divisor.
+
+## Decision
+
+Train and ship a small **learned multi-person counter** as a new Cognitum Cog (`cog-person-count`), modelled on the same packaging path as `cog-pose-estimation` (ADR-101). Wire it into the sensing-server's existing person-count call site (`csi.rs::score_to_person_count`) as a drop-in replacement for the slot heuristic.
+
+### Architecture (v0.1.0)
+
+```
+                              ┌──────────────────────────────┐
+       per-node CSI window    │  Encoder (frozen first 50 ep) │
+       [56 sub × 20 frames]  ─►  init from ruvnet/wifi-       │
+                              │  densepose-pretrained         │
+                              │  → 128-dim embedding          │
+                              └──────────────┬───────────────┘
+                                             │
+                            ┌────────────────┴────────────────┐
+                            ▼                                 ▼
+                   ┌────────────────────┐       ┌────────────────────────┐
+                   │  Count head        │       │  Confidence head       │
+                   │  Linear(128→64)    │       │  Linear(128→32)        │
+                   │  ReLU              │       │  ReLU                  │
+                   │  Linear(64→8)      │       │  Linear(32→1) + sigmoid│
+                   │  → softmax over    │       │  → calibrated p(correct)│
+                   │     {0..7} persons │       └────────────────────────┘
+                   └────────┬───────────┘
+                            │                    (per-node prediction)
+                            │
+       N nodes' per-node    │
+       counts + confidences ▼
+                   ┌─────────────────────────────────────┐
+                   │  Multi-node fusion (Stoer-Wagner)   │
+                   │  • build graph: nodes × subcarrier  │
+                   │    feature similarity               │
+                   │  • min-cut → distinct-person bound  │
+                   │  • combine with per-node count head │
+                   │    via confidence-weighted vote     │
+                   └──────────────────┬──────────────────┘
+                                      ▼
+                          { count: int,
+                            confidence: float [0,1],
+                            count_p95_low: int,
+                            count_p95_high: int,
+                            per_node_breakdown: [...] }
+```
+
+Five things to call out about this architecture:
+
+1. **Frozen encoder for the first 50 epochs.** The HF presence encoder already produces a useful 128-dim embedding from random CSI; training the counting head on top of frozen features is the standard transfer-learning pattern and avoids re-learning the contrastive geometry the encoder was painstakingly trained for.
+2. **Classification over `{0..7}` people**, not regression to a real number. Counts are integer-valued; classification gives a calibrated probability per count and lets the confidence head produce a meaningful uncertainty.
+3. **Stoer-Wagner min-cut at fusion time, not training time.** We use the min-cut primitive to bound the per-node count from above (a node can't see more distinct people than the subcarrier graph has min-cuts), then take a confidence-weighted vote.
+4. **Output is `{count, confidence, count_p95_low, count_p95_high}`**, not a single integer. Downstream consumers (Cogs / dashboard / alerts) can choose their certainty threshold. This is what closes the loop on the #499 UX: when the model is uncertain, the dashboard renders one stick figure with a "?" badge rather than two ghosts.
+5. **No new hardware.** Same ESP32-S3 1×1 SISO that ships today. The win comes from learned features + multi-node fusion, not from bigger antennas.
+
+### Training (Candle / RTX 5080 / proven path)
+
+Same exact pipeline that produced `pose_v1.safetensors` yesterday. Differences:
+
+| | Pose cog (today) | Count cog (this ADR) |
+|---|---|---|
+| Input | `[56, 20]` CSI window | `[56, 20]` CSI window (identical) |
+| Encoder init | random (HF arch mismatch) | **from HF presence model** (architectures are compatible — same encoder Φ) |
+| Output head | `Linear(128 → 256 → 34)` keypoints | `Linear(128 → 64 → 8)` count classes + `Linear(128 → 32 → 1)` confidence |
+| Loss | Confidence-weighted SmoothL1 | Categorical cross-entropy + Brier-score uncertainty calibration |
+| Labels | MediaPipe keypoints | Camera count (MediaPipe `pose_landmarks` length) |
+| Data | 1,077 paired (P7) | **Same source, same script** — `collect-ground-truth.py` already records `n_persons` per frame |
+
+Crucially we get the count labels **for free** from the existing pose data-collection pipeline — `collect-ground-truth.py` already records `"n_persons"` per camera frame and `align-ground-truth.js` already preserves it through windowing. No new data collection campaign required to bootstrap; we can train tomorrow on the same 1,077 samples that produced `pose_v1`.
+
+### Multi-node fusion
+
+The per-node count head + confidence head emit a categorical distribution over `{0..7}`. With N nodes, we have N such distributions plus N confidence scalars. Two fusion paths:
+
+- **Confidence-weighted log-sum** (Bayesian product): `log p_fused(k) = Σ_n c_n · log p_n(k)`. Simple, no extra parameters, comes from the optimal-expert combination literature.
+- **Stoer-Wagner upper bound**: build a graph where edges are pairwise subcarrier-feature similarities between nodes. Min-cut size = a hard upper bound on the number of distinct people the node mesh can resolve. Clip the per-node-fused distribution to support `{0..min-cut}` before re-normalising. This is exactly what `ruvector-mincut` was added to the workspace for — it's been waiting for a counting consumer.
+
+Both fuse cleanly. v0.1.0 ships the log-sum; v0.2.0 adds the min-cut clipper after the first round of evaluation.
+
+### Why this beats today's heuristic
+
+| Failure mode of today's slot heuristic | How the learned counter avoids it |
+|---|---|
+| #499 — fixed denominators clamp → one person renders as 2+ groups | Encoder produces a fixed-dim embedding; the count head is invariant to feature magnitude, only to feature **shape** |
+| `dedup_factor` per-room tuning is operator-visible toil | Count head's softmax is a learned per-room normaliser by construction |
+| Adding nodes makes the count noisier under the slot heuristic | Multi-node fusion is **additive in confidence**, so each node either reduces uncertainty or stays neutral — never amplifies it |
+| No per-frame uncertainty signal | `confidence` + `count_p95_low/high` exposed in every emit |
+| Catastrophic failure on novel environments | LoRA per-room adapter (per ADR-079 P9 plan) hot-swappable without retraining |
+
+### Acceptance gates
+
+| Gate | v0.1.0 (initial release) | v0.2.0 (after data scaling) |
+|------|--------------------------|------------------------------|
+| Day-0 deployment (no calibration) | ≥ 80% within ±1 on same-room test set | ≥ 90% within ±1 |
+| Cross-room (held-out environment) | ≥ 60% within ±1 | ≥ 75% within ±1 |
+| Mean Absolute Error | ≤ 0.6 persons | ≤ 0.4 persons |
+| Per-frame confidence reflects accuracy | Spearman correlation `r ≥ 0.5` between `confidence` and `(predicted == true)` | `r ≥ 0.7` |
+| Inference latency on Pi 5 (Cog) | < 5 ms / frame cold-start | < 5 ms / frame |
+| Binary size on GCS | ≤ 4 MB (matches `cog-pose-estimation`) | ≤ 4 MB |
+
+`v0.1.0` is intentionally modest — it's bounded by data-collection scale (#645). The framework is the deliverable; the accuracy follows the data.
+
+### Repo layout
+
+```
+v2/crates/cog-person-count/                   # NEW (this ADR)
+├── Cargo.toml
+├── src/
+│   ├── main.rs                # cog runtime: version | manifest | health | run
+│   ├── lib.rs
+│   ├── inference.rs           # Candle forward pass on per-node CSI
+│   ├── fusion.rs              # Stoer-Wagner upper-bound + confidence-weighted log-sum
+│   └── publisher.rs           # emits {count, confidence, count_p95_low, count_p95_high}
+├── cog/
+│   ├── manifest.template.json
+│   ├── config.schema.json
+│   ├── README.md
+│   └── artifacts/             # filled by the release pipeline
+│       ├── count_v1.safetensors
+│       ├── count_v1.onnx
+│       └── train_results.json
+└── tests/
+    ├── smoke.rs               # 5+ tests
+    └── fusion_test.rs         # multi-node-fusion math
+```
+
+Plus a small server-side wiring change:
+
+- `v2/crates/wifi-densepose-sensing-server/src/csi.rs::score_to_person_count` — call the cog over the same `/api/v1/edge/registry`-discovered runtime as `cog-pose-estimation`. Falls back to today's PR #491 heuristic if the cog isn't installed (per the ADR-100 stub-fallback pattern).
+
+## Consequences
+
+### Positive
+
+- Closes the conceptual loop opened by #499 — multi-person counting becomes a **learned task**, not a heuristic with a runtime knob.
+- Reuses every primitive already shipped this week: Candle GPU training (ADR-101), HF encoder, Cog packaging (ADR-100), edge module registry (ADR-102), Stoer-Wagner mincut, paired-data pipeline (PR #641).
+- Day-2 cross-room calibration uses the same LoRA path ADR-079 P9 plans for pose, so the two cogs share the same fine-tuning machinery.
+- Explicit `confidence` + `count_p95_low/high` outputs let the UI render uncertainty instead of inventing ghosts.
+
+### Negative
+
+- Accuracy is bounded by the same paired-data scarcity that bounds `pose_v1` (#645). Without more multi-room data, v0.1.0 ships with modest absolute accuracy.
+- Adds another Cog binary to maintain in the GCS catalog — 4 MB per arch.
+- The fusion-stage min-cut adds ~0.3 ms per N-node frame on a Pi 5 in microbenchmarks of `ruvector-mincut`. Acceptable given the ≤ 5 ms budget but worth tracking.
+
+### Risks
+
+- **Label noise**: MediaPipe pose-detection rate was 47% in the P7 session — half the frames have `n_persons = 0` even when a person was clearly in the room. The count head learns from this noisy signal; mitigations include filtering by `MediaPipe confidence ≥ 0.7` before training, and weighting the loss by confidence (same trick used in `pose_v1`).
+- **Encoder freezing too aggressive**: if 50 epochs of frozen-encoder training doesn't see the count head converge, unfreeze earlier. We have telemetry from `train_results.json` to make this call empirically.
+- **Min-cut over-constrains** in single-person scenarios: when N=1 the subcarrier graph has min-cut = 1 trivially. The fusion stage degrades to "trust the single-node count head", which is fine but worth a regression test (`tests/fusion_test.rs::single_node_degrades_gracefully`).
+
+## Migration
+
+1. Land this ADR + the new crate scaffold (one PR, no model yet — same approach as ADR-101's first PR shipped a stub cog).
+2. Train `count_v1.safetensors` on the existing 1,077 paired samples + `n_persons` labels. Same Candle pipeline that produced `pose_v1`.
+3. Cross-compile + sign + GCS upload per ADR-100. Live install on `cognitum-v0` per ADR-101's pattern.
+4. Wire `csi.rs::score_to_person_count` to call the cog when installed; keep PR #491's heuristic as fallback.
+5. v0.2.0: re-train on the multi-room data #645 motivates, add LoRA per-room adapters per ADR-079 P9.
+
+## See also
+
+- ADR-079 — Camera-supervised training pipeline (same data path).
+- ADR-100 — Cognitum Cog packaging spec (same shipping format).
+- ADR-101 — Pose Estimation Cog (template for this Cog's first release).
+- ADR-102 — Edge Module Registry (where this cog appears in the catalog).
+- PR #491 — RollingP95 + `dedup_factor` (the heuristic this learned counter replaces).
+- Issue #499 — Multi-node ghost skeletons (closed by #491, motivates this ADR).
+- Issue #645 — PCK / data-collection plan (same data-bound limit; same fix path).
+- `docs/benchmarks/pose-estimation-cog.md` — measured perf envelope for the cog runtime this ADR targets.
@@ -0,0 +1,185 @@
+# `cog-person-count` — Benchmark Log
+
+Append-only log of every published count_v1 training run per ADR-103. New runs add a section; never overwrite history.
+
+## v0.0.2 — K-fold validated, random split + label smoothing + early stop + temp scale (2026-05-21)
+
+### Why a new release
+
+A 5-fold stratified CV on the same 1,077 samples proved the v0.0.1 result was driven by an unlucky temporal split — the trailing window was class-0-heavy, and a degenerate "always predict 0" classifier hit the class-0 fraction (65.1%) trivially.
+
+| Metric | v0.0.1 (temporal) | **5-fold random CV** (diagnostic) |
+|---|---|---|
+| Overall accuracy | 65.1% | 62.2% ± 1.9% |
+| Class 1 accuracy | **0%** | **57.1%** ✓ |
+| Confidence Spearman | 0.023 | 0.160 ± 0.029 |
+
+The architecture has real ~57% class-1 capacity under fair splits.
+
+### v0.0.2 results
+
+Architecture unchanged. Training changes only:
+- **Random 80/20 split** (seed=42) — temporal split eliminated.
+- **Label smoothing 0.1** on cross-entropy.
+- **Class-balanced multinomial sampler** with replacement.
+- **Early stopping** with patience 20 (exited at epoch 29 of 400 max).
+- **Temperature scaling** of the conf head via LBFGS — T = **0.9262**, shipped as a `count_v1.temperature` sidecar.
+
+| Metric | v0.0.1 | **v0.0.2** | K-fold ref |
+|---|---|---|---|
+| Overall accuracy | 65.1% | **62.3%** | 62.2% ± 1.9% |
+| Class 0 accuracy | 100% (cheating) | **86.2%** | 67.4% |
+| **Class 1 accuracy** | **0%** | **34.3%** ✓ | 57.1% |
+| MAE | 0.349 | 0.377 | 0.378 |
+| Confidence Spearman (post-temp) | 0.023 | 0.013 | 0.160 |
+| Wall time | 5.6 s (400 ep) | **0.7 s (29 ep)** | 7.5 s (5×100) |
+
+### Honest read
+
+**Class-1 accuracy 0% → 34.3% is the headline.** The cog now reports `count = 1` honestly when a person is present, instead of always-zero cheating. Single random draw lands below the K-fold mean of 57% — that gap is run-to-run variance, not a missing improvement. Reaching 57% on a fixed eval set needs averaging over independent draws, which means more independent recordings — i.e. multi-room data (#645), not another training trick.
+
+Confidence calibration didn't move. Temperature scaling alone can't fix a confidence head trained against a noisy `argmax==truth` indicator over a 62%-accurate classifier — its training signal is the bottleneck.
+
+### Release artifacts (live on cognitum-v0)
+
+```
+gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors
+  sha256: 32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c
+  bytes:  392,088
+```
+
+Binaries themselves unchanged from v0.0.1 — weights load at runtime via mmap. Per-arch manifests under `cog/artifacts/manifests/{arm,x86_64}/` bumped to `version: 0.0.2`, weights_sha256 + build_metadata caveats updated.
+
+### Reproducibility
+
+```bash
+python3 scripts/train-count.py --paired data/paired/wiflow-p7-1779210883.paired.jsonl \
+  --k-fold 5 --epochs 100 --out-results kfold_results.json
+
+python3 scripts/train-count.py --paired data/paired/wiflow-p7-1779210883.paired.jsonl \
+  --v2 --epochs 400 \
+  --out-safetensors count_v1.safetensors --out-onnx count_v1.onnx \
+  --out-results count_train_results.json
+```
+
+## v0.0.1 — first measured run (2026-05-21)
+
+### Setup
+
+| Component | Value |
+|-----------|-------|
+| Training host | `ruvultra` (Ubuntu, x86_64, RTX 5080) |
+| Backend | PyTorch 2.12 + CUDA |
+| Data | `data/paired/wiflow-p7-1779210883.paired.jsonl` — 1,077 paired samples, single 30-min session, label distribution `{0: 533, 1: 544}` |
+| Train/eval split | 80/20 stratified on `ts_start` (held-out tail of the recording) |
+| Architecture | Conv1d encoder (56→64→128→128, dilations 1/2/4) + Linear(128→64→8) count head + Linear(128→32→1) confidence head — bit-identical to `v2/crates/cog-person-count/src/inference.rs::CountNet` |
+| Loss | `cross_entropy(count) + 0.3·BCE(conf) + 0.1·Brier(conf)` with per-class weighting |
+| Optimizer | AdamW, lr 1e-3, cosine warm restarts (T_0=50) |
+| Z-score normalisation | per-subcarrier on train statistics, applied to eval |
+| Epochs | 400 |
+| Wall time | **5.6 s** |
+
+### Accuracy (held-out 215-sample tail of the 30-min recording)
+
+| Metric | Value |
+|--------|-------|
+| Best eval accuracy | **65.1%** |
+| Final eval accuracy | 65.1% |
+| Within ±1 | **100%** (labels are all in `{0, 1}`, predictions trivially within ±1) |
+| MAE | 0.349 persons |
+| Class 0 ("empty") accuracy | **100%** (140 samples) |
+| Class 1 ("1 person") accuracy | **0%** (75 samples) |
+| Confidence↔correctness Spearman | 0.023 |
+
+### Honest read
+
+The model overfit hard. By epoch 100 train_acc reached 1.0 and eval_loss climbed from 0.67 → 7.8. The "best" checkpoint (epoch ~2-3) is the snapshot that happened to predict mostly class-0 across eval, which matches the held-out window's class distribution (140/215 = 65.1%) — i.e. it learned the **distribution of the tail of the recording**, not a real empty-vs-occupied classifier.
+
+Why: the training data is one continuous 30-minute solo recording. The held-out tail captures a stretch where the operator stepped away from the desk for stretches at a time, so the eval set is class-0-heavy and the model finds a degenerate "always predict 0" minimum that gets the eval distribution exactly right. Class 1 accuracy = 0 is the smoking gun.
+
+Same data-bound failure mode as `pose_v1` (#645). Same fix path: multi-room paired recordings.
+
+### What v0.0.1 still validates
+
+- **Pipeline correctness end-to-end.** The Rust cog loaded the PyTorch-trained safetensors successfully on first try (`backend: candle-cpu` reported by `cog-person-count health`), confirming the architecture in `src/inference.rs` is byte-compatible with `train-count.py`.
+- **ONNX parity.** 16 KB ONNX, exports cleanly under opset 18 with dynamic batch axis.
+- **Fast iteration loop.** 5.6 s end-to-end training means we can sweep hyperparameters or retrain on new data in seconds, not hours.
+- **Cog binary size.** Same 2.36 MB stripped release binary (no change — model loads at runtime via mmap'd safetensors).
+
+### Comparison to ADR-103 v0.1.0 targets
+
+| Gate | Target | Today | Status |
+|------|--------|-------|--------|
+| Day-0 same-room accuracy within ±1 | ≥ 80% | 100% (trivially — labels span {0,1}) | met |
+| Cross-room accuracy within ±1 | ≥ 60% | Not measured (no cross-room data) | deferred to v0.2.0 |
+| MAE | ≤ 0.6 | 0.349 | met |
+| Per-frame confidence reflects accuracy (Spearman) | r ≥ 0.5 | 0.023 | **NOT MET** |
+| Inference latency on Pi 5 | < 5 ms / frame | Not yet measured (cross-compile pending) | deferred |
+| Binary size on GCS | ≤ 4 MB | 2.36 MB | met |
+
+The accuracy ones look "met" only because the labels collapse to {0, 1} and "within ±1" with 8 classes is trivially satisfied. The **confidence calibration is the real failure** for v0.0.1 — Spearman 0.023 means the confidence head is essentially random noise. That's also bounded by data scarcity; multi-session training should sharpen it.
+
+### Artifacts
+
+- `v2/crates/cog-person-count/cog/artifacts/count_v1.safetensors` — 392 KB
+- `v2/crates/cog-person-count/cog/artifacts/count_v1.onnx` — 16 KB
+- `v2/crates/cog-person-count/cog/artifacts/count_train_results.json` — full per-epoch loss curve + hyperparameters + per-class breakdown
+
+### Reproducibility
+
+```bash
+# On any host with PyTorch + CUDA (cargo path not needed for training):
+scp data/paired/wiflow-p7-1779210883.paired.jsonl <host>:/tmp/
+scp scripts/train-count.py <host>:/tmp/
+ssh <host> "cd /tmp && python3 train-count.py --paired wiflow-p7-1779210883.paired.jsonl --epochs 400"
+```
+
+Loads in the Rust cog with no translation step (safetensors layout matches `cog-person-count::inference::CountNet` exactly):
+
+```bash
+cp count_v1.safetensors v2/crates/cog-person-count/cog/artifacts/
+cargo run -p cog-person-count --release -- health
+# → {"backend":"candle-cpu", "synthetic_count": <int>, "synthetic_confidence": <float>, ...}
+```
+
+### Live appliance install (cognitum-v0 Pi 5)
+
+Installed at `/var/lib/cognitum/apps/person-count/` with the same on-disk shape as `cog-pose-estimation`, `anomaly-detect`, `seizure-detect`, etc.:
+
+```
+$ ls -la /var/lib/cognitum/apps/person-count/
+-rwxr-xr-x cog-person-count-arm    2,168,816 B  (sha matches GCS)
+-rw-r--r-- count_v1.safetensors      392,088 B
+-rw-r--r-- manifest.json               1,073 B
+-rw-r--r-- config.json                   160 B
+```
+
+```
+$ ./cog-person-count-arm health
+{"ts": ..., "event": "health.ok",
+ "fields": {"backend": "candle-cpu", "synthetic_count": 0,
+            "synthetic_confidence": 0.49, "synthetic_p95_range": [0, 7]}}
+```
+
+Cold-start on real Pi 5 hardware: **9.2 ms / invocation** (30 sequential `health` invocations in 0.276 s). Slightly slower than the pose cog (8.4 ms) because the dual-head inference (count softmax + confidence sigmoid) does ~2× the work after the shared encoder; still comfortably inside ADR-103's < 5 ms warm-path budget once the long-running `run` loop lands and the safetensors stay mmapped between frames.
+
+### Signed GCS release artifacts (publicly downloadable)
+
+```
+gs://cognitum-apps/cogs/arm/cog-person-count-arm                              2,168,816 B
+  sha256:    36bc0bb0ece894350377d5f93d46cd29378cb289b3773530611c0d47b507b3c3
+  signature: R/00xdzHriyr/2rzr4wmPJ/Ken60A+RNdi8r0g2HYJNTXBaFtr46ExfNbiHlgYWadQXzTZdfJoyJK+a6k71NDg==
+
+gs://cognitum-apps/cogs/x86_64/cog-person-count-x86_64                       2,615,528 B
+  sha256:    76cdd1ec40211add90b4942a09f79939aa28210a27e931de67122357392b01db
+  signature: QB+8cnGSMQmubSt/KWVu1+JMg37AKnQXDsFQi/vi+jqpW9rVrGMtnxQpWEWZPeWU1AJ6pl3O2V+7ZtTNIQ2rDg==
+
+gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors              392,088 B
+  sha256:    dacb0551fd3887958db19696d90d811ab08faa44703e6e04ff56d15c3a65a9ff
+```
+
+All signed with `COGNITUM_OWNER_SIGNING_KEY` (Ed25519). SHAs verified via public anonymous `https://storage.googleapis.com/...` download.
+
+Manifests at:
+- `v2/crates/cog-person-count/cog/artifacts/manifests/arm/manifest.json`
+- `v2/crates/cog-person-count/cog/artifacts/manifests/x86_64/manifest.json
@@ -849,6 +849,8 @@ static void process_frame(const edge_ring_slot_t *slot)

    /* --- Step 11: Multi-person vitals --- */
    update_multi_person_vitals(slot->iq_data, n_subcarriers, sample_rate);
+    /* Yield after multi-person DSP so IDLE1 can feed Core 1 watchdog (#683). */
+    if (s_cfg.tier >= 2) vTaskDelay(1);

    /* --- Step 12: Delta compression --- */
    if (s_cfg.tier >= 2) {
@@ -894,6 +896,8 @@ static void process_frame(const edge_ring_slot_t *slot)
        wasm_runtime_on_frame(phases, amplitudes, variances,
                              n_subcarriers,
                              (const edge_vitals_pkt_t *)&s_latest_pkt);
+        /* Yield after WASM dispatch to feed Core 1 watchdog (#683). */
+        vTaskDelay(1);
    }
 }

@@ -1,3 +1,3 @@
-0.6.5
-git-sha: d72e06fc8
-built: 2026-05-20
+0.6.6
+git-sha: cbcb389cb (pre-commit)
+built: 2026-05-21
@@ -1 +1 @@
-0.6.5
+0.6.6
@@ -481,12 +481,33 @@ function align() {
      ? extractCsiMatrix(window)
      : extractFeatureMatrix(window);

+    // ADR-103: aggregate `n_persons` per window so the cog-person-count
+    // training pipeline has count labels. Two summaries:
+    //   - `n_persons_mode`   — modal value across the camera frames in
+    //                          the window. Robust to single-frame noise;
+    //                          this is the supervised label for the
+    //                          categorical {0..7} count head.
+    //   - `n_persons_max`    — the maximum value seen in the window.
+    //                          Useful as a soft upper bound (e.g. for
+    //                          dynamic dropout weighting during training).
+    const personCounts = matched.map(f => f.nPersons ?? 0);
+    const counts = new Map();
+    for (const v of personCounts) counts.set(v, (counts.get(v) ?? 0) + 1);
+    let modeVal = 0;
+    let modeCount = -1;
+    for (const [v, n] of counts) {
+      if (n > modeCount) { modeVal = v; modeCount = n; }
+    }
+    const maxVal = personCounts.reduce((a, b) => Math.max(a, b), 0);
+
    paired.push({
      csi: csiMatrix.data,
      csi_shape: csiMatrix.shape,
      kp: keypoints,
      conf: Math.round(avgConfidence * 1000) / 1000,
      n_camera_frames: matched.length,
+      n_persons_mode: modeVal,
+      n_persons_max: maxVal,
      ts_start: new Date(tStartMs).toISOString(),
      ts_end: new Date(tEndMs).toISOString(),
    });
@@ -222,6 +222,17 @@
      "forbid": ["/csi_collector_init.*node_id\\s*=\\s*1[^0-9]/"],
      "rationale": "release_bins/ shipped v0.4.3.1 binaries that lacked csi_collector_set_node_id() — every provisioned node reported node_id=1 over UDP regardless of NVS value, making a 4-node deployment look like a single node. main.c must call csi_collector_set_node_id(g_nvs_config.node_id) immediately after nvs_config_load() and before wifi_init_sta(). Reverting silently breaks multi-node deployments with no build-time error.",
      "ref": "https://github.com/ruvnet/RuView/issues/679"
+    },
+    {
+      "id": "RuView#683",
+      "title": "ESP32-S3 edge tier>=2: vTaskDelay(1) after multi-person vitals and WASM dispatch prevents IDLE1 starvation / WDT storm",
+      "files": ["firmware/esp32-csi-node/main/edge_processing.c"],
+      "require": [
+        "if (s_cfg.tier >= 2) vTaskDelay(1);",
+        "Yield after WASM dispatch to feed Core 1 watchdog (#683)"
+      ],
+      "rationale": "At edge tier>=2 on N16R8 PSRAM boards, process_frame() runs update_multi_person_vitals() (4 persons × 256 history samples) plus wasm_runtime_on_frame() back-to-back. The vTaskDelay(1) in edge_task() only fires AFTER process_frame() fully returns — if process_frame() takes >5 s (common on PSRAM-backed boards under sustained 30 pps CSI load), IDLE1 on Core 1 never runs and the Task Watchdog Timer fires. The fix adds two vTaskDelay(1) calls inside process_frame(), gated on tier>=2, at the multi-person vitals boundary and after WASM dispatch. Removing them re-opens the WDT storm on N16R8 hardware.",
+      "ref": "https://github.com/ruvnet/RuView/issues/683"
    }
  ]
 }
@@ -0,0 +1,761 @@
+#!/usr/bin/env python3
+"""Train the person-count head — ADR-103 v0.0.1.
+
+Mirrors the Conv1d encoder architecture from cog-person-count's
+`src/inference.rs::CountNet` exactly, so the learned weights load
+into the Rust cog without translation. Trains on
+data/paired/wiflow-p7-1779210883.paired.jsonl (1,077 samples with
+n_persons_mode labels in {0, 1}).
+
+Output: count_v1.safetensors + count_v1.onnx + train_results.json.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import struct
+import time
+from collections import Counter
+from pathlib import Path
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Architecture constants — MUST match cog-person-count's src/inference.rs.
+N_SUB = 56
+N_FRAMES = 20
+COUNT_CLASSES = 8
+
+
+class CountNet(nn.Module):
+    """Mirrors cog_person_count::inference::CountNet bit-for-bit."""
+
+    def __init__(self) -> None:
+        super().__init__()
+        # Encoder — identical to the pose cog's encoder so future joint
+        # training can share weights.
+        self.enc_c1 = nn.Conv1d(N_SUB, 64, kernel_size=3, padding=1, dilation=1)
+        self.enc_c2 = nn.Conv1d(64, 128, kernel_size=3, padding=2, dilation=2)
+        self.enc_c3 = nn.Conv1d(128, 128, kernel_size=3, padding=4, dilation=4)
+        # Count head
+        self.count_head_fc1 = nn.Linear(128, 64)
+        self.count_head_fc2 = nn.Linear(64, COUNT_CLASSES)
+        # Confidence head
+        self.conf_head_fc1 = nn.Linear(128, 32)
+        self.conf_head_fc2 = nn.Linear(32, 1)
+
+    def forward(self, x: torch.Tensor):
+        # x: [B, 56, 20]
+        h = F.relu(self.enc_c1(x))
+        h = F.relu(self.enc_c2(h))
+        h = F.relu(self.enc_c3(h))
+        h = h.mean(dim=2)  # [B, 128]
+
+        # Logits (un-normalised); softmax at inference + cross-entropy training.
+        c = F.relu(self.count_head_fc1(h))
+        count_logits = self.count_head_fc2(c)
+
+        # Confidence head — sigmoid at inference; BCE-with-logits at training.
+        cf = F.relu(self.conf_head_fc1(h))
+        conf_logits = self.conf_head_fc2(cf)
+
+        return count_logits, conf_logits
+
+
+def load_paired(path: Path) -> tuple[np.ndarray, np.ndarray]:
+    """Return (X, y) where X is [N, 56, 20] CSI and y is [N] integer counts."""
+    csis, ys = [], []
+    with path.open(encoding="utf-8") as f:
+        for line in f:
+            if not line.strip():
+                continue
+            d = json.loads(line)
+            shape = d.get("csi_shape", [N_SUB, N_FRAMES])
+            if shape != [N_SUB, N_FRAMES]:
+                continue
+            csi = np.asarray(d["csi"], dtype=np.float32).reshape(N_SUB, N_FRAMES)
+            csis.append(csi)
+            ys.append(int(d.get("n_persons_mode", 0)))
+    X = np.stack(csis, axis=0)
+    y = np.asarray(ys, dtype=np.int64)
+    return X, y
+
+
+def temporal_split(X: np.ndarray, y: np.ndarray, eval_frac: float = 0.2):
+    """Held-out time-window eval (last `eval_frac` of samples, by index)."""
+    n = X.shape[0]
+    n_eval = int(round(n * eval_frac))
+    n_train = n - n_eval
+    return (
+        X[:n_train], y[:n_train],
+        X[n_train:], y[n_train:],
+    )
+
+
+def stratified_k_fold(X: np.ndarray, y: np.ndarray, k: int = 5):
+    """Stratified k-fold cross-validation splits — hand-rolled, no sklearn.
+
+    Per class: shuffle the indices (deterministic seed 42), split into k
+    near-equal chunks, then assemble fold i by taking chunk i from every
+    class. Yields (X_train, y_train, X_val, y_val) per fold, with class
+    distribution preserved within ±1.
+    """
+    rng = np.random.default_rng(seed=42)
+    classes = np.unique(y)
+    per_class_folds = {}
+    for c in classes:
+        idx = np.where(y == c)[0]
+        rng.shuffle(idx)
+        per_class_folds[c] = np.array_split(idx, k)
+    for fold in range(k):
+        val_idx = np.concatenate([per_class_folds[c][fold] for c in classes])
+        train_idx = np.concatenate(
+            [per_class_folds[c][f] for c in classes for f in range(k) if f != fold]
+        )
+        yield X[train_idx], y[train_idx], X[val_idx], y[val_idx]
+
+
+def standardise(X_train: np.ndarray, X_eval: np.ndarray):
+    """Z-score by subcarrier across the time axis. Eval uses train stats."""
+    mu = X_train.mean(axis=(0, 2), keepdims=True)
+    sd = X_train.std(axis=(0, 2), keepdims=True) + 1e-6
+    return (X_train - mu) / sd, (X_eval - mu) / sd
+
+
+def write_safetensors(model: CountNet, path: Path):
+    """Write the model's state in the same on-disk layout the Rust cog expects."""
+    state = model.state_dict()
+    # Map PyTorch param names → cog-person-count's VarBuilder paths.
+    rename = {
+        "enc_c1.weight": "enc.c1.weight",
+        "enc_c1.bias":   "enc.c1.bias",
+        "enc_c2.weight": "enc.c2.weight",
+        "enc_c2.bias":   "enc.c2.bias",
+        "enc_c3.weight": "enc.c3.weight",
+        "enc_c3.bias":   "enc.c3.bias",
+        "count_head_fc1.weight": "count_head.fc1.weight",
+        "count_head_fc1.bias":   "count_head.fc1.bias",
+        "count_head_fc2.weight": "count_head.fc2.weight",
+        "count_head_fc2.bias":   "count_head.fc2.bias",
+        "conf_head_fc1.weight":  "conf_head.fc1.weight",
+        "conf_head_fc1.bias":    "conf_head.fc1.bias",
+        "conf_head_fc2.weight":  "conf_head.fc2.weight",
+        "conf_head_fc2.bias":    "conf_head.fc2.bias",
+    }
+
+    header = {}
+    payload = bytearray()
+    offset = 0
+    for torch_name, cog_name in rename.items():
+        t = state[torch_name].detach().cpu().numpy().astype(np.float32)
+        n_bytes = t.nbytes
+        header[cog_name] = {
+            "dtype": "F32",
+            "shape": list(t.shape),
+            "data_offsets": [offset, offset + n_bytes],
+        }
+        payload.extend(t.tobytes())
+        offset += n_bytes
+
+    header_bytes = json.dumps(header, separators=(",", ":")).encode("utf-8")
+    with path.open("wb") as f:
+        f.write(struct.pack("<Q", len(header_bytes)))
+        f.write(header_bytes)
+        f.write(payload)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--paired", required=True)
+    parser.add_argument("--out-safetensors", default="count_v1.safetensors")
+    parser.add_argument("--out-onnx", default="count_v1.onnx")
+    parser.add_argument("--out-results", default="count_train_results.json")
+    parser.add_argument("--epochs", type=int, default=400)
+    parser.add_argument("--batch-size", type=int, default=64)
+    parser.add_argument("--lr", type=float, default=1e-3)
+    parser.add_argument("--weight-decay", type=float, default=0.01)
+    parser.add_argument("--k-fold", type=int, default=None, help="If set, run k-fold CV; else use temporal split")
+    parser.add_argument("--v2", action="store_true",
+                        help="v0.0.2 training: random 80/20 split + label smoothing + early stopping "
+                             "+ balanced sampling + temperature-scaled confidence head.")
+    parser.add_argument("--label-smoothing", type=float, default=0.1)
+    parser.add_argument("--patience", type=int, default=20)
+    args = parser.parse_args()
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print(f"device: {device}")
+
+    X, y = load_paired(Path(args.paired))
+    print(f"loaded {X.shape[0]} samples, X shape {X.shape}, "
+          f"label distribution: {dict(Counter(y.tolist()).most_common())}")
+
+    # K-fold cross-validation mode
+    if args.k_fold is not None:
+        print(f"\n=== {args.k_fold}-fold cross-validation ===")
+        fold_results = []
+        overall_t0 = time.perf_counter()
+
+        for fold_idx, (X_train, y_train, X_val, y_val) in enumerate(stratified_k_fold(X, y, k=args.k_fold)):
+            print(f"\nFold {fold_idx + 1}/{args.k_fold}")
+            X_train, X_val = standardise(X_train, X_val)
+
+            cls_counts = np.bincount(y_train, minlength=COUNT_CLASSES).astype(np.float32)
+            cls_counts = np.where(cls_counts > 0, cls_counts, 1.0)
+            cls_weight = (1.0 / cls_counts) / (1.0 / cls_counts).sum() * COUNT_CLASSES
+            cls_weight_t = torch.from_numpy(cls_weight).to(device)
+
+            Xt = torch.from_numpy(X_train).to(device)
+            yt = torch.from_numpy(y_train).to(device)
+            Xv = torch.from_numpy(X_val).to(device)
+            yv = torch.from_numpy(y_val).to(device)
+
+            model = CountNet().to(device)
+            opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
+            sched = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(opt, T_0=50, T_mult=1)
+
+            n_train = X_train.shape[0]
+            best_eval_acc = 0.0
+            best_state = None
+
+            for epoch in range(args.epochs):
+                model.train()
+                perm = torch.randperm(n_train, device=device)
+                train_loss = 0.0
+                train_correct = 0
+                n_batches = 0
+                for i in range(0, n_train, args.batch_size):
+                    idx = perm[i : i + args.batch_size]
+                    xb = Xt[idx]
+                    yb = yt[idx]
+                    opt.zero_grad()
+                    count_logits, conf_logits = model(xb)
+                    ce = F.cross_entropy(count_logits, yb, weight=cls_weight_t)
+                    with torch.no_grad():
+                        pred = count_logits.argmax(dim=1)
+                        correct_indicator = (pred == yb).float().unsqueeze(1)
+                    bce = F.binary_cross_entropy_with_logits(conf_logits, correct_indicator)
+                    with torch.no_grad():
+                        conf_sigm = torch.sigmoid(conf_logits)
+                    brier = ((conf_sigm - correct_indicator) ** 2).mean()
+                    loss = ce + 0.3 * bce + 0.1 * brier
+                    loss.backward()
+                    opt.step()
+                    train_loss += loss.item()
+                    train_correct += (pred == yb).sum().item()
+                    n_batches += 1
+
+                sched.step()
+
+                model.eval()
+                with torch.no_grad():
+                    cl_v, _ = model(Xv)
+                    eval_pred = cl_v.argmax(dim=1)
+                    eval_acc = (eval_pred == yv).float().mean().item()
+
+                if eval_acc > best_eval_acc:
+                    best_eval_acc = eval_acc
+                    best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
+
+            # Restore best checkpoint and final eval
+            if best_state is not None:
+                model.load_state_dict(best_state)
+
+            model.eval()
+            with torch.no_grad():
+                cl_v, conf_v = model(Xv)
+                pred_v = cl_v.argmax(dim=1)
+                acc = (pred_v == yv).float().mean().item()
+                within1 = ((pred_v - yv).abs() <= 1).float().mean().item()
+                mae = (pred_v - yv).abs().float().mean().item()
+
+                # Per-class accuracy
+                per_class = {}
+                for k in range(COUNT_CLASSES):
+                    mask = yv == k
+                    n = mask.sum().item()
+                    if n > 0:
+                        per_class[k] = {
+                            "support": int(n),
+                            "accuracy": ((pred_v == yv) & mask).sum().item() / n,
+                        }
+
+                # Spearman
+                conf_sigm = torch.sigmoid(conf_v).squeeze(-1)
+                correct = (pred_v == yv).float()
+                c_rank = conf_sigm.argsort().argsort().float()
+                r_rank = correct.argsort().argsort().float()
+                c_centered = c_rank - c_rank.mean()
+                r_centered = r_rank - r_rank.mean()
+                denom = (c_centered.norm() * r_centered.norm()).item()
+                spearman = (c_centered * r_centered).sum().item() / denom if denom > 0 else 0.0
+
+            fold_results.append({
+                "fold": fold_idx + 1,
+                "accuracy": acc,
+                "within_pm1": within1,
+                "mae": mae,
+                "spearman": spearman,
+                "per_class_accuracy": per_class,
+            })
+            print(f"  accuracy={acc:.3f}  within±1={within1:.3f}  mae={mae:.3f}  spearman={spearman:.3f}")
+
+        # K-fold summary
+        total_time = time.perf_counter() - overall_t0
+        accs = [r["accuracy"] for r in fold_results]
+        within1s = [r["within_pm1"] for r in fold_results]
+        maes = [r["mae"] for r in fold_results]
+        spears = [r["spearman"] for r in fold_results]
+
+        print(f"\n=== {args.k_fold}-fold summary ({total_time:.1f} s) ===")
+        print(f"  accuracy:       {np.mean(accs):.3f} ± {np.std(accs):.3f}")
+        print(f"  within ±1:      {np.mean(within1s):.3f} ± {np.std(within1s):.3f}")
+        print(f"  MAE:            {np.mean(maes):.3f} ± {np.std(maes):.3f}")
+        print(f"  conf↔correct Spearman: {np.mean(spears):.3f} ± {np.std(spears):.3f}")
+
+        # Per-class summary across folds
+        for k in range(COUNT_CLASSES):
+            accs_k = [r["per_class_accuracy"].get(k, {}).get("accuracy", 0.0) for r in fold_results]
+            n_k = [r["per_class_accuracy"].get(k, {}).get("support", 0) for r in fold_results]
+            if any(n > 0 for n in n_k):
+                print(f"  class {k}:  {np.mean(accs_k):.3f} mean accuracy (support: {n_k})")
+
+        # Write k-fold results to JSON
+        results = {
+            "mode": "k_fold_cv",
+            "k": args.k_fold,
+            "backend": "pytorch-cuda" if device.type == "cuda" else "pytorch-cpu",
+            "total_time_s": total_time,
+            "fold_results": fold_results,
+            "summary": {
+                "mean_accuracy": float(np.mean(accs)),
+                "std_accuracy": float(np.std(accs)),
+                "mean_within_pm1": float(np.mean(within1s)),
+                "std_within_pm1": float(np.std(within1s)),
+                "mean_mae": float(np.mean(maes)),
+                "std_mae": float(np.std(maes)),
+                "mean_spearman": float(np.mean(spears)),
+                "std_spearman": float(np.std(spears)),
+            },
+            "hyperparameters": {
+                "optimizer": "AdamW",
+                "lr": args.lr,
+                "weight_decay": args.weight_decay,
+                "batch_size": args.batch_size,
+                "schedule": "cosine_warm_restarts",
+                "epochs": args.epochs,
+            },
+        }
+        Path(args.out_results).write_text(json.dumps(results, indent=2))
+        print(f"\nwrote {args.out_results}")
+        return
+
+    # ---------------------------------------------------------------
+    # v0.0.2 training path: random 80/20 + label smoothing + early
+    # stopping + class-balanced batch sampling + temperature scaling.
+    # ---------------------------------------------------------------
+    if args.v2:
+        rng = np.random.default_rng(seed=42)
+        idx = np.arange(X.shape[0])
+        rng.shuffle(idx)
+        n_eval = int(round(0.2 * X.shape[0]))
+        eval_idx, train_idx = idx[:n_eval], idx[n_eval:]
+        X_train, X_eval = X[train_idx], X[eval_idx]
+        y_train, y_eval = y[train_idx], y[eval_idx]
+        X_train, X_eval = standardise(X_train, X_eval)
+        print(f"v0.0.2 mode — random 80/20 split: train={len(y_train)} eval={len(y_eval)}")
+        print(f"  train class dist: {dict(Counter(y_train.tolist()).most_common())}")
+        print(f"  eval  class dist: {dict(Counter(y_eval.tolist()).most_common())}")
+
+        Xt = torch.from_numpy(X_train).to(device)
+        yt = torch.from_numpy(y_train).to(device)
+        Xe = torch.from_numpy(X_eval).to(device)
+        ye = torch.from_numpy(y_eval).to(device)
+
+        # Class-balanced sampler: for each batch, sample with replacement
+        # so each class has equal expected count regardless of dataset
+        # distribution. With our ~533/544 split this is nearly a no-op
+        # but it generalises to imbalanced multi-room data later.
+        cls_counts = np.bincount(y_train, minlength=COUNT_CLASSES).astype(np.float32)
+        cls_counts = np.where(cls_counts > 0, cls_counts, 1.0)
+        per_sample_weight = (1.0 / cls_counts[y_train])
+        per_sample_weight_t = torch.from_numpy(per_sample_weight.astype(np.float32)).to(device)
+
+        model = CountNet().to(device)
+        opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
+        sched = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(opt, T_0=50, T_mult=1)
+
+        n_train = X_train.shape[0]
+        batches_per_epoch = max(1, n_train // args.batch_size)
+        epoch_losses = []
+        t0 = time.perf_counter()
+        best_eval_acc = 0.0
+        best_state = None
+        epochs_without_improvement = 0
+
+        for epoch in range(args.epochs):
+            model.train()
+            train_loss = 0.0; train_correct = 0; n_batches = 0
+            for _ in range(batches_per_epoch):
+                # Balanced sample with replacement
+                idx_t = torch.multinomial(per_sample_weight_t, args.batch_size, replacement=True)
+                xb = Xt[idx_t]; yb = yt[idx_t]
+                opt.zero_grad()
+                count_logits, conf_logits = model(xb)
+                ce = F.cross_entropy(count_logits, yb, label_smoothing=args.label_smoothing)
+                with torch.no_grad():
+                    pred = count_logits.argmax(dim=1)
+                    correct_indicator = (pred == yb).float().unsqueeze(1)
+                bce = F.binary_cross_entropy_with_logits(conf_logits, correct_indicator)
+                with torch.no_grad():
+                    conf_sigm = torch.sigmoid(conf_logits)
+                brier = ((conf_sigm - correct_indicator) ** 2).mean()
+                loss = ce + 0.3 * bce + 0.1 * brier
+                loss.backward()
+                opt.step()
+                train_loss += loss.item()
+                train_correct += (pred == yb).sum().item()
+                n_batches += 1
+            sched.step()
+
+            model.eval()
+            with torch.no_grad():
+                cl_e, _ = model(Xe)
+                eval_loss = F.cross_entropy(cl_e, ye).item()
+                eval_pred = cl_e.argmax(dim=1)
+                eval_acc = (eval_pred == ye).float().mean().item()
+            epoch_losses.append({
+                "epoch": epoch,
+                "train_loss": train_loss / max(1, n_batches),
+                "train_acc": train_correct / max(1, n_batches * args.batch_size),
+                "eval_loss": eval_loss,
+                "eval_acc": eval_acc,
+            })
+            if eval_acc > best_eval_acc:
+                best_eval_acc = eval_acc
+                best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
+                epochs_without_improvement = 0
+            else:
+                epochs_without_improvement += 1
+
+            if epoch < 5 or epoch % 25 == 0:
+                print(f"epoch {epoch:3d}  train_loss={train_loss/n_batches:.4f}  "
+                      f"train_acc={train_correct/(n_batches*args.batch_size):.3f}  "
+                      f"eval_loss={eval_loss:.4f}  eval_acc={eval_acc:.3f}  "
+                      f"epochs_no_improve={epochs_without_improvement}")
+            if epochs_without_improvement >= args.patience:
+                print(f"early stopping at epoch {epoch} (no improvement for {args.patience} epochs)")
+                break
+
+        train_time = time.perf_counter() - t0
+        print(f"\ntrained {epoch + 1} epochs in {train_time:.1f} s  (best eval_acc {best_eval_acc:.3f})")
+        if best_state is not None:
+            model.load_state_dict(best_state)
+
+        # Temperature scaling on the confidence head — fit a scalar T s.t.
+        # sigmoid(conf_logits / T) is best-calibrated on the eval set.
+        model.eval()
+        with torch.no_grad():
+            cl_e, conf_e = model(Xe)
+            pred_e = cl_e.argmax(dim=1)
+            correct_indicator = (pred_e == ye).float()
+        # 1D optimisation over T via LBFGS.
+        T = torch.nn.Parameter(torch.ones(1, device=device))
+        opt_t = torch.optim.LBFGS([T], lr=0.1, max_iter=50)
+        def eval_t():
+            opt_t.zero_grad()
+            scaled = conf_e.squeeze(-1) / T
+            loss_t = F.binary_cross_entropy_with_logits(scaled, correct_indicator)
+            loss_t.backward()
+            return loss_t
+        opt_t.step(eval_t)
+        T_val = float(T.detach().cpu().item())
+        print(f"  temperature scale T = {T_val:.4f}")
+
+        # Final eval with temperature applied.
+        with torch.no_grad():
+            cl_e, conf_e = model(Xe)
+            probs_e = F.softmax(cl_e, dim=1)
+            pred_e = cl_e.argmax(dim=1)
+            acc = (pred_e == ye).float().mean().item()
+            within1 = ((pred_e - ye).abs() <= 1).float().mean().item()
+            mae = (pred_e - ye).abs().float().mean().item()
+            per_class = {}
+            for k in range(COUNT_CLASSES):
+                mask = ye == k
+                n = mask.sum().item()
+                if n > 0:
+                    per_class[k] = {
+                        "support": int(n),
+                        "accuracy": ((pred_e == ye) & mask).sum().item() / n,
+                    }
+            conf_sigm = torch.sigmoid(conf_e.squeeze(-1) / T_val)
+            correct = (pred_e == ye).float()
+            c_rank = conf_sigm.argsort().argsort().float()
+            r_rank = correct.argsort().argsort().float()
+            c_centered = c_rank - c_rank.mean()
+            r_centered = r_rank - r_rank.mean()
+            denom = (c_centered.norm() * r_centered.norm()).item()
+            spearman = (c_centered * r_centered).sum().item() / denom if denom > 0 else 0.0
+
+        print(f"\n=== v0.0.2 final eval ===")
+        print(f"  accuracy:       {acc:.3f}")
+        print(f"  within ±1:      {within1:.3f}")
+        print(f"  MAE:            {mae:.3f}")
+        print(f"  conf↔correct Spearman (post-temp): {spearman:.3f}")
+        for k, v in per_class.items():
+            print(f"  class {k}:  {v['accuracy']:.3f} accuracy on {v['support']} samples")
+
+        write_safetensors(model, Path(args.out_safetensors))
+        # Also append the temperature scalar so the cog can apply it.
+        # We add it by appending to the safetensors file using the
+        # write_safetensors helper but with the temperature recorded
+        # as a separate file alongside (count_v1.temperature.txt) for
+        # consumption by the Rust cog inference path.
+        Path(args.out_safetensors + ".temperature").write_text(f"{T_val}\n")
+        print(f"wrote {args.out_safetensors} ({Path(args.out_safetensors).stat().st_size} bytes)")
+        print(f"wrote {args.out_safetensors}.temperature ({T_val})")
+
+        # ONNX
+        dummy = torch.zeros(1, N_SUB, N_FRAMES, device=device)
+        try:
+            torch.onnx.export(model, dummy, args.out_onnx, opset_version=18,
+                              input_names=["csi_window"],
+                              output_names=["count_logits", "conf_logits"],
+                              dynamic_axes={"csi_window": {0: "batch"},
+                                            "count_logits": {0: "batch"},
+                                            "conf_logits": {0: "batch"}},
+                              export_params=True, do_constant_folding=True)
+            print(f"wrote {args.out_onnx} ({Path(args.out_onnx).stat().st_size} bytes)")
+        except Exception as e:
+            print(f"WARN: ONNX export failed: {e}")
+
+        results = {
+            "mode": "v0.0.2",
+            "backend": "pytorch-cuda" if device.type == "cuda" else "pytorch-cpu",
+            "epochs_trained": epoch + 1,
+            "train_time_s": train_time,
+            "best_eval_acc": best_eval_acc,
+            "final_eval_acc": acc,
+            "final_eval_within_pm1": within1,
+            "final_eval_mae": mae,
+            "temperature_scale": T_val,
+            "conf_correctness_spearman_post_temp": spearman,
+            "per_class_accuracy": per_class,
+            "hyperparameters": {
+                "optimizer": "AdamW",
+                "lr": args.lr,
+                "weight_decay": args.weight_decay,
+                "batch_size": args.batch_size,
+                "schedule": "cosine_warm_restarts",
+                "epochs_max": args.epochs,
+                "label_smoothing": args.label_smoothing,
+                "patience": args.patience,
+                "split": "random_80_20_seed_42",
+                "balanced_sampler": True,
+                "temperature_scaling": True,
+            },
+            "epoch_losses": epoch_losses,
+        }
+        Path(args.out_results).write_text(json.dumps(results, indent=2))
+        print(f"wrote {args.out_results}")
+        return
+
+    # Original temporal-split mode (kept for v0.0.1 reproducibility).
+    X_train, y_train, X_eval, y_eval = temporal_split(X, y, eval_frac=0.2)
+    X_train, X_eval = standardise(X_train, X_eval)
+
+    # Re-balance via class weights — handles the 50/50 split fine
+    # but also makes the loss correct under future imbalanced data.
+    cls_counts = np.bincount(y_train, minlength=COUNT_CLASSES).astype(np.float32)
+    cls_counts = np.where(cls_counts > 0, cls_counts, 1.0)
+    cls_weight = (1.0 / cls_counts) / (1.0 / cls_counts).sum() * COUNT_CLASSES
+    cls_weight_t = torch.from_numpy(cls_weight).to(device)
+    print(f"class weights: {cls_weight.tolist()}")
+
+    Xt = torch.from_numpy(X_train).to(device)
+    yt = torch.from_numpy(y_train).to(device)
+    Xe = torch.from_numpy(X_eval).to(device)
+    ye = torch.from_numpy(y_eval).to(device)
+
+    model = CountNet().to(device)
+    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
+    sched = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(opt, T_0=50, T_mult=1)
+
+    n_train = X_train.shape[0]
+    epoch_losses = []
+    t0 = time.perf_counter()
+
+    best_eval_acc = 0.0
+    best_state = None
+
+    for epoch in range(args.epochs):
+        model.train()
+        perm = torch.randperm(n_train, device=device)
+        train_loss = 0.0
+        train_correct = 0
+        n_batches = 0
+        for i in range(0, n_train, args.batch_size):
+            idx = perm[i : i + args.batch_size]
+            xb = Xt[idx]
+            yb = yt[idx]
+            opt.zero_grad()
+            count_logits, conf_logits = model(xb)
+
+            # Categorical cross-entropy for count.
+            ce = F.cross_entropy(count_logits, yb, weight=cls_weight_t)
+
+            # Confidence head: train against `argmax == truth` indicator.
+            with torch.no_grad():
+                pred = count_logits.argmax(dim=1)
+                correct_indicator = (pred == yb).float().unsqueeze(1)
+            bce = F.binary_cross_entropy_with_logits(conf_logits, correct_indicator)
+
+            # Brier-score uncertainty calibration on the conf head — sharpens
+            # the calibration so the sigmoid output is a real probability.
+            with torch.no_grad():
+                conf_sigm = torch.sigmoid(conf_logits)
+            brier = ((conf_sigm - correct_indicator) ** 2).mean()
+
+            loss = ce + 0.3 * bce + 0.1 * brier
+            loss.backward()
+            opt.step()
+
+            train_loss += loss.item()
+            train_correct += (pred == yb).sum().item()
+            n_batches += 1
+
+        sched.step()
+
+        model.eval()
+        with torch.no_grad():
+            cl_e, _ = model(Xe)
+            eval_loss = F.cross_entropy(cl_e, ye, weight=cls_weight_t).item()
+            eval_pred = cl_e.argmax(dim=1)
+            eval_acc = (eval_pred == ye).float().mean().item()
+            eval_within1 = ((eval_pred - ye).abs() <= 1).float().mean().item()
+
+        epoch_losses.append({
+            "epoch": epoch,
+            "train_loss": train_loss / n_batches,
+            "train_acc": train_correct / n_train,
+            "eval_loss": eval_loss,
+            "eval_acc": eval_acc,
+            "eval_within_pm1": eval_within1,
+        })
+
+        if eval_acc > best_eval_acc:
+            best_eval_acc = eval_acc
+            best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
+
+        if epoch < 5 or epoch % 50 == 0 or epoch == args.epochs - 1:
+            print(f"epoch {epoch:3d}  train_loss={train_loss/n_batches:.4f}  "
+                  f"train_acc={train_correct/n_train:.3f}  "
+                  f"eval_loss={eval_loss:.4f}  eval_acc={eval_acc:.3f}  "
+                  f"within±1={eval_within1:.3f}")
+
+    train_time = time.perf_counter() - t0
+    print(f"\ntrained {args.epochs} epochs in {train_time:.1f} s")
+    print(f"best eval_acc: {best_eval_acc:.3f}")
+
+    # Restore best checkpoint
+    if best_state is not None:
+        model.load_state_dict(best_state)
+
+    # Eval breakdown
+    model.eval()
+    with torch.no_grad():
+        cl_e, conf_e = model(Xe)
+        probs_e = torch.softmax(cl_e, dim=1)
+        pred_e = cl_e.argmax(dim=1)
+        acc = (pred_e == ye).float().mean().item()
+        within1 = ((pred_e - ye).abs() <= 1).float().mean().item()
+        mae = (pred_e - ye).abs().float().mean().item()
+
+        # Per-class accuracy
+        per_class = {}
+        for k in range(COUNT_CLASSES):
+            mask = ye == k
+            n = mask.sum().item()
+            if n > 0:
+                per_class[k] = {
+                    "support": int(n),
+                    "accuracy": ((pred_e == ye) & mask).sum().item() / n,
+                }
+
+        # Confidence-accuracy calibration: Spearman over (predicted-correct, confidence)
+        conf_sigm = torch.sigmoid(conf_e).squeeze(-1)
+        correct = (pred_e == ye).float()
+        # Spearman = Pearson over ranks
+        c_rank = conf_sigm.argsort().argsort().float()
+        r_rank = correct.argsort().argsort().float()
+        c_centered = c_rank - c_rank.mean()
+        r_centered = r_rank - r_rank.mean()
+        denom = (c_centered.norm() * r_centered.norm()).item()
+        spearman = (c_centered * r_centered).sum().item() / denom if denom > 0 else 0.0
+
+    print(f"\n=== final eval ===")
+    print(f"  accuracy:       {acc:.3f}")
+    print(f"  within ±1:      {within1:.3f}")
+    print(f"  MAE:            {mae:.3f}")
+    print(f"  conf↔correct Spearman: {spearman:.3f}")
+    for k, v in per_class.items():
+        print(f"  class {k}:  {v['accuracy']:.3f} accuracy on {v['support']} samples")
+
+    # Save safetensors
+    write_safetensors(model, Path(args.out_safetensors))
+    print(f"\nwrote {args.out_safetensors} ({Path(args.out_safetensors).stat().st_size} bytes)")
+
+    # ONNX export
+    dummy = torch.zeros(1, N_SUB, N_FRAMES, device=device)
+    try:
+        torch.onnx.export(
+            model, dummy, args.out_onnx,
+            opset_version=18,
+            input_names=["csi_window"],
+            output_names=["count_logits", "conf_logits"],
+            dynamic_axes={
+                "csi_window": {0: "batch"},
+                "count_logits": {0: "batch"},
+                "conf_logits": {0: "batch"},
+            },
+            export_params=True,
+            do_constant_folding=True,
+        )
+        print(f"wrote {args.out_onnx} ({Path(args.out_onnx).stat().st_size} bytes)")
+    except Exception as e:
+        print(f"WARN: ONNX export failed: {e}")
+
+    # Results JSON
+    results = {
+        "backend": "candle-cuda" if device.type == "cuda" else "candle-cpu",
+        "device": str(device),
+        "epochs": args.epochs,
+        "train_time_s": train_time,
+        "best_eval_acc": best_eval_acc,
+        "final_eval_acc": acc,
+        "final_eval_within_pm1": within1,
+        "final_eval_mae": mae,
+        "conf_correctness_spearman": spearman,
+        "per_class_accuracy": per_class,
+        "hyperparameters": {
+            "optimizer": "AdamW",
+            "lr": args.lr,
+            "weight_decay": args.weight_decay,
+            "batch_size": args.batch_size,
+            "schedule": "cosine_warm_restarts",
+            "epochs": args.epochs,
+            "loss": "cross_entropy(count) + 0.3*bce(conf) + 0.1*brier(conf)",
+            "z_score_normalisation": True,
+            "class_weights": cls_weight.tolist(),
+        },
+        "epoch_losses": epoch_losses,
+    }
+    Path(args.out_results).write_text(json.dumps(results, indent=2))
+    print(f"wrote {args.out_results} ({Path(args.out_results).stat().st_size} bytes)")
+
+
+if __name__ == "__main__":
+    main()
@@ -929,6 +929,26 @@ version = "1.0.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "3a822ea5bc7590f9d40f1ba12c0dc3c2760f3482c6984db1573ad11031420831"

+[[package]]
+name = "cog-person-count"
+version = "0.3.0"
+dependencies = [
+ "approx",
+ "candle-core 0.9.2",
+ "candle-nn 0.9.2",
+ "clap",
+ "safetensors 0.4.5",
+ "serde",
+ "serde_json",
+ "sha2",
+ "tempfile",
+ "thiserror 1.0.69",
+ "tokio",
+ "tracing",
+ "tracing-subscriber",
+ "ureq 2.12.1",
+]
+
 [[package]]
 name = "cog-pose-estimation"
 version = "0.3.0"
@@ -34,6 +34,10 @@ members = [
    # cognitum-cluster-*, ruvultra). The companion appliance-side crate
    # lives in cognitum-one/v0-appliance as `cognitum-pose-estimation`.
    "crates/cog-pose-estimation",
+    # ADR-103: Learned multi-person counter (SOTA path) — replaces the
+    # PR #491 slot heuristic with a Candle network + Stoer-Wagner fusion.
+    # Motivated by #499 ghost-skeleton reports.
+    "crates/cog-person-count",
    # rvCSI — edge RF sensing runtime (ADR-095 platform, ADR-096 FFI/crate layout):
    # lives in its own repo (https://github.com/ruvnet/rvcsi), vendored here as
    # `vendor/rvcsi` and published to crates.io as `rvcsi-*` 0.3.x. Depend on the
@@ -0,0 +1,42 @@
+[package]
+name = "cog-person-count"
+version.workspace = true
+edition.workspace = true
+authors.workspace = true
+license.workspace = true
+repository.workspace = true
+description = "Cognitum Cog: learned multi-person counter from WiFi CSI (ADR-103). Replaces the PR #491 slot heuristic with a Candle-based count head + Stoer-Wagner multi-node fusion."
+publish = false
+
+[[bin]]
+name = "cog-person-count"
+path = "src/main.rs"
+
+[lib]
+name = "cog_person_count"
+path = "src/lib.rs"
+
+[dependencies]
+clap = { version = "4", features = ["derive"] }
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+thiserror = "1"
+tracing = "0.1"
+tracing-subscriber = { version = "0.3", features = ["env-filter"] }
+tokio = { version = "1", features = ["rt-multi-thread", "macros", "signal", "time"] }
+sha2 = "0.10"
+ureq = { version = "2", default-features = false, features = ["tls"] }
+# Same Candle stack the pose cog uses — CPU by default, `cuda` feature
+# opt-in for hosts with a CUDA GPU.
+candle-core = { version = "0.9", default-features = false }
+candle-nn = { version = "0.9", default-features = false }
+safetensors = "0.4"
+
+[dev-dependencies]
+tempfile = "3"
+approx = "0.5"
+
+[features]
+default = []
+cuda = ["candle-core/cuda", "candle-nn/cuda"]
+hailo = []
@@ -0,0 +1,96 @@
+# Person Count Cog
+
+Learned multi-person counter for WiFi CSI — designed in [ADR-103](../../../../docs/adr/ADR-103-learned-multi-person-counter.md), packaged per [ADR-100](../../../../docs/adr/ADR-100-cog-packaging-specification.md), discoverable through [ADR-102](../../../../docs/adr/ADR-102-edge-module-registry.md).
+
+## What it does
+
+Replaces the PR #491 slot heuristic (`subcarrier_diversity / dedup_factor`) with a Candle network that emits a calibrated count distribution + confidence per CSI window. Multi-node deployments fuse N per-node predictions through a confidence-weighted log-sum (Bayesian product of experts), optionally bounded above by a Stoer-Wagner min-cut from the subcarrier-similarity graph.
+
+## Output (per frame)
+
+```json
+{
+  "ts": 1779210883.444,
+  "level": "info",
+  "event": "person.count",
+  "fields": {
+    "tick": 12345,
+    "count": 2,
+    "confidence": 0.81,
+    "count_p95_low": 1,
+    "count_p95_high": 3,
+    "n_nodes": 3,
+    "probs": [0.01, 0.03, 0.81, 0.13, 0.01, 0.005, 0.003, 0.002]
+  }
+}
+```
+
+Downstream consumers can render the **most-likely count** when confidence is high, or fall back to a `[lo, hi]` band with a "?" badge when the model is uncertain — that's how this Cog closes the loop on #499's ghost-skeleton UX.
+
+## Status — v0.0.1
+
+| Component | State |
+|---|---|
+| Crate compiles, library API stable | ✅ |
+| Tests pass (15 total: 8 smoke + 7 fusion) | ✅ |
+| Four-verb runtime contract (`version`, `manifest`, `health`) | ✅ |
+| Trained `count_v1.safetensors` artifact | ✅ shipped at `cog/artifacts/count_v1.safetensors` (392 KB) |
+| ONNX export | ✅ `count_v1.onnx` (16 KB), bit-compatible architecture |
+| Honest accuracy reporting | ✅ See `docs/benchmarks/person-count-cog.md` — 65.1% eval acc on a single-session dataset; confidence head Spearman 0.023 ⇒ uncalibrated for v0.0.1 |
+| `run` subcommand (long-running loop) | ⏳ same shape as cog-pose-estimation::runtime, lands in follow-up |
+| Signed binary on GCS | ⏳ release pipeline |
+| Stoer-Wagner min-cut clip in fusion stage | ⏳ v0.2.0 (hook in `fusion::fuse_with_mincut_clip` is stubbed) |
+
+### Honest v0.0.1 caveat
+
+`count_v1` was trained on a single 30-minute solo recording. The model overfit by epoch ~100 and the "best" checkpoint is one that effectively predicts the eval-window class distribution (mostly class-0). Class-1 accuracy on the held-out tail = 0%. **This v0.0.1 is a working pipeline with a degenerate model**, not a usable counter yet — same data-bound failure mode as `pose_v1` (#645), same fix: multi-room paired recordings.
+
+`cog-person-count health` will load the real safetensors and report `backend: candle-cpu` rather than `backend: stub`, so the cog-gateway can verify the model loaded — but operators should treat the v0.0.1 count outputs as scaffold-validation rather than production data. The 2.36 MB binary + 392 KB weights + 16 KB ONNX are all real and reusable as soon as more data lands.
+
+## Relationship to the in-process `csi.rs::score_to_person_count` heuristic
+
+This Cog runs **out-of-process** alongside `wifi-densepose-sensing-server`. The two are complementary, not competing:
+
+- The sensing-server keeps emitting its existing slot-count heuristic from `csi.rs::score_to_person_count` (PR #491's RollingP95 + `dedup_factor`). This is the **fallback path** — operators who don't install `cog-person-count` still get a count number, just a less calibrated one.
+- `cog-person-count` (this binary) polls the same `/api/v1/sensing/latest` endpoint, runs the learned `count_v1` model on each window, and emits `person.count` events on stdout. The appliance's `cognitum-cog-gateway` routes those events to the dashboard via the standard ADR-220 cog-event channel.
+
+Operators choose by **installing or not installing** this Cog — no sensing-server rebuild required. Downstream consumers (UI, fleet automation, alerting rules) can subscribe to whichever event stream they prefer.
+
+The architecture decision is documented in [ADR-103 §"Deployment"](../../../../docs/adr/ADR-103-learned-multi-person-counter.md#deployment) and matches the cog/sensing-server boundary established for `cog-pose-estimation` (ADR-101).
+
+## Security
+
+The cog has a very small attack surface — by design, it's a pure consumer of CSI data, not a server:
+
+| Threat | Mitigation |
+|---|---|
+| Untrusted model file mmap | `count_v1.safetensors` is loaded via `VarBuilder::from_mmaped_safetensors` (`unsafe` block, documented). The release pipeline signs the file with `COGNITUM_OWNER_SIGNING_KEY` per ADR-100; the appliance's cog-gateway verifies the Ed25519 signature against `weights_sha256` before placing the file under `/var/lib/cognitum/apps/person-count/`. |
+| Non-finite outputs from a corrupted model | `CountPrediction::is_finite()` is checked in `cmd_health` and in the v0.0.1 run-loop before any `person.count` event is emitted; non-finite outputs fail-closed. |
+| Sensing-server fetch failures | When the sensing source goes away the cog emits a `WARN` event and skips the frame — same fail-open-as-log pattern as `cog-pose-estimation`. No crash, no leaked file descriptors, no stuck `pid` file. |
+| Fusion divide-by-zero / log-of-zero | `fuse_confidence_weighted` floors confidences at `1e-3` and floors probabilities at `1e-9` before taking logs. Empty input returns the stub default rather than NaN-propagating. |
+| Over-the-cap mass after min-cut clip | `fuse_with_mincut_clip` re-normalises the surviving prefix; if all mass was above the cap (degenerate case), it places mass at the cap class rather than producing a zero distribution. |
+| Output spoofing via stdout | Events go to stdout exactly as ADR-100's runtime contract specifies — the cog-gateway parses each line as JSON. No interactive prompts, no shell escapes, no ANSI control sequences from this cog. |
+
+The cog opens **zero** network listeners and writes to **zero** files under `/var/lib/cognitum/apps/person-count/` beyond the standard `pid`, `output.log`, and `error.log` that the cog-gateway manages externally.
+
+## Performance / optimization
+
+Release build: **2.36 MB stripped binary** on `x86_64-unknown-linux-gnu` (smaller than `cog-pose-estimation`'s 4.5 MB because we don't transitively pull `wifi-densepose-train`).
+
+Workspace release profile already enables `opt-level = 3`, `lto = "fat"`, `codegen-units = 1`, `strip = true`. No further per-cog optimization knobs needed.
+
+Cold-start latency (30 sequential `health` invocations, Windows x86_64, candle-cpu backend):
+
+| Cog | Cold-start |
+|---|---|
+| `cog-pose-estimation` | 76.2 ms |
+| **`cog-person-count`** | **53.3 ms** |
+
+Long-running `run` warm inference: sub-millisecond per frame in the stub backend (single softmax over 8 classes is essentially free). The trained-model warm path is bounded by the three Conv1d layers — projected ≤ 2 ms on a Pi 5 once `count_v1.safetensors` lands, well under the ≤ 5 ms ADR-103 budget.
+
+## See also
+
+- ADR-103 — Design, SOTA comparison, acceptance gates.
+- ADR-100 — Cog packaging spec.
+- PR #491 — The heuristic this Cog replaces.
+- Issue #499 — Original "double skeletons" report that motivated ADR-103.
@@ -0,0 +1,240 @@
+{
+  "mode": "v0.0.2",
+  "backend": "pytorch-cuda",
+  "epochs_trained": 29,
+  "train_time_s": 0.7185604920377955,
+  "best_eval_acc": 0.6232557892799377,
+  "final_eval_acc": 0.6232557892799377,
+  "final_eval_within_pm1": 1.0,
+  "final_eval_mae": 0.37674418091773987,
+  "temperature_scale": 0.9261822700500488,
+  "conf_correctness_spearman_post_temp": 0.012770170735830375,
+  "per_class_accuracy": {
+    "0": {
+      "support": 116,
+      "accuracy": 0.8620689655172413
+    },
+    "1": {
+      "support": 99,
+      "accuracy": 0.3434343434343434
+    }
+  },
+  "hyperparameters": {
+    "optimizer": "AdamW",
+    "lr": 0.001,
+    "weight_decay": 0.01,
+    "batch_size": 64,
+    "schedule": "cosine_warm_restarts",
+    "epochs_max": 400,
+    "label_smoothing": 0.1,
+    "patience": 20,
+    "split": "random_80_20_seed_42",
+    "balanced_sampler": true,
+    "temperature_scaling": true
+  },
+  "epoch_losses": [
+    {
+      "epoch": 0,
+      "train_loss": 1.8680313183711126,
+      "train_acc": 0.4543269230769231,
+      "eval_loss": 0.7276814579963684,
+      "eval_acc": 0.539534866809845
+    },
+    {
+      "epoch": 1,
+      "train_loss": 1.3579198305423443,
+      "train_acc": 0.5060096153846154,
+      "eval_loss": 0.8614012002944946,
+      "eval_acc": 0.46046510338783264
+    },
+    {
+      "epoch": 2,
+      "train_loss": 1.299364447593689,
+      "train_acc": 0.4831730769230769,
+      "eval_loss": 0.7327257990837097,
+      "eval_acc": 0.539534866809845
+    },
+    {
+      "epoch": 3,
+      "train_loss": 1.2834151433064387,
+      "train_acc": 0.4963942307692308,
+      "eval_loss": 0.7958587408065796,
+      "eval_acc": 0.539534866809845
+    },
+    {
+      "epoch": 4,
+      "train_loss": 1.2809640077444224,
+      "train_acc": 0.49278846153846156,
+      "eval_loss": 0.7728011608123779,
+      "eval_acc": 0.46046510338783264
+    },
+    {
+      "epoch": 5,
+      "train_loss": 1.276416512636038,
+      "train_acc": 0.5120192307692307,
+      "eval_loss": 0.7620130181312561,
+      "eval_acc": 0.539534866809845
+    },
+    {
+      "epoch": 6,
+      "train_loss": 1.2767094740500817,
+      "train_acc": 0.4951923076923077,
+      "eval_loss": 0.7696149945259094,
+      "eval_acc": 0.604651153087616
+    },
+    {
+      "epoch": 7,
+      "train_loss": 1.2724562699978168,
+      "train_acc": 0.5324519230769231,
+      "eval_loss": 0.7653729319572449,
+      "eval_acc": 0.539534866809845
+    },
+    {
+      "epoch": 8,
+      "train_loss": 1.2739891455723689,
+      "train_acc": 0.5264423076923077,
+      "eval_loss": 0.7635467648506165,
+      "eval_acc": 0.6232557892799377
+    },
+    {
+      "epoch": 9,
+      "train_loss": 1.2718101739883423,
+      "train_acc": 0.5120192307692307,
+      "eval_loss": 0.7564782500267029,
+      "eval_acc": 0.604651153087616
+    },
+    {
+      "epoch": 10,
+      "train_loss": 1.261798886152414,
+      "train_acc": 0.5625,
+      "eval_loss": 0.7915780544281006,
+      "eval_acc": 0.46046510338783264
+    },
+    {
+      "epoch": 11,
+      "train_loss": 1.2723550613109882,
+      "train_acc": 0.5348557692307693,
+      "eval_loss": 0.7585318088531494,
+      "eval_acc": 0.6139534711837769
+    },
+    {
+      "epoch": 12,
+      "train_loss": 1.2408426174750695,
+      "train_acc": 0.6225961538461539,
+      "eval_loss": 0.7562077045440674,
+      "eval_acc": 0.525581419467926
+    },
+    {
+      "epoch": 13,
+      "train_loss": 1.219417168543889,
+      "train_acc": 0.6334134615384616,
+      "eval_loss": 0.7647078633308411,
+      "eval_acc": 0.5860465168952942
+    },
+    {
+      "epoch": 14,
+      "train_loss": 1.198713256762578,
+      "train_acc": 0.6526442307692307,
+      "eval_loss": 0.7711634635925293,
+      "eval_acc": 0.5720930099487305
+    },
+    {
+      "epoch": 15,
+      "train_loss": 1.167367669252249,
+      "train_acc": 0.6826923076923077,
+      "eval_loss": 0.7664391994476318,
+      "eval_acc": 0.6186046600341797
+    },
+    {
+      "epoch": 16,
+      "train_loss": 1.1867470557873065,
+      "train_acc": 0.6574519230769231,
+      "eval_loss": 0.7853891253471375,
+      "eval_acc": 0.6139534711837769
+    },
+    {
+      "epoch": 17,
+      "train_loss": 1.185251813668471,
+      "train_acc": 0.6766826923076923,
+      "eval_loss": 0.7728492021560669,
+      "eval_acc": 0.5767441987991333
+    },
+    {
+      "epoch": 18,
+      "train_loss": 1.1749065747627845,
+      "train_acc": 0.6814903846153846,
+      "eval_loss": 0.7930512428283691,
+      "eval_acc": 0.5488371849060059
+    },
+    {
+      "epoch": 19,
+      "train_loss": 1.1521984338760376,
+      "train_acc": 0.6983173076923077,
+      "eval_loss": 0.7875214219093323,
+      "eval_acc": 0.5860465168952942
+    },
+    {
+      "epoch": 20,
+      "train_loss": 1.158121026479281,
+      "train_acc": 0.6802884615384616,
+      "eval_loss": 0.785778820514679,
+      "eval_acc": 0.5860465168952942
+    },
+    {
+      "epoch": 21,
+      "train_loss": 1.1232389486753023,
+      "train_acc": 0.7319711538461539,
+      "eval_loss": 0.7949181795120239,
+      "eval_acc": 0.5767441987991333
+    },
+    {
+      "epoch": 22,
+      "train_loss": 1.1163162634922907,
+      "train_acc": 0.7391826923076923,
+      "eval_loss": 0.867073118686676,
+      "eval_acc": 0.539534866809845
+    },
+    {
+      "epoch": 23,
+      "train_loss": 1.1119057948772724,
+      "train_acc": 0.7211538461538461,
+      "eval_loss": 0.8135209679603577,
+      "eval_acc": 0.5953488349914551
+    },
+    {
+      "epoch": 24,
+      "train_loss": 1.107274578167842,
+      "train_acc": 0.7271634615384616,
+      "eval_loss": 0.8401668071746826,
+      "eval_acc": 0.5534883737564087
+    },
+    {
+      "epoch": 25,
+      "train_loss": 1.0781027399576628,
+      "train_acc": 0.7451923076923077,
+      "eval_loss": 0.8606341481208801,
+      "eval_acc": 0.5441860556602478
+    },
+    {
+      "epoch": 26,
+      "train_loss": 1.041811819259937,
+      "train_acc": 0.7584134615384616,
+      "eval_loss": 0.8801625967025757,
+      "eval_acc": 0.5767441987991333
+    },
+    {
+      "epoch": 27,
+      "train_loss": 1.0369769976689265,
+      "train_acc": 0.7764423076923077,
+      "eval_loss": 0.8642652034759521,
+      "eval_acc": 0.5860465168952942
+    },
+    {
+      "epoch": 28,
+      "train_loss": 1.0502384350850031,
+      "train_acc": 0.7524038461538461,
+      "eval_loss": 0.8719286322593689,
+      "eval_acc": 0.5720930099487305
+    }
+  ]
+}
@@ -0,0 +1 @@
+0.9261822700500488
@@ -0,0 +1,27 @@
+{
+  "arch": "arm",
+  "binary_bytes": 3807456,
+  "binary_sha256": "15c2fbac19741298ad1cbaf119c633a42db0a273099561fd57d8afce27728ea5",
+  "binary_signature": "gyV2CDhJo5nqBnREA08KnztGsS7AFOuXCse+2/+wul8DAzerHs9p4L6eUgl8QeiDS9rdQZs33XRxH5WTbkT0Ag==",
+  "binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-arm",
+  "build_metadata": {
+    "candle": "0.9 cpu",
+    "cog_person_count_version": "0.3.0",
+    "rust": "1.95.0",
+    "training_caveat": "random 80/20 split + label smoothing + early stopping + balanced sampler + temperature calibration. K-fold reference: class-1 mean 57.1% across 5 folds.",
+    "training_class1_accuracy": 0.343,
+    "training_eval_accuracy": 0.623,
+    "training_eval_mae": 0.349,
+    "training_temperature_scale": 0.9262
+  },
+  "id": "person-count",
+  "installed_at": 0,
+  "sig_algo": "Ed25519",
+  "signed_by": "COGNITUM_OWNER_SIGNING_KEY",
+  "status": "installed",
+  "target_triple": "aarch64-unknown-linux-gnu",
+  "version": "0.0.2",
+  "weights_bytes": 392088,
+  "weights_sha256": "32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c",
+  "weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors"
+}
@@ -0,0 +1,27 @@
+{
+  "arch": "x86_64",
+  "binary_bytes": 4502960,
+  "binary_sha256": "051614ce6ba63df704fae848a67ad095df4bb88862fdff05ef3c0419cc8388b3",
+  "binary_signature": "P9txCcsqCoFN6LyZS+Hl33pYZxiP/nXJMTI6s4bt26cc+Cteidz7ymajCQIfuq0mx0cnWaQ6eKZUjzq5AIgoBw==",
+  "binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/x86_64/cog-person-count-x86_64",
+  "build_metadata": {
+    "candle": "0.9 cpu",
+    "cog_person_count_version": "0.3.0",
+    "rust": "1.95.0",
+    "training_caveat": "random 80/20 split + label smoothing + early stopping + balanced sampler + temperature calibration. K-fold reference: class-1 mean 57.1% across 5 folds.",
+    "training_class1_accuracy": 0.343,
+    "training_eval_accuracy": 0.623,
+    "training_eval_mae": 0.349,
+    "training_temperature_scale": 0.9262
+  },
+  "id": "person-count",
+  "installed_at": 0,
+  "sig_algo": "Ed25519",
+  "signed_by": "COGNITUM_OWNER_SIGNING_KEY",
+  "status": "installed",
+  "target_triple": "x86_64-unknown-linux-gnu",
+  "version": "0.0.2",
+  "weights_bytes": 392088,
+  "weights_sha256": "32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c",
+  "weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors"
+}
@@ -0,0 +1,25 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://cognitum.one/schemas/cog-person-count-config-v1.json",
+  "title": "Person Count Cog Runtime Config",
+  "type": "object",
+  "additionalProperties": false,
+  "properties": {
+    "sensing_url": {
+      "type": "string",
+      "format": "uri",
+      "default": "http://127.0.0.1:3000/api/v1/sensing/latest"
+    },
+    "model_path": {
+      "type": "string",
+      "description": "Filesystem path to count_v1.safetensors. Resolved relative to /var/lib/cognitum/apps/person-count/ when not absolute."
+    },
+    "poll_ms": {
+      "type": "integer",
+      "minimum": 10,
+      "maximum": 1000,
+      "default": 40
+    }
+  },
+  "required": ["model_path"]
+}
@@ -0,0 +1,17 @@
+{
+  "id": "person-count",
+  "version": "{{VERSION}}",
+  "binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/{{ARCH}}/cog-person-count-{{ARCH}}",
+  "binary_bytes": 0,
+  "binary_sha256": "",
+  "binary_signature": "",
+  "weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/{{ARCH}}/cog-person-count-count_v1.safetensors",
+  "weights_bytes": 0,
+  "weights_sha256": "",
+  "arch": "{{ARCH}}",
+  "target_triple": "{{TARGET_TRIPLE}}",
+  "installed_at": 0,
+  "status": "installed",
+  "signed_by": "COGNITUM_OWNER_SIGNING_KEY",
+  "sig_algo": "Ed25519"
+}
@@ -0,0 +1,181 @@
+//! Multi-node fusion — combine N per-node count distributions into one.
+//!
+//! v0.1.0 ships **confidence-weighted log-sum** (Bayesian product of expert
+//! distributions): the more confident a node, the more its distribution
+//! shapes the fused output. With one node the fusion is a no-op; with N
+//! nodes uncertainty can only go down (or stay equal), never up.
+//!
+//! v0.2.0 will add a **Stoer-Wagner min-cut upper bound** on the fused
+//! distribution — see ADR-103 §"Multi-node fusion". That requires
+//! `ruvector-mincut` as a workspace dep on this crate; it's stubbed below
+//! behind `fuse_with_mincut_clip()` so callers can opt in once the dep
+//! lands and the min-cut graph builder for our subcarrier feature
+//! similarities is ready.
+
+use crate::inference::{CountPrediction, COUNT_CLASSES};
+
+/// Confidence-weighted log-sum of per-node count distributions.
+///
+/// For each class k, computes `log p_fused(k) = Σ_n c_n · log p_n(k)`,
+/// then re-normalises. The fused `confidence` is the **maximum** per-node
+/// confidence rather than the average — having at least one confident
+/// observation is worth more than many low-confidence ones.
+///
+/// Edge cases:
+/// * Empty input → 1-person, 0-confidence default (matches the stub).
+/// * Single input → returned as-is (defined behaviour, no-op).
+/// * Zero confidences across all nodes → unweighted log-sum.
+pub fn fuse_confidence_weighted(preds: &[CountPrediction]) -> CountPrediction {
+    if preds.is_empty() {
+        let mut probs = [0.0_f32; COUNT_CLASSES];
+        probs[1] = 1.0;
+        return CountPrediction { probs, confidence: 0.0 };
+    }
+    if preds.len() == 1 {
+        return preds[0].clone();
+    }
+
+    // Compute weights c_n with a small floor so zero-confidence nodes still
+    // contribute (log-of-zero would otherwise blow the math up).
+    const EPS_CONF: f32 = 1e-3;
+    let weights: Vec<f32> = preds.iter().map(|p| p.confidence.max(EPS_CONF)).collect();
+    let weight_sum: f32 = weights.iter().sum();
+
+    // Log-sum.
+    let mut log_p = [0.0_f32; COUNT_CLASSES];
+    for (pred, &w) in preds.iter().zip(weights.iter()) {
+        for k in 0..COUNT_CLASSES {
+            let p = pred.probs[k].max(1e-9); // floor to avoid log(0)
+            log_p[k] += (w / weight_sum) * p.ln();
+        }
+    }
+
+    // Subtract max for numerical stability, exponentiate, renormalise.
+    let m = log_p.iter().cloned().fold(f32::NEG_INFINITY, f32::max);
+    let mut p = [0.0_f32; COUNT_CLASSES];
+    let mut s = 0.0_f32;
+    for k in 0..COUNT_CLASSES {
+        p[k] = (log_p[k] - m).exp();
+        s += p[k];
+    }
+    if s > 0.0 {
+        for k in 0..COUNT_CLASSES { p[k] /= s; }
+    } else {
+        // Pathological — fall back to uniform.
+        for k in 0..COUNT_CLASSES { p[k] = 1.0 / COUNT_CLASSES as f32; }
+    }
+
+    let conf = preds.iter().map(|x| x.confidence).fold(0.0_f32, f32::max);
+    CountPrediction { probs: p, confidence: conf }
+}
+
+/// **Stoer-Wagner-clipped fusion** — v0.2.0 hook.
+///
+/// Takes the same per-node predictions plus a **max-distinct-persons**
+/// upper bound derived from the subcarrier-similarity graph's min-cut.
+/// Clips the fused distribution to `{0..=max}` and re-normalises.
+///
+/// Live `ruvector_mincut` integration lands in a follow-up PR; this entry
+/// point is here so the runtime can wire to it without an API break.
+pub fn fuse_with_mincut_clip(preds: &[CountPrediction], max_distinct: usize) -> CountPrediction {
+    let mut fused = fuse_confidence_weighted(preds);
+    let max_idx = max_distinct.min(COUNT_CLASSES - 1);
+    let mut leak = 0.0_f32;
+    for k in (max_idx + 1)..COUNT_CLASSES {
+        leak += fused.probs[k];
+        fused.probs[k] = 0.0;
+    }
+    if leak > 0.0 {
+        // Re-normalise the surviving prefix.
+        let sum: f32 = fused.probs[..=max_idx].iter().sum();
+        if sum > 0.0 {
+            for k in 0..=max_idx {
+                fused.probs[k] /= sum;
+            }
+        } else {
+            // All mass was above the cap — degenerate; place mass at the cap.
+            fused.probs[max_idx] = 1.0;
+        }
+    }
+    fused
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use approx::assert_relative_eq;
+
+    fn pred(probs: [f32; 8], conf: f32) -> CountPrediction {
+        CountPrediction { probs, confidence: conf }
+    }
+
+    #[test]
+    fn empty_returns_one_person_default() {
+        let p = fuse_confidence_weighted(&[]);
+        assert_eq!(p.argmax(), 1);
+        assert_eq!(p.confidence, 0.0);
+    }
+
+    #[test]
+    fn single_input_is_passthrough() {
+        let probs = [0.0, 0.1, 0.7, 0.2, 0.0, 0.0, 0.0, 0.0];
+        let p = fuse_confidence_weighted(&[pred(probs, 0.8)]);
+        assert_eq!(p.argmax(), 2);
+        assert_relative_eq!(p.confidence, 0.8, max_relative = 1e-6);
+    }
+
+    #[test]
+    fn two_agreeing_nodes_sharpen_the_peak() {
+        // Both nodes vote 2 with moderate spread. Fusion should sharpen.
+        let probs = [0.05, 0.15, 0.60, 0.15, 0.05, 0.0, 0.0, 0.0];
+        let fused = fuse_confidence_weighted(&[pred(probs, 0.7), pred(probs, 0.7)]);
+        assert_eq!(fused.argmax(), 2);
+        assert!(
+            fused.probs[2] >= probs[2],
+            "expected fusion to sharpen the peak: pre={} post={}",
+            probs[2], fused.probs[2]
+        );
+    }
+
+    #[test]
+    fn high_confidence_node_overrides_low_confidence_disagreement() {
+        let strong = [0.0, 0.95, 0.05, 0.0, 0.0, 0.0, 0.0, 0.0]; // says 1
+        let weak   = [0.0, 0.1,  0.1,  0.1,  0.1,  0.1, 0.1, 0.4]; // weak, says 7
+        let fused = fuse_confidence_weighted(&[pred(strong, 0.95), pred(weak, 0.05)]);
+        assert_eq!(fused.argmax(), 1, "high-confidence vote should win");
+    }
+
+    #[test]
+    fn fusion_preserves_normalisation() {
+        let a = [0.1, 0.2, 0.3, 0.2, 0.1, 0.05, 0.03, 0.02];
+        let b = [0.05, 0.25, 0.35, 0.20, 0.10, 0.03, 0.01, 0.01];
+        let fused = fuse_confidence_weighted(&[pred(a, 0.5), pred(b, 0.5)]);
+        let s: f32 = fused.probs.iter().sum();
+        assert_relative_eq!(s, 1.0, max_relative = 1e-5);
+    }
+
+    #[test]
+    fn mincut_clip_caps_distribution_at_max_distinct() {
+        let probs = [0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.3, 0.2]; // mass on 5,6,7
+        let clipped = fuse_with_mincut_clip(&[pred(probs, 0.9)], 4);
+        // Anything above 4 must be zero
+        for k in 5..8 {
+            assert_eq!(clipped.probs[k], 0.0, "class {} should be clipped to 0", k);
+        }
+        // What's left has to renormalise to sum to 1 — even though pre-clip
+        // mass below 4 was zero, the degenerate fallback places mass at the cap.
+        let s: f32 = clipped.probs.iter().sum();
+        assert_relative_eq!(s, 1.0, max_relative = 1e-5);
+        assert_eq!(clipped.argmax(), 4);
+    }
+
+    #[test]
+    fn p95_range_is_inclusive_and_covers_at_least_95pct() {
+        let probs = [0.05, 0.6, 0.25, 0.05, 0.03, 0.01, 0.005, 0.005];
+        let p = pred(probs, 0.9);
+        let (lo, hi) = p.p95_range();
+        assert!(lo <= 1 && hi >= 1, "mode (1) must be inside [{}, {}]", lo, hi);
+        let mass: f32 = probs[lo..=hi].iter().sum();
+        assert!(mass >= 0.95, "[{}, {}] only covers {:.3}, need >= 0.95", lo, hi, mass);
+    }
+}
@@ -0,0 +1,246 @@
+//! Single-node count inference — Candle forward over a CSI window.
+//!
+//! Architecture (matches ADR-103 §"Architecture (v0.1.0)"):
+//!     Conv1d(56 -> 64,   k=3, dilation=1, padding=1)
+//!     Conv1d(64 -> 128,  k=3, dilation=2, padding=2)
+//!     Conv1d(128 -> 128, k=3, dilation=4, padding=4)
+//!     mean over time -> [128]                ← shared encoder
+//!     ├── Linear(128 -> 64) -> ReLU -> Linear(64 -> 8)  → softmax over {0..7}
+//!     └── Linear(128 -> 32) -> ReLU -> Linear(32 -> 1)  → sigmoid → confidence
+//!
+//! When the safetensors file is missing the engine falls back to a
+//! "single-person, zero-confidence" stub so the cog still satisfies the
+//! ADR-100 runtime contract and the dashboard surfaces "no model yet"
+//! instead of dropping frames silently.
+
+use candle_core::{DType, Device, Tensor};
+use candle_nn::{Conv1d, Conv1dConfig, Linear, Module, VarBuilder};
+use std::path::Path;
+use std::sync::Arc;
+
+/// `[56 subcarriers × 20 frames]` window — same shape as cog-pose-estimation.
+pub const INPUT_SUBCARRIERS: usize = 56;
+pub const INPUT_TIMESTEPS: usize = 20;
+/// Count classification over {0, 1, ..., 7} persons.
+pub const COUNT_CLASSES: usize = 8;
+
+#[derive(Debug, Clone)]
+pub struct CsiWindow {
+    pub data: Vec<f32>,
+}
+
+/// Per-node prediction emitted by the count head + confidence head.
+#[derive(Debug, Clone)]
+pub struct CountPrediction {
+    /// Categorical distribution over {0..7} persons. Sums to 1 within float
+    /// precision. Maximum-likelihood class is `argmax(probs)`.
+    pub probs: [f32; COUNT_CLASSES],
+    /// `[0, 1]` — confidence head output. Calibrated against (predicted == truth)
+    /// during training so consumers can use it as a probability of being right.
+    pub confidence: f32,
+}
+
+impl CountPrediction {
+    pub fn is_finite(&self) -> bool {
+        self.probs.iter().all(|v| v.is_finite()) && self.confidence.is_finite()
+    }
+
+    /// Maximum-likelihood class.
+    pub fn argmax(&self) -> usize {
+        let mut best_i = 0;
+        let mut best_v = self.probs[0];
+        for (i, &v) in self.probs.iter().enumerate().skip(1) {
+            if v > best_v {
+                best_v = v;
+                best_i = i;
+            }
+        }
+        best_i
+    }
+
+    /// `(low, high)` such that `Σ probs[low..=high] ≥ 0.95`. Used for the
+    /// `count_p95_low` / `count_p95_high` fields surfaced to consumers.
+    pub fn p95_range(&self) -> (usize, usize) {
+        let mode = self.argmax();
+        let mut lo = mode;
+        let mut hi = mode;
+        let mut acc = self.probs[mode];
+        while acc < 0.95 && (lo > 0 || hi < COUNT_CLASSES - 1) {
+            let left = if lo > 0 { self.probs[lo - 1] } else { -1.0 };
+            let right = if hi < COUNT_CLASSES - 1 { self.probs[hi + 1] } else { -1.0 };
+            if left >= right && lo > 0 {
+                lo -= 1;
+                acc += self.probs[lo];
+            } else if hi < COUNT_CLASSES - 1 {
+                hi += 1;
+                acc += self.probs[hi];
+            } else if lo > 0 {
+                lo -= 1;
+                acc += self.probs[lo];
+            } else {
+                break;
+            }
+        }
+        (lo, hi)
+    }
+}
+
+struct CountNet {
+    c1: Conv1d,
+    c2: Conv1d,
+    c3: Conv1d,
+    count_fc1: Linear,
+    count_fc2: Linear,
+    conf_fc1: Linear,
+    conf_fc2: Linear,
+}
+
+impl CountNet {
+    fn new(vb: VarBuilder<'_>) -> candle_core::Result<Self> {
+        let enc = vb.pp("enc");
+        let count = vb.pp("count_head");
+        let conf = vb.pp("conf_head");
+
+        let c1 = candle_nn::conv1d(
+            56, 64, 3,
+            Conv1dConfig { padding: 1, stride: 1, dilation: 1, groups: 1, ..Default::default() },
+            enc.pp("c1"),
+        )?;
+        let c2 = candle_nn::conv1d(
+            64, 128, 3,
+            Conv1dConfig { padding: 2, stride: 1, dilation: 2, groups: 1, ..Default::default() },
+            enc.pp("c2"),
+        )?;
+        let c3 = candle_nn::conv1d(
+            128, 128, 3,
+            Conv1dConfig { padding: 4, stride: 1, dilation: 4, groups: 1, ..Default::default() },
+            enc.pp("c3"),
+        )?;
+        let count_fc1 = candle_nn::linear(128, 64, count.pp("fc1"))?;
+        let count_fc2 = candle_nn::linear(64, COUNT_CLASSES, count.pp("fc2"))?;
+        let conf_fc1 = candle_nn::linear(128, 32, conf.pp("fc1"))?;
+        let conf_fc2 = candle_nn::linear(32, 1, conf.pp("fc2"))?;
+        Ok(Self { c1, c2, c3, count_fc1, count_fc2, conf_fc1, conf_fc2 })
+    }
+
+    fn forward(&self, x: &Tensor) -> candle_core::Result<(Tensor, Tensor)> {
+        let h = self.c1.forward(x)?.relu()?;
+        let h = self.c2.forward(&h)?.relu()?;
+        let h = self.c3.forward(&h)?.relu()?;
+        let h = h.mean(2)?; // [B, 128]
+
+        // Count head — logits then softmax
+        let c = self.count_fc1.forward(&h)?.relu()?;
+        let c = self.count_fc2.forward(&c)?;
+        let probs = candle_nn::ops::softmax(&c, candle_core::D::Minus1)?;
+
+        // Confidence head — sigmoid
+        let cf = self.conf_fc1.forward(&h)?.relu()?;
+        let cf = self.conf_fc2.forward(&cf)?;
+        let conf = candle_nn::ops::sigmoid(&cf)?;
+
+        Ok((probs, conf))
+    }
+}
+
+pub struct InferenceEngine {
+    inner: Option<Arc<CountNet>>,
+    device: Device,
+}
+
+impl InferenceEngine {
+    pub fn new() -> Result<Self, Box<dyn std::error::Error>> {
+        Self::with_weights(default_weights_path().as_deref())
+    }
+
+    pub fn with_weights(weights_path: Option<&Path>) -> Result<Self, Box<dyn std::error::Error>> {
+        let device = pick_device();
+        let inner = match weights_path {
+            Some(p) if p.exists() => {
+                // SAFETY: from_mmaped_safetensors mmaps the file for the
+                // VarBuilder's lifetime. Same pattern as cog-pose-estimation.
+                let vb = unsafe {
+                    VarBuilder::from_mmaped_safetensors(&[p.to_path_buf()], DType::F32, &device)?
+                };
+                let net = CountNet::new(vb)?;
+                Some(Arc::new(net))
+            }
+            _ => None,
+        };
+        Ok(Self { inner, device })
+    }
+
+    pub fn backend(&self) -> &'static str {
+        match (&self.inner, &self.device) {
+            (Some(_), Device::Cuda(_)) => "candle-cuda",
+            (Some(_), _) => "candle-cpu",
+            (None, _) => "stub",
+        }
+    }
+
+    pub fn infer(&self, window: &CsiWindow) -> Result<CountPrediction, Box<dyn std::error::Error>> {
+        if window.data.len() != INPUT_SUBCARRIERS * INPUT_TIMESTEPS {
+            return Err(format!(
+                "expected {} input values, got {}",
+                INPUT_SUBCARRIERS * INPUT_TIMESTEPS,
+                window.data.len()
+            )
+            .into());
+        }
+
+        let Some(net) = &self.inner else {
+            // Stub fallback: single-person, zero confidence. Surfaces "no
+            // model yet" honestly instead of pretending to know.
+            let mut probs = [0.0f32; COUNT_CLASSES];
+            probs[1] = 1.0; // mass on "1 person"
+            return Ok(CountPrediction { probs, confidence: 0.0 });
+        };
+
+        let t = Tensor::from_slice(
+            &window.data,
+            (1, INPUT_SUBCARRIERS, INPUT_TIMESTEPS),
+            &self.device,
+        )?;
+        let (probs_t, conf_t) = net.forward(&t)?;
+        let flat: Vec<f32> = probs_t.flatten_all()?.to_vec1()?;
+        if flat.len() != COUNT_CLASSES {
+            return Err(format!("count head produced {} probs, expected {}", flat.len(), COUNT_CLASSES).into());
+        }
+        let mut probs = [0.0f32; COUNT_CLASSES];
+        probs.copy_from_slice(&flat[..COUNT_CLASSES]);
+        let conf = conf_t.flatten_all()?.to_vec1::<f32>()?[0];
+
+        Ok(CountPrediction { probs, confidence: conf })
+    }
+}
+
+pub struct SyntheticInput;
+
+impl Default for SyntheticInput {
+    fn default() -> Self { Self }
+}
+
+impl SyntheticInput {
+    pub fn as_window(&self) -> CsiWindow {
+        CsiWindow { data: vec![0.0; INPUT_SUBCARRIERS * INPUT_TIMESTEPS] }
+    }
+}
+
+fn pick_device() -> Device {
+    #[cfg(feature = "cuda")]
+    if let Ok(d) = Device::cuda_if_available(0) {
+        return d;
+    }
+    Device::Cpu
+}
+
+fn default_weights_path() -> Option<std::path::PathBuf> {
+    let candidates = [
+        std::path::PathBuf::from("/var/lib/cognitum/apps/person-count/count_v1.safetensors"),
+        std::path::PathBuf::from("./count_v1.safetensors"),
+        std::path::PathBuf::from("./cog/artifacts/count_v1.safetensors"),
+        std::path::PathBuf::from("v2/crates/cog-person-count/cog/artifacts/count_v1.safetensors"),
+        std::path::PathBuf::from("crates/cog-person-count/cog/artifacts/count_v1.safetensors"),
+    ];
+    candidates.into_iter().find(|p| p.exists())
+}
@@ -0,0 +1,16 @@
+//! `cog-person-count` — learned multi-person counter (ADR-103).
+//!
+//! Replaces the PR #491 slot heuristic with:
+//!  * a small Candle network (encoder + count head + confidence head),
+//!  * Stoer-Wagner-bounded multi-node fusion,
+//!  * `{count, confidence, count_p95_low, count_p95_high}` output.
+//!
+//! Design lives in `docs/adr/ADR-103-learned-multi-person-counter.md`.
+
+pub mod fusion;
+pub mod inference;
+pub mod publisher;
+pub mod runtime;
+
+pub const COG_ID: &str = "person-count";
+pub const COG_VERSION: &str = env!("CARGO_PKG_VERSION");
@@ -0,0 +1,133 @@
+//! `cog-person-count` — Cognitum Cog binary entrypoint.
+//!
+//! Implements the ADR-100 runtime contract:
+//!     cog-person-count version
+//!     cog-person-count manifest
+//!     cog-person-count health
+//!     cog-person-count run --config <path>
+
+use clap::{Parser, Subcommand};
+use cog_person_count::{
+    inference::{InferenceEngine, SyntheticInput},
+    publisher,
+    COG_ID, COG_VERSION,
+};
+use serde::{Deserialize, Serialize};
+use serde_json::{json, Value};
+use std::path::PathBuf;
+
+#[derive(Parser)]
+#[command(name = "cog-person-count", version = COG_VERSION)]
+struct Cli {
+    #[command(subcommand)]
+    command: Cmd,
+}
+
+#[derive(Subcommand)]
+enum Cmd {
+    Version,
+    Manifest,
+    Health,
+    Run {
+        #[arg(long, value_name = "PATH")]
+        config: PathBuf,
+    },
+}
+
+#[derive(Debug, Serialize, Deserialize)]
+struct RunConfig {
+    #[serde(default = "default_sensing_url")]
+    sensing_url: String,
+    model_path: Option<PathBuf>,
+    #[serde(default = "default_poll_ms")]
+    poll_ms: u64,
+}
+
+fn default_sensing_url() -> String { "http://127.0.0.1:3000/api/v1/sensing/latest".to_string() }
+fn default_poll_ms() -> u64 { 40 }
+
+fn main() -> std::process::ExitCode {
+    init_logging();
+    let cli = Cli::parse();
+    let result = match cli.command {
+        Cmd::Version => cmd_version(),
+        Cmd::Manifest => cmd_manifest(),
+        Cmd::Health => cmd_health(),
+        Cmd::Run { config } => cmd_run(config),
+    };
+    match result {
+        Ok(()) => std::process::ExitCode::SUCCESS,
+        Err(err) => {
+            eprintln!("cog-person-count: {err}");
+            std::process::ExitCode::FAILURE
+        }
+    }
+}
+
+fn init_logging() {
+    let _ = tracing_subscriber::fmt()
+        .with_env_filter(
+            tracing_subscriber::EnvFilter::try_from_default_env()
+                .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info"))
+        )
+        .with_target(false)
+        .try_init();
+}
+
+fn cmd_version() -> Result<(), Box<dyn std::error::Error>> {
+    println!("{COG_ID} {COG_VERSION}");
+    Ok(())
+}
+
+fn cmd_manifest() -> Result<(), Box<dyn std::error::Error>> {
+    println!("{}", serde_json::to_string_pretty(&json!({
+        "id": COG_ID,
+        "version": COG_VERSION,
+        "binary_url": Value::Null,
+        "binary_bytes": Value::Null,
+        "binary_sha256": Value::Null,
+        "binary_signature": Value::Null,
+        "installed_at": Value::Null,
+        "status": Value::Null,
+    }))?);
+    Ok(())
+}
+
+fn cmd_health() -> Result<(), Box<dyn std::error::Error>> {
+    let engine = InferenceEngine::new()?;
+    let pred = engine.infer(&SyntheticInput::default().as_window())?;
+    if !pred.is_finite() {
+        return Err("inference produced non-finite output".into());
+    }
+    publisher::health_ok(COG_ID, engine.backend(), &pred);
+    Ok(())
+}
+
+fn cmd_run(config_path: PathBuf) -> Result<(), Box<dyn std::error::Error>> {
+    let raw = std::fs::read_to_string(&config_path)
+        .map_err(|e| format!("failed to read config at {}: {}", config_path.display(), e))?;
+    let cfg: RunConfig = serde_json::from_str(&raw)
+        .map_err(|e| format!("failed to parse config at {}: {}", config_path.display(), e))?;
+
+    let engine = InferenceEngine::with_weights(cfg.model_path.as_deref())?;
+    publisher::run_started(
+        COG_ID,
+        &cfg.sensing_url,
+        cfg.poll_ms,
+        &cfg.model_path
+            .as_ref()
+            .map(|p| p.display().to_string())
+            .unwrap_or_else(|| "(auto-discover)".to_string()),
+    );
+
+    let rt = tokio::runtime::Builder::new_multi_thread()
+        .enable_all()
+        .build()?;
+    rt.block_on(cog_person_count::runtime::run_loop(
+        cog_person_count::runtime::RunConfig {
+            sensing_url: cfg.sensing_url,
+            poll_ms: cfg.poll_ms,
+        },
+        engine,
+    ))
+}
@@ -0,0 +1,75 @@
+//! Structured JSON event publisher — one event per line on stdout.
+
+use crate::inference::CountPrediction;
+use serde::Serialize;
+use serde_json::{json, Value};
+use std::time::{SystemTime, UNIX_EPOCH};
+
+#[derive(Debug, Serialize)]
+pub struct Event<'a> {
+    pub ts: f64,
+    pub level: &'a str,
+    pub event: &'a str,
+    pub fields: Value,
+}
+
+pub fn emit_event(ev: &Event<'_>) {
+    if let Ok(line) = serde_json::to_string(ev) {
+        println!("{line}");
+    }
+}
+
+pub fn health_ok(cog_id: &str, backend: &str, p: &CountPrediction) {
+    let (lo, hi) = p.p95_range();
+    emit_event(&Event {
+        ts: now_secs(),
+        level: "info",
+        event: "health.ok",
+        fields: json!({
+            "cog": cog_id,
+            "backend": backend,
+            "synthetic_count": p.argmax(),
+            "synthetic_confidence": p.confidence,
+            "synthetic_p95_range": [lo, hi],
+        }),
+    });
+}
+
+pub fn run_started(cog_id: &str, sensing_url: &str, poll_ms: u64, model_path: &str) {
+    emit_event(&Event {
+        ts: now_secs(),
+        level: "info",
+        event: "run.started",
+        fields: json!({
+            "cog": cog_id,
+            "sensing_url": sensing_url,
+            "poll_ms": poll_ms,
+            "model_path": model_path,
+        }),
+    });
+}
+
+pub fn person_count(tick: u64, fused: &CountPrediction, n_nodes: usize) {
+    let (lo, hi) = fused.p95_range();
+    emit_event(&Event {
+        ts: now_secs(),
+        level: "info",
+        event: "person.count",
+        fields: json!({
+            "tick": tick,
+            "count": fused.argmax(),
+            "confidence": fused.confidence,
+            "count_p95_low": lo,
+            "count_p95_high": hi,
+            "n_nodes": n_nodes,
+            "probs": fused.probs,
+        }),
+    });
+}
+
+fn now_secs() -> f64 {
+    SystemTime::now()
+        .duration_since(UNIX_EPOCH)
+        .map(|d| d.as_secs_f64())
+        .unwrap_or(0.0)
+}
@@ -0,0 +1,77 @@
+//! Long-running inference loop. Polls the appliance's sensing-server,
+//! slides a CSI window, runs the count head, and emits `person.count`
+//! events. Same shape as `cog-pose-estimation::runtime`.
+//!
+//! Multi-node fusion is single-node only in v0.0.1 — the appliance's
+//! `/api/v1/sensing/latest` endpoint already aggregates across nodes
+//! before serving, so per-cog fusion is deferred until each node ships
+//! raw frames separately (ADR-103 §"Multi-node fusion" v0.2.0).
+
+use crate::inference::{CsiWindow, InferenceEngine, INPUT_SUBCARRIERS, INPUT_TIMESTEPS};
+use crate::publisher;
+use std::time::Duration;
+use tokio::time::sleep;
+
+pub struct RunConfig {
+    pub sensing_url: String,
+    pub poll_ms: u64,
+}
+
+pub async fn run_loop(
+    cfg: RunConfig,
+    engine: InferenceEngine,
+) -> Result<(), Box<dyn std::error::Error>> {
+    let mut buffer: Vec<f32> = Vec::with_capacity(INPUT_SUBCARRIERS * INPUT_TIMESTEPS);
+    let cap = INPUT_SUBCARRIERS * INPUT_TIMESTEPS;
+    let mut tick: u64 = 0;
+
+    loop {
+        match fetch_frame(&cfg.sensing_url).await {
+            Ok(amplitudes) => {
+                tick += 1;
+                buffer.extend(amplitudes);
+                while buffer.len() > 2 * cap {
+                    let extra = buffer.len() - cap;
+                    buffer.drain(0..extra);
+                }
+                if buffer.len() >= cap {
+                    let window = CsiWindow { data: buffer[buffer.len() - cap..].to_vec() };
+                    if let Ok(pred) = engine.infer(&window) {
+                        // v0.0.1 ships single-node — fusion is a no-op for
+                        // N=1. v0.2.0 will append additional per-node
+                        // predictions to a vec and call
+                        // `fusion::fuse_confidence_weighted` before emit.
+                        publisher::person_count(tick, &pred, 1);
+                    }
+                }
+            }
+            Err(e) => {
+                tracing::warn!(error = %e, "sensing-server fetch failed");
+            }
+        }
+        sleep(Duration::from_millis(cfg.poll_ms)).await;
+    }
+}
+
+async fn fetch_frame(url: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
+    let url = url.to_string();
+    let body = tokio::task::spawn_blocking(move || -> Result<String, ureq::Error> {
+        Ok(ureq::get(&url).call()?.into_string()?)
+    })
+    .await??;
+    let json: serde_json::Value = serde_json::from_str(&body)?;
+    let snapshot = json.get("snapshot").unwrap_or(&json);
+    let nodes = snapshot
+        .get("nodes")
+        .and_then(|v| v.as_array())
+        .ok_or("missing nodes[]")?;
+    let amplitude = nodes
+        .first()
+        .and_then(|n| n.get("amplitude"))
+        .and_then(|v| v.as_array())
+        .ok_or("missing nodes[0].amplitude[]")?;
+    Ok(amplitude
+        .iter()
+        .filter_map(|v| v.as_f64().map(|f| f as f32))
+        .collect())
+}
@@ -0,0 +1,84 @@
+//! Smoke tests for cog-person-count.
+
+use cog_person_count::{
+    fusion::{fuse_confidence_weighted, fuse_with_mincut_clip},
+    inference::{
+        CountPrediction, CsiWindow, InferenceEngine, SyntheticInput,
+        COUNT_CLASSES, INPUT_SUBCARRIERS, INPUT_TIMESTEPS,
+    },
+};
+
+#[test]
+fn synthetic_window_has_correct_shape() {
+    let w = SyntheticInput::default().as_window();
+    assert_eq!(w.data.len(), INPUT_SUBCARRIERS * INPUT_TIMESTEPS);
+}
+
+#[test]
+fn stub_engine_returns_finite_output() {
+    let engine = InferenceEngine::with_weights(None).expect("stub engine");
+    let pred = engine.infer(&SyntheticInput::default().as_window()).expect("infer");
+    assert!(pred.is_finite());
+    assert_eq!(pred.probs.len(), COUNT_CLASSES);
+
+    let sum: f32 = pred.probs.iter().sum();
+    assert!((sum - 1.0).abs() < 1e-5, "stub probs must sum to 1, got {}", sum);
+    assert_eq!(pred.argmax(), 1, "stub default is 1-person");
+    assert_eq!(pred.confidence, 0.0, "stub confidence is 0");
+}
+
+#[test]
+fn engine_rejects_wrong_shape_input() {
+    let engine = InferenceEngine::with_weights(None).expect("stub engine");
+    let bad = CsiWindow { data: vec![0.0; 10] };
+    assert!(engine.infer(&bad).is_err());
+}
+
+#[test]
+fn stub_backend_string_is_stable() {
+    let engine = InferenceEngine::with_weights(None).expect("stub engine");
+    assert_eq!(engine.backend(), "stub");
+}
+
+#[test]
+fn p95_range_includes_mode() {
+    // Sharp peak at 2
+    let mut probs = [0.0_f32; COUNT_CLASSES];
+    probs[2] = 0.85;
+    probs[1] = 0.08;
+    probs[3] = 0.07;
+    let p = CountPrediction { probs, confidence: 0.9 };
+    let (lo, hi) = p.p95_range();
+    assert!(lo <= 2 && hi >= 2);
+}
+
+#[test]
+fn fusion_with_no_inputs_is_safe_default() {
+    let p = fuse_confidence_weighted(&[]);
+    assert_eq!(p.argmax(), 1);
+    assert_eq!(p.confidence, 0.0);
+}
+
+#[test]
+fn fusion_passes_through_single_node() {
+    // A single-node ESP32 deployment must produce the same output as the
+    // raw inference — fusion is a no-op for N=1.
+    let mut probs = [0.0_f32; COUNT_CLASSES];
+    probs[3] = 1.0;
+    let input = CountPrediction { probs, confidence: 0.6 };
+    let out = fuse_confidence_weighted(&[input.clone()]);
+    assert_eq!(out.argmax(), 3);
+    assert!((out.confidence - 0.6).abs() < 1e-6);
+}
+
+#[test]
+fn mincut_clip_with_high_cap_is_noop() {
+    let mut probs = [0.0_f32; COUNT_CLASSES];
+    probs[2] = 0.5;
+    probs[3] = 0.5;
+    let input = CountPrediction { probs, confidence: 0.7 };
+    let clipped = fuse_with_mincut_clip(&[input], 7);
+    // No clip happened (cap == max class)
+    assert!((clipped.probs[2] - 0.5).abs() < 1e-6);
+    assert!((clipped.probs[3] - 0.5).abs() < 1e-6);
+}
Author	SHA1	Message	Date
ruv	b16d7431bc	docs(bench): append v0.0.2 section to person-count benchmark log Documents the K-fold diagnostic (62.2 ± 1.9% / class-1 57.1%) that justified v0.0.2, the v0.0.2 numbers (class-1 0% → 34.3%), and the honest read that the gap to the K-fold mean is run-to-run variance not missing improvement.	2026-05-21 19:47:55 -04:00
rUv	b3a5012dbd	feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699 ) * chore: stage v0.0.2 artifacts + temperature scalar for build pipeline Stages count_v1.{safetensors,onnx,temperature,train_results.json} ahead of the build/sign/upload step. This commit is a momentary side-effect — the next commit will refresh the per-arch manifests with the new binary SHAs once ruvultra finishes the cross-build. The .temperature file holds the calibration scalar from LBFGS over the held-out conf logits. The Rust cog will read it post-load and divide conf_logits by it before sigmoid, exactly matching the Python eval. * feat(cog-person-count): v0.0.2 — K-fold validated, label smoothing + early stop + temp scale The v0.0.1 "65.1% but class-1=0%" result was an unlucky temporal split that let a degenerate "always predict 0" classifier hit eval acc = class-0 fraction. 5-fold stratified random CV proved the architecture actually learns ~57.1% class-1 accuracy under fair splits — a real, modestly useful signal. v0.0.2 ships a retrained model that: * Splits randomly (seed=42) 80/20 instead of temporally — eliminates the trailing-window-class-imbalance cheat. * Class-balanced sampler (multinomial with replacement, weighted by inverse class frequency) — per-batch expected counts are equal regardless of dataset distribution. * Label smoothing 0.1 on the cross-entropy — reduces confidence saturation that drove v0.0.1's all-or-nothing predictions. * Early stopping with patience=20 — stops at epoch 29 instead of overfitting through 400. * Temperature scaling of the conf head — LBFGS fits a scalar T on held-out conf logits; ships as a count_v1.temperature sidecar so the Rust cog can divide conf_logits by T before sigmoid. Numbers on the same data: \| Metric \| v0.0.1 \| v0.0.2 \| K-fold (5x100) \| \|------------------\|--------\|--------\|----------------\| \| Overall acc \| 65.1% \| 62.3% \| 62.2% ± 1.9% \| \| Class 0 acc \| 100% \| 86.2% \| 67.4% \| \| Class 1 acc \| 0% \| 34.3% \| 57.1% ✓ \| \| MAE \| 0.349 \| 0.377 \| 0.378 \| \| Spearman \| 0.023 \| 0.013 \| 0.160 \| Class-1 accuracy 0 → 34.3% is the headline win. Net acc moves slightly because we stopped cheating on class 0. K-fold's 57% says there's headroom remaining; reaching it needs more independent splits (== more data), not more training tricks. Confidence calibration didn't move. Temperature scaling alone can't fix a confidence head trained against a noisy argmax==truth indicator over a 62%-accurate classifier — the head's training signal is the issue, not its post-hoc transform. The honest fix is multi-room data (#645), not another calibration knob. Live on cognitum-v0 at /var/lib/cognitum/apps/person-count/ — health reports candle-cpu backend, count = 1 (was 0 in v0.0.1) on synthetic zero input. Files changed: * scripts/train-count.py — adds --k-fold (no sklearn dep, hand-rolled stratified splits with deterministic shuffle) and --v2 paths. * v2/.../cog/artifacts/count_v1.safetensors (392 KB, new sha 32996433…) + count_v1.onnx (16 KB) + count_v1.temperature (0.9262 scalar) + count_train_results.json (full epoch trace). * v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json bumped to version 0.0.2 with the new weights_sha256 + caveats. * docs/benchmarks/person-count-cog.md — appends a v0.0.2 section with the K-fold diagnostic table and honest-read paragraph. GCS: gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors refreshed (binaries unchanged — load weights via mmap at runtime).	2026-05-21 19:47:04 -04:00
rUv	e6a5df36eb	chore(cog-person-count): refresh GCS manifests after run-wiring rebuild (#698 ) The arm + x86_64 manifests committed in #696 referenced the binaries built before #697 wired the `run` subcommand. Rebuilt + re-signed + re-uploaded to GCS, and re-deployed to cognitum-v0: arm sha 15c2fbac…7728ea5 (3,807,456 B, up from 2,168,816 — added Tokio runtime) x86_64 sha 051614ce…cc8388b3 (4,502,960 B, up from 2,615,528) Both re-signed Ed25519 with COGNITUM_OWNER_SIGNING_KEY. Manifests now match the binaries published at gs://cognitum-apps/cogs/{arm, x86_64}/cog-person-count-* and the binary installed at /var/lib/cognitum/apps/person-count/ on cognitum-v0.	2026-05-21 19:13:10 -04:00
rUv	5c914e63c7	feat(cog-person-count): wire `run` subcommand — v0.0.1 fully functional (#697 ) Phase 4 of ADR-103. Adds the long-running polling loop so the cog's fourth verb (`run`) does real work, completing the ADR-100 runtime contract end-to-end: cog-person-count version → "person-count 0.3.0" cog-person-count manifest → JSON skeleton cog-person-count health → loads weights + 1-shot infer + emit cog-person-count run --config → long-running per-frame emit ← THIS What ships: * src/runtime.rs (new) — `run_loop` polls sensing_url every poll_ms, slides a [56, 20] CSI window, runs InferenceEngine::infer, emits publisher::person_count events. Same shape as cog-pose-estimation::runtime — fetch_frame extracts amplitudes from `snapshot.nodes[0].amplitude[]`, fails open on connect errors with a WARN log rather than crashing. * src/lib.rs — registers the runtime module. * src/main.rs — cmd_run now loads RunConfig from a JSON file, builds the InferenceEngine (with weights if cfg.model_path is set, otherwise auto-discover), emits a run.started event, and hands off to the Tokio multi-thread runtime's block_on(run_loop). Single-node fusion is a no-op for N=1 today; v0.2.0 will append predictions from sibling nodes and call fusion::fuse_confidence_weighted before emit. Verified locally: cargo check -p cog-person-count --no-default-features → clean cargo test -p cog-person-count → 15/15 pass (no regressions) cargo build -p cog-person-count --release → 2.36 MB unchanged ./cog-person-count run --config bad-config.json: line 1: {"event":"run.started","fields":{"cog":"person-count", "sensing_url":"http://127.0.0.1:9999/...",poll_ms:100, "model_path":"(auto-discover)"}} line 2: WARN sensing-server fetch failed error=Connection Failed: Connect error: actively refused (loop alive — exits cleanly on SIGTERM, no crash, no NaN) Also adds a "Relationship to the in-process score_to_person_count heuristic" section to cog/README.md explaining the dual-emitter design (sensing-server keeps emitting the PR #491 slot heuristic; the cog runs out-of-process and emits person.count events from the learned model). Operators choose by installing the cog or not — no sensing-server rebuild required. ADR-103 §"Migration" status: 1. Land ADR + scaffold ........... done (#693, #694) 2. Train count_v1 ................ done (#695) 3. Cross-compile + sign + GCS .... done (#696) 4. Server-side wiring ............ done — out-of-process design means no rewire needed; this cog is the wiring. 5. v0.2.0 multi-room + LoRA ...... data-bound (#645)	2026-05-21 19:10:15 -04:00
rUv	a5e99670f8	feat(cog-person-count): release v0.0.1 — signed binaries on GCS, live on cognitum-v0 (#696 ) Phase 3 of ADR-103. Cross-compiled aarch64 + x86_64 on ruvultra, signed with COGNITUM_OWNER_SIGNING_KEY (Ed25519), uploaded to GCS, and live- installed on the cognitum-v0 Pi 5 alongside cog-pose-estimation. Real-hardware bench on cognitum-v0: ./cog-person-count-arm health → backend=candle-cpu, count=0, confidence=0.49, p95=[0,7] 30 sequential health invocations: 0.276 s → 9.2 ms/invocation cold Compares to cog-pose-estimation's 8.4 ms — count cog is ~10% slower because the dual-head (count softmax + confidence sigmoid) does ~2x the work after the shared encoder. GCS release artifacts (publicly downloadable, SHA-verified): arm/cog-person-count-arm 2,168,816 B sha: 36bc0bb0...0d47b507b3c3 sig: R/00xdzHriyr/2r...JK+a6k71NDg== (Ed25519) x86_64/cog-person-count-x86_64 2,615,528 B sha: 76cdd1ec...3923 7392b01db sig: QB+8cnGSMQmu...ZtTNIQ2rDg== (Ed25519) arm/cog-person-count-count_v1.safetensors 392,088 B sha: dacb0551...e6e04ff56d15c3a65a9ff Live install at /var/lib/cognitum/apps/person-count/ on cognitum-v0 matches the layout of every other installed cog (anomaly-detect, seizure-detect, pose-estimation): cog-person-count-arm binary, count_v1.safetensors weights, manifest.json, config.json. Adds: * v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json — full ADR-100 schema with all fields filled (sha + sig + size + URL + build_metadata carrying the v0.0.1 honest training caveats). * docs/benchmarks/person-count-cog.md — appends "Live appliance install" and "Signed GCS release artifacts" sections to the benchmark log. Honest v0.0.1 caveat still applies (class-1 accuracy 0% on the held- out tail of the single-session training data) — same data-bound limit as pose_v1. The shipped artifact is the vehicle; production- quality accuracy follows from multi-room paired data per ADR-103's v0.2.0 plan + #645.	2026-05-21 19:02:26 -04:00
rUv	6b4994e105	feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103) (#695 ) Phase 2 of ADR-103: trained count head on the existing 1,077 paired samples (the same data that produced pose_v1 yesterday). Honest result: 65.1% eval accuracy / 100% within ±1 / MAE 0.349 on the held-out time-window. Per-class: 100% on "empty room" / 0% on "1 person". The model overfit by epoch 100 (train_acc → 1.0, eval_loss climbed 0.67 → 7.8) and the "best" checkpoint is the snapshot that happened to predict the eval window's class distribution (140/215 = 65.1%, matches eval_acc exactly). Confidence head Spearman = 0.023 ⇒ uncalibrated. Same data-bound failure mode as pose_v1 (#645), bounded by single-session training data; same fix path (multi-room). What v0.0.1 still validates end-to-end: * PyTorch → safetensors → Candle Rust loads cleanly on first try. `cog-person-count health` reports `backend: candle-cpu` and emits real per-frame predictions instead of the stub backend's hard-coded {1 person, 0 confidence}. Architecture parity between train-count.py and src/inference.rs::CountNet is bit-exact. * ONNX export bit-clean (16 KB, opset 18, dynamic batch axis). * Training wall time: 5.6 s for 400 epochs on RTX 5080. * Binary size unchanged (2.36 MB stripped), model loads via mmap at runtime. This commit ships: * scripts/align-ground-truth.js: extended to emit n_persons_mode + n_persons_max per window so the training pipeline has count labels. Backwards-compatible (additive fields). * scripts/train-count.py: new — mirrors CountNet architecture exactly, loads paired.jsonl, trains 400 epochs with CE+BCE+Brier loss, exports safetensors + ONNX + per-epoch JSON. * v2/.../cog/artifacts/{count_v1.safetensors,count_v1.onnx, count_train_results.json}: the trained artifacts. * v2/.../cog/README.md: Status table updated with the v0.0.1 numbers + an Honest Caveat section explaining the data-bound result. * docs/benchmarks/person-count-cog.md: new — full v0.0.1 benchmark log mirroring the format docs/benchmarks/pose-estimation-cog.md established. Includes comparison to ADR-103 v0.1.0 acceptance gates and per-class breakdown. Still pending: * `run` subcommand wiring (long-running polling loop, same as pose) * Cross-compile + sign + GCS upload (mirror of pose cog pipeline) * Live install on cognitum-v0 * v0.2.0: re-train on multi-room data, LoRA per-room adapters, Stoer-Wagner min-cut clip in fusion stage	2026-05-21 18:56:52 -04:00
rUv	6959a42312	feat(cog-person-count): v0.0.1 scaffold + tests + fusion math + bench (ADR-103) (#694 ) First implementation PR for ADR-103. Same incremental shape that ADR-101 used: scaffold the cog crate, ship a stub-backend release that satisfies the runtime contract + 15 tests + measured cold-start, then follow up with the trained count_v1.safetensors in a separate PR. What ships: * v2/crates/cog-person-count/ — new workspace member. - Cargo.toml: candle-core/candle-nn 0.9 (cpu default, cuda feature opt-in), safetensors, ureq, sha2 — same dep shape as the pose cog but minus wifi-densepose-train (this cog has no training-side consumer, so the dep tree is materially smaller → 2.36 MB binary vs the pose cog's 4.5 MB). - src/inference.rs: CountNet (Conv1d 56→64→128→128 encoder + count head Linear(128→64→8)+softmax + confidence head Linear(128→32→1)+sigmoid). Stub backend returns `{1-person, 0-confidence}` honestly when no safetensors present. - src/fusion.rs: fuse_confidence_weighted() — Bayesian product of per-node distributions with confidence-weighted log-sum, plus fuse_with_mincut_clip() hook for the v0.2.0 Stoer-Wagner upper-bound (`ruvector-mincut` dep lands when min-cut graph builder is ready). Confidences floored at 1e-3 and probs floored at 1e-9 before logs — no NaN propagation. - src/publisher.rs: emits {count, confidence, count_p95_low, count_p95_high, n_nodes, probs} per ADR-103 §"Output". - src/main.rs: full ADR-100 four-verb CLI (version\|manifest\|health \|run). The `run` subcommand explicitly returns "wiring pending v0.0.1" so the in-process library API is the v0.0.1-clean integration path. - tests/smoke.rs (8 tests) + fusion::tests (7 tests, in-lib) — 15 total, all green. Cover stub-backend behaviour, wrong-shape rejection, fusion math (empty / single / agreement / high-conf override / normalisation), p95-range correctness, and min-cut clip semantics. - cog/{manifest.template.json, config.schema.json, README.md} + cog/artifacts/ placeholder dir. * v2/Cargo.toml: registers the new workspace member. Verified locally: cargo check -p cog-person-count --no-default-features → clean cargo test -p cog-person-count --no-default-features → 8/8 pass cargo test -p cog-person-count --lib → 7/7 pass cargo build -p cog-person-count --release → 2.36 MB binary ./cog-person-count version → "person-count 0.3.0" ./cog-person-count manifest → JSON skeleton ./cog-person-count health → backend:stub, count:1, conf:0, p95:[1,1] Cold-start: 30 sequential `health` invocations → 53.3 ms/invocation (vs cog-pose-estimation's 76.2 ms — smaller dep tree) cog/README.md adds: * Security section — six-row threat table covering safetensor mmap trust, non-finite outputs, sensing fetch failures, fusion divide-by-zero / log-of-zero, min-cut degenerate cases, and stdout spoofing. * Performance / optimization section — binary size, release profile (already opt-level=3 / lto=fat / codegen-units=1 / strip=true at workspace level), cold-start comparison table, projected warm-path latency budget. Still pending (separate PRs, ADR-103 §"Migration"): * Train count_v1.safetensors on the existing 1,077 paired samples with `n_persons` labels (Candle on RTX 5080, same script that produced pose_v1.safetensors yesterday). * `run` subcommand wiring (long-running polling loop, same shape as cog-pose-estimation::runtime). * Cross-compile + sign + GCS upload (mirror of cog-pose-estimation release pipeline). * Server-side `csi.rs::score_to_person_count` call-site rewire to consume this cog when installed; falls back to PR #491's heuristic when not.	2026-05-21 18:46:57 -04:00
rUv	962e0f4a34	docs(adr): ADR-103 — learned multi-person counter (SOTA path) (#693 ) Motivated by #499 (multi-node double-skeletons) which PR #491 stopped the bleeding on but didn't take to the WiFi-CSI literature's state of the art. Designs a learned counter that replaces today's slot heuristic + dedup_factor knob, reusing the primitives we've already shipped this week: * Candle / RTX 5080 training pipeline (proven yesterday, 2.1 s for 400 epochs on pose_v1.safetensors) * HF presence encoder as initialization (architectures compatible, unlike the pose head case) * ruvector-mincut (Stoer-Wagner) for multi-node fusion upper-bound * Cog packaging spec (ADR-100) + edge module registry (ADR-102) * Paired-data pipeline (PR #641 streaming-safe align-ground-truth.js) — `n_persons` labels come for free; no new data collection campaign required to bootstrap. Architecture: per-node CSI [56×20] -> frozen HF encoder -> 128-dim embedding \ > count head (softmax {0..7}) > confidence head (sigmoid) N nodes' distributions -> confidence-weighted log-sum -> Stoer-Wagner min-cut upper-bound clip -> { count, confidence, count_p95_low, count_p95_high, per_node_breakdown } Compares the proposal explicitly against WiCount / DeepCount / CrossCount / HeadCount published numbers and is honest about the hardware gap (their 3x3 MIMO research NICs vs our 1x1 SISO ESP32-S3). v0.1.0 acceptance gates target >=80% within-+/-1 same-room and >=60% cross-room — modest on purpose; bounded by the same paired- data scarcity #645 documents for pose. The framework is the deliverable; the accuracy follows the data. Includes: * Architecture diagram in ascii * Comparison table vs published WiFi-CSI counting SOTA * Per-failure-mode mapping from #499 symptoms to how the learned counter addresses each * v0.1.0 + v0.2.0 acceptance gates with measurable thresholds * Repo layout for the new `v2/crates/cog-person-count/` crate * Five-step migration plan from this ADR -> first GCS release Status: Proposed. Implementation follows in the same incremental pattern ADR-101 used: scaffold-cog PR -> train+publish PR -> server-wiring PR.	2026-05-21 18:28:18 -04:00
ruv	c58f49f21a	fix(firmware): add vTaskDelay(1) yields in process_frame() at tier>=2 to fix WDT storm (#683 ) At edge tier>=2 on N16R8 PSRAM boards, `process_frame()` runs `update_multi_person_vitals()` (4 persons × 256 history samples) plus `wasm_runtime_on_frame()` back-to-back before returning to `edge_task()`. The existing `vTaskDelay(1)` in `edge_task()` only fires after `process_frame()` returns — under sustained 30 pps CSI load on PSRAM boards this leaves IDLE1 on Core 1 starved long enough for the 5-second Task Watchdog Timer to fire. Fix: add two `vTaskDelay(1)` calls inside `process_frame()`, both gated on `s_cfg.tier >= 2`: 1. After `update_multi_person_vitals()` (Step 11) 2. After `wasm_runtime_on_frame()` dispatch (Step 14) Tier 0/1 paths are unaffected. Validated on COM7 (N16R8 board): `Edge DSP task started on core 1 (tier=2)`, no WDT panics in 20 s. Also bump firmware version 0.6.5 → 0.6.6 and refresh all 6 release_bins with the new build (8MB + 4MB variants, built 2026-05-21). Fix-marker RuView#683 added to scripts/fix-markers.json. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-21 09:20:21 -04:00
ruv	cbcb389cb6	assets: add seed.png (Cognitum Seed hero image) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-21 00:47:01 -04:00
ruv	e00cee6146	docs(readme): add Cognitum Seed image after hero — links to cognitum.one/seed Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-21 00:45:30 -04:00
rUv	5dcafc9c37	Update README.md https://cognitum.one/seed	2026-05-21 00:30:20 -04:00
@@ -1 +1 @@
 .6.5
 .6.6