Compare commits

..

4 Commits

Author SHA1 Message Date
ruv b16d7431bc docs(bench): append v0.0.2 section to person-count benchmark log
Documents the K-fold diagnostic (62.2 ± 1.9% / class-1 57.1%) that
justified v0.0.2, the v0.0.2 numbers (class-1 0% → 34.3%), and the
honest read that the gap to the K-fold mean is run-to-run variance
not missing improvement.
2026-05-21 19:47:55 -04:00
rUv b3a5012dbd feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699)
* chore: stage v0.0.2 artifacts + temperature scalar for build pipeline

Stages count_v1.{safetensors,onnx,temperature,train_results.json}
ahead of the build/sign/upload step. This commit is a momentary
side-effect — the next commit will refresh the per-arch manifests
with the new binary SHAs once ruvultra finishes the cross-build.

The .temperature file holds the calibration scalar from LBFGS over the
held-out conf logits. The Rust cog will read it post-load and divide
conf_logits by it before sigmoid, exactly matching the Python eval.

* feat(cog-person-count): v0.0.2 — K-fold validated, label smoothing + early stop + temp scale

The v0.0.1 "65.1% but class-1=0%" result was an unlucky temporal split
that let a degenerate "always predict 0" classifier hit eval acc =
class-0 fraction. 5-fold stratified random CV proved the architecture
actually learns ~57.1% class-1 accuracy under fair splits — a real,
modestly useful signal.

v0.0.2 ships a retrained model that:

* **Splits randomly (seed=42) 80/20** instead of temporally — eliminates
  the trailing-window-class-imbalance cheat.
* **Class-balanced sampler** (multinomial with replacement, weighted by
  inverse class frequency) — per-batch expected counts are equal
  regardless of dataset distribution.
* **Label smoothing 0.1** on the cross-entropy — reduces confidence
  saturation that drove v0.0.1's all-or-nothing predictions.
* **Early stopping** with patience=20 — stops at epoch 29 instead of
  overfitting through 400.
* **Temperature scaling** of the conf head — LBFGS fits a scalar T on
  held-out conf logits; ships as a count_v1.temperature sidecar so the
  Rust cog can divide conf_logits by T before sigmoid.

Numbers on the same data:

  | Metric           | v0.0.1 | v0.0.2 | K-fold (5x100) |
  |------------------|--------|--------|----------------|
  | Overall acc      | 65.1%  | 62.3%  | 62.2% ± 1.9%   |
  | Class 0 acc      | 100%   | 86.2%  | 67.4%          |
  | Class 1 acc      |  0%    | 34.3%  | 57.1% ✓        |
  | MAE              | 0.349  | 0.377  | 0.378          |
  | Spearman         | 0.023  | 0.013  | 0.160          |

Class-1 accuracy 0 → 34.3% is the headline win. Net acc moves slightly
because we stopped cheating on class 0. K-fold's 57% says there's
headroom remaining; reaching it needs more independent splits (== more
data), not more training tricks.

Confidence calibration didn't move. Temperature scaling alone can't fix
a confidence head trained against a noisy argmax==truth indicator over
a 62%-accurate classifier — the head's training signal is the issue,
not its post-hoc transform. The honest fix is multi-room data (#645),
not another calibration knob.

Live on cognitum-v0 at /var/lib/cognitum/apps/person-count/ — health
reports candle-cpu backend, count = 1 (was 0 in v0.0.1) on synthetic
zero input.

Files changed:
* scripts/train-count.py — adds --k-fold (no sklearn dep, hand-rolled
  stratified splits with deterministic shuffle) and --v2 paths.
* v2/.../cog/artifacts/count_v1.safetensors (392 KB, new sha
  32996433…) + count_v1.onnx (16 KB) + count_v1.temperature (0.9262
  scalar) + count_train_results.json (full epoch trace).
* v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json bumped to
  version 0.0.2 with the new weights_sha256 + caveats.
* docs/benchmarks/person-count-cog.md — appends a v0.0.2 section
  with the K-fold diagnostic table and honest-read paragraph.

GCS:
  gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors
    refreshed (binaries unchanged — load weights via mmap at runtime).
2026-05-21 19:47:04 -04:00
rUv e6a5df36eb chore(cog-person-count): refresh GCS manifests after run-wiring rebuild (#698)
The arm + x86_64 manifests committed in #696 referenced the binaries
built before #697 wired the `run` subcommand. Rebuilt + re-signed +
re-uploaded to GCS, and re-deployed to cognitum-v0:

  arm    sha 15c2fbac…7728ea5  (3,807,456 B, up from 2,168,816 — added Tokio runtime)
  x86_64 sha 051614ce…cc8388b3 (4,502,960 B, up from 2,615,528)

Both re-signed Ed25519 with COGNITUM_OWNER_SIGNING_KEY. Manifests
now match the binaries published at gs://cognitum-apps/cogs/{arm,
x86_64}/cog-person-count-* and the binary installed at
/var/lib/cognitum/apps/person-count/ on cognitum-v0.
2026-05-21 19:13:10 -04:00
rUv 5c914e63c7 feat(cog-person-count): wire run subcommand — v0.0.1 fully functional (#697)
Phase 4 of ADR-103. Adds the long-running polling loop so the cog's
fourth verb (`run`) does real work, completing the ADR-100 runtime
contract end-to-end:

  cog-person-count version    → "person-count 0.3.0"
  cog-person-count manifest   → JSON skeleton
  cog-person-count health     → loads weights + 1-shot infer + emit
  cog-person-count run --config  → long-running per-frame emit  ← THIS

What ships:

* src/runtime.rs (new) — `run_loop` polls sensing_url every poll_ms,
  slides a [56, 20] CSI window, runs InferenceEngine::infer, emits
  publisher::person_count events. Same shape as
  cog-pose-estimation::runtime — fetch_frame extracts amplitudes
  from `snapshot.nodes[0].amplitude[]`, fails open on connect errors
  with a WARN log rather than crashing.
* src/lib.rs — registers the runtime module.
* src/main.rs — cmd_run now loads RunConfig from a JSON file, builds
  the InferenceEngine (with weights if cfg.model_path is set,
  otherwise auto-discover), emits a run.started event, and hands off
  to the Tokio multi-thread runtime's block_on(run_loop). Single-node
  fusion is a no-op for N=1 today; v0.2.0 will append predictions
  from sibling nodes and call fusion::fuse_confidence_weighted before
  emit.

Verified locally:

  cargo check  -p cog-person-count --no-default-features   → clean
  cargo test   -p cog-person-count                          → 15/15 pass (no regressions)
  cargo build  -p cog-person-count --release                → 2.36 MB unchanged
  ./cog-person-count run --config bad-config.json:
    line 1: {"event":"run.started","fields":{"cog":"person-count",
             "sensing_url":"http://127.0.0.1:9999/...",poll_ms:100,
             "model_path":"(auto-discover)"}}
    line 2: WARN sensing-server fetch failed
            error=Connection Failed: Connect error: actively refused
    (loop alive — exits cleanly on SIGTERM, no crash, no NaN)

Also adds a "Relationship to the in-process score_to_person_count
heuristic" section to cog/README.md explaining the dual-emitter
design (sensing-server keeps emitting the PR #491 slot heuristic;
the cog runs out-of-process and emits person.count events from the
learned model). Operators choose by installing the cog or not — no
sensing-server rebuild required.

ADR-103 §"Migration" status:
  1. Land ADR + scaffold ........... done (#693, #694)
  2. Train count_v1 ................ done (#695)
  3. Cross-compile + sign + GCS .... done (#696)
  4. Server-side wiring ............ done — out-of-process design
                                      means no rewire needed; this
                                      cog is the wiring.
  5. v0.2.0 multi-room + LoRA ...... data-bound (#645)
2026-05-21 19:10:15 -04:00
12 changed files with 755 additions and 3182 deletions
+60
View File
@@ -2,6 +2,66 @@
Append-only log of every published count_v1 training run per ADR-103. New runs add a section; never overwrite history.
## v0.0.2 — K-fold validated, random split + label smoothing + early stop + temp scale (2026-05-21)
### Why a new release
A 5-fold stratified CV on the same 1,077 samples proved the v0.0.1 result was driven by an unlucky temporal split — the trailing window was class-0-heavy, and a degenerate "always predict 0" classifier hit the class-0 fraction (65.1%) trivially.
| Metric | v0.0.1 (temporal) | **5-fold random CV** (diagnostic) |
|---|---|---|
| Overall accuracy | 65.1% | 62.2% ± 1.9% |
| Class 1 accuracy | **0%** | **57.1%** ✓ |
| Confidence Spearman | 0.023 | 0.160 ± 0.029 |
The architecture has real ~57% class-1 capacity under fair splits.
### v0.0.2 results
Architecture unchanged. Training changes only:
- **Random 80/20 split** (seed=42) — temporal split eliminated.
- **Label smoothing 0.1** on cross-entropy.
- **Class-balanced multinomial sampler** with replacement.
- **Early stopping** with patience 20 (exited at epoch 29 of 400 max).
- **Temperature scaling** of the conf head via LBFGS — T = **0.9262**, shipped as a `count_v1.temperature` sidecar.
| Metric | v0.0.1 | **v0.0.2** | K-fold ref |
|---|---|---|---|
| Overall accuracy | 65.1% | **62.3%** | 62.2% ± 1.9% |
| Class 0 accuracy | 100% (cheating) | **86.2%** | 67.4% |
| **Class 1 accuracy** | **0%** | **34.3%** ✓ | 57.1% |
| MAE | 0.349 | 0.377 | 0.378 |
| Confidence Spearman (post-temp) | 0.023 | 0.013 | 0.160 |
| Wall time | 5.6 s (400 ep) | **0.7 s (29 ep)** | 7.5 s (5×100) |
### Honest read
**Class-1 accuracy 0% → 34.3% is the headline.** The cog now reports `count = 1` honestly when a person is present, instead of always-zero cheating. Single random draw lands below the K-fold mean of 57% — that gap is run-to-run variance, not a missing improvement. Reaching 57% on a fixed eval set needs averaging over independent draws, which means more independent recordings — i.e. multi-room data (#645), not another training trick.
Confidence calibration didn't move. Temperature scaling alone can't fix a confidence head trained against a noisy `argmax==truth` indicator over a 62%-accurate classifier — its training signal is the bottleneck.
### Release artifacts (live on cognitum-v0)
```
gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors
sha256: 32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c
bytes: 392,088
```
Binaries themselves unchanged from v0.0.1 — weights load at runtime via mmap. Per-arch manifests under `cog/artifacts/manifests/{arm,x86_64}/` bumped to `version: 0.0.2`, weights_sha256 + build_metadata caveats updated.
### Reproducibility
```bash
python3 scripts/train-count.py --paired data/paired/wiflow-p7-1779210883.paired.jsonl \
--k-fold 5 --epochs 100 --out-results kfold_results.json
python3 scripts/train-count.py --paired data/paired/wiflow-p7-1779210883.paired.jsonl \
--v2 --epochs 400 \
--out-safetensors count_v1.safetensors --out-onnx count_v1.onnx \
--out-results count_train_results.json
```
## v0.0.1 — first measured run (2026-05-21)
### Setup
+401
View File
@@ -95,6 +95,29 @@ def temporal_split(X: np.ndarray, y: np.ndarray, eval_frac: float = 0.2):
)
def stratified_k_fold(X: np.ndarray, y: np.ndarray, k: int = 5):
"""Stratified k-fold cross-validation splits — hand-rolled, no sklearn.
Per class: shuffle the indices (deterministic seed 42), split into k
near-equal chunks, then assemble fold i by taking chunk i from every
class. Yields (X_train, y_train, X_val, y_val) per fold, with class
distribution preserved within ±1.
"""
rng = np.random.default_rng(seed=42)
classes = np.unique(y)
per_class_folds = {}
for c in classes:
idx = np.where(y == c)[0]
rng.shuffle(idx)
per_class_folds[c] = np.array_split(idx, k)
for fold in range(k):
val_idx = np.concatenate([per_class_folds[c][fold] for c in classes])
train_idx = np.concatenate(
[per_class_folds[c][f] for c in classes for f in range(k) if f != fold]
)
yield X[train_idx], y[train_idx], X[val_idx], y[val_idx]
def standardise(X_train: np.ndarray, X_eval: np.ndarray):
"""Z-score by subcarrier across the time axis. Eval uses train stats."""
mu = X_train.mean(axis=(0, 2), keepdims=True)
@@ -154,6 +177,12 @@ def main():
parser.add_argument("--batch-size", type=int, default=64)
parser.add_argument("--lr", type=float, default=1e-3)
parser.add_argument("--weight-decay", type=float, default=0.01)
parser.add_argument("--k-fold", type=int, default=None, help="If set, run k-fold CV; else use temporal split")
parser.add_argument("--v2", action="store_true",
help="v0.0.2 training: random 80/20 split + label smoothing + early stopping "
"+ balanced sampling + temperature-scaled confidence head.")
parser.add_argument("--label-smoothing", type=float, default=0.1)
parser.add_argument("--patience", type=int, default=20)
args = parser.parse_args()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
@@ -163,6 +192,378 @@ def main():
print(f"loaded {X.shape[0]} samples, X shape {X.shape}, "
f"label distribution: {dict(Counter(y.tolist()).most_common())}")
# K-fold cross-validation mode
if args.k_fold is not None:
print(f"\n=== {args.k_fold}-fold cross-validation ===")
fold_results = []
overall_t0 = time.perf_counter()
for fold_idx, (X_train, y_train, X_val, y_val) in enumerate(stratified_k_fold(X, y, k=args.k_fold)):
print(f"\nFold {fold_idx + 1}/{args.k_fold}")
X_train, X_val = standardise(X_train, X_val)
cls_counts = np.bincount(y_train, minlength=COUNT_CLASSES).astype(np.float32)
cls_counts = np.where(cls_counts > 0, cls_counts, 1.0)
cls_weight = (1.0 / cls_counts) / (1.0 / cls_counts).sum() * COUNT_CLASSES
cls_weight_t = torch.from_numpy(cls_weight).to(device)
Xt = torch.from_numpy(X_train).to(device)
yt = torch.from_numpy(y_train).to(device)
Xv = torch.from_numpy(X_val).to(device)
yv = torch.from_numpy(y_val).to(device)
model = CountNet().to(device)
opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
sched = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(opt, T_0=50, T_mult=1)
n_train = X_train.shape[0]
best_eval_acc = 0.0
best_state = None
for epoch in range(args.epochs):
model.train()
perm = torch.randperm(n_train, device=device)
train_loss = 0.0
train_correct = 0
n_batches = 0
for i in range(0, n_train, args.batch_size):
idx = perm[i : i + args.batch_size]
xb = Xt[idx]
yb = yt[idx]
opt.zero_grad()
count_logits, conf_logits = model(xb)
ce = F.cross_entropy(count_logits, yb, weight=cls_weight_t)
with torch.no_grad():
pred = count_logits.argmax(dim=1)
correct_indicator = (pred == yb).float().unsqueeze(1)
bce = F.binary_cross_entropy_with_logits(conf_logits, correct_indicator)
with torch.no_grad():
conf_sigm = torch.sigmoid(conf_logits)
brier = ((conf_sigm - correct_indicator) ** 2).mean()
loss = ce + 0.3 * bce + 0.1 * brier
loss.backward()
opt.step()
train_loss += loss.item()
train_correct += (pred == yb).sum().item()
n_batches += 1
sched.step()
model.eval()
with torch.no_grad():
cl_v, _ = model(Xv)
eval_pred = cl_v.argmax(dim=1)
eval_acc = (eval_pred == yv).float().mean().item()
if eval_acc > best_eval_acc:
best_eval_acc = eval_acc
best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
# Restore best checkpoint and final eval
if best_state is not None:
model.load_state_dict(best_state)
model.eval()
with torch.no_grad():
cl_v, conf_v = model(Xv)
pred_v = cl_v.argmax(dim=1)
acc = (pred_v == yv).float().mean().item()
within1 = ((pred_v - yv).abs() <= 1).float().mean().item()
mae = (pred_v - yv).abs().float().mean().item()
# Per-class accuracy
per_class = {}
for k in range(COUNT_CLASSES):
mask = yv == k
n = mask.sum().item()
if n > 0:
per_class[k] = {
"support": int(n),
"accuracy": ((pred_v == yv) & mask).sum().item() / n,
}
# Spearman
conf_sigm = torch.sigmoid(conf_v).squeeze(-1)
correct = (pred_v == yv).float()
c_rank = conf_sigm.argsort().argsort().float()
r_rank = correct.argsort().argsort().float()
c_centered = c_rank - c_rank.mean()
r_centered = r_rank - r_rank.mean()
denom = (c_centered.norm() * r_centered.norm()).item()
spearman = (c_centered * r_centered).sum().item() / denom if denom > 0 else 0.0
fold_results.append({
"fold": fold_idx + 1,
"accuracy": acc,
"within_pm1": within1,
"mae": mae,
"spearman": spearman,
"per_class_accuracy": per_class,
})
print(f" accuracy={acc:.3f} within±1={within1:.3f} mae={mae:.3f} spearman={spearman:.3f}")
# K-fold summary
total_time = time.perf_counter() - overall_t0
accs = [r["accuracy"] for r in fold_results]
within1s = [r["within_pm1"] for r in fold_results]
maes = [r["mae"] for r in fold_results]
spears = [r["spearman"] for r in fold_results]
print(f"\n=== {args.k_fold}-fold summary ({total_time:.1f} s) ===")
print(f" accuracy: {np.mean(accs):.3f} ± {np.std(accs):.3f}")
print(f" within ±1: {np.mean(within1s):.3f} ± {np.std(within1s):.3f}")
print(f" MAE: {np.mean(maes):.3f} ± {np.std(maes):.3f}")
print(f" conf↔correct Spearman: {np.mean(spears):.3f} ± {np.std(spears):.3f}")
# Per-class summary across folds
for k in range(COUNT_CLASSES):
accs_k = [r["per_class_accuracy"].get(k, {}).get("accuracy", 0.0) for r in fold_results]
n_k = [r["per_class_accuracy"].get(k, {}).get("support", 0) for r in fold_results]
if any(n > 0 for n in n_k):
print(f" class {k}: {np.mean(accs_k):.3f} mean accuracy (support: {n_k})")
# Write k-fold results to JSON
results = {
"mode": "k_fold_cv",
"k": args.k_fold,
"backend": "pytorch-cuda" if device.type == "cuda" else "pytorch-cpu",
"total_time_s": total_time,
"fold_results": fold_results,
"summary": {
"mean_accuracy": float(np.mean(accs)),
"std_accuracy": float(np.std(accs)),
"mean_within_pm1": float(np.mean(within1s)),
"std_within_pm1": float(np.std(within1s)),
"mean_mae": float(np.mean(maes)),
"std_mae": float(np.std(maes)),
"mean_spearman": float(np.mean(spears)),
"std_spearman": float(np.std(spears)),
},
"hyperparameters": {
"optimizer": "AdamW",
"lr": args.lr,
"weight_decay": args.weight_decay,
"batch_size": args.batch_size,
"schedule": "cosine_warm_restarts",
"epochs": args.epochs,
},
}
Path(args.out_results).write_text(json.dumps(results, indent=2))
print(f"\nwrote {args.out_results}")
return
# ---------------------------------------------------------------
# v0.0.2 training path: random 80/20 + label smoothing + early
# stopping + class-balanced batch sampling + temperature scaling.
# ---------------------------------------------------------------
if args.v2:
rng = np.random.default_rng(seed=42)
idx = np.arange(X.shape[0])
rng.shuffle(idx)
n_eval = int(round(0.2 * X.shape[0]))
eval_idx, train_idx = idx[:n_eval], idx[n_eval:]
X_train, X_eval = X[train_idx], X[eval_idx]
y_train, y_eval = y[train_idx], y[eval_idx]
X_train, X_eval = standardise(X_train, X_eval)
print(f"v0.0.2 mode — random 80/20 split: train={len(y_train)} eval={len(y_eval)}")
print(f" train class dist: {dict(Counter(y_train.tolist()).most_common())}")
print(f" eval class dist: {dict(Counter(y_eval.tolist()).most_common())}")
Xt = torch.from_numpy(X_train).to(device)
yt = torch.from_numpy(y_train).to(device)
Xe = torch.from_numpy(X_eval).to(device)
ye = torch.from_numpy(y_eval).to(device)
# Class-balanced sampler: for each batch, sample with replacement
# so each class has equal expected count regardless of dataset
# distribution. With our ~533/544 split this is nearly a no-op
# but it generalises to imbalanced multi-room data later.
cls_counts = np.bincount(y_train, minlength=COUNT_CLASSES).astype(np.float32)
cls_counts = np.where(cls_counts > 0, cls_counts, 1.0)
per_sample_weight = (1.0 / cls_counts[y_train])
per_sample_weight_t = torch.from_numpy(per_sample_weight.astype(np.float32)).to(device)
model = CountNet().to(device)
opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
sched = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(opt, T_0=50, T_mult=1)
n_train = X_train.shape[0]
batches_per_epoch = max(1, n_train // args.batch_size)
epoch_losses = []
t0 = time.perf_counter()
best_eval_acc = 0.0
best_state = None
epochs_without_improvement = 0
for epoch in range(args.epochs):
model.train()
train_loss = 0.0; train_correct = 0; n_batches = 0
for _ in range(batches_per_epoch):
# Balanced sample with replacement
idx_t = torch.multinomial(per_sample_weight_t, args.batch_size, replacement=True)
xb = Xt[idx_t]; yb = yt[idx_t]
opt.zero_grad()
count_logits, conf_logits = model(xb)
ce = F.cross_entropy(count_logits, yb, label_smoothing=args.label_smoothing)
with torch.no_grad():
pred = count_logits.argmax(dim=1)
correct_indicator = (pred == yb).float().unsqueeze(1)
bce = F.binary_cross_entropy_with_logits(conf_logits, correct_indicator)
with torch.no_grad():
conf_sigm = torch.sigmoid(conf_logits)
brier = ((conf_sigm - correct_indicator) ** 2).mean()
loss = ce + 0.3 * bce + 0.1 * brier
loss.backward()
opt.step()
train_loss += loss.item()
train_correct += (pred == yb).sum().item()
n_batches += 1
sched.step()
model.eval()
with torch.no_grad():
cl_e, _ = model(Xe)
eval_loss = F.cross_entropy(cl_e, ye).item()
eval_pred = cl_e.argmax(dim=1)
eval_acc = (eval_pred == ye).float().mean().item()
epoch_losses.append({
"epoch": epoch,
"train_loss": train_loss / max(1, n_batches),
"train_acc": train_correct / max(1, n_batches * args.batch_size),
"eval_loss": eval_loss,
"eval_acc": eval_acc,
})
if eval_acc > best_eval_acc:
best_eval_acc = eval_acc
best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
epochs_without_improvement = 0
else:
epochs_without_improvement += 1
if epoch < 5 or epoch % 25 == 0:
print(f"epoch {epoch:3d} train_loss={train_loss/n_batches:.4f} "
f"train_acc={train_correct/(n_batches*args.batch_size):.3f} "
f"eval_loss={eval_loss:.4f} eval_acc={eval_acc:.3f} "
f"epochs_no_improve={epochs_without_improvement}")
if epochs_without_improvement >= args.patience:
print(f"early stopping at epoch {epoch} (no improvement for {args.patience} epochs)")
break
train_time = time.perf_counter() - t0
print(f"\ntrained {epoch + 1} epochs in {train_time:.1f} s (best eval_acc {best_eval_acc:.3f})")
if best_state is not None:
model.load_state_dict(best_state)
# Temperature scaling on the confidence head — fit a scalar T s.t.
# sigmoid(conf_logits / T) is best-calibrated on the eval set.
model.eval()
with torch.no_grad():
cl_e, conf_e = model(Xe)
pred_e = cl_e.argmax(dim=1)
correct_indicator = (pred_e == ye).float()
# 1D optimisation over T via LBFGS.
T = torch.nn.Parameter(torch.ones(1, device=device))
opt_t = torch.optim.LBFGS([T], lr=0.1, max_iter=50)
def eval_t():
opt_t.zero_grad()
scaled = conf_e.squeeze(-1) / T
loss_t = F.binary_cross_entropy_with_logits(scaled, correct_indicator)
loss_t.backward()
return loss_t
opt_t.step(eval_t)
T_val = float(T.detach().cpu().item())
print(f" temperature scale T = {T_val:.4f}")
# Final eval with temperature applied.
with torch.no_grad():
cl_e, conf_e = model(Xe)
probs_e = F.softmax(cl_e, dim=1)
pred_e = cl_e.argmax(dim=1)
acc = (pred_e == ye).float().mean().item()
within1 = ((pred_e - ye).abs() <= 1).float().mean().item()
mae = (pred_e - ye).abs().float().mean().item()
per_class = {}
for k in range(COUNT_CLASSES):
mask = ye == k
n = mask.sum().item()
if n > 0:
per_class[k] = {
"support": int(n),
"accuracy": ((pred_e == ye) & mask).sum().item() / n,
}
conf_sigm = torch.sigmoid(conf_e.squeeze(-1) / T_val)
correct = (pred_e == ye).float()
c_rank = conf_sigm.argsort().argsort().float()
r_rank = correct.argsort().argsort().float()
c_centered = c_rank - c_rank.mean()
r_centered = r_rank - r_rank.mean()
denom = (c_centered.norm() * r_centered.norm()).item()
spearman = (c_centered * r_centered).sum().item() / denom if denom > 0 else 0.0
print(f"\n=== v0.0.2 final eval ===")
print(f" accuracy: {acc:.3f}")
print(f" within ±1: {within1:.3f}")
print(f" MAE: {mae:.3f}")
print(f" conf↔correct Spearman (post-temp): {spearman:.3f}")
for k, v in per_class.items():
print(f" class {k}: {v['accuracy']:.3f} accuracy on {v['support']} samples")
write_safetensors(model, Path(args.out_safetensors))
# Also append the temperature scalar so the cog can apply it.
# We add it by appending to the safetensors file using the
# write_safetensors helper but with the temperature recorded
# as a separate file alongside (count_v1.temperature.txt) for
# consumption by the Rust cog inference path.
Path(args.out_safetensors + ".temperature").write_text(f"{T_val}\n")
print(f"wrote {args.out_safetensors} ({Path(args.out_safetensors).stat().st_size} bytes)")
print(f"wrote {args.out_safetensors}.temperature ({T_val})")
# ONNX
dummy = torch.zeros(1, N_SUB, N_FRAMES, device=device)
try:
torch.onnx.export(model, dummy, args.out_onnx, opset_version=18,
input_names=["csi_window"],
output_names=["count_logits", "conf_logits"],
dynamic_axes={"csi_window": {0: "batch"},
"count_logits": {0: "batch"},
"conf_logits": {0: "batch"}},
export_params=True, do_constant_folding=True)
print(f"wrote {args.out_onnx} ({Path(args.out_onnx).stat().st_size} bytes)")
except Exception as e:
print(f"WARN: ONNX export failed: {e}")
results = {
"mode": "v0.0.2",
"backend": "pytorch-cuda" if device.type == "cuda" else "pytorch-cpu",
"epochs_trained": epoch + 1,
"train_time_s": train_time,
"best_eval_acc": best_eval_acc,
"final_eval_acc": acc,
"final_eval_within_pm1": within1,
"final_eval_mae": mae,
"temperature_scale": T_val,
"conf_correctness_spearman_post_temp": spearman,
"per_class_accuracy": per_class,
"hyperparameters": {
"optimizer": "AdamW",
"lr": args.lr,
"weight_decay": args.weight_decay,
"batch_size": args.batch_size,
"schedule": "cosine_warm_restarts",
"epochs_max": args.epochs,
"label_smoothing": args.label_smoothing,
"patience": args.patience,
"split": "random_80_20_seed_42",
"balanced_sampler": True,
"temperature_scaling": True,
},
"epoch_losses": epoch_losses,
}
Path(args.out_results).write_text(json.dumps(results, indent=2))
print(f"wrote {args.out_results}")
return
# Original temporal-split mode (kept for v0.0.1 reproducibility).
X_train, y_train, X_eval, y_eval = temporal_split(X, y, eval_frac=0.2)
X_train, X_eval = standardise(X_train, X_eval)
+11
View File
@@ -47,6 +47,17 @@ Downstream consumers can render the **most-likely count** when confidence is hig
`cog-person-count health` will load the real safetensors and report `backend: candle-cpu` rather than `backend: stub`, so the cog-gateway can verify the model loaded — but operators should treat the v0.0.1 count outputs as scaffold-validation rather than production data. The 2.36 MB binary + 392 KB weights + 16 KB ONNX are all real and reusable as soon as more data lands.
## Relationship to the in-process `csi.rs::score_to_person_count` heuristic
This Cog runs **out-of-process** alongside `wifi-densepose-sensing-server`. The two are complementary, not competing:
- The sensing-server keeps emitting its existing slot-count heuristic from `csi.rs::score_to_person_count` (PR #491's RollingP95 + `dedup_factor`). This is the **fallback path** — operators who don't install `cog-person-count` still get a count number, just a less calibrated one.
- `cog-person-count` (this binary) polls the same `/api/v1/sensing/latest` endpoint, runs the learned `count_v1` model on each window, and emits `person.count` events on stdout. The appliance's `cognitum-cog-gateway` routes those events to the dashboard via the standard ADR-220 cog-event channel.
Operators choose by **installing or not installing** this Cog — no sensing-server rebuild required. Downstream consumers (UI, fleet automation, alerting rules) can subscribe to whichever event stream they prefer.
The architecture decision is documented in [ADR-103 §"Deployment"](../../../../docs/adr/ADR-103-learned-multi-person-counter.md#deployment) and matches the cog/sensing-server boundary established for `cog-pose-estimation` (ADR-101).
## Security
The cog has a very small attack surface — by design, it's a pure consumer of CSI data, not a server:
File diff suppressed because it is too large Load Diff
@@ -0,0 +1 @@
0.9261822700500488
@@ -1,25 +1,27 @@
{
"id": "person-count",
"version": "0.0.1",
"binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-arm",
"binary_bytes": 2168816,
"binary_sha256": "36bc0bb0ece894350377d5f93d46cd29378cb289b3773530611c0d47b507b3c3",
"binary_signature": "R/00xdzHriyr/2rzr4wmPJ/Ken60A+RNdi8r0g2HYJNTXBaFtr46ExfNbiHlgYWadQXzTZdfJoyJK+a6k71NDg==",
"weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors",
"weights_bytes": 392088,
"weights_sha256": "dacb0551fd3887958db19696d90d811ab08faa44703e6e04ff56d15c3a65a9ff",
"arch": "arm",
"target_triple": "aarch64-unknown-linux-gnu",
"installed_at": 0,
"status": "installed",
"signed_by": "COGNITUM_OWNER_SIGNING_KEY",
"sig_algo": "Ed25519",
"binary_bytes": 3807456,
"binary_sha256": "15c2fbac19741298ad1cbaf119c633a42db0a273099561fd57d8afce27728ea5",
"binary_signature": "gyV2CDhJo5nqBnREA08KnztGsS7AFOuXCse+2/+wul8DAzerHs9p4L6eUgl8QeiDS9rdQZs33XRxH5WTbkT0Ag==",
"binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-arm",
"build_metadata": {
"rust": "1.95.0",
"candle": "0.9 cpu",
"cog_person_count_version": "0.3.0",
"training_eval_accuracy": 0.651,
"rust": "1.95.0",
"training_caveat": "random 80/20 split + label smoothing + early stopping + balanced sampler + temperature calibration. K-fold reference: class-1 mean 57.1% across 5 folds.",
"training_class1_accuracy": 0.343,
"training_eval_accuracy": 0.623,
"training_eval_mae": 0.349,
"training_caveat": "single-session data; class-1 accuracy 0% — see docs/benchmarks/person-count-cog.md"
}
}
"training_temperature_scale": 0.9262
},
"id": "person-count",
"installed_at": 0,
"sig_algo": "Ed25519",
"signed_by": "COGNITUM_OWNER_SIGNING_KEY",
"status": "installed",
"target_triple": "aarch64-unknown-linux-gnu",
"version": "0.0.2",
"weights_bytes": 392088,
"weights_sha256": "32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c",
"weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors"
}
@@ -1,25 +1,27 @@
{
"id": "person-count",
"version": "0.0.1",
"binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/x86_64/cog-person-count-x86_64",
"binary_bytes": 2615528,
"binary_sha256": "76cdd1ec40211add90b4942a09f79939aa28210a27e931de67122357392b01db",
"binary_signature": "QB+8cnGSMQmubSt/KWVu1+JMg37AKnQXDsFQi/vi+jqpW9rVrGMtnxQpWEWZPeWU1AJ6pl3O2V+7ZtTNIQ2rDg==",
"weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors",
"weights_bytes": 392088,
"weights_sha256": "dacb0551fd3887958db19696d90d811ab08faa44703e6e04ff56d15c3a65a9ff",
"arch": "x86_64",
"target_triple": "x86_64-unknown-linux-gnu",
"installed_at": 0,
"status": "installed",
"signed_by": "COGNITUM_OWNER_SIGNING_KEY",
"sig_algo": "Ed25519",
"binary_bytes": 4502960,
"binary_sha256": "051614ce6ba63df704fae848a67ad095df4bb88862fdff05ef3c0419cc8388b3",
"binary_signature": "P9txCcsqCoFN6LyZS+Hl33pYZxiP/nXJMTI6s4bt26cc+Cteidz7ymajCQIfuq0mx0cnWaQ6eKZUjzq5AIgoBw==",
"binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/x86_64/cog-person-count-x86_64",
"build_metadata": {
"rust": "1.95.0",
"candle": "0.9 cpu",
"cog_person_count_version": "0.3.0",
"training_eval_accuracy": 0.651,
"rust": "1.95.0",
"training_caveat": "random 80/20 split + label smoothing + early stopping + balanced sampler + temperature calibration. K-fold reference: class-1 mean 57.1% across 5 folds.",
"training_class1_accuracy": 0.343,
"training_eval_accuracy": 0.623,
"training_eval_mae": 0.349,
"training_caveat": "single-session data; class-1 accuracy 0% — see docs/benchmarks/person-count-cog.md"
}
}
"training_temperature_scale": 0.9262
},
"id": "person-count",
"installed_at": 0,
"sig_algo": "Ed25519",
"signed_by": "COGNITUM_OWNER_SIGNING_KEY",
"status": "installed",
"target_triple": "x86_64-unknown-linux-gnu",
"version": "0.0.2",
"weights_bytes": 392088,
"weights_sha256": "32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c",
"weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors"
}
+1
View File
@@ -10,6 +10,7 @@
pub mod fusion;
pub mod inference;
pub mod publisher;
pub mod runtime;
pub const COG_ID: &str = "person-count";
pub const COG_VERSION: &str = env!("CARGO_PKG_VERSION");
+27 -6
View File
@@ -103,10 +103,31 @@ fn cmd_health() -> Result<(), Box<dyn std::error::Error>> {
Ok(())
}
fn cmd_run(_config_path: PathBuf) -> Result<(), Box<dyn std::error::Error>> {
// Long-running mode is wired in the v0.0.1 release follow-up — same
// approach as cog-pose-estimation's runtime.rs. For now, the cog
// satisfies the four-verb contract; downstream consumers integrate
// via the in-process `InferenceEngine` API.
Err("`run` subcommand wiring is pending v0.0.1 — for now consume via the InferenceEngine library API".into())
fn cmd_run(config_path: PathBuf) -> Result<(), Box<dyn std::error::Error>> {
let raw = std::fs::read_to_string(&config_path)
.map_err(|e| format!("failed to read config at {}: {}", config_path.display(), e))?;
let cfg: RunConfig = serde_json::from_str(&raw)
.map_err(|e| format!("failed to parse config at {}: {}", config_path.display(), e))?;
let engine = InferenceEngine::with_weights(cfg.model_path.as_deref())?;
publisher::run_started(
COG_ID,
&cfg.sensing_url,
cfg.poll_ms,
&cfg.model_path
.as_ref()
.map(|p| p.display().to_string())
.unwrap_or_else(|| "(auto-discover)".to_string()),
);
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()?;
rt.block_on(cog_person_count::runtime::run_loop(
cog_person_count::runtime::RunConfig {
sensing_url: cfg.sensing_url,
poll_ms: cfg.poll_ms,
},
engine,
))
}
+77
View File
@@ -0,0 +1,77 @@
//! Long-running inference loop. Polls the appliance's sensing-server,
//! slides a CSI window, runs the count head, and emits `person.count`
//! events. Same shape as `cog-pose-estimation::runtime`.
//!
//! Multi-node fusion is single-node only in v0.0.1 — the appliance's
//! `/api/v1/sensing/latest` endpoint already aggregates across nodes
//! before serving, so per-cog fusion is deferred until each node ships
//! raw frames separately (ADR-103 §"Multi-node fusion" v0.2.0).
use crate::inference::{CsiWindow, InferenceEngine, INPUT_SUBCARRIERS, INPUT_TIMESTEPS};
use crate::publisher;
use std::time::Duration;
use tokio::time::sleep;
pub struct RunConfig {
pub sensing_url: String,
pub poll_ms: u64,
}
pub async fn run_loop(
cfg: RunConfig,
engine: InferenceEngine,
) -> Result<(), Box<dyn std::error::Error>> {
let mut buffer: Vec<f32> = Vec::with_capacity(INPUT_SUBCARRIERS * INPUT_TIMESTEPS);
let cap = INPUT_SUBCARRIERS * INPUT_TIMESTEPS;
let mut tick: u64 = 0;
loop {
match fetch_frame(&cfg.sensing_url).await {
Ok(amplitudes) => {
tick += 1;
buffer.extend(amplitudes);
while buffer.len() > 2 * cap {
let extra = buffer.len() - cap;
buffer.drain(0..extra);
}
if buffer.len() >= cap {
let window = CsiWindow { data: buffer[buffer.len() - cap..].to_vec() };
if let Ok(pred) = engine.infer(&window) {
// v0.0.1 ships single-node — fusion is a no-op for
// N=1. v0.2.0 will append additional per-node
// predictions to a vec and call
// `fusion::fuse_confidence_weighted` before emit.
publisher::person_count(tick, &pred, 1);
}
}
}
Err(e) => {
tracing::warn!(error = %e, "sensing-server fetch failed");
}
}
sleep(Duration::from_millis(cfg.poll_ms)).await;
}
}
async fn fetch_frame(url: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
let url = url.to_string();
let body = tokio::task::spawn_blocking(move || -> Result<String, ureq::Error> {
Ok(ureq::get(&url).call()?.into_string()?)
})
.await??;
let json: serde_json::Value = serde_json::from_str(&body)?;
let snapshot = json.get("snapshot").unwrap_or(&json);
let nodes = snapshot
.get("nodes")
.and_then(|v| v.as_array())
.ok_or("missing nodes[]")?;
let amplitude = nodes
.first()
.and_then(|n| n.get("amplitude"))
.and_then(|v| v.as_array())
.ok_or("missing nodes[0].amplitude[]")?;
Ok(amplitude
.iter()
.filter_map(|v| v.as_f64().map(|f| f as f32))
.collect())
}