Files
ruvnet--RuView/v2/crates/cog-person-count/cog/artifacts/count_train_results.json
T
rUv b3a5012dbd feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699)
* chore: stage v0.0.2 artifacts + temperature scalar for build pipeline

Stages count_v1.{safetensors,onnx,temperature,train_results.json}
ahead of the build/sign/upload step. This commit is a momentary
side-effect — the next commit will refresh the per-arch manifests
with the new binary SHAs once ruvultra finishes the cross-build.

The .temperature file holds the calibration scalar from LBFGS over the
held-out conf logits. The Rust cog will read it post-load and divide
conf_logits by it before sigmoid, exactly matching the Python eval.

* feat(cog-person-count): v0.0.2 — K-fold validated, label smoothing + early stop + temp scale

The v0.0.1 "65.1% but class-1=0%" result was an unlucky temporal split
that let a degenerate "always predict 0" classifier hit eval acc =
class-0 fraction. 5-fold stratified random CV proved the architecture
actually learns ~57.1% class-1 accuracy under fair splits — a real,
modestly useful signal.

v0.0.2 ships a retrained model that:

* **Splits randomly (seed=42) 80/20** instead of temporally — eliminates
  the trailing-window-class-imbalance cheat.
* **Class-balanced sampler** (multinomial with replacement, weighted by
  inverse class frequency) — per-batch expected counts are equal
  regardless of dataset distribution.
* **Label smoothing 0.1** on the cross-entropy — reduces confidence
  saturation that drove v0.0.1's all-or-nothing predictions.
* **Early stopping** with patience=20 — stops at epoch 29 instead of
  overfitting through 400.
* **Temperature scaling** of the conf head — LBFGS fits a scalar T on
  held-out conf logits; ships as a count_v1.temperature sidecar so the
  Rust cog can divide conf_logits by T before sigmoid.

Numbers on the same data:

  | Metric           | v0.0.1 | v0.0.2 | K-fold (5x100) |
  |------------------|--------|--------|----------------|
  | Overall acc      | 65.1%  | 62.3%  | 62.2% ± 1.9%   |
  | Class 0 acc      | 100%   | 86.2%  | 67.4%          |
  | Class 1 acc      |  0%    | 34.3%  | 57.1% ✓        |
  | MAE              | 0.349  | 0.377  | 0.378          |
  | Spearman         | 0.023  | 0.013  | 0.160          |

Class-1 accuracy 0 → 34.3% is the headline win. Net acc moves slightly
because we stopped cheating on class 0. K-fold's 57% says there's
headroom remaining; reaching it needs more independent splits (== more
data), not more training tricks.

Confidence calibration didn't move. Temperature scaling alone can't fix
a confidence head trained against a noisy argmax==truth indicator over
a 62%-accurate classifier — the head's training signal is the issue,
not its post-hoc transform. The honest fix is multi-room data (#645),
not another calibration knob.

Live on cognitum-v0 at /var/lib/cognitum/apps/person-count/ — health
reports candle-cpu backend, count = 1 (was 0 in v0.0.1) on synthetic
zero input.

Files changed:
* scripts/train-count.py — adds --k-fold (no sklearn dep, hand-rolled
  stratified splits with deterministic shuffle) and --v2 paths.
* v2/.../cog/artifacts/count_v1.safetensors (392 KB, new sha
  32996433…) + count_v1.onnx (16 KB) + count_v1.temperature (0.9262
  scalar) + count_train_results.json (full epoch trace).
* v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json bumped to
  version 0.0.2 with the new weights_sha256 + caveats.
* docs/benchmarks/person-count-cog.md — appends a v0.0.2 section
  with the K-fold diagnostic table and honest-read paragraph.

GCS:
  gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors
    refreshed (binaries unchanged — load weights via mmap at runtime).
2026-05-21 19:47:04 -04:00

240 lines
6.1 KiB
JSON

{
"mode": "v0.0.2",
"backend": "pytorch-cuda",
"epochs_trained": 29,
"train_time_s": 0.7185604920377955,
"best_eval_acc": 0.6232557892799377,
"final_eval_acc": 0.6232557892799377,
"final_eval_within_pm1": 1.0,
"final_eval_mae": 0.37674418091773987,
"temperature_scale": 0.9261822700500488,
"conf_correctness_spearman_post_temp": 0.012770170735830375,
"per_class_accuracy": {
"0": {
"support": 116,
"accuracy": 0.8620689655172413
},
"1": {
"support": 99,
"accuracy": 0.3434343434343434
}
},
"hyperparameters": {
"optimizer": "AdamW",
"lr": 0.001,
"weight_decay": 0.01,
"batch_size": 64,
"schedule": "cosine_warm_restarts",
"epochs_max": 400,
"label_smoothing": 0.1,
"patience": 20,
"split": "random_80_20_seed_42",
"balanced_sampler": true,
"temperature_scaling": true
},
"epoch_losses": [
{
"epoch": 0,
"train_loss": 1.8680313183711126,
"train_acc": 0.4543269230769231,
"eval_loss": 0.7276814579963684,
"eval_acc": 0.539534866809845
},
{
"epoch": 1,
"train_loss": 1.3579198305423443,
"train_acc": 0.5060096153846154,
"eval_loss": 0.8614012002944946,
"eval_acc": 0.46046510338783264
},
{
"epoch": 2,
"train_loss": 1.299364447593689,
"train_acc": 0.4831730769230769,
"eval_loss": 0.7327257990837097,
"eval_acc": 0.539534866809845
},
{
"epoch": 3,
"train_loss": 1.2834151433064387,
"train_acc": 0.4963942307692308,
"eval_loss": 0.7958587408065796,
"eval_acc": 0.539534866809845
},
{
"epoch": 4,
"train_loss": 1.2809640077444224,
"train_acc": 0.49278846153846156,
"eval_loss": 0.7728011608123779,
"eval_acc": 0.46046510338783264
},
{
"epoch": 5,
"train_loss": 1.276416512636038,
"train_acc": 0.5120192307692307,
"eval_loss": 0.7620130181312561,
"eval_acc": 0.539534866809845
},
{
"epoch": 6,
"train_loss": 1.2767094740500817,
"train_acc": 0.4951923076923077,
"eval_loss": 0.7696149945259094,
"eval_acc": 0.604651153087616
},
{
"epoch": 7,
"train_loss": 1.2724562699978168,
"train_acc": 0.5324519230769231,
"eval_loss": 0.7653729319572449,
"eval_acc": 0.539534866809845
},
{
"epoch": 8,
"train_loss": 1.2739891455723689,
"train_acc": 0.5264423076923077,
"eval_loss": 0.7635467648506165,
"eval_acc": 0.6232557892799377
},
{
"epoch": 9,
"train_loss": 1.2718101739883423,
"train_acc": 0.5120192307692307,
"eval_loss": 0.7564782500267029,
"eval_acc": 0.604651153087616
},
{
"epoch": 10,
"train_loss": 1.261798886152414,
"train_acc": 0.5625,
"eval_loss": 0.7915780544281006,
"eval_acc": 0.46046510338783264
},
{
"epoch": 11,
"train_loss": 1.2723550613109882,
"train_acc": 0.5348557692307693,
"eval_loss": 0.7585318088531494,
"eval_acc": 0.6139534711837769
},
{
"epoch": 12,
"train_loss": 1.2408426174750695,
"train_acc": 0.6225961538461539,
"eval_loss": 0.7562077045440674,
"eval_acc": 0.525581419467926
},
{
"epoch": 13,
"train_loss": 1.219417168543889,
"train_acc": 0.6334134615384616,
"eval_loss": 0.7647078633308411,
"eval_acc": 0.5860465168952942
},
{
"epoch": 14,
"train_loss": 1.198713256762578,
"train_acc": 0.6526442307692307,
"eval_loss": 0.7711634635925293,
"eval_acc": 0.5720930099487305
},
{
"epoch": 15,
"train_loss": 1.167367669252249,
"train_acc": 0.6826923076923077,
"eval_loss": 0.7664391994476318,
"eval_acc": 0.6186046600341797
},
{
"epoch": 16,
"train_loss": 1.1867470557873065,
"train_acc": 0.6574519230769231,
"eval_loss": 0.7853891253471375,
"eval_acc": 0.6139534711837769
},
{
"epoch": 17,
"train_loss": 1.185251813668471,
"train_acc": 0.6766826923076923,
"eval_loss": 0.7728492021560669,
"eval_acc": 0.5767441987991333
},
{
"epoch": 18,
"train_loss": 1.1749065747627845,
"train_acc": 0.6814903846153846,
"eval_loss": 0.7930512428283691,
"eval_acc": 0.5488371849060059
},
{
"epoch": 19,
"train_loss": 1.1521984338760376,
"train_acc": 0.6983173076923077,
"eval_loss": 0.7875214219093323,
"eval_acc": 0.5860465168952942
},
{
"epoch": 20,
"train_loss": 1.158121026479281,
"train_acc": 0.6802884615384616,
"eval_loss": 0.785778820514679,
"eval_acc": 0.5860465168952942
},
{
"epoch": 21,
"train_loss": 1.1232389486753023,
"train_acc": 0.7319711538461539,
"eval_loss": 0.7949181795120239,
"eval_acc": 0.5767441987991333
},
{
"epoch": 22,
"train_loss": 1.1163162634922907,
"train_acc": 0.7391826923076923,
"eval_loss": 0.867073118686676,
"eval_acc": 0.539534866809845
},
{
"epoch": 23,
"train_loss": 1.1119057948772724,
"train_acc": 0.7211538461538461,
"eval_loss": 0.8135209679603577,
"eval_acc": 0.5953488349914551
},
{
"epoch": 24,
"train_loss": 1.107274578167842,
"train_acc": 0.7271634615384616,
"eval_loss": 0.8401668071746826,
"eval_acc": 0.5534883737564087
},
{
"epoch": 25,
"train_loss": 1.0781027399576628,
"train_acc": 0.7451923076923077,
"eval_loss": 0.8606341481208801,
"eval_acc": 0.5441860556602478
},
{
"epoch": 26,
"train_loss": 1.041811819259937,
"train_acc": 0.7584134615384616,
"eval_loss": 0.8801625967025757,
"eval_acc": 0.5767441987991333
},
{
"epoch": 27,
"train_loss": 1.0369769976689265,
"train_acc": 0.7764423076923077,
"eval_loss": 0.8642652034759521,
"eval_acc": 0.5860465168952942
},
{
"epoch": 28,
"train_loss": 1.0502384350850031,
"train_acc": 0.7524038461538461,
"eval_loss": 0.8719286322593689,
"eval_acc": 0.5720930099487305
}
]
}