mirror of
https://github.com/ruvnet/RuView
synced 2026-06-09 10:13:17 +00:00
b3a5012dbd
* chore: stage v0.0.2 artifacts + temperature scalar for build pipeline
Stages count_v1.{safetensors,onnx,temperature,train_results.json}
ahead of the build/sign/upload step. This commit is a momentary
side-effect — the next commit will refresh the per-arch manifests
with the new binary SHAs once ruvultra finishes the cross-build.
The .temperature file holds the calibration scalar from LBFGS over the
held-out conf logits. The Rust cog will read it post-load and divide
conf_logits by it before sigmoid, exactly matching the Python eval.
* feat(cog-person-count): v0.0.2 — K-fold validated, label smoothing + early stop + temp scale
The v0.0.1 "65.1% but class-1=0%" result was an unlucky temporal split
that let a degenerate "always predict 0" classifier hit eval acc =
class-0 fraction. 5-fold stratified random CV proved the architecture
actually learns ~57.1% class-1 accuracy under fair splits — a real,
modestly useful signal.
v0.0.2 ships a retrained model that:
* **Splits randomly (seed=42) 80/20** instead of temporally — eliminates
the trailing-window-class-imbalance cheat.
* **Class-balanced sampler** (multinomial with replacement, weighted by
inverse class frequency) — per-batch expected counts are equal
regardless of dataset distribution.
* **Label smoothing 0.1** on the cross-entropy — reduces confidence
saturation that drove v0.0.1's all-or-nothing predictions.
* **Early stopping** with patience=20 — stops at epoch 29 instead of
overfitting through 400.
* **Temperature scaling** of the conf head — LBFGS fits a scalar T on
held-out conf logits; ships as a count_v1.temperature sidecar so the
Rust cog can divide conf_logits by T before sigmoid.
Numbers on the same data:
| Metric | v0.0.1 | v0.0.2 | K-fold (5x100) |
|------------------|--------|--------|----------------|
| Overall acc | 65.1% | 62.3% | 62.2% ± 1.9% |
| Class 0 acc | 100% | 86.2% | 67.4% |
| Class 1 acc | 0% | 34.3% | 57.1% ✓ |
| MAE | 0.349 | 0.377 | 0.378 |
| Spearman | 0.023 | 0.013 | 0.160 |
Class-1 accuracy 0 → 34.3% is the headline win. Net acc moves slightly
because we stopped cheating on class 0. K-fold's 57% says there's
headroom remaining; reaching it needs more independent splits (== more
data), not more training tricks.
Confidence calibration didn't move. Temperature scaling alone can't fix
a confidence head trained against a noisy argmax==truth indicator over
a 62%-accurate classifier — the head's training signal is the issue,
not its post-hoc transform. The honest fix is multi-room data (#645),
not another calibration knob.
Live on cognitum-v0 at /var/lib/cognitum/apps/person-count/ — health
reports candle-cpu backend, count = 1 (was 0 in v0.0.1) on synthetic
zero input.
Files changed:
* scripts/train-count.py — adds --k-fold (no sklearn dep, hand-rolled
stratified splits with deterministic shuffle) and --v2 paths.
* v2/.../cog/artifacts/count_v1.safetensors (392 KB, new sha
32996433…) + count_v1.onnx (16 KB) + count_v1.temperature (0.9262
scalar) + count_train_results.json (full epoch trace).
* v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json bumped to
version 0.0.2 with the new weights_sha256 + caveats.
* docs/benchmarks/person-count-cog.md — appends a v0.0.2 section
with the K-fold diagnostic table and honest-read paragraph.
GCS:
gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors
refreshed (binaries unchanged — load weights via mmap at runtime).
27 lines
1.2 KiB
JSON
27 lines
1.2 KiB
JSON
{
|
|
"arch": "arm",
|
|
"binary_bytes": 3807456,
|
|
"binary_sha256": "15c2fbac19741298ad1cbaf119c633a42db0a273099561fd57d8afce27728ea5",
|
|
"binary_signature": "gyV2CDhJo5nqBnREA08KnztGsS7AFOuXCse+2/+wul8DAzerHs9p4L6eUgl8QeiDS9rdQZs33XRxH5WTbkT0Ag==",
|
|
"binary_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-arm",
|
|
"build_metadata": {
|
|
"candle": "0.9 cpu",
|
|
"cog_person_count_version": "0.3.0",
|
|
"rust": "1.95.0",
|
|
"training_caveat": "random 80/20 split + label smoothing + early stopping + balanced sampler + temperature calibration. K-fold reference: class-1 mean 57.1% across 5 folds.",
|
|
"training_class1_accuracy": 0.343,
|
|
"training_eval_accuracy": 0.623,
|
|
"training_eval_mae": 0.349,
|
|
"training_temperature_scale": 0.9262
|
|
},
|
|
"id": "person-count",
|
|
"installed_at": 0,
|
|
"sig_algo": "Ed25519",
|
|
"signed_by": "COGNITUM_OWNER_SIGNING_KEY",
|
|
"status": "installed",
|
|
"target_triple": "aarch64-unknown-linux-gnu",
|
|
"version": "0.0.2",
|
|
"weights_bytes": 392088,
|
|
"weights_sha256": "32996433516891a37c63c600db8b95e42192a53bd538c088c82cd6a85e55513c",
|
|
"weights_url": "https://storage.googleapis.com/cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors"
|
|
} |