Files
ruvnet--RuView/benchmarks/wiflow-std/results/edge_optimization.json
T
rUv 17471e93ff ADR-152: WiFi-Pose SOTA 2026 intake — WiFlow-STD benchmark, Rust integrations, ADR-153 802.11bf layer, efficiency frontier (#1008)
* feat(calibration): NodeGeometry transceiver-geometry recording (ADR-152 §2.1.1)

PerceptAlign-motivated geometry capture at enrollment: per-node optional
records (position, antenna orientation, inter-node distances, acquisition
method) — recorded when known, never required. Event-sourced via
EnrollmentEvent::GeometryRecorded (latest recording wins); persisted on
SpecialistBank with serde defaults so pre-ADR-152 bank JSON loads cleanly
(fixture-proven, and geometry-free banks serialize byte-shape-identical
to the old schema); threaded through MultiNodeMixture as data only — the
learned geometry embeddings and algorithmic fusion use are §2.1.2,
deliberately deferred until the ADR-151 P6 LoRA heads exist.

Geometry recorded from now on means banks captured today remain usable
for layout-conditioned training later — you can't retroactively add
geometry to data you didn't record.

8 new tests (3 geometry, 2 anchor, 2 bank, 1 multistatic) + full-loop
extension (2-node geometry, one tape-measured + one unknown, surviving
the bank JSON round-trip the runtime loads from). 50/50 calibration
(both feature configs) + 23 CLI tests green.

Co-Authored-By: RuFlo <ruv@ruv.net>

* feat(training): two-checkerboard camera↔room calibration for ADR-079 labels (ADR-152 §2.1.3)

Defends the camera-supervised pipeline against PerceptAlign's
"coordinate overfitting": MediaPipe keypoints were emitted in raw camera
coordinates with no shared frame and no transceiver-geometry metadata —
the exact label shape that memorizes deployment layout and collapses
cross-layout.

- scripts/calibrate-camera-room.py + calibration_lib.py: OpenCV
  two-checkerboard calibration → versioned bundle JSON (intrinsics,
  camera→room extrinsics, checkerboard spec, transceiver geometry,
  sha256 calibration_id). Intrinsics resolve from file > cache >
  multi-view computation > loud-warning 2-view fallback.
- collect-ground-truth.py --calibration <bundle>: every sample gains
  keypoints_room (unit bearing rays from the camera center in the room
  frame — documented projective alignment; raw image coords preserved
  so training chooses), camera_origin_room, calibration_id, and the
  transceiver geometry stamp. Without the flag, output is byte-identical
  to before (tested) + a one-line ADR-152 warning.

Design finding (recorded for ADR-152): a single planar checkerboard's
corner grid is centrosymmetric — the reversed corner ordering fits a
ghost camera pose with IDENTICAL reprojection error, so per-board flip
disambiguation is mathematically ill-posed. solve_two_board_extrinsics
solves the joint wall+floor set over all 4 flip combinations, where the
minimum is unique — an independent reason the TWO-checkerboard method is
required, beyond what PerceptAlign states.

15 headless pytest tests green (synthetic corners: extrinsics recovery
incl. ghost resolution, bundle round-trip + hash stability, ray
transforms w/ distortion + cross-resolution, no-calibration byte
identity).

Co-Authored-By: RuFlo <ruv@ruv.net>

* feat(benchmarks): WiFlow-STD reproduction harness + measurement (a) results (ADR-152 §2.2)

Shipped checkpoint REFUTED (0.08% PCK@20, wrong keypoint normalization);
6 reproducibility defects documented (broken imports, corrupted dataset
tail with float32-max garbage that NaN-poisons fp16 BatchNorm, unreachable
test phase). After repairs, retraining with upstream defaults reproduces
96.09% PCK@20 full-test / 96.61% corruption-free (published 97.25%) on
RTX 5080. Claims graded MEASURED-EQUIVALENT; 2.23M params + ~0.055 GFLOPs
verified. Third-party code/weights/data stay out of tree (gitignored).

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: ADR-152 Rust integrations + ADR-153 802.11bf protocol model

- calibration: GeometryEmbedding — 32-slot permutation-invariant NodeGeometry
  featurization for future LoRA-head conditioning (ADR-152 §2.1.2); derived
  SpecialistBank::geometry_embedding() accessor; 59 tests
- train: MaePretrainConfig + patchify/random-mask with UNSW measured recipe
  (80% masking, (30,3) patches; ADR-152 §2.3, arXiv 2511.18792); strict
  no-truncate/no-NaN policy; proptest properties
- train: WiFlowStdModel — tch-gated port of the verified ~96%-PCK@20
  WiFlow-STD architecture (ADR-152 §2.2 beyond-SOTA); ungated param formula
  pinned to 2,225,042; 15/17-keypoint support; 239 crate tests
- hardware: ieee80211bf forward-compatibility protocol model (ADR-153):
  SpecProfile gates, SensingCapabilities negotiation, required ConsentMode,
  session FSM, SensingTransport + SimTransport + OpportunisticCsiBridge;
  full acceptance checklist covered; 156+4 tests
- deps: ruvector bumps per ADR-152 §2.6 survey (mincut/solver 2.0.6,
  attention 2.1.0, gnn 2.2.0); vendor/ruvector synced to a083bd77f
- docs: ADR-153 accepted; ADR-152 §2.2 status, §2.4 amendment, §2.6 added

Workspace: 162 test suites green (--no-default-features); Python proof PASS.
Known pre-existing flake: homecore-api env_empty_falls_back_to_defaults
(unserialized env-var mutation) — untouched, follow-up.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: CHANGELOG + CLAUDE.md entries for ADR-152 integrations and ADR-153

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(train): repair tch-backend bit-rot — gated path compiles and tests run again

Mechanical API refresh against current tch: Vec::from(Tensor) -> try_from
(+ explicit flatten), numel() usize cast, Rem/div ops -> remainder() /
divide_scalar_mode(floor) — the latter fixed a silent true-division bug in
heatmap argmax decoding; clamp(1.0, f64::MAX) -> clamp_min (torch 2.x scalar
overflow panic); petgraph EdgeRef import; missing EvalMetrics and
verify_checkpoint_dir APIs that tests documented. wiflow_std roundtrip test
uses safetensors (.pt _save_parameters roundtrip broken in torch 2.11
Windows). Gated: 349 passed (incl. all 20 wiflow_std); ungated: unchanged.
Known pre-existing: gaussian-heatmap convention mismatch (2 tests), proof
seed race under parallel threads — documented, deliberate follow-ups.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(train): WiFlow-STD PyTorch->tch weight import + numerical parity proof

export_to_safetensors.py maps the retrained checkpoint (295 tensors -> 248
mapped, param sum exactly 2,225,042; num_batches_tracked dropped) into a
tch-loadable safetensors plus a deterministic parity fixture. Gated #[ignore]
integration test loads it strictly and asserts forward-pass agreement:
max abs diff 1.192e-7 on the seed-42 fixture. dump_variable_names test makes
the tch name layout authoritative. Zero architecture discrepancies found.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: workflow-review findings — BN gamma init, ThresholdParams serde, init docs

Concurrent validation workflow (2 review lanes + adversarial verification,
13 agents): 5 confirmed findings, 3 refuted. Fixes:
- wiflow_std: pin BatchNorm gamma to 1.0 (tch default draws Uniform(0,1) —
  silently halves activations in from-scratch training; loaded checkpoints
  unaffected, parity re-verified after the change)
- wiflow_std: document the conv-init divergences vs the reference's
  effective kaiming_normal(fan_out) re-init (from-scratch dynamics only)
- ieee80211bf: ThresholdParams deserialization validates via try_from so
  the <=100 invariant holds for untrusted payloads (+ rejection test)

Benchmarks (release, ruvzen): GeometryEmbedding 1.84us/call (542k/s),
MAE tokenization 7.38us/window (135k/s), 802.11bf FSM 8.9M events/s —
nothing suspicious.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): ADR-152 §2.1.4 gate resolved — PerceptAlign repo MIT, dataset on HF

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): edge optimization measured + measurement (b) blocked + 92.9% retraction

Edge optimization (ADR-152 optimize track): ONNX Runtime fp32 is the CPU
latency win (3.2 ms/window, ~3.4x faster than torch, parity 2.4e-7); ORT
dynamic int8 reaches 2.44 MB (paper's ~2.2 MB claim plausible only via
conv-capable toolchains; -0.16pt PCK@20, +18% MPJPE, 2x slower); torch
dynamic quant converts 0% of this conv-only model; fp16 halves storage free
but is slower on CPU.

Measurement (b) BLOCKED-ON-DATA: only 1,077 paired ESP32 windows exist
(stop rule <2k). Forensic recheck of the surviving April holdout RETRACTS
the ADR-079 '92.9% PCK@20' figure: constant-output model, absolute (not
torso) threshold, 69 near-static frames — mean predictor scores 100% under
that protocol; torso-PCK@20 is 19.1%. Corroborates PR #535. Stale citations
removed from user-guide, readme-details, ADR-152 §2.1.3; no-citation rule
extended to ADR-079 accuracy claims. Unblock: >=2k-window multi-pose paired
session + torso-PCK re-baseline.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(user-guide): corrected camera-supervised collection tutorial

Step 0 CSI-rate check + session-length math (window yield = frames/20 —
the May session's 8x under-delivery was a ~12 Hz CSI rate, not an aligner
bug); two-checkerboard calibration step (ADR-152 §2.1.3); pose-variety and
confidence guidance; torso-normalized PCK + temporal-split + pred-variance
eval protocol (lessons from the 92.9% retraction); scale presets re-keyed
to realistic window counts.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): static PTQ int8 (calibrated) results + overnight capture script

Conv-only static QDQ beats dynamic int8 on accuracy (PCK@20 96.61-96.63%
vs 96.52%, MPJPE +10% vs +18% over fp32) at ~equal size/latency; all-ops
QDQ strictly worse (int8 activations through attention glue). Entropy
calibration verified bit-identical to MinMax on this data. Deployment:
ONNX fp32 for speed (3.2ms), static conv-only QDQ for smallest (2.53MB).

Also: scripts/overnight-empty-capture.py — segmented UDP CSI recorder for
empty-room baselines (no glob collisions, detach-safe).

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): measurement (b) MEASURED — optimization transfer only, mean-pose baseline wins

WiFlow-STD fine-tuned on 2,046 fresh single-room ESP32 paired windows
(temporal 70/15/15, 70->540 adapter, K=17): pretrained-init 65% PCK@20 vs
scratch 0% (optimization transfer) but frozen-trunk ~0% (no feature
transfer), and NOTHING beats the mean-pose baseline (95.9% PCK@20 —
single subject, near-static normalized coords). Honesty gates held: pred
std 0.0113 (non-constant model) but mean-baseline dominance means no
citable CSI->pose capability from this data. ADR-152 open question 1
answered partially; definitive answer needs multi-subject/position data.

Two new aligner findings: heterogeneous csi_shape with silent zero-padding
(~20%), and extractCsiMatrix's transposed shape label (frame-major data,
[nSc, nFrames] label) — fixes pending.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): efficiency sweep MEASURED — half model dominates full reference

Compact WiFlow-STD variants on the same data/split/protocol: half (843,834
params, 0.38x) strictly dominates the 2.23M reference (PCK@20 96.62 vs
96.61, PCK@50 99.47 vs 99.11, MPJPE 0.00898 vs 0.0094) — the published
architecture is over-parameterized for its own benchmark. quarter (338k)
96.05%; tiny (56,290 params, 1/39.5) holds 94.11% — a ~220KB fp32 edge
candidate. In-domain caveats recorded; cross-domain untested.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(train): compact WiFlow-STD presets in Rust + tiny edge artifact (ADR-152)

WiFlowStdConfig gains half()/quarter()/tiny() mirroring the overnight sweep
exactly: TcnGroupsMode (Fixed/Gcd/Depthwise), input_pw_groups, derived
stride schedule and decoder-mid (all default to upstream behavior; legacy
serde JSON unaffected). Param formulas pin to trained ground truth first
try: 843,834 / 338,600 / 56,290; default 2,225,042 pin and 1.192e-7 parity
unchanged. 248 tests green.

Tiny edge artifact (tiny_edge_bench.py): ONNX fp32 = 295 KB, 0.66 ms/win
(~1,500/s CPU), 94.11% PCK@20 (matches sweep clean-test exactly; parity
1.49e-7). Static int8 is a bad trade at this scale (-1.43pt, +19% MPJPE,
-16% size, slower) — recorded as negative result. Export note: width-16
breaks AdaptiveAvgPool((15,1)) TorchScript export; replaced by exact
mean+matmul equivalent, proven by parity.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: resolve all 10 confirmed code-review findings (7-angle review, 20/20 verified)

wiflow_std: min_feature_width (default 15) replaces the keypoints->stride
coupling — for_keypoints(17) now provably builds the trained [2,2,2,2]
graph and pools 15->17, matching the validated Python protocol (pinned by
tests); param_count() total on invalid configs; random_mask returns Result
and rejects non-finite/out-of-range ratios; trainer checkpoints switched
to safetensors (.pt VarStore roundtrip broken on Windows torch 2.11).

ieee80211bf: SBP proxy now re-triggers instances and relays reports via
Action::RelaySbpReport -> SensingFrame::SbpReport (clients consume via
their existing path); missed_instances reset on success = consecutive
semantics; SessionTable gains a guarded SBP entry point + unknown-id drop
counter; initiator-role sessions reject inbound setup/SBP requests
(RejectedNotSupported) closing the idle hijack; StartSetup/StartSbp
outside Idle return InvalidStateForCommand; SBP validation unified
through evaluate_setup with a 1:1 SetupStatus->SbpStatus mapping.
events.rs split out to honor the 500-line cap.

calibration/cli: enrollment geometry now actually reaches trained banks —
both production call sites attach .with_geometry; --geometry flag on
train-room and POST /enroll/geometry + train-body geometry on
calibrate-serve give production a recording surface; geometry-free banks
log the ADR-152 §2.1.2 note.

benchmarks: corruption masks committed as ground truth (unregenerable
after in-place cleaning; verified bit-identical regeneration from the
pristine copy) + generate_corruption_masks.py producer; _bench_common.py
dedups the 5x-copied shim/evaluate/seed/remap (post-refactor PCK@20
re-verified equal to the last digit); remote scripts get the mmap patch;
tiny_edge --calib validated multiple-of-64; onnx_bench --help no longer
executes (and overwrote) the export — artifact restored byte-exact.

Workspace: 2,963 tests passed, 0 failed; Python proof PASS.

Co-Authored-By: claude-flow <ruv@ruv.net>

* ci: build workspace tests without debuginfo — runner disk exhaustion

The combined 38-crate debug target exceeds the GitHub runner's disk
('final link failed: No space left on device'); the same tree measured
151GB locally with full debuginfo. CARGO_PROFILE_{DEV,TEST}_DEBUG=0
shrinks the target ~5-10x; debuginfo serves no purpose in CI test runs.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 17:02:23 -04:00

772 lines
22 KiB
JSON

{
"torch": {
"env": {
"torch": "2.12.0+cpu",
"platform": "Windows-11-10.0.26200-SP0",
"processor": "Intel64 Family 6 Model 197 Stepping 2, GenuineIntel",
"num_threads": 16,
"checkpoint": "results\\retrained_best_pose_model.pth",
"params": 2225042
},
"variants": {
"fp32": {
"file": "retrained_fp32_resaved.pth",
"size_bytes": 9068948,
"size_mb": 9.068948,
"latency_batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 24.903650000851485,
"median_ms_per_window": 24.903650000851485,
"windows_per_second": 40.15475642991324
},
"latency_batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 184.02919999789447,
"median_ms_per_window": 2.875456249967101,
"windows_per_second": 347.77089723115813
},
"accuracy": {
"samples": 10000,
"pck@20": 0.9668200004577636,
"pck@50": 0.9915333324432373,
"mpjpe": 0.00936222033649683,
"wall_seconds": 37.85407733917236
}
},
"fp16": {
"file": "retrained_fp16.pth",
"size_bytes": 4580332,
"size_mb": 4.580332,
"latency_batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 23.936699999467237,
"median_ms_per_window": 23.936699999467237,
"windows_per_second": 41.776853117691964
},
"latency_batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 102.32584999903338,
"median_ms_per_window": 1.5988414062348966,
"windows_per_second": 625.4529036465817
},
"accuracy": {
"samples": 10000,
"pck@20": 0.966773332977295,
"pck@50": 0.9915066654205322,
"mpjpe": 0.009460017587244511,
"wall_seconds": 21.632277250289917
}
},
"int8_dynamic": {
"file": "retrained_int8_dynamic.pth",
"size_bytes": 9068948,
"size_mb": 9.068948,
"latency_batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 18.105350000041653,
"median_ms_per_window": 18.105350000041653,
"windows_per_second": 55.23229321707117
},
"latency_batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 168.77549999844632,
"median_ms_per_window": 2.6371171874757238,
"windows_per_second": 379.20195763359703
},
"accuracy": {
"samples": 10000,
"pck@20": 0.9668200004577636,
"pck@50": 0.9915333324432373,
"mpjpe": 0.00936222033649683,
"wall_seconds": 45.35376596450806
}
}
},
"int8_dynamic_quant_report": {
"eligible_module_counts": {
"nn.Linear": 0,
"nn.Conv1d": 21,
"nn.Conv2d": 22
},
"modules_actually_quantized": [],
"n_modules_quantized": 0,
"params_total": 2225042,
"params_quantized": 0,
"params_quantized_fraction": 0.0
},
"accuracy_subset": {
"description": "seed-42 file-level 70/15/15 test split, corrupted windows (files 487-499) excluded, seed-42 random subset",
"subset_size": 10000,
"clean_test_total": 10000
}
},
"onnx": {
"env": {
"torch": "2.12.0+cpu",
"onnxruntime": "1.26.0",
"platform": "Windows-11-10.0.26200-SP0"
},
"export": {
"mode": "dynamic-batch",
"exporter": "torchscript",
"file": "retrained_fp32_dynamic.onnx",
"size_mb": 8.971781
},
"parity": {
"fixture": "results/parity_fixture.npz (batch 2, seed 42)",
"max_abs_diff_vs_stored_fixture": 2.384185791015625e-07,
"max_abs_diff_vs_torch_now": 2.384185791015625e-07,
"pass_lt_1e-4": true
},
"latency": {
"batch1": {
"batch_size": 1,
"runs": 100,
"median_ms_per_batch": 2.5410999987798277,
"median_ms_per_window": 2.5410999987798277,
"windows_per_second": 393.5303610563043
},
"batch64": {
"batch_size": 64,
"runs": 30,
"median_ms_per_batch": 181.95204999938142,
"median_ms_per_window": 2.8430007812403346,
"windows_per_second": 351.7410218803118
}
},
"ort_int8_dynamic_supplementary": {
"file": "retrained_int8_ort_dynamic.onnx",
"size_mb": 2.438794,
"runs": true,
"max_abs_diff_vs_fp32_fixture": 0.00827130675315857
}
},
"onnx_accuracy": {
"onnx_fp32": {
"samples": 10000,
"pck@20": 0.9668200004577636,
"pck@50": 0.9915333324432373,
"mpjpe": 0.00936222568154335,
"wall_seconds": 22.34790802001953
},
"onnx_int8_ort_dynamic": {
"samples": 10000,
"pck@20": 0.965240001964569,
"pck@50": 0.9915466655731201,
"mpjpe": 0.01108054072111845,
"wall_seconds": 55.742953062057495
}
},
"latency_controlled_rerun": {
"note": "3 interleaved repetitions per variant, median ms/window; quiet box",
"fp32": {
"batch1_ms_per_window_median": 10.969150001983508,
"batch1_reps": [
10.969150001983508,
12.646450000829645,
10.49820000116597
],
"batch64_ms_per_window_median": 2.2734187500077496,
"batch64_reps": [
2.377234374989712,
2.124126562478068,
2.2734187500077496
]
},
"fp16": {
"batch1_ms_per_window_median": 24.313550000442774,
"batch1_reps": [
25.1078499986761,
21.856999999727122,
24.313550000442774
],
"batch64_ms_per_window_median": 2.414695312495496,
"batch64_reps": [
2.5705156249955508,
1.7137437499741281,
2.414695312495496
]
},
"int8_dynamic": {
"batch1_ms_per_window_median": 15.627150000000256,
"batch1_reps": [
17.67525000104797,
14.627999998992891,
15.627150000000256
],
"batch64_ms_per_window_median": 2.0546906250160646,
"batch64_reps": [
2.0546906250160646,
2.03407343752815,
2.9325796875241394
]
},
"onnx_fp32": {
"batch1_ms_per_window_median": 3.186650001225644,
"batch1_reps": [
2.7332500012562377,
3.1995500012271805,
3.186650001225644
],
"batch64_ms_per_window_median": 1.9893374999924163,
"batch64_reps": [
1.5590843750032946,
1.9893374999924163,
2.2144343749914697
]
},
"onnx_int8_ort_dynamic": {
"batch1_ms_per_window_median": 6.50984999811044,
"batch1_reps": [
6.50984999811044,
6.455249998907675,
6.789299999581999
],
"batch64_ms_per_window_median": 5.770093750015803,
"batch64_reps": [
5.770093750015803,
3.912374999970325,
7.8067296875019565
]
}
},
"onnx_static_ptq": {
"env": {
"onnxruntime": "1.26.0",
"torch": "2.12.0+cpu",
"platform": "Windows-11-10.0.26200-SP0",
"source_model": "retrained_fp32_dynamic.onnx",
"preprocessed_model": {
"file": "retrained_fp32_preproc.onnx",
"size_mb": 8.981529
}
},
"variants": {
"minmax_all": {
"file": "retrained_int8_static_minmax_all.onnx",
"size_bytes": 2604286,
"size_mb": 2.604286,
"calibration": {
"method": "minmax",
"windows": 1000,
"percentile": null,
"seconds": 5.052440166473389
},
"scope": "all",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 283,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 181,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.015945255756378174,
"accuracy": {
"samples": 10000,
"pck@20": 0.9545266661643982,
"pck@50": 0.9913666645050049,
"mpjpe": 0.014860070134699345,
"wall_seconds": 43.455235958099365
}
},
"minmax_conv": {
"file": "retrained_int8_static_minmax_conv.onnx",
"size_bytes": 2527421,
"size_mb": 2.527421,
"calibration": {
"method": "minmax",
"windows": 1000,
"percentile": null,
"seconds": 4.380746126174927
},
"scope": "conv",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 156,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 78,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.010693132877349854,
"accuracy": {
"samples": 10000,
"pck@20": 0.9663399996757507,
"pck@50": 0.9918666641235352,
"mpjpe": 0.01084446222037077,
"wall_seconds": 35.937947034835815
}
},
"entropy_all": {
"file": "retrained_int8_static_entropy_all.onnx",
"size_bytes": 2604268,
"size_mb": 2.604268,
"calibration": {
"method": "entropy",
"windows": 512,
"percentile": null,
"seconds": 23.835066318511963
},
"scope": "all",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 283,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 181,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.015280365943908691,
"accuracy": {
"samples": 10000,
"pck@20": 0.9530466662406921,
"pck@50": 0.9912600006103516,
"mpjpe": 0.015098519864678382,
"wall_seconds": 51.514281034469604
}
},
"entropy_conv": {
"file": "retrained_int8_static_entropy_conv.onnx",
"size_bytes": 2527403,
"size_mb": 2.527403,
"calibration": {
"method": "entropy",
"windows": 512,
"percentile": null,
"seconds": 9.634419918060303
},
"scope": "conv",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 156,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 78,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.012535125017166138,
"accuracy": {
"samples": 10000,
"pck@20": 0.9659599989891052,
"pck@50": 0.9918666648864746,
"mpjpe": 0.010778637571632861,
"wall_seconds": 41.01180171966553
}
},
"percentile_all": {
"file": "retrained_int8_static_percentile_all.onnx",
"size_bytes": 2604052,
"size_mb": 2.604052,
"calibration": {
"method": "percentile",
"windows": 512,
"percentile": 99.99,
"seconds": 20.221954584121704
},
"scope": "all",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 283,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 181,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.017689883708953857,
"accuracy": {
"samples": 10000,
"pck@20": 0.9639333323478698,
"pck@50": 0.9916799991607667,
"mpjpe": 0.012176512064039708,
"wall_seconds": 49.365190744400024
}
},
"percentile_conv": {
"file": "retrained_int8_static_percentile_conv.onnx",
"size_bytes": 2527241,
"size_mb": 2.527241,
"calibration": {
"method": "percentile",
"windows": 512,
"percentile": 99.99,
"seconds": 8.223475694656372
},
"scope": "conv",
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"node_counts": {
"Add": 9,
"AveragePool": 1,
"BatchNormalization": 12,
"Concat": 10,
"Conv": 43,
"DequantizeLinear": 156,
"Einsum": 4,
"Gather": 16,
"Mul": 39,
"QuantizeLinear": 78,
"Reshape": 14,
"Shape": 2,
"Sigmoid": 37,
"Slice": 8,
"Softmax": 2,
"Squeeze": 1,
"Transpose": 7,
"Unsqueeze": 11
},
"max_abs_diff_vs_fp32_fixture": 0.014725983142852783,
"accuracy": {
"samples": 10000,
"pck@20": 0.9660599988937378,
"pck@50": 0.9916066654205322,
"mpjpe": 0.010310938355326652,
"wall_seconds": 36.89548587799072
}
}
},
"latency": {
"note": "3 interleaved repetitions per variant, median ms/window; onnx_fp32 / onnx_int8_ort_dynamic are same-session references",
"onnx_fp32": {
"batch1_reps": [
4.5327999996516155,
2.535649999117595,
2.167549997466267
],
"batch64_reps": [
1.9354515624740998,
2.4948054687854437,
1.9334703125082342
],
"batch1_ms_per_window_median": 2.535649999117595,
"batch64_ms_per_window_median": 1.9354515624740998
},
"onnx_int8_ort_dynamic": {
"batch1_reps": [
5.698599999959697,
5.721350000385428,
4.805099997611251
],
"batch64_reps": [
4.096601562508795,
4.857628124995017,
4.583800000006022
],
"batch1_ms_per_window_median": 5.698599999959697,
"batch64_ms_per_window_median": 4.583800000006022
},
"entropy_all": {
"batch1_reps": [
6.444149999879301,
5.038299999796436,
5.713200000172947
],
"batch64_reps": [
4.149468750028973,
3.437125000004926,
4.410960937491382
],
"batch1_ms_per_window_median": 5.713200000172947,
"batch64_ms_per_window_median": 4.149468750028973
},
"entropy_conv": {
"batch1_reps": [
4.874750000453787,
5.169099998965976,
5.236699998931726
],
"batch64_reps": [
3.010160156236452,
3.1175546875203963,
3.516850781238645
],
"batch1_ms_per_window_median": 5.169099998965976,
"batch64_ms_per_window_median": 3.1175546875203963
},
"percentile_all": {
"batch1_reps": [
5.184749999898486,
5.2898499998264015,
5.916899999647285
],
"batch64_reps": [
4.305105468745296,
4.460741406262514,
4.184502343747454
],
"batch1_ms_per_window_median": 5.2898499998264015,
"batch64_ms_per_window_median": 4.305105468745296
},
"percentile_conv": {
"batch1_reps": [
4.916449999655015,
7.150899999032845,
5.284949998895172
],
"batch64_reps": [
3.855813281262499,
4.688969531230214,
5.220103124997877
],
"batch1_ms_per_window_median": 5.284949998895172,
"batch64_ms_per_window_median": 4.688969531230214
},
"minmax_all": {
"batch1_reps": [
6.463300000177696,
7.149449998905766,
5.3209000016067876
],
"batch64_reps": [
3.9251343750095202,
4.033442187505898,
3.428199218745931
],
"batch1_ms_per_window_median": 6.463300000177696,
"batch64_ms_per_window_median": 3.9251343750095202
},
"minmax_conv": {
"batch1_reps": [
5.9961499991914025,
5.236549999608542,
4.854399998293957
],
"batch64_reps": [
4.368359375007458,
3.249617187492504,
3.0238906249735464
],
"batch1_ms_per_window_median": 5.236549999608542,
"batch64_ms_per_window_median": 3.249617187492504
}
},
"accuracy_subset": {
"description": "seed-42 file-level 70/15/15 test split, corrupted windows excluded, seed-42 random subset (same as quantize_bench/eval_ort_accuracy)",
"subset_size": 10000
}
},
"tiny_variant": {
"env": {
"torch": "2.12.0+cpu",
"onnxruntime": "1.26.0",
"platform": "Windows-11-10.0.26200-SP0",
"num_threads": 16,
"checkpoint": "results\\tiny_best.pth",
"checkpoint_size_bytes": 340555,
"params": 56290,
"variant_config": {
"tcn": [
68,
56,
44,
32
],
"conv": [
2,
4,
8,
16
],
"attn_groups": 2,
"groups_mode": "depthwise",
"input_pw_groups": 4
}
},
"export": {
"mode": "dynamic-batch",
"exporter": "torchscript",
"opset": 17,
"file": "tiny_fp32_dynamic.onnx",
"size_bytes": 295279,
"size_mb": 0.295279,
"verified_batches": [
1,
2,
64
],
"note": "AdaptiveAvgPool2d((15,1)) replaced at export by an exact mean(-1) + constant averaging matmul (final_width 16 is not a multiple of 15, which the TorchScript exporter rejects); exactness proven by the parity check vs the original torch model"
},
"parity": {
"fixture": "results/parity_fixture.npz input (batch 2, seed 42); reference output recomputed with the tiny torch model",
"max_abs_diff_vs_torch": 1.4901161193847656e-07,
"pass_lt_1e-4": true
},
"int8_static_percentile_conv": {
"file": "tiny_int8_static_percentile_conv.onnx",
"size_bytes": 248278,
"size_mb": 0.248278,
"calibration": {
"method": "percentile",
"percentile": 99.99,
"windows": 512,
"scope": "conv-only TRAIN-split corruption-free",
"seconds": 1.5347836017608643
},
"per_channel": true,
"activation_type": "QInt8",
"weight_type": "QInt8",
"max_abs_diff_vs_fp32_fixture": 0.018491357564926147
},
"latency": {
"note": "3 interleaved repetitions per variant, median ms/window; full-model sessions are same-session references",
"tiny_onnx_fp32": {
"batch1_reps": [
0.6312500008789357,
0.6834500018157996,
0.6595999984710943
],
"batch64_reps": [
0.37747578119251557,
0.24196640623586063,
0.2314671875183194
],
"batch1_ms_per_window_median": 0.6595999984710943,
"batch64_ms_per_window_median": 0.24196640623586063
},
"tiny_onnx_int8_static_percentile_conv": {
"batch1_reps": [
0.7988500001374632,
0.9382499993080273,
0.8451000030618161
],
"batch64_reps": [
0.9211476562995813,
1.3045390625165965,
1.026230468767153
],
"batch1_ms_per_window_median": 0.8451000030618161,
"batch64_ms_per_window_median": 1.026230468767153
},
"full_onnx_fp32_reference": {
"batch1_reps": [
2.267249998112675,
2.80170000041835,
2.132149998942623
],
"batch64_reps": [
1.3050578124875756,
1.4244992187855132,
1.8014164062947202
],
"batch1_ms_per_window_median": 2.267249998112675,
"batch64_ms_per_window_median": 1.4244992187855132
},
"full_onnx_int8_static_percentile_conv_reference": {
"batch1_reps": [
5.529599999135826,
4.768399998283712,
6.215800000063609
],
"batch64_reps": [
3.815724218725336,
3.1025562500417436,
4.333318749957016
],
"batch1_ms_per_window_median": 5.529599999135826,
"batch64_ms_per_window_median": 3.815724218725336
}
},
"accuracy_subset": {
"description": "seed-42 file-level 70/15/15 test split, corrupted windows excluded, seed-42 random subset (same as quantize_bench/eval_ort_accuracy/static_ptq_bench)",
"subset_size": 10000
},
"accuracy": {
"tiny_onnx_fp32": {
"samples": 10000,
"pck@20": 0.941106667804718,
"pck@50": 0.99369333152771,
"mpjpe": 0.012527281279861927,
"wall_seconds": 10.927234888076782
},
"tiny_onnx_int8_static_percentile_conv": {
"samples": 10000,
"pck@20": 0.9268133331298828,
"pck@50": 0.9932933319091797,
"mpjpe": 0.014906252065300942,
"wall_seconds": 12.320892333984375
}
}
}
}