ruvnet--RuView

mirror of https://github.com/ruvnet/RuView synced 2026-06-24 12:43:18 +00:00

Files

T

rUv 0f64d23516 feat(bench): int8 quantization of WiFlow-STD half pose model — MEASURED trade-off (ADR-175, honest negative) (#1095 )

Sub-deliverable 8.2 of the benchmark/optimization milestone. Quantizes the
843,834-param "half" WiFlow-STD pose model (half_best.pth) to int8 two ways and
MEASURES the accuracy/size trade-off vs fp32 under ONE locked normalization
(ADR-173 torso-diameter PCK, upstream calculate_pck use_torso_norm=True), on the
same seed-42 file-level 70/15/15 test split that produced the fp32 sweep numbers.

MEASURED on ruvultra (RTX 5080, torch 2.11.0+cu128, fbgemm; clean test, torso-PCK):
  fp32             96.62% pck@20  99.47% pck@50  0.008981 mpjpe  3.351 MB
  int8 PTQ static  40.98% pck@20  94.98% pck@50  0.038262 mpjpe  1.046 MB  (-55.64pp)
  int8 QAT (3 ep)  67.48% pck@20  98.69% pck@50  0.026548 mpjpe  1.043 MB  (-29.15pp)

Verdict (honest no): int8 is NOT a win at the strict PCK@20 edge target. Static
PTQ collapses; QAT recovers a large share but still loses 29 pp @20 for a 3.2x
size win — keep fp32/fp16 on the edge. Disclosed: QAT fake-quant val pck@20 was
83.45% but converted int8 scores 67.48% (~16pp convert_fx gap, reported honestly).

Deliverables:
- v2/crates/wifi-densepose-train/scripts/quantize_half_int8.py (reproducible:
  header carries the exact ssh command + run date; QAT primary, static PTQ fallback)
- docs/adr/ADR-175-int8-quantization-half-pose-model-measured.md (MEASURED table,
  locked normalization, QAT-vs-PTQ labeling, verdict, reproduction, limitations)
- CHANGELOG [Unreleased] ### Added entry

No production Rust or signal-pipeline change. Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a, bit-exact).

2026-06-15 09:16:22 -04:00

quantize_half_int8.py

feat(bench): int8 quantization of WiFlow-STD half pose model — MEASURED trade-off (ADR-175, honest negative) (#1095 )

2026-06-15 09:16:22 -04:00