mirror of
https://github.com/ruvnet/RuView
synced 2026-06-29 13:33:19 +00:00
17471e93ff
* feat(calibration): NodeGeometry transceiver-geometry recording (ADR-152 §2.1.1) PerceptAlign-motivated geometry capture at enrollment: per-node optional records (position, antenna orientation, inter-node distances, acquisition method) — recorded when known, never required. Event-sourced via EnrollmentEvent::GeometryRecorded (latest recording wins); persisted on SpecialistBank with serde defaults so pre-ADR-152 bank JSON loads cleanly (fixture-proven, and geometry-free banks serialize byte-shape-identical to the old schema); threaded through MultiNodeMixture as data only — the learned geometry embeddings and algorithmic fusion use are §2.1.2, deliberately deferred until the ADR-151 P6 LoRA heads exist. Geometry recorded from now on means banks captured today remain usable for layout-conditioned training later — you can't retroactively add geometry to data you didn't record. 8 new tests (3 geometry, 2 anchor, 2 bank, 1 multistatic) + full-loop extension (2-node geometry, one tape-measured + one unknown, surviving the bank JSON round-trip the runtime loads from). 50/50 calibration (both feature configs) + 23 CLI tests green. Co-Authored-By: RuFlo <ruv@ruv.net> * feat(training): two-checkerboard camera↔room calibration for ADR-079 labels (ADR-152 §2.1.3) Defends the camera-supervised pipeline against PerceptAlign's "coordinate overfitting": MediaPipe keypoints were emitted in raw camera coordinates with no shared frame and no transceiver-geometry metadata — the exact label shape that memorizes deployment layout and collapses cross-layout. - scripts/calibrate-camera-room.py + calibration_lib.py: OpenCV two-checkerboard calibration → versioned bundle JSON (intrinsics, camera→room extrinsics, checkerboard spec, transceiver geometry, sha256 calibration_id). Intrinsics resolve from file > cache > multi-view computation > loud-warning 2-view fallback. - collect-ground-truth.py --calibration <bundle>: every sample gains keypoints_room (unit bearing rays from the camera center in the room frame — documented projective alignment; raw image coords preserved so training chooses), camera_origin_room, calibration_id, and the transceiver geometry stamp. Without the flag, output is byte-identical to before (tested) + a one-line ADR-152 warning. Design finding (recorded for ADR-152): a single planar checkerboard's corner grid is centrosymmetric — the reversed corner ordering fits a ghost camera pose with IDENTICAL reprojection error, so per-board flip disambiguation is mathematically ill-posed. solve_two_board_extrinsics solves the joint wall+floor set over all 4 flip combinations, where the minimum is unique — an independent reason the TWO-checkerboard method is required, beyond what PerceptAlign states. 15 headless pytest tests green (synthetic corners: extrinsics recovery incl. ghost resolution, bundle round-trip + hash stability, ray transforms w/ distortion + cross-resolution, no-calibration byte identity). Co-Authored-By: RuFlo <ruv@ruv.net> * feat(benchmarks): WiFlow-STD reproduction harness + measurement (a) results (ADR-152 §2.2) Shipped checkpoint REFUTED (0.08% PCK@20, wrong keypoint normalization); 6 reproducibility defects documented (broken imports, corrupted dataset tail with float32-max garbage that NaN-poisons fp16 BatchNorm, unreachable test phase). After repairs, retraining with upstream defaults reproduces 96.09% PCK@20 full-test / 96.61% corruption-free (published 97.25%) on RTX 5080. Claims graded MEASURED-EQUIVALENT; 2.23M params + ~0.055 GFLOPs verified. Third-party code/weights/data stay out of tree (gitignored). Co-Authored-By: claude-flow <ruv@ruv.net> * feat: ADR-152 Rust integrations + ADR-153 802.11bf protocol model - calibration: GeometryEmbedding — 32-slot permutation-invariant NodeGeometry featurization for future LoRA-head conditioning (ADR-152 §2.1.2); derived SpecialistBank::geometry_embedding() accessor; 59 tests - train: MaePretrainConfig + patchify/random-mask with UNSW measured recipe (80% masking, (30,3) patches; ADR-152 §2.3, arXiv 2511.18792); strict no-truncate/no-NaN policy; proptest properties - train: WiFlowStdModel — tch-gated port of the verified ~96%-PCK@20 WiFlow-STD architecture (ADR-152 §2.2 beyond-SOTA); ungated param formula pinned to 2,225,042; 15/17-keypoint support; 239 crate tests - hardware: ieee80211bf forward-compatibility protocol model (ADR-153): SpecProfile gates, SensingCapabilities negotiation, required ConsentMode, session FSM, SensingTransport + SimTransport + OpportunisticCsiBridge; full acceptance checklist covered; 156+4 tests - deps: ruvector bumps per ADR-152 §2.6 survey (mincut/solver 2.0.6, attention 2.1.0, gnn 2.2.0); vendor/ruvector synced to a083bd77f - docs: ADR-153 accepted; ADR-152 §2.2 status, §2.4 amendment, §2.6 added Workspace: 162 test suites green (--no-default-features); Python proof PASS. Known pre-existing flake: homecore-api env_empty_falls_back_to_defaults (unserialized env-var mutation) — untouched, follow-up. Co-Authored-By: claude-flow <ruv@ruv.net> * docs: CHANGELOG + CLAUDE.md entries for ADR-152 integrations and ADR-153 Co-Authored-By: claude-flow <ruv@ruv.net> * fix(train): repair tch-backend bit-rot — gated path compiles and tests run again Mechanical API refresh against current tch: Vec::from(Tensor) -> try_from (+ explicit flatten), numel() usize cast, Rem/div ops -> remainder() / divide_scalar_mode(floor) — the latter fixed a silent true-division bug in heatmap argmax decoding; clamp(1.0, f64::MAX) -> clamp_min (torch 2.x scalar overflow panic); petgraph EdgeRef import; missing EvalMetrics and verify_checkpoint_dir APIs that tests documented. wiflow_std roundtrip test uses safetensors (.pt _save_parameters roundtrip broken in torch 2.11 Windows). Gated: 349 passed (incl. all 20 wiflow_std); ungated: unchanged. Known pre-existing: gaussian-heatmap convention mismatch (2 tests), proof seed race under parallel threads — documented, deliberate follow-ups. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(train): WiFlow-STD PyTorch->tch weight import + numerical parity proof export_to_safetensors.py maps the retrained checkpoint (295 tensors -> 248 mapped, param sum exactly 2,225,042; num_batches_tracked dropped) into a tch-loadable safetensors plus a deterministic parity fixture. Gated #[ignore] integration test loads it strictly and asserts forward-pass agreement: max abs diff 1.192e-7 on the seed-42 fixture. dump_variable_names test makes the tch name layout authoritative. Zero architecture discrepancies found. Co-Authored-By: claude-flow <ruv@ruv.net> * fix: workflow-review findings — BN gamma init, ThresholdParams serde, init docs Concurrent validation workflow (2 review lanes + adversarial verification, 13 agents): 5 confirmed findings, 3 refuted. Fixes: - wiflow_std: pin BatchNorm gamma to 1.0 (tch default draws Uniform(0,1) — silently halves activations in from-scratch training; loaded checkpoints unaffected, parity re-verified after the change) - wiflow_std: document the conv-init divergences vs the reference's effective kaiming_normal(fan_out) re-init (from-scratch dynamics only) - ieee80211bf: ThresholdParams deserialization validates via try_from so the <=100 invariant holds for untrusted payloads (+ rejection test) Benchmarks (release, ruvzen): GeometryEmbedding 1.84us/call (542k/s), MAE tokenization 7.38us/window (135k/s), 802.11bf FSM 8.9M events/s — nothing suspicious. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-152 §2.1.4 gate resolved — PerceptAlign repo MIT, dataset on HF Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): edge optimization measured + measurement (b) blocked + 92.9% retraction Edge optimization (ADR-152 optimize track): ONNX Runtime fp32 is the CPU latency win (3.2 ms/window, ~3.4x faster than torch, parity 2.4e-7); ORT dynamic int8 reaches 2.44 MB (paper's ~2.2 MB claim plausible only via conv-capable toolchains; -0.16pt PCK@20, +18% MPJPE, 2x slower); torch dynamic quant converts 0% of this conv-only model; fp16 halves storage free but is slower on CPU. Measurement (b) BLOCKED-ON-DATA: only 1,077 paired ESP32 windows exist (stop rule <2k). Forensic recheck of the surviving April holdout RETRACTS the ADR-079 '92.9% PCK@20' figure: constant-output model, absolute (not torso) threshold, 69 near-static frames — mean predictor scores 100% under that protocol; torso-PCK@20 is 19.1%. Corroborates PR #535. Stale citations removed from user-guide, readme-details, ADR-152 §2.1.3; no-citation rule extended to ADR-079 accuracy claims. Unblock: >=2k-window multi-pose paired session + torso-PCK re-baseline. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(user-guide): corrected camera-supervised collection tutorial Step 0 CSI-rate check + session-length math (window yield = frames/20 — the May session's 8x under-delivery was a ~12 Hz CSI rate, not an aligner bug); two-checkerboard calibration step (ADR-152 §2.1.3); pose-variety and confidence guidance; torso-normalized PCK + temporal-split + pred-variance eval protocol (lessons from the 92.9% retraction); scale presets re-keyed to realistic window counts. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): static PTQ int8 (calibrated) results + overnight capture script Conv-only static QDQ beats dynamic int8 on accuracy (PCK@20 96.61-96.63% vs 96.52%, MPJPE +10% vs +18% over fp32) at ~equal size/latency; all-ops QDQ strictly worse (int8 activations through attention glue). Entropy calibration verified bit-identical to MinMax on this data. Deployment: ONNX fp32 for speed (3.2ms), static conv-only QDQ for smallest (2.53MB). Also: scripts/overnight-empty-capture.py — segmented UDP CSI recorder for empty-room baselines (no glob collisions, detach-safe). Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): measurement (b) MEASURED — optimization transfer only, mean-pose baseline wins WiFlow-STD fine-tuned on 2,046 fresh single-room ESP32 paired windows (temporal 70/15/15, 70->540 adapter, K=17): pretrained-init 65% PCK@20 vs scratch 0% (optimization transfer) but frozen-trunk ~0% (no feature transfer), and NOTHING beats the mean-pose baseline (95.9% PCK@20 — single subject, near-static normalized coords). Honesty gates held: pred std 0.0113 (non-constant model) but mean-baseline dominance means no citable CSI->pose capability from this data. ADR-152 open question 1 answered partially; definitive answer needs multi-subject/position data. Two new aligner findings: heterogeneous csi_shape with silent zero-padding (~20%), and extractCsiMatrix's transposed shape label (frame-major data, [nSc, nFrames] label) — fixes pending. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): efficiency sweep MEASURED — half model dominates full reference Compact WiFlow-STD variants on the same data/split/protocol: half (843,834 params, 0.38x) strictly dominates the 2.23M reference (PCK@20 96.62 vs 96.61, PCK@50 99.47 vs 99.11, MPJPE 0.00898 vs 0.0094) — the published architecture is over-parameterized for its own benchmark. quarter (338k) 96.05%; tiny (56,290 params, 1/39.5) holds 94.11% — a ~220KB fp32 edge candidate. In-domain caveats recorded; cross-domain untested. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(train): compact WiFlow-STD presets in Rust + tiny edge artifact (ADR-152) WiFlowStdConfig gains half()/quarter()/tiny() mirroring the overnight sweep exactly: TcnGroupsMode (Fixed/Gcd/Depthwise), input_pw_groups, derived stride schedule and decoder-mid (all default to upstream behavior; legacy serde JSON unaffected). Param formulas pin to trained ground truth first try: 843,834 / 338,600 / 56,290; default 2,225,042 pin and 1.192e-7 parity unchanged. 248 tests green. Tiny edge artifact (tiny_edge_bench.py): ONNX fp32 = 295 KB, 0.66 ms/win (~1,500/s CPU), 94.11% PCK@20 (matches sweep clean-test exactly; parity 1.49e-7). Static int8 is a bad trade at this scale (-1.43pt, +19% MPJPE, -16% size, slower) — recorded as negative result. Export note: width-16 breaks AdaptiveAvgPool((15,1)) TorchScript export; replaced by exact mean+matmul equivalent, proven by parity. Co-Authored-By: claude-flow <ruv@ruv.net> * fix: resolve all 10 confirmed code-review findings (7-angle review, 20/20 verified) wiflow_std: min_feature_width (default 15) replaces the keypoints->stride coupling — for_keypoints(17) now provably builds the trained [2,2,2,2] graph and pools 15->17, matching the validated Python protocol (pinned by tests); param_count() total on invalid configs; random_mask returns Result and rejects non-finite/out-of-range ratios; trainer checkpoints switched to safetensors (.pt VarStore roundtrip broken on Windows torch 2.11). ieee80211bf: SBP proxy now re-triggers instances and relays reports via Action::RelaySbpReport -> SensingFrame::SbpReport (clients consume via their existing path); missed_instances reset on success = consecutive semantics; SessionTable gains a guarded SBP entry point + unknown-id drop counter; initiator-role sessions reject inbound setup/SBP requests (RejectedNotSupported) closing the idle hijack; StartSetup/StartSbp outside Idle return InvalidStateForCommand; SBP validation unified through evaluate_setup with a 1:1 SetupStatus->SbpStatus mapping. events.rs split out to honor the 500-line cap. calibration/cli: enrollment geometry now actually reaches trained banks — both production call sites attach .with_geometry; --geometry flag on train-room and POST /enroll/geometry + train-body geometry on calibrate-serve give production a recording surface; geometry-free banks log the ADR-152 §2.1.2 note. benchmarks: corruption masks committed as ground truth (unregenerable after in-place cleaning; verified bit-identical regeneration from the pristine copy) + generate_corruption_masks.py producer; _bench_common.py dedups the 5x-copied shim/evaluate/seed/remap (post-refactor PCK@20 re-verified equal to the last digit); remote scripts get the mmap patch; tiny_edge --calib validated multiple-of-64; onnx_bench --help no longer executes (and overwrote) the export — artifact restored byte-exact. Workspace: 2,963 tests passed, 0 failed; Python proof PASS. Co-Authored-By: claude-flow <ruv@ruv.net> * ci: build workspace tests without debuginfo — runner disk exhaustion The combined 38-crate debug target exceeds the GitHub runner's disk ('final link failed: No space left on device'); the same tree measured 151GB locally with full debuginfo. CARGO_PROFILE_{DEV,TEST}_DEBUG=0 shrinks the target ~5-10x; debuginfo serves no purpose in CI test runs. Co-Authored-By: claude-flow <ruv@ruv.net>
279 lines
11 KiB
Python
279 lines
11 KiB
Python
"""WiFlow-STD compact-variant efficiency sweep (ADR-152) — sequential overnight runner.
|
|
|
|
Trains compact variants of the upstream WiFlow-STD architecture on the same
|
|
data/split as the full-size reference retraining (seed 42, file-level 70/15/15,
|
|
upstream dataset.py) and evaluates PCK@10..50 + MPJPE on the full test split and
|
|
the corruption-free test subset (file indices < 487).
|
|
|
|
Training mirrors upstream run.py/train.py defaults except:
|
|
- fp32 only (no fp16 autocast / GradScaler — avoids the BN-poisoning trap
|
|
documented in RESULTS.md defect 5; data on disk is already cleaned).
|
|
- batch 64 (kept modest: another GPU job may share the 16 GB card tonight).
|
|
- scheduler + early stopping keyed on val MPJPE (upstream early-stops on val MPE
|
|
with patience 5; same here).
|
|
|
|
Usage:
|
|
venv/bin/python sweep/run_sweep.py --dry-run # param counts only
|
|
nohup venv/bin/python sweep/run_sweep.py > sweep/sweep.log 2>&1 &
|
|
|
|
Idempotent: variants already present in sweep/results.jsonl are skipped.
|
|
|
|
NOTE: deployed to ruvultra (~/wiflow-std-bench/sweep) as a standalone file, so
|
|
it deliberately inlines its helpers. The reference implementations (upstream
|
|
import shim, >1GB np.load mmap patch, key-remap loader, canonical evaluate
|
|
loop) live in benchmarks/wiflow-std/_bench_common.py — keep copies in sync.
|
|
"""
|
|
import argparse
|
|
import copy
|
|
import json
|
|
import os
|
|
import random
|
|
import sys
|
|
import time
|
|
|
|
import numpy as np
|
|
import torch
|
|
from torch.utils.data import DataLoader, Subset
|
|
|
|
# csi_windows.npy is ~13 GB; mmap large arrays instead of eagerly loading
|
|
# ~15 GB into RAM (same patch as _bench_common._np_load_mmap).
|
|
_np_load = np.load
|
|
|
|
|
|
def _np_load_mmap(path, *a, **kw):
|
|
if (isinstance(path, str) and path.endswith('.npy')
|
|
and os.path.getsize(path) > 1 << 30 and 'mmap_mode' not in kw):
|
|
kw['mmap_mode'] = 'r'
|
|
return _np_load(path, *a, **kw)
|
|
|
|
|
|
np.load = _np_load_mmap
|
|
|
|
BENCH = os.path.expanduser('~/wiflow-std-bench')
|
|
SWEEP = os.path.join(BENCH, 'sweep')
|
|
sys.path.insert(0, os.path.join(BENCH, 'upstream'))
|
|
sys.path.insert(0, SWEEP)
|
|
|
|
from dataset import PreprocessedCSIKeypointsDataset, create_preprocessed_train_val_test_loaders # noqa: E402
|
|
from losses.pose_loss import PoseLoss # noqa: E402
|
|
from utils.metrics import calculate_pck, calculate_mpjpe # noqa: E402
|
|
from model_compact import CompactWiFlowPoseModel, describe # noqa: E402
|
|
|
|
VARIANTS = [
|
|
# name, tcn_channels, conv_channels, attn_groups, groups_mode, input_pw_groups
|
|
dict(name='half', tcn=[270, 220, 170, 120], conv=[4, 8, 16, 32], attn_groups=4,
|
|
groups_mode='gcd20', input_pw_groups=1),
|
|
dict(name='quarter', tcn=[135, 110, 85, 60], conv=[2, 4, 8, 16], attn_groups=2,
|
|
groups_mode='gcd20', input_pw_groups=1),
|
|
dict(name='tiny', tcn=[68, 56, 44, 32], conv=[2, 4, 8, 16], attn_groups=2,
|
|
groups_mode='depthwise', input_pw_groups=4),
|
|
]
|
|
|
|
BATCH = 64
|
|
EPOCHS = 50
|
|
PATIENCE = 5
|
|
LR = 1e-4
|
|
WEIGHT_DECAY = 5e-5
|
|
SEED = 42
|
|
CORRUPT_FILE_START = 487 # files 487-499 were zero-filled by clean_nan.py
|
|
|
|
|
|
def set_seed(seed=SEED):
|
|
random.seed(seed)
|
|
np.random.seed(seed)
|
|
torch.manual_seed(seed)
|
|
torch.cuda.manual_seed_all(seed)
|
|
torch.backends.cudnn.deterministic = True
|
|
torch.backends.cudnn.benchmark = False
|
|
|
|
|
|
def build_model(v, dropout=0.5):
|
|
return CompactWiFlowPoseModel(
|
|
tcn_channels=v['tcn'], conv_channels=v['conv'], attn_groups=v['attn_groups'],
|
|
groups_mode=v['groups_mode'], input_pw_groups=v['input_pw_groups'],
|
|
dropout=dropout)
|
|
|
|
|
|
@torch.no_grad()
|
|
def evaluate(model, loader, device):
|
|
model.eval()
|
|
totals = {t: 0.0 for t in (0.1, 0.2, 0.3, 0.4, 0.5)}
|
|
total_mpe, n = 0.0, 0
|
|
for bx, by in loader:
|
|
bx, by = bx.to(device), by.to(device)
|
|
out = model(bx)
|
|
bs = by.size(0)
|
|
total_mpe += calculate_mpjpe(out, by) * bs
|
|
pck = calculate_pck(out, by, thresholds=list(totals))
|
|
for t in totals:
|
|
totals[t] += pck[t] * bs
|
|
n += bs
|
|
return {'samples': n, 'mpjpe': total_mpe / n,
|
|
**{f'pck@{int(t * 100)}': totals[t] / n for t in totals}}
|
|
|
|
|
|
def train_variant(v, dataset, device):
|
|
set_seed(SEED)
|
|
train_loader, val_loader, test_loader = create_preprocessed_train_val_test_loaders(
|
|
dataset=dataset, batch_size=BATCH, num_workers=2, random_seed=SEED)
|
|
|
|
set_seed(SEED) # re-seed after split so init is split-independent
|
|
model = build_model(v).to(device)
|
|
info = describe(model)
|
|
print(f"[{v['name']}] params={info['params']:,} tcn_groups={info['tcn_groups_per_block']} "
|
|
f"conv_strides={info['conv_strides']} final_width={info['final_width']}", flush=True)
|
|
|
|
criterion = PoseLoss(position_weight=1.0, bone_weight=0.2, loss_type='smooth_l1')
|
|
optimizer = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY,
|
|
betas=(0.9, 0.999))
|
|
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
|
|
optimizer, mode='min', factor=0.5, patience=3, min_lr=LR / 1000,
|
|
cooldown=1, threshold=1e-4)
|
|
|
|
best_val_mpe = float('inf')
|
|
best_val_pck20 = 0.0
|
|
best_epoch = 0
|
|
best_state = None
|
|
patience_counter = 0
|
|
t0 = time.time()
|
|
error = None
|
|
epochs_run = 0
|
|
|
|
for epoch in range(1, EPOCHS + 1):
|
|
model.train()
|
|
ep_loss, nb = 0.0, 0
|
|
te = time.time()
|
|
for i, (bx, by) in enumerate(train_loader):
|
|
bx = bx.to(device, non_blocking=True)
|
|
by = by.to(device, non_blocking=True)
|
|
optimizer.zero_grad(set_to_none=True)
|
|
out = model(bx)
|
|
loss, _parts = criterion(out, by)
|
|
if not torch.isfinite(loss):
|
|
error = f'non-finite loss at epoch {epoch} step {i}'
|
|
break
|
|
loss.backward()
|
|
optimizer.step()
|
|
ep_loss += loss.item()
|
|
nb += 1
|
|
if epoch == 1 and i % 500 == 0:
|
|
print(f"[{v['name']}] e1 step {i}/{len(train_loader)} loss={loss.item():.5f}",
|
|
flush=True)
|
|
if error:
|
|
break
|
|
epochs_run = epoch
|
|
|
|
val = evaluate(model, val_loader, device)
|
|
scheduler.step(val['mpjpe'])
|
|
lr_now = optimizer.param_groups[0]['lr']
|
|
print(f"[{v['name']}] epoch {epoch}/{EPOCHS} train_loss={ep_loss / max(nb, 1):.5f} "
|
|
f"val_mpjpe={val['mpjpe']:.5f} val_pck20={val['pck@20'] * 100:.2f}% "
|
|
f"lr={lr_now:.2e} ({time.time() - te:.0f}s)", flush=True)
|
|
|
|
if val['mpjpe'] < best_val_mpe:
|
|
best_val_mpe = val['mpjpe']
|
|
best_val_pck20 = val['pck@20']
|
|
best_epoch = epoch
|
|
best_state = copy.deepcopy(model.state_dict())
|
|
patience_counter = 0
|
|
else:
|
|
patience_counter += 1
|
|
if patience_counter >= PATIENCE:
|
|
print(f"[{v['name']}] early stop at epoch {epoch} (best {best_epoch})", flush=True)
|
|
break
|
|
|
|
train_seconds = time.time() - t0
|
|
result = {
|
|
'variant': v['name'], 'params': info['params'],
|
|
'tcn_channels': v['tcn'], 'conv_channels': v['conv'],
|
|
'attn_groups': v['attn_groups'], 'groups_mode': v['groups_mode'],
|
|
'input_pw_groups': v['input_pw_groups'],
|
|
'tcn_groups_per_block': info['tcn_groups_per_block'],
|
|
'conv_strides': info['conv_strides'], 'final_width': info['final_width'],
|
|
'batch_size': BATCH, 'max_epochs': EPOCHS, 'patience': PATIENCE,
|
|
'lr': LR, 'weight_decay': WEIGHT_DECAY, 'seed': SEED, 'precision': 'fp32',
|
|
'epochs_run': epochs_run, 'best_epoch': best_epoch,
|
|
'best_val_mpjpe': best_val_mpe if best_state else None,
|
|
'best_val_pck20': best_val_pck20 if best_state else None,
|
|
'train_seconds': round(train_seconds, 1),
|
|
'torch': torch.__version__, 'error': error,
|
|
'finished_utc': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
|
|
}
|
|
|
|
if best_state is not None:
|
|
ckpt = os.path.join(SWEEP, f"{v['name']}_best.pth")
|
|
torch.save(best_state, ckpt)
|
|
result['checkpoint'] = ckpt
|
|
model.load_state_dict(best_state)
|
|
|
|
eval_loader = DataLoader(test_loader.dataset, batch_size=256, shuffle=False,
|
|
num_workers=2)
|
|
result['test_full'] = evaluate(model, eval_loader, device)
|
|
|
|
w2f = dataset.window_to_file
|
|
clean_idx = [i for i in test_loader.dataset.indices if w2f[i] < CORRUPT_FILE_START]
|
|
clean_loader = DataLoader(Subset(dataset, clean_idx), batch_size=256,
|
|
shuffle=False, num_workers=2)
|
|
result['test_clean'] = evaluate(model, clean_loader, device)
|
|
print(f"[{v['name']}] TEST clean: pck20={result['test_clean']['pck@20'] * 100:.2f}% "
|
|
f"mpjpe={result['test_clean']['mpjpe']:.5f} | full: "
|
|
f"pck20={result['test_full']['pck@20'] * 100:.2f}%", flush=True)
|
|
return result
|
|
|
|
|
|
def main():
|
|
ap = argparse.ArgumentParser()
|
|
ap.add_argument('--dry-run', action='store_true', help='print param counts and exit')
|
|
args = ap.parse_args()
|
|
|
|
if args.dry_run:
|
|
for v in VARIANTS:
|
|
m = build_model(v)
|
|
info = describe(m)
|
|
x = torch.randn(2, 540, 20)
|
|
m.eval()
|
|
y = m(x)
|
|
print(f"{v['name']:8s} params={info['params']:>9,} "
|
|
f"tcn={v['tcn']} conv={v['conv']} attn_g={v['attn_groups']} "
|
|
f"mode={v['groups_mode']} pw_g={v['input_pw_groups']} "
|
|
f"tcn_groups={info['tcn_groups_per_block']} strides={info['conv_strides']} "
|
|
f"W'={info['final_width']} out={tuple(y.shape)}")
|
|
return
|
|
|
|
results_path = os.path.join(SWEEP, 'results.jsonl')
|
|
done = set()
|
|
if os.path.exists(results_path):
|
|
with open(results_path) as f:
|
|
for line in f:
|
|
try:
|
|
done.add(json.loads(line)['variant'])
|
|
except Exception:
|
|
pass
|
|
|
|
device = torch.device('cuda')
|
|
print(f"torch {torch.__version__} on {torch.cuda.get_device_name(0)}", flush=True)
|
|
data_dir = os.path.join(BENCH, 'preprocessed_csi_data')
|
|
dataset = PreprocessedCSIKeypointsDataset(data_dir=data_dir, keypoint_scale=1000.0,
|
|
enable_temporal_clean=True)
|
|
|
|
for v in VARIANTS:
|
|
if v['name'] in done:
|
|
print(f"[{v['name']}] already in results.jsonl — skipping", flush=True)
|
|
continue
|
|
print(f"\n===== variant: {v['name']} =====", flush=True)
|
|
try:
|
|
result = train_variant(v, dataset, device)
|
|
except Exception as e: # record and move on to next variant
|
|
import traceback
|
|
traceback.print_exc()
|
|
result = {'variant': v['name'], 'error': repr(e),
|
|
'finished_utc': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())}
|
|
with open(results_path, 'a') as f:
|
|
f.write(json.dumps(result) + '\n')
|
|
f.flush()
|
|
print('\nSWEEP COMPLETE', flush=True)
|
|
|
|
|
|
if __name__ == '__main__':
|
|
main()
|