Files
ruvnet--RuView/aether-arena/STATUS.md
T
ruv a6808568a2 feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4)
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:

- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
  (pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
  only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
  formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
  harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
  SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
  every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).

Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:47:22 -04:00

1.6 KiB

AetherArena — Build Status

Tracks ADR-149 implementation milestones. "Complete" = benchmark infrastructure done, tested, CI-gated, deploy-ready, RuView baseline entered, §7 acceptance test passing. Model SOTA (e.g. MM-Fi PCK@20 ~72%) is a separate long-running ML effort, blocked on ADR-079 camera-ground-truth collection — not an infra-completion blocker.

# Milestone Status
M1 ADR-149 Accepted + committed done
M2 Deterministic scorer runner (aa_score_runner) → tier + proof hash done — builds --no-default-features, hash stable, VERDICT: PASS
M3 CI harness-gate workflow (PR runs the scorer) done — .github/workflows/aether-arena-harness.yml
M4 Scaffold: README + submission schema + VERIFY (acceptance test) done
M5 Public smoke split (committed) + private MM-Fi held-out split prep next
M6 HF Space (Gradio) submission flow + sandboxed scorer container blocked — needs HF token / maintainer authorization to deploy
M7 Signed append-only Parquet results ledger
M8 RuView baseline entry (honest PCK@20) + public launch

Blockers / decisions needed

  • HF deploy (M6) needs an HF token and authorization to create the public ruvnet/aether-arena Space.
  • MM-Fi is CC BY-NC → AA must stay non-commercial / legally distinct from the commercial RuView product.
  • Realism of M2 fixture: current fixture is a determinism fixture (stable hash), not a realistic baseline; M5 swaps in real MM-Fi held-out scoring.