mirror of
https://github.com/ruvnet/RuView
synced 2026-06-09 10:13:17 +00:00
a6808568a2
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:
- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
(pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).
Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.
Co-Authored-By: claude-flow <ruv@ruv.net>
23 lines
1.6 KiB
Markdown
23 lines
1.6 KiB
Markdown
# AetherArena — Build Status
|
|
|
|
Tracks ADR-149 implementation milestones. "Complete" = benchmark **infrastructure** done,
|
|
tested, CI-gated, deploy-ready, RuView baseline entered, §7 acceptance test passing.
|
|
Model **SOTA** (e.g. MM-Fi PCK@20 ~72%) is a separate long-running ML effort, blocked on
|
|
ADR-079 camera-ground-truth collection — *not* an infra-completion blocker.
|
|
|
|
| # | Milestone | Status |
|
|
|---|-----------|--------|
|
|
| M1 | ADR-149 Accepted + committed | ✅ done |
|
|
| M2 | Deterministic scorer runner (`aa_score_runner`) → tier + proof hash | ✅ done — builds `--no-default-features`, hash stable, VERDICT: PASS |
|
|
| M3 | CI harness-gate workflow (PR runs the scorer) | ✅ done — `.github/workflows/aether-arena-harness.yml` |
|
|
| M4 | Scaffold: README + submission schema + VERIFY (acceptance test) | ✅ done |
|
|
| M5 | Public smoke split (committed) + private MM-Fi held-out split prep | ⏳ next |
|
|
| M6 | HF Space (Gradio) submission flow + sandboxed scorer container | ⛔ blocked — needs HF token / maintainer authorization to deploy |
|
|
| M7 | Signed append-only Parquet results ledger | ⏳ |
|
|
| M8 | RuView baseline entry (honest PCK@20) + public launch | ⏳ |
|
|
|
|
## Blockers / decisions needed
|
|
- **HF deploy (M6)** needs an HF token and authorization to create the public `ruvnet/aether-arena` Space.
|
|
- **MM-Fi is CC BY-NC** → AA must stay non-commercial / legally distinct from the commercial RuView product.
|
|
- **Realism of M2 fixture**: current fixture is a *determinism* fixture (stable hash), not a realistic baseline; M5 swaps in real MM-Fi held-out scoring.
|