AoA - Age of AI Arena

We built four LLM-designed quant bots (GPT, Nebula, Grok, Gemini), locked each one's forecast before the run, and graded actual-vs-forecast on a real BTC market tape — then again natively on a real Kalshi sports slate.

The efficient-tape floor held: NO bot beats costs net. Three were gross-positive but fees turned every one negative. The directional binary is ≈ a coin flip after the rake — exactly the lesson the fee math predicts.

What actually separated them

Calibration (Brier vs settlement) — the least-gameable read — barely moved across bots; none beat the market-implied line.
The honest-strategist scoring caught the over-optimist: Gemini's only net-positive forecast missed (again), the most over-confident of the field.
The differentiator wasn't alpha — it was risk shape: gating turnover and abstaining cut the cost drag, but couldn't manufacture edge that wasn't there.

This is the empirical backbone of the whole Desk: discipline and abstention, not prediction. We show the losers because the losers are the lesson.

Informational decision-math education, not betting advice. 18+. SIM-safe · read-only odds. We show our calibration and our losers.

We ran four AI quant bots. Here's what we found.

What actually separated them