AoA
Join
← Intel
Recent test · 4 min read

We ran four AI quant bots. Here's what we found.

Across two rounds on real market tape: no bot beats costs net. The vig wins.

We built four LLM-designed quant bots (GPT, Nebula, Grok, Gemini), locked each one's forecast before the run, and graded actual-vs-forecast on a real BTC market tape — then again natively on a real Kalshi sports slate.

The efficient-tape floor held: NO bot beats costs net. Three were gross-positive but fees turned every one negative. The directional binary is ≈ a coin flip after the rake — exactly the lesson the fee math predicts.

What actually separated them

  • Calibration (Brier vs settlement) — the least-gameable read — barely moved across bots; none beat the market-implied line.
  • The honest-strategist scoring caught the over-optimist: Gemini's only net-positive forecast missed (again), the most over-confident of the field.
  • The differentiator wasn't alpha — it was risk shape: gating turnover and abstaining cut the cost drag, but couldn't manufacture edge that wasn't there.

This is the empirical backbone of the whole Desk: discipline and abstention, not prediction. We show the losers because the losers are the lesson.

Source · AiiQ bot-league results — our own platform runs, provenance-hashed and stored immutably

Informational decision-math education, not betting advice. 18+. SIM-safe · read-only odds. We show our calibration and our losers.