Skip to main content

Model Stats (/model-stats)

Compares Claude, GPT-4o, and Gemini performance across all evaluations.

Metrics per model

  • Total evaluations scored
  • Average score given
  • Win rate when that model recommended trading
  • Accuracy — how often the model’s recommendation matched the outcome
  • Agreement rate — how often models agree with each other

Model comparison

Side-by-side table showing which model is most accurate, most conservative, and most aggressive.