Free, ad-free, open-data

The interactive AI decision engine

Compare 30+ language and image models across real-world scenarios. Drag the weight sliders, set your budget, see what wins.

Build your own scenario Decision wizard Cost calculator

Pick a scenario

Each scenario weights benchmarks differently. Switch between them — the leaderboard re-ranks live.

Build your own

A pair-programming assistant for IDE / agent loops. Heavy on coding benchmarks, with a real-world agentic component (SWE-bench) and some weight on cost since coding loops burn tokens.

#	Model	Scenario score	Est. $/month	Context	LiveCB	SWE-bench	HumanEval	Arena
1	o3 OpenAI	95.4	$600	200k
2	Gemini 2.5 Pro Google vision	92.6	$120	2.0M
3	DeepSeek R1 DeepSeek	79.7	$33	128k
4	Claude Opus 4 Anthropic vision	75.4	$1,035	200k
5	Gemini 2.0 Flash Google vision	65.630% cov.	$6	1.0M	—	—
6	Claude Sonnet 4 Anthropic vision	64.9	$207	200k
7	o1-mini OpenAI	61.930% cov.	$180	128k	—	—
8	o3-mini OpenAI	61.3	$66	200k
9	Qwen3 235B Alibaba (Qwen)	59.9	$10	131k
10	o1 OpenAI	56.560% cov.	$900	200k	—
11	Grok 2 xAI	53.130% cov.	$138	131k	—	—
12	Gemini 1.5 Pro Google vision	49.930% cov.	$75	2.0M	—	—
13	Llama 3.1 405B Instruct Meta	49.330% cov.	$116	128k	—	—
14	Grok 3 xAI	47.460% cov.	$207	1.0M	—
15	Llama 3.3 70B Instruct Meta	46.530% cov.	$29	128k	—	—
16	Llama 4 Scout Meta vision	44.130% cov.	$10	10.0M	—	—
17	Mistral Large 2 Mistral	42.730% cov.	$102	128k	—	—
18	Qwen2.5 72B Instruct Alibaba (Qwen)	42.430% cov.	$30	131k	—	—
19	Claude 3.5 Haiku Anthropic	41.630% cov.	$55	200k	—	—
20	Claude 3.5 Sonnet Anthropic vision	40.7	$207	200k
21	Claude 3 Opus Anthropic vision	38.330% cov.	$1,035	200k	—	—
22	DeepSeek V3 DeepSeek	35.9	$16	128k
23	Llama 3.1 70B Instruct Meta	31.430% cov.	$29	128k	—	—
24	Gemini 1.5 Flash Google vision	27.930% cov.	$5	1.0M	—	—
25	Llama 4 Maverick Meta vision	26.360% cov.	$14	1.0M	—
26	GPT-4o OpenAI vision	23.1	$150	128k
27	GPT-4o mini OpenAI vision	20.270% cov.	$9	128k		—
28	GPT-4 Turbo OpenAI vision	14.760% cov.	$510	128k		—	—
29	Mixtral 8x22B Mistral	2.030% cov.	$40	66k	—	—
30	DALL-E 3 OpenAI image	0.00% cov.	—	—	—	—	—	—
31	GPT Image 1 OpenAI image	0.00% cov.	—	—	—	—	—	—
32	Imagen 3 Google image	0.00% cov.	—	—	—	—	—	—
33	Imagen 4 Google image	0.00% cov.	—	—	—	—	—	—
34	Midjourney v6.1 Midjourney image	0.00% cov.	—	—	—	—	—	—
35	FLUX.1.1 [pro] Black Forest Labs image	0.00% cov.	—	—	—	—	—	—
36	FLUX.1 [dev] Black Forest Labs image	0.00% cov.	—	—	—	—	—	—
37	Stable Diffusion 3.5 Large Stability AI image	0.00% cov.	—	—	—	—	—	—
38	Ideogram 2.0 Ideogram image	0.00% cov.	—	—	—	—	—	—

Showing 38 of 38 models. Hover any score for the source. Click a model to see its full benchmark profile.

Cost vs quality

Models on the Pareto frontier (highlighted) give you the best quality at their cost tier.

Pareto frontierBubble size = context window