Llama 4 Maverick
Pricing verified 1y ago · source
Benchmarks
preference
Crowdsourced pairwise human preference rankings of LLM responses. Higher Elo means more frequently preferred by users.
math
American Invitational Mathematics Examination 2024 problems. Three-digit integer answers; very hard for non-reasoning models.
coding
164 hand-written Python programming problems scored by passing unit tests. Saturated for frontier models.
agentic
Real GitHub issues solved end-to-end. Verified subset is a 500-task human-validated slice of SWE-bench.
vision
long context
Long-context retrieval and reasoning suite. We report the 128k token effective-context score.
performance
Median sustained output speed in tokens per second on the model's first-party API for medium-length prompts. Higher is faster.
Median time from request to first output chunk in milliseconds on the model's first-party API for medium-length prompts. Lower is snappier; reasoning models are penalised here because they think before talking.
Providers
| Provider | Input $/M | Output $/M | Context | Quant |
|---|---|---|---|---|
DeepInfra deepinfra/base | $0.15 | $0.60 | 1.0M | fp8 |
Novita novita/fp8 | $0.27 | $0.85 | 1.0M | fp8 |
Parasail parasail/fp8 | $0.35 | $1.00 | 524k | fp8 |
SambaNova sambanova | $0.63 | $1.80 | 131k | unknown |