Llama 3.3 70B Instruct
Pricing verified 1y ago · source
Benchmarks
preference
Crowdsourced pairwise human preference rankings of LLM responses. Higher Elo means more frequently preferred by users.
knowledge
Harder version of MMLU testing knowledge across 57 academic subjects; reduces guessing-friendly answers.
reasoning
Graduate-level Google-proof Q&A in physics, chemistry, and biology. Diamond subset is the hardest tier with PhD-validated answers.
math
coding
164 hand-written Python programming problems scored by passing unit tests. Saturated for frontier models.
instruction following
Verifiable instruction-following benchmark; 25 categories of strict formatting / structural directives.
long context
Long-context retrieval and reasoning suite. We report the 128k token effective-context score.
performance
Median sustained output speed in tokens per second on the model's first-party API for medium-length prompts. Higher is faster.
Median time from request to first output chunk in milliseconds on the model's first-party API for medium-length prompts. Lower is snappier; reasoning models are penalised here because they think before talking.
Providers
| Provider | Input $/M | Output $/M | Context | Quant |
|---|---|---|---|---|
DeepInfra deepinfra/turbo | $0.10 | $0.32 | 131k | fp8 |
Inceptron inceptron/fp8 | $0.12 | $0.38 | 131k | fp8 |
Nebius nebius/fp8 | $0.13 | $0.40 | 131k | fp8 |
AkashML akashml/fp8 | $0.13 | $0.40 | 131k | fp8 |
Novita novita/bf16 | $0.14 | $0.40 | 131k | bf16 |
Parasail parasail/int8 | $0.22 | $0.50 | 131k | int8 |
Friendli friendli | $0.60 | $0.60 | 131k | unknown |
WandB wandb/fp16 | $0.71 | $0.71 | 128k | fp16 |
Google google-vertex | $0.72 | $0.72 | 128k | unknown |
Google google-vertex | $0.72 | $0.72 | 128k | unknown |
Groq groq | $0.59 | $0.79 | 131k | unknown |
Together together/fp8 | $0.88 | $0.88 | 131k | fp8 |
SambaNova sambanova-turbo | $0.45 | $0.90 | 16k | bf16 |
SambaNova sambanova/bf16 | $0.60 | $1.20 | 131k | bf16 |
Cloudflare cloudflare/fp8 | $0.29 | $2.25 | 24k | fp8 |