Token Speed Calculator

Compare Token Generation Speed Between Models

Estimate token processing speeds and response times for different models and workloads.

Model Speed Tier
Standard
Estimated Processing Times
TTFT
--
first token
Prefill
--
input processing
Generation
--
output tokens
Total
--
response time
Output speed -- tok/s
050100150200+ tok/s
Estimates based on typical API benchmarks. Actual speed varies with server load, network, and provider infrastructure. Reasoning models (o1/o3/o4) may have higher TTFT due to chain-of-thought.

Context Length Impact on Response Time

How longer prompts affect total response time for the selected model (500 output tokens).

Context Size Prefill Time Total Time Visual

Speed Benchmarks — All Models (2026)

Sort by:
Model Provider Tier tok/s 500 tokens 2000 tokens

Understanding Token Generation Speed

Token generation speed is critical for user experience and application design. Here's what affects it:

Key Factors

  • Model size: Larger models = more compute per token = lower tok/s.
  • Hardware: H100 > A100 > RTX 4090 for raw throughput.
  • Speculative decoding: Some providers use draft models to accelerate output.
  • Reasoning models: o1/o3/o4 series and extended-thinking models have high TTFT (2–10+ s) due to internal chain-of-thought.

Speed Tiers (2026 API)

Fast 100–150+ tok/s
Gemini 2.0 Flash, GPT-4o Mini, Claude 3 Haiku, Gemini 1.5 Flash, Mistral Small
Standard 50–90 tok/s
GPT-4o, Claude Sonnet 4, Gemini 2.5 Pro, Mistral Medium 3, DeepSeek V3
Slow 15–45 tok/s
Large API models: Llama 3.1 405B, Mistral Large, Qwen 2.5 72B
Thinking 20–30 tok/s + high TTFT
Reasoning models: o3, o4-mini, Claude Opus 4, DeepSeek R2

Frequently Asked Questions