Estimate token processing speeds and response times for different models and workloads.
How longer prompts affect total response time for the selected model (500 output tokens).
| Context Size | Prefill Time | Total Time | Visual |
|---|
| Model | Provider | Tier | tok/s | 500 tokens | 2000 tokens |
|---|
Token generation speed is critical for user experience and application design. Here's what affects it: