Ollama Cloud models
Live speed benchmarks for every model available on Ollama Cloud. Numbers update hourly — sorted by latest tokens per second.
All Ollama Cloud models by speed
| Model | TPS now | TPS 24h avg | TTFT | Reliability | Intelligence Index |
|---|---|---|---|---|---|
| 1 GLM 4.7 | 265.8 | 122.4 | 418ms | 100% | 33.8 |
| 2 Nemotron 3 Nano 30B (non-reasoning) | 261.5 | 223.2 | 427ms | 100% | 7.4 |
| 3 Ministral 3 3B (non-reasoning) | 221.5 | 204.0 | 474ms | 100% | 5.6 |
| 4 Ministral 3 3B (non-reasoning) | 217.8 | 211.4 | 444ms | 100% | 5.6 |
| 5 GLM 4.7 | 195.6 | 109.1 | 448ms | 100% | 33.8 |
| 6 Ministral 3 8B (non-reasoning) | 171.6 | 132.8 | 453ms | 100% | 8.9 |
| 7 Qwen3 Coder 480B (non-reasoning) | 162.7 | 108.6 | 784ms | 100% | 18 |
| 8 Qwen3 Coder 480B (non-reasoning) | 147.8 | 129.2 | 841ms | 100% | 18 |
| 9 Gemini 3 Flash Preview | 145.0 | 114.5 | 1.7s | 100% | 37.8 |
| 10 Ministral 3 8B (non-reasoning) | 140.9 | 129.5 | 591ms | 100% | 8.9 |
| 11 MiniMax M2.1 | 140.1 | 126.2 | 1.4s | 100% | 31.4 |
| 12 DeepSeek V4 Pro | 137.7 | 152.8 | 656ms | 100% | 44.3 |
| 13 RNJ 1 8B | 126.1 | 126.0 | 327ms | 100% | — |
| 14 RNJ 1 8B | 125.8 | 126.2 | 347ms | 100% | — |
| 15 DeepSeek V4 Flash | 125.2 | 214.1 | 557ms | 100% | 40.3 |
| 16 MiniMax M2.1 | 123.6 | 125.9 | — | 100% | 31.4 |
| 17 Nemotron 3 Super | 106.3 | 55.8 | 609ms | 100% | 25.4 |
| 18 Kimi K2.7 Code | 105.1 | 149.0 | 929ms | 100% | 41.9 |
| 19 Qwen3 Coder Next (non-reasoning) | 103.2 | 89.7 | 335ms | 100% | 21.2 |
| 20 Qwen3 Coder Next (non-reasoning) | 98.5 | 88.7 | 374ms | 100% | 21.2 |
| 21 Nemotron 3 Nano 30B (non-reasoning) | 96.3 | 215.5 | 439ms | 100% | 7.4 |
| 22 GLM 5.1 | 94.8 | 135.3 | 938ms | 100% | 40.2 |
| 23 Ministral 3 14B (non-reasoning) | 88.9 | 104.8 | 488ms | 100% | 10 |
| 24 MiniMax M2.5 | 88.3 | 84.9 | 296ms | 100% | 33.7 |
| 25 Gemma4 31B | 85.2 | 117.9 | 391ms | 100% | 29.4 |
| 26 Devstral 2 123B (non-reasoning) | 82.4 | 58.4 | 547ms | 100% | 15.5 |
| 27 Devstral Small 2 24B (non-reasoning) | 82.3 | 58.9 | 484ms | 100% | 13.1 |
| 28 Ministral 3 14B (non-reasoning) | 79.5 | 105.0 | 548ms | 100% | 10 |
| 29 Devstral Small 2 24B (non-reasoning) | 78.6 | 68.2 | 504ms | 100% | 13.1 |
| 30 GLM 5.2 | 78.5 | 106.2 | 566ms | 100% | 50.7 |
| 31 Devstral 2 123B (non-reasoning) | 77.1 | 66.2 | 530ms | 100% | 15.5 |
| 32 Kimi K2.6 | 71.6 | 92.0 | 645ms | 93% | 42.8 |
| 33 GPT-OSS 20B | 68.0 | 92.6 | 543ms | 100% | 14.9 |
| 34 Mistral Large 3 675B (non-reasoning) | 64.5 | 57.8 | 715ms | 100% | 16.2 |
| 35 Nemotron 3 Super | 63.7 | 62.9 | 705ms | 100% | 25.4 |
| 36 GPT-OSS 120B | 63.5 | 132.6 | 449ms | 100% | 23.8 |
| 37 GPT-OSS 20B | 55.8 | 105.3 | 454ms | 100% | 14.9 |
| 38 MiniMax M3 | 53.4 | 58.7 | 1.6s | 100% | 44.4 |
| 39 Gemma3 4B (non-reasoning) | 53.0 | 51.8 | 544ms | 99% | 1.1 |
| 40 GPT-OSS 120B | 52.4 | 131.0 | 450ms | 100% | 23.8 |
| 41 Gemma4 31B | 51.9 | 101.4 | 422ms | 100% | 29.4 |
| 42 Kimi K2.5 | 48.7 | 134.7 | 1.0s | 100% | 38.1 |
| 43 MiniMax M3 | 47.9 | 58.1 | 1.3s | 100% | 44.4 |
| 44 Gemma3 4B (non-reasoning) | 46.4 | 49.0 | 557ms | 100% | 1.1 |
| 45 Qwen3.5 397B | 42.3 | 86.0 | 8.3s | 100% | 33.7 |
| 46 Gemma3 12B (non-reasoning) | 36.6 | 39.4 | 972ms | 100% | 3.4 |
| 47 MiniMax M2.7 | 34.8 | 39.4 | 2.1s | 100% | 38.1 |
| 48 GLM 5 | 31.9 | 112.0 | 656ms | 100% | 39.5 |
| 49 Gemma3 12B (non-reasoning) | 29.3 | 39.1 | 669ms | 100% | 3.4 |
| 50 Nemotron 3 Ultra | 21.6 | 30.1 | 28.2s | 93% | 37.8 |
| 51 Gemma3 27B (non-reasoning) | 16.7 | 16.1 | 574ms | 100% | 4.8 |
| 52 DeepSeek V3.2 | 15.8 | 33.7 | 732ms | 100% | 33.4 |
| 53 Nemotron 3 Ultra | 14.8 | 19.5 | 23.4s | 98% | 37.8 |
| 54 Gemma3 27B (non-reasoning) | 13.6 | 16.5 | 752ms | 100% | 4.8 |
| 55 DeepSeek V3.1 671B (non-reasoning) | 11.6 | 9.1 | 1.5s | 100% | 21 |
| 56 MiniMax M2.5 | 3.9 | 85.0 | 28.7s | 100% | 33.7 |
About Ollama Cloud models
Ollama Cloud provides access to large language models via a hosted API. Models range from lightweight coding assistants (e.g. gpt-oss:20b, usage level 1) to flagship reasoning models (e.g. deepseek-v4-pro, usage level 4). Local model usage is always unlimited; cloud models count toward your plan's usage balance.
The benchmarks on this page come from continuous automated tests — one streaming chat completion per model per ~10 minutes, measured from outside Ollama's network. See the methodology page for the full measurement spec.
Compare models side by side
Use the compare tool to overlay speed timelines for up to 6 models. Useful for picking between similar-sized models or tracking performance changes over time.