New Same benchmark, more providers: Ollama vs OpenCode Zen & Go. Compare on TokenDyno →

Ollama Cloud models

Live speed benchmarks for every model available on Ollama Cloud. Numbers update hourly — sorted by latest tokens per second.

56 total models benchmarked
56 available on current plan
live data updates hourly

All Ollama Cloud models by speed

Model TPS now TPS 24h avg TTFT Reliability Intelligence Index
1 GLM 4.7 265.8 122.4 418ms 100% 33.8
2 Nemotron 3 Nano 30B (non-reasoning) 261.5 223.2 427ms 100% 7.4
3 Ministral 3 3B (non-reasoning) 221.5 204.0 474ms 100% 5.6
4 Ministral 3 3B (non-reasoning) 217.8 211.4 444ms 100% 5.6
5 GLM 4.7 195.6 109.1 448ms 100% 33.8
6 Ministral 3 8B (non-reasoning) 171.6 132.8 453ms 100% 8.9
7 Qwen3 Coder 480B (non-reasoning) 162.7 108.6 784ms 100% 18
8 Qwen3 Coder 480B (non-reasoning) 147.8 129.2 841ms 100% 18
9 Gemini 3 Flash Preview 145.0 114.5 1.7s 100% 37.8
10 Ministral 3 8B (non-reasoning) 140.9 129.5 591ms 100% 8.9
11 MiniMax M2.1 140.1 126.2 1.4s 100% 31.4
12 DeepSeek V4 Pro 137.7 152.8 656ms 100% 44.3
13 RNJ 1 8B 126.1 126.0 327ms 100%
14 RNJ 1 8B 125.8 126.2 347ms 100%
15 DeepSeek V4 Flash 125.2 214.1 557ms 100% 40.3
16 MiniMax M2.1 123.6 125.9 100% 31.4
17 Nemotron 3 Super 106.3 55.8 609ms 100% 25.4
18 Kimi K2.7 Code 105.1 149.0 929ms 100% 41.9
19 Qwen3 Coder Next (non-reasoning) 103.2 89.7 335ms 100% 21.2
20 Qwen3 Coder Next (non-reasoning) 98.5 88.7 374ms 100% 21.2
21 Nemotron 3 Nano 30B (non-reasoning) 96.3 215.5 439ms 100% 7.4
22 GLM 5.1 94.8 135.3 938ms 100% 40.2
23 Ministral 3 14B (non-reasoning) 88.9 104.8 488ms 100% 10
24 MiniMax M2.5 88.3 84.9 296ms 100% 33.7
25 Gemma4 31B 85.2 117.9 391ms 100% 29.4
26 Devstral 2 123B (non-reasoning) 82.4 58.4 547ms 100% 15.5
27 Devstral Small 2 24B (non-reasoning) 82.3 58.9 484ms 100% 13.1
28 Ministral 3 14B (non-reasoning) 79.5 105.0 548ms 100% 10
29 Devstral Small 2 24B (non-reasoning) 78.6 68.2 504ms 100% 13.1
30 GLM 5.2 78.5 106.2 566ms 100% 50.7
31 Devstral 2 123B (non-reasoning) 77.1 66.2 530ms 100% 15.5
32 Kimi K2.6 71.6 92.0 645ms 93% 42.8
33 GPT-OSS 20B 68.0 92.6 543ms 100% 14.9
34 Mistral Large 3 675B (non-reasoning) 64.5 57.8 715ms 100% 16.2
35 Nemotron 3 Super 63.7 62.9 705ms 100% 25.4
36 GPT-OSS 120B 63.5 132.6 449ms 100% 23.8
37 GPT-OSS 20B 55.8 105.3 454ms 100% 14.9
38 MiniMax M3 53.4 58.7 1.6s 100% 44.4
39 Gemma3 4B (non-reasoning) 53.0 51.8 544ms 99% 1.1
40 GPT-OSS 120B 52.4 131.0 450ms 100% 23.8
41 Gemma4 31B 51.9 101.4 422ms 100% 29.4
42 Kimi K2.5 48.7 134.7 1.0s 100% 38.1
43 MiniMax M3 47.9 58.1 1.3s 100% 44.4
44 Gemma3 4B (non-reasoning) 46.4 49.0 557ms 100% 1.1
45 Qwen3.5 397B 42.3 86.0 8.3s 100% 33.7
46 Gemma3 12B (non-reasoning) 36.6 39.4 972ms 100% 3.4
47 MiniMax M2.7 34.8 39.4 2.1s 100% 38.1
48 GLM 5 31.9 112.0 656ms 100% 39.5
49 Gemma3 12B (non-reasoning) 29.3 39.1 669ms 100% 3.4
50 Nemotron 3 Ultra 21.6 30.1 28.2s 93% 37.8
51 Gemma3 27B (non-reasoning) 16.7 16.1 574ms 100% 4.8
52 DeepSeek V3.2 15.8 33.7 732ms 100% 33.4
53 Nemotron 3 Ultra 14.8 19.5 23.4s 98% 37.8
54 Gemma3 27B (non-reasoning) 13.6 16.5 752ms 100% 4.8
55 DeepSeek V3.1 671B (non-reasoning) 11.6 9.1 1.5s 100% 21
56 MiniMax M2.5 3.9 85.0 28.7s 100% 33.7

About Ollama Cloud models

Ollama Cloud provides access to large language models via a hosted API. Models range from lightweight coding assistants (e.g. gpt-oss:20b, usage level 1) to flagship reasoning models (e.g. deepseek-v4-pro, usage level 4). Local model usage is always unlimited; cloud models count toward your plan's usage balance.

The benchmarks on this page come from continuous automated tests — one streaming chat completion per model per ~10 minutes, measured from outside Ollama's network. See the methodology page for the full measurement spec.

Compare models side by side

Use the compare tool to overlay speed timelines for up to 6 models. Useful for picking between similar-sized models or tracking performance changes over time.