New Same benchmark, more providers: Ollama vs OpenCode Zen & Go. Compare on TokenDyno →

Ollama Cloud limits

What limits Ollama Cloud places on your usage — by plan tier, concurrency, and model weight. All figures from ollama.com/cloud (June 2026). For token-per-minute or request-per-minute limits, see the rate limits section below.

Concurrency limits

Each plan limits how many cloud model requests can be in-flight at once. Requests over the limit are queued; once the queue fills, additional requests are rejected until a slot opens.

Plan Concurrent cloud models Price
Free 1 $0
Pro 3 $20/mo or $200/yr
Max 10 $100/mo
Team TBA Contact Ollama

Usage balance and resets

Ollama Cloud charges by GPU time, not token count. Usage balance resets on a rolling weekly cycle:

  • Weekly limit — resets every 7 days. The overall cap for a rolling week.

An email alert fires at 90% of your plan limit. Pro and Max users can purchase additional usage balance when the plan balance is exhausted — Ollama does not publicly list the per-unit price, so check your account dashboard.

Model usage levels

Each cloud model has a usage difficulty level from 1 to 4. Heavier models consume more GPU time per request and therefore drain your plan balance faster.

Level Description Example models
1 Light gpt-oss:20b and similar small models
2 Moderate Mid-size models
3 Heavy Large models
4 Extra heavy deepseek-v4-pro and similar flagship models

Ollama does not publish the exact GPU-seconds each level consumes. Check your account usage dashboard or the model's detail page on ollama.com for the level of a specific model.

Rate limits (RPM / TPM / context)

Ollama Cloud does not publicly list requests-per-minute (RPM), tokens-per-minute (TPM), or maximum context window sizes in their documentation as of June 2026. The only usage constraints described publicly are the concurrency limits and GPU-time usage balance above.

If you need exact rate limit numbers for production planning:

Model deprecation schedule

Ollama Cloud deprecates cloud models with advance notice via email and the website. The following models were announced for retirement on June 16, 2026:

Retiring model Replacement
kimi-k2-thinkingkimi-k2.6
kimi-k2:1tkimi-k2.6
minimax-m2minimax-m3
glm-4.6glm-5.1
qwen3-next:80bqwen3.5
qwen3-vl:235bqwen3.5
qwen3-vl:235b-instructqwen3.5
cogito-2.1:671bdeepseek-v4-flash

Deprecations only affect cloud models. Local models are not affected. Source: docs.ollama.com/cloud.

How limits affect benchmark numbers

The benchmarks on this site run on the Ollama Cloud Pro plan. Benchmarks run sequentially (one request at a time) so concurrency limits do not affect our measurements. Usage balance limits can affect data freshness — if the balance is exhausted, benchmarks pause until the weekly reset.

See the full methodology for details, including how circuit-breaker logic and rate-limit backoff work.

Source: ollama.com/cloud and docs.ollama.com/cloud, retrieved June 2026. Always verify at the source — limits and pricing can change without notice.