Ollama Cloud pricing
Ollama Cloud bills by GPU time, not by token count. The information below reflects the plans shown at ollama.com/cloud as of June 2026. Check that page for the current rates — they can change.
Plan comparison
- 1 concurrent cloud model
- Light cloud usage included
- Unlimited local model usage
- 40,000+ community integrations
- CLI, API, and desktop app access
Good for: evaluating cloud models, local-first workflows with occasional cloud calls.
- 3 concurrent cloud models
- ~50× more cloud usage than Free
- Extra usage balance purchasable
- Usage alert at 90% of limit
- Weekly limits reset every 7 days
Good for: developers, power users, coding assistants running throughout the day.
- 10 concurrent cloud models
- ~5× more usage than Pro (~250× Free)
- Extra usage balance purchasable
- Same reset cadence as Pro
Good for: teams using a shared key, high-throughput agentic pipelines, running heavy models (level 3–4) regularly.
- Shared team usage pool
- Centralised billing and admin
- SSO support
- Model access controls
- Priority support + dedicated Slack
Contact [email protected] for details.
How Ollama Cloud billing works
Ollama Cloud charges by GPU time, not by the number of tokens you generate. Each model has a usage difficulty level from 1 to 4:
- Level 1 — lightweight models (e.g. gpt-oss:20b). Low GPU time per request.
- Level 2 — mid-size models. Moderate usage.
- Level 3 — large models. Significant usage per call.
- Level 4 — extra-heavy models (e.g. deepseek-v4-pro). Highest usage per call.
Because billing is GPU-time-based, Ollama states they do not cap you at a fixed token count — a slower model at the same level costs the same GPU time as a faster one.
Extra usage balance: Pro and Max users can purchase additional usage after exhausting their plan balance. Exact per-unit pricing is not publicly listed — check your account dashboard or ollama.com/cloud.
Concurrency limits
The number of cloud model requests you can have in flight simultaneously is plan-gated:
| Plan | Concurrent cloud models |
|---|---|
| Free | 1 |
| Pro | 3 |
| Max | 10 |
Requests beyond the concurrency limit are queued. When the queue fills, new requests are rejected until a slot opens.
Usage resets
Usage limits reset on a rolling weekly cycle:
- Weekly limits reset every 7 days.
An email alert fires at 90% of your plan limit (can be disabled in account settings).
How these prices affect benchmark results
The benchmarks on this site run on the Pro plan. Speed numbers reflect Pro-tier infrastructure. Free-plan users may see different throughput under load — Ollama Cloud prioritises paid plans during peak times.
See the methodology page for the full measurement spec.