New Same benchmark, more providers: Ollama vs OpenCode Zen & Go. Compare on TokenDyno →

Ollama Cloud pricing

Ollama Cloud bills by GPU time, not by token count. The information below reflects the plans shown at ollama.com/cloud as of June 2026. Check that page for the current rates — they can change.

Plan comparison

Free $0 forever
  • 1 concurrent cloud model
  • Light cloud usage included
  • Unlimited local model usage
  • 40,000+ community integrations
  • CLI, API, and desktop app access

Good for: evaluating cloud models, local-first workflows with occasional cloud calls.

Max $100/mo
  • 10 concurrent cloud models
  • ~5× more usage than Pro (~250× Free)
  • Extra usage balance purchasable
  • Same reset cadence as Pro

Good for: teams using a shared key, high-throughput agentic pipelines, running heavy models (level 3–4) regularly.

Team TBA coming soon
  • Shared team usage pool
  • Centralised billing and admin
  • SSO support
  • Model access controls
  • Priority support + dedicated Slack

Contact [email protected] for details.

How Ollama Cloud billing works

Ollama Cloud charges by GPU time, not by the number of tokens you generate. Each model has a usage difficulty level from 1 to 4:

  • Level 1 — lightweight models (e.g. gpt-oss:20b). Low GPU time per request.
  • Level 2 — mid-size models. Moderate usage.
  • Level 3 — large models. Significant usage per call.
  • Level 4 — extra-heavy models (e.g. deepseek-v4-pro). Highest usage per call.

Because billing is GPU-time-based, Ollama states they do not cap you at a fixed token count — a slower model at the same level costs the same GPU time as a faster one.

Extra usage balance: Pro and Max users can purchase additional usage after exhausting their plan balance. Exact per-unit pricing is not publicly listed — check your account dashboard or ollama.com/cloud.

Concurrency limits

The number of cloud model requests you can have in flight simultaneously is plan-gated:

Plan Concurrent cloud models
Free1
Pro3
Max10

Requests beyond the concurrency limit are queued. When the queue fills, new requests are rejected until a slot opens.

Usage resets

Usage limits reset on a rolling weekly cycle:

  • Weekly limits reset every 7 days.

An email alert fires at 90% of your plan limit (can be disabled in account settings).

How these prices affect benchmark results

The benchmarks on this site run on the Pro plan. Speed numbers reflect Pro-tier infrastructure. Free-plan users may see different throughput under load — Ollama Cloud prioritises paid plans during peak times.

See the methodology page for the full measurement spec.

Source: ollama.com/cloud, retrieved June 2026. Prices and limits may change — always verify at the source.