Ollama Cloud pricing

Ollama Cloud bills by GPU time, not by token count. The information below reflects the plans shown at ollama.com/cloud as of June 2026. Check that page for the current rates — they can change.

Plan comparison

Free $0 forever

1 concurrent cloud model
Light cloud usage included
Unlimited local model usage
40,000+ community integrations
CLI, API, and desktop app access

Good for: evaluating cloud models, local-first workflows with occasional cloud calls.

Pro $20/mo or $200/yr (billed annually)

3 concurrent cloud models
~50× more cloud usage than Free
Extra usage balance purchasable
Usage alert at 90% of limit
Weekly limits reset every 7 days

Good for: developers, power users, coding assistants running throughout the day.

Max $100/mo

10 concurrent cloud models
~5× more usage than Pro (~250× Free)
Extra usage balance purchasable
Same reset cadence as Pro

Good for: teams using a shared key, high-throughput agentic pipelines, running heavy models (level 3–4) regularly.

Team TBA coming soon

Shared team usage pool
Centralised billing and admin
SSO support
Model access controls
Priority support + dedicated Slack

Contact [email protected] for details.

How Ollama Cloud billing works

Ollama Cloud charges by GPU time, not by the number of tokens you generate. Each model has a usage difficulty level from 1 to 4:

Level 1 — lightweight models (e.g. gpt-oss:20b). Low GPU time per request.
Level 2 — mid-size models. Moderate usage.
Level 3 — large models. Significant usage per call.
Level 4 — extra-heavy models (e.g. deepseek-v4-pro). Highest usage per call.

Because billing is GPU-time-based, Ollama states they do not cap you at a fixed token count — a slower model at the same level costs the same GPU time as a faster one.

Extra usage balance: Pro and Max users can purchase additional usage after exhausting their plan balance. Exact per-unit pricing is not publicly listed — check your account dashboard or ollama.com/cloud.

Concurrency limits

The number of cloud model requests you can have in flight simultaneously is plan-gated:

Plan	Concurrent cloud models
Free	1
Pro	3
Max	10

Requests beyond the concurrency limit are queued. When the queue fills, new requests are rejected until a slot opens.

Usage resets

Usage limits reset on a rolling weekly cycle:

Weekly limits reset every 7 days.

An email alert fires at 90% of your plan limit (can be disabled in account settings).

How these prices affect benchmark results

The benchmarks on this site run on the Pro plan. Speed numbers reflect Pro-tier infrastructure. Free-plan users may see different throughput under load — Ollama Cloud prioritises paid plans during peak times.

See the methodology page for the full measurement spec.

Source: ollama.com/cloud, retrieved June 2026. Prices and limits may change — always verify at the source.