Give Feedback

Inference Pricing

Last verified 1 Jul 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

Inference has a usage-based pricing model, so costs scale with your actual usage.

Bring Your Own Models (BYOM)

BYOM model weights are stored in a service-managed, non-accessible Spaces location, and are billed at $5.00 per month. We do not charge you for browsing or managing imported models in Model Catalog. Costs apply only for storing model weights and for using those models with other paid features, such as dedicated inference deployments.

Model Playground

Usage is charged at the same rate as serverless inference.

Serverless Inference

Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.

Warning

Serverless inference is prepaid only. You must maintain a positive prepaid account balance to send serverless inference requests, and we deduct usage charges from this balance. If your balance reaches $0, access is suspended until you replenish it. To add a balance or enable auto-reload, see Manage Serverless Inference Prepayment.

The following shows pricing for foundation models available through serverless inference.

Anthropic Models

Note

When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.

Claude Opus 4.6, Sonnet 5, Sonnet 4.6, and Sonnet 4.5 support an input context window of up to 1M tokens.

Model	Serverless Inference
Claude Fable 5	Input/output tokens$10.00 per 1M input tokens $50.00 per 1M output tokens Prompt caching$12.50 per 1M cache creation 5m input tokens $20.00 per 1M cache creation 1h input tokens $1.00 per 1M cache read input tokens
Claude Haiku 4.5	Input/output tokens$1.00 per 1M input tokens $5.00 per 1M output tokens Prompt caching$1.25 per 1M cache creation 5m input tokens $2.00 per 1M cache creation 1h input tokens $0.100 per 1M cache read input tokens
Claude Opus 4.8	Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
Claude Opus 4.7	Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
Claude Opus 4.6	Prompts ≤200K tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompts >200K tokens$10.00 per 1M input tokens $37.50 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
Claude Opus 4.5	Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
Claude Opus 4.1	Input/output tokens$15.00 per 1M input tokens $75.00 per 1M output tokens Prompt caching$18.75 per 1M cache creation 5m input tokens $30.00 per 1M cache creation 1h input tokens $1.50 per 1M cache read input tokens
Claude Sonnet 5	Input/output tokens$2.00 per 1M input tokens $10.00 per 1M output tokens Prompt caching$2.50 per 1M cache creation 5m input tokens $4.00 per 1M cache creation 1h input tokens $0.200 per 1M cache read input tokens
Claude Sonnet 4.6	Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
Claude Sonnet 4.5	Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens

Arcee Models

Model	Serverless Inference
Trinity Large	Input/output tokens$0.25 per 1M input tokens $0.90 per 1M output tokens Prompt caching$0.06 per 1M cache read input tokens

fal Models

Model	Serverless Inference
Fast SDXL	$0.0011 per compute second
Flux Schnell	$0.0030 per megapixel
Stable Audio 2.5 (Text-to-Audio)	$0.00058 per compute second
Multilingual TTS v2	$0.10 per 1000 characters

OpenAI Models

Note

When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.

Model	Serverless Inference
gpt-oss-120b	Input/output tokens$0.10 per 1M input tokens $0.70 per 1M output tokens
gpt-oss-20b	Input/output tokens$0.05 per 1M input tokens $0.45 per 1M output tokens
GPT-5.5	Input/output tokens$5.00 per 1M input tokens $30.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
GPT-5.4	Input/output tokens$2.50 per 1M input tokens $15.00 per 1M output tokens Prompt caching$0.25 per 1M cache read input tokens
GPT-5.4 mini	Input/output tokens$0.75 per 1M input tokens $4.50 per 1M output tokens Prompt caching$0.075 per 1M cache read input tokens
GPT-5.4 nano	Input/output tokens$0.20 per 1M input tokens $1.25 per 1M output tokens Prompt caching$0.02 per 1M cache read input tokens
GPT-5.4 pro	Input/output tokens$30.00 per 1M input tokens $180.00 per 1M output tokens
GPT-5.3-Codex	Input/output tokens$1.75 per 1M input tokens $14.00 per 1M output tokens Prompt caching$0.175 per 1M cache read input tokens
GPT-5.2	Input/output tokens$1.75 per 1M input tokens $14.00 per 1M output tokens Prompt caching$0.175 per 1M cache read input tokens
GPT-5.2 pro	Input/output tokens$21.00 per 1M input tokens $168.00 per 1M output tokens
GPT-5.1-Codex-Max	Input/output tokens$1.25 per 1M input tokens $10.00 per 1M output tokens Prompt caching$0.125 per 1M cache read input tokens
GPT-5	Input/output tokens$1.25 per 1M input tokens $10.00 per 1M output tokens Prompt caching$0.125 per 1M cache read input tokens
GPT-5 mini	Input/output tokens$0.25 per 1M input tokens $2.00 per 1M output tokens Prompt caching$0.025 per 1M cache read input tokens
GPT-5 nano	Input/output tokens$0.05 per 1M input tokens $0.40 per 1M output tokens Prompt caching$0.005 per 1M cache read input tokens
GPT-4.1	Input/output tokens$2.00 per 1M input tokens $8.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
GPT-4o	Input/output tokens$2.50 per 1M input tokens $10.00 per 1M output tokens Prompt caching$1.25 per 1M cache read input tokens
GPT-4o mini	Input/output tokens$0.15 per 1M input tokens $0.60 per 1M output tokens Prompt caching$0.075 per 1M cache read input tokens
o1	Input/output tokens$15.00 per 1M input tokens $60.00 per 1M output tokens Prompt caching$7.50 per 1M cache read input tokens
o3	Input/output tokens$2.00 per 1M input tokens $8.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
o3-mini	Input/output tokens$1.10 per 1M input tokens $4.40 per 1M output tokens Prompt caching$0.55 per 1M cache read input tokens
GPT-image-1	Input/output tokens$5.00 per 1M input tokens $40.00 per 1M output tokens Prompt caching$1.25 per 1M cache read input tokens
GPT Image 1.5	Input/output tokens$5.00 per 1M input tokens $10.00 per 1M output tokens Prompt caching$1.00 per 1M cache read input tokens
GPT Image 2	Text input$5.00 per 1M tokens Text output$0.00 per 1M tokens Text cache read$1.25 per 1M tokens Image input$8.00 per 1M tokens Image output$30.00 per 1M tokens Image cache read$2.00 per 1M tokens

DigitalOcean-Hosted Models

Provider	Model	Serverless Inference
Alibaba	Qwen3-32B	Input/output tokens$0.25 per 1M tokens $0.55 per 1M tokens
Alibaba	Qwen3 Coder Flash	Input/output tokens$0.45 per 1M tokens $1.70 per 1M tokens Prompt caching$0.09 per 1M tokens
Alibaba	Qwen 3.5 397B A17B	Input/output tokens$0.385 per 1M tokens $2.45 per 1M tokens Prompt caching$0.111 per 1M tokens
Alibaba	Qwen 3 TTS (1.7B)	$20.00 per 1M character tokens
Alibaba	Wan2.2-T2V-A14B	$0.60 per video
DeepSeek	DeepSeek R1 Distill Llama 70B	Input/output tokens$0.99 per 1M tokens $0.99 per 1M tokens
DeepSeek	DeepSeek V4 Pro	Input/output tokens$1.392 per 1M tokens $2.784 per 1M tokens Prompt caching$0.348 per 1M tokens
DeepSeek	DeepSeek V4 Flash	Input/output tokens$0.112 per 1M tokens $0.224 per 1M tokens Prompt caching$0.028 per 1M tokens
DeepSeek	DeepSeek V3.2	Input/output tokens$0.425 per 1M tokens $1.36 per 1M tokens Prompt caching$0.15 per 1M tokens
Google	Gemma 4	Input/output tokens$0.18 per 1M tokens $0.50 per 1M tokens
MiniMax	MiniMax M2.5 (Public Preview)	Input/output tokens$0.225 per 1M tokens $0.90 per 1M tokens Prompt caching$0.06 per 1M tokens
Moonshot AI	Kimi K2.5	Input/output tokens$0.375 per 1M tokens $2.025 per 1M tokens Prompt caching$0.203 per 1M tokens
Moonshot AI	Kimi K2.6	Input/output tokens$0.76 per 1M tokens $3.20 per 1M tokens Prompt caching$0.19 per 1M tokens
Meta	Llama 3.3 Instruct-70B	Input/output tokens$0.65 per 1M tokens $0.65 per 1M tokens
Meta	Llama 4 Maverick 17B 128E Instruct	Input/output tokens$0.25 per 1M tokens $0.87 per 1M tokens
Mistral AI	Ministral 3 14B Instruct	Input/output tokens$0.20 per 1M tokens $0.20 per 1M tokens
NVIDIA	Nemotron 3 Ultra	Input/output tokens$0.90 per 1M tokens $1.70 per 1M tokens
NVIDIA	Nemotron-3-Super-120B (Public Preview)	Input/output tokens$0.21 per 1M tokens $0.455 per 1M tokens
NVIDIA	Nemotron Nano 3 Omni	Input/output tokens$0.50 per 1M tokens $0.90 per 1M tokens
NVIDIA	Nemotron Nano 12B v2 VL	Input/output tokens$0.20 per 1M tokens $0.60 per 1M tokens
Stability AI	Stable Diffusion 3.5 Large	$0.08 per image
Xiaomi	MiMo-V2.5	Input/output tokens$0.105 per 1M tokens $0.28 per 1M tokens Prompt caching$0.028 per 1M tokens
Xiaomi	MiMo V2.5 Pro	Input/output tokens$0.60 per 1M tokens $3.00 per 1M tokens Prompt caching$0.16 per 1M tokens
Z.ai	GLM-5.2	Input/output tokens$1.05 per 1M tokens $4.40 per 1M tokens Prompt caching$0.21 per 1M tokens
Z.ai	GLM-5.1	Input/output tokens$0.975 per 1M tokens $4.30 per 1M tokens Prompt caching$0.26 per 1M tokens
Z.ai	GLM 5	Input/output tokens$0.75 per 1M tokens $2.40 per 1M tokens Prompt caching$0.20 per 1M tokens

Dedicated Inference

Dedicated Inference is billed per GPU-hour based on the GPU you use.

GPU	Price
AMD MI300X	$2.59 per hour
AMD MI300X (8x)	$20.70 per hour
AMD MI325X	$2.98 per hour
AMD MI325X (8x)	$23.82 per hour
AMD MI350X	$6.89 per hour
NVIDIA B300	$10.39 per hour
NVIDIA B300 (8x)	$83.10 per hour
NVIDIA H100	$4.41 per hour
NVIDIA H100 (8x)	$30.32 per hour
NVIDIA H200	$4.47 per hour
NVIDIA H200 (8x)	$35.78 per hour

Batch Inference

Batch inference is charged at up to a 50% discount on OpenAI and Anthropic models.

You are only charged for completed requests. If a batch job fails, is blocked by guardrails, or expires partway through, requests that were not processed are not charged.

Inference Router public

Inference Router is available in public preview and enabled for all users. You can contact support for questions or assistance.

There is no additional cost to using Inference Router during public preview. Using inference routing forwards requests to foundation models for serverless inference and dedicated inference. You are billed for the models that serve each request.

Tools Usage

Knowledge base retrieval, DigitalOcean MCP servers, and Anthropic- and OpenAI-only tools, such as tool search and computer use, do not incur additional charges other than the standard per-token inference costs.

The following tools incur charges in addition to the standard per-token inference costs:

Web search: $10 per 1000 requests, not charged when using Anthropic models
Web fetch: $3 per 1000 requests, not charged when using Anthropic models

DigitalOcean Evaluations

DigitalOcean Evaluations can use a model or an Inference Router configuration as the candidate. Evaluations that use candidate models deployed on Serverless Inference, or judge models, are charged at the same token rates as serverless inference.

Candidate models deployed on Dedicated Inference do not incur additional evaluation-specific token charges.

Storage for evaluation datasets and evaluation results is currently provided at no additional charge. DigitalOcean may introduce or modify storage fees in the future.

Knowledge Bases

Knowledge base pricing is shown per million tokens, but billing is calculated per thousand tokens.

You’re billed for both indexing and storage:

Tokens used for indexing and retrieval query vectorization: We charge for tokens used to generate embeddings during indexing and to vectorize user queries during retrieval. Both use the same embeddings model pricing.

Indexing pricing is the same for manual and auto-indexing. Indexing charges apply only when changes are detected, such as new, updated, or deleted files or URLs. If auto-indexing is paused or no changes are found, there are no indexing charges.

Note

Retrieval requests sent through a MCP server are billed the same as retrieval requests sent directly to the knowledge base retrieve endpoint. This includes the tokens used to vectorize the retrieval query with the selected embeddings model.

For example, a 10 MB dataset is about 3 million tokens, and a 1 GB dataset is about 250 million tokens.

Actual costs depend on the embeddings model:

Model	Price
`all-mini-lm-l6-v2`	$0.009 per 1M input tokens
`multi-qa-mpnet-base-dot-v1`	$0.009 per 1M input tokens
`gte-large-en-v1.5`	$0.09 per 1M input tokens
`Qwen3 Embedding 0.6B`	$0.04 per 1,000,000 tokens
`BGE-M3`	$0.02 per 1,000,000 tokens
`E5 Large V2`	$0.02 per 1,000,000 tokens

Note

One token is roughly four characters (approximately 75 words per 100 tokens). Non-Latin scripts, emojis, or binary data may increase token counts.

Reranking tokens: If reranking is enabled, tokens used to rerank results are billed based on the selected reranking model. For supported reranking models, see available reranking models.

Model Price

BGE Reranker v2 m3 $0.01 per 1M reranking tokens
Storage: Embeddings are stored in OpenSearch. See OpenSearch pricing.

Model	Price
`BGE Reranker v2 m3`	$0.01 per 1M reranking tokens

Chunking has no separate charge. Chunking costs depend on embedding token usage, OpenSearch database, and the selected embeddings model.

Chunking strategy cost depends on how many tokens the strategy embeds and returns:

Section-based and fixed length chunking are the most cost-efficient because they use simple splitting and have predictable token usage.
Semantic chunking costs more because it uses the embeddings model to detect semantic boundaries and embed final chunks, often resulting in 1.5 to 3 times more indexing tokens.
Hierarchical chunking slightly increases indexing cost by creating parent and child embeddings. It can also increase retrieval cost because agents receive both child and parent chunks for each lookup.

Changing your chunking strategy or configuration requires re-indexing the affected data source, which consumes additional tokens. For guidance on chunking configurations and best practices, see our chunking parameters reference and chunking best practices.

If you use RAG Playground, answer generation is billed separately based on the selected serverless inference model. Free tokens for RAG Playground are not separate; they are shared with Model Playground.

Agent Platform

Agent creation is free. We charge for model usage and for additional features like knowledge bases and guardrails. We display prices per million tokens and bill per thousand tokens for accuracy.

Model usage is billed by DigitalOcean. You are charged for all input and output tokens processed by the agent at the same token rates as serverless inference. Token usage depends on factors such as input length, agent instructions, attached knowledge bases, and configuration settings. To optimize usage, test your agents and adjust their parameters.

Agent Guardrails

Charges apply for all tokens processed through agent guardrails:

Guardrail	Price
Content Moderation	$0.20 per 1,000,000 tokens
Jailbreak Detection	$0.20 per 1,000,000 tokens
Sensitive Data Detection	$0.34 per 1,000,000 tokens

Costs are per token. Creating, editing, or duplicating guardrails has no additional cost.

Functions

If you attach DigitalOcean Functions to your agent, you are billed at functions pricing.

Agent Evaluations

Agent evaluations are charged by token usage at the same rates as model usage.

Agent Development Kit public

You are not charged for using the Agent Development Kit during public preview. However, you are billed for other DigitalOcean Inference features you use with your agent deployment:

We charge for model usage for Agent Development Kit (ADK). If you are using a DigitalOcean-hosted model, you are charged for those model keys.

Note

For General Availability, agent deployment hosting, measured in GiB-sec, will be charged. We will also be charging for judge input and output tokens, which are the tokens used for judging the agent inputs and outputs against the test case’s chosen metrics. These costs are waived during public preview.

Inference Pricing

Bring Your Own Models (BYOM)

Model Playground

Serverless Inference

Dedicated Inference

Batch Inference

Inference Router public

Tools Usage

DigitalOcean Evaluations

Knowledge Bases

Agent Platform

Agent Guardrails

Functions

Agent Evaluations

Agent Development Kit public

We can't find any results for your search.