Give Feedback

Available Models for Inference

Last verified 24 Jun 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

The following foundation, embeddings, and reranking models are available.

Note

For pricing information, see the pricing page.

We regularly update our model offerings to provide the most performant and efficient models, and deprecate older models. For information on our model deprecation policy and recommended model replacements, see Model Support Policy.

Foundation Models

Inference supports both open source and commercial foundation models. Open source models are generally published by research labs, available under open licenses. Commercial models are proprietary such as OpenAI and Anthropic models. All models are offered using DigitalOcean API access keys, but you can also bring your own provider’s API keys to access the commercial models.

We offer the following foundation models, subject to the AI Model Terms, our Service Terms, and the Terms of Service Agreement.

You can use these models in serverless inference, dedicated inference, inference routers, batch inference, agents, or Agent Development Kit (ADK). See the model-specific usage information below.

Anthropic Models

Anthropic models available on DigitalOcean Inference support tool (function) calling, prompt caching, adaptive thinking, fast mode, dynamic workflows, mid-conversation system messages, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model	Model ID	Context Window	Max Output Tokens	Serverless Inference	ADK	Agents	Usage Notes	Tentative End-of-Support
Claude Fable 5	`anthropic-claude-fable-5`	1,000,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Prompt caching ✔️ Tool calling ✔️ Adaptive thinking ℹ️ Requires a mandatory 30-day data retention of prompts and completions for trust and safety reviews
Claude Haiku 4.5	`anthropic-claude-haiku-4.5`	200,000	64,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling	No sooner than October 2026
Claude Opus 4.8	`anthropic-claude-opus-4.8`	1,000,000	128,000	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens ✔️ Prompt caching ✔️ Tool calling ✔️ Fast mode ✔️ Adaptive thinking ✔️ Dynamic workflows ✔️ Mid-conversation system messages	No sooner than May 2027
Claude Opus 4.7	`anthropic-claude-opus-4.7`	200,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool calling	No sooner than April 2027
Claude Opus 4.6	`anthropic-claude-opus-4.6`	200,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool calling	No sooner than February 2027
Claude Opus 4.5	`anthropic-claude-opus-4.5`	200,000	64,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling	No sooner than November 2026
Claude Opus 4.1	`anthropic-claude-4.1-opus`	200,000	32,000	✔️	✔️		✔️ Prompt caching ✔️ Tool calling	No sooner than August 2026
Claude Sonnet 5	`anthropic-claude-5-sonnet`	1,000,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Prompt caching ✔️ Tool (function) calling ✔️ Adaptive thinking (API default: on, effort high) ✔️ Effort levels: low, medium, high, max, x-high
Claude Sonnet 4.6	`anthropic-claude-4.6-sonnet`	200,000	64,000	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool (function) calling	No sooner than February 2027
Claude Sonnet 4.5	`anthropic-claude-4.5-sonnet`	200,000	64,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool calling	No sooner than September 2026

Arcee Models

Model	Model ID	Context Window	Max Output Tokens	Serverless Inference	ADK	Usage Notes
Trinity Large (Public Preview)	`arcee-trinity-large-thinking`	128,000	128,000	✔️	✔️	✔️ Chat Completions API for sending prompts. ✔️ Prompt caching. ℹ️ Use is subject to Public Preview Terms including Arcee Terms & Conditions.

fal Models

Model	Model ID	Type	Use for	Usage Notes
Fast SDXL	`fal-ai/fast-sdxl`	Image generation	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model
Flux Schnell	`fal-ai/flux/schnell`	Image generation	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model
Stable Audio 2.5	`fal-ai/stable-audio-25/text-to-audio`	Text-to-audio	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model
Multilingual TTS v2	`fal-ai/elevenlabs/tts/multilingual-v2`	Text-to-speech	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model

OpenAI Models

OpenAI models available on DigitalOcean Inference support tool (function) calling, prompt caching, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model	Model ID	Context Window	Max Output Tokens	Serverless Inference	ADK	Agents	Usage Notes
GPT-5.5	`openai-gpt-5.5`	1,000,000	128,000	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens ✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4	`openai-gpt-5.4`	400,000	128,000	✔️	✔️		✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens (beta) ✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4 mini	`openai-gpt-5.4-mini`	400,000	128,000	✔️	✔️		✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4 nano	`openai-gpt-5.4-nano`	400,000	128,000	✔️	✔️		✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4 pro	`openai-gpt-5.4-pro`	1,050,000	128,000	✔️	✔️		✔️ Evaluations judge model ✔️ Only the Responses API for sending prompts for serverless inference ✔️ Tool calling
GPT-5.3-Codex	`openai-gpt-5.3-codex`	400,000	128,000	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT-5.2	`openai-gpt-5.2`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-5.2 pro	`openai-gpt-5.2-pro`	400,000	128,000	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT-5.1-Codex-Max	`openai-gpt-5.1-codex-max`	400,000	128,000	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT-5	`openai-gpt-5`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-5 mini	`openai-gpt-5-mini`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-5 nano	`openai-gpt-5-nano`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-4.1	`openai-gpt-4.1`	1,047,576	32,768	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-4o	`openai-gpt-4o`	128,000	16,384	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Prompt caching ✔️ Tool calling
GPT-4o mini	`openai-gpt-4o-mini`	128,000	16,384	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
o1	`openai-o1`	200,000	Not published	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
o3	`openai-o3`	200,000	Not published	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Prompt caching ✔️ Tool calling
o3-mini	`openai-o3-mini`	200,000	Not published	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT Image 1	`openai-gpt-image-1`	Not published	Not published	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT Image 1.5	`openai-gpt-image-1.5`	Not published	Not published	✔️	✔️
GPT Image 2	`openai-gpt-image-2`	Not published	Not published	✔️	✔️

DigitalOcean-Hosted Models

Provider	Model	Model ID	Parameters	Context Window	Max Output Tokens	Serverless Inference	Dedicated Inference	ADK	Agents	Usage Notes
Alibaba	Qwen 2.5 14B Instruct	`qwen-2.5-14b-instruct`	14 billion	32,768	8,192		✔️
Alibaba	Qwen3-32B	`alibaba-qwen3-32b`	32.8 billion	32,768	40,960	✔️	✔️	✔️	✔️	✔️ Evaluations judge model
Alibaba	Qwen3 Coder Flash	`qwen3-coder-flash`	30 billion	262,144	65,536	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
Alibaba	Qwen 3.5 397B A17B	`qwen3.5-397b-a17b`	397 billion	131,072	81,920	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching. ✔️ Evaluations judge model
Alibaba	Qwen 3 TTS (1.7B)	`qwen3-tts-voicedesign`	1.7 billion	32,768	Not published	✔️		✔️		ℹ️ Text-to-speech. Multimodal and generative model.
Alibaba	Wan2.2-T2V-A14B	`wan2-2-t2v-a14b`	14 billion	100	Not published	✔️		✔️		ℹ️ Text-to-video. Multimodal and generative model.
DeepSeek	DeepSeek R1 Distill Llama 70B	`deepseek-r1-distill-llama-70b`	70 billion	32,678	32,768	✔️	✔️	✔️	✔️	ℹ️ When using in a user-facing agent, we strongly recommend adding all available guardrails for a safer conversational experience.
DeepSeek	DeepSeek V4 Pro	`deepseek-v4-pro`	1.6 trillion	87,040	87,040	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
DeepSeek	DeepSeek V4 Flash	`deepseek-4-flash`	284 billion	65,536	65,536	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
DeepSeek	DeepSeek V3.2	`deepseek-3.2`	680 billion	65,536	65,536	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
DeepSeek	DeepSeek V3	`deepseek-v3`	671 billion	163,840	8,000		✔️
Google	Gemma 4	`gemma-4-31B-it`	31 billion	256,000	8,192	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
MiniMax	M2.5 (Public Preview)	`minimax-m2.5`	230 billion	65,536	65,536	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to Public Preview Terms including MiniMax Model License.
Moonshot AI	Kimi K2.5	`kimi-k2.5`	1 trillion	256,000	32,768	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to a Modified MIT license.
Moonshot AI	Kimi K2.6	`kimi-k2.6`	1 trillion	96,000	96,000	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to a Modified MIT license.
Meta	Llama 3.1 Instruct (8B)	`llama3-8b-instruct`	80 billion	131,072	2,048		✔️
Meta	Llama 3.3 Instruct-70B	`llama3.3-70b-instruct`	70 billion	128,000	128,000	✔️	✔️	✔️	✔️	✔️ Evaluations judge model
Meta	Llama 4 Maverick 17B 128E Instruct	`llama-4-maverick`	400 billion	128,000	16,384	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI	Ministral 3 8B Instruct	`ministral-3-8b-instruct-2512`	8.92 billion	262,144	4,096		✔️
Mistral AI	Ministral 3 14B Instruct	`mistral-3-14B`	14 billion	262,144	16,384	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI	Mistral 7B Instruct v0.3	`mistral-7b-instruct-v0.3`	7 billion	32,768	8,192		✔️
NVIDIA	Nemotron 3 Ultra	`nemotron-3-ultra-550b`	550 billion	131,072	131,072	✔️	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
NVIDIA	Nemotron-3-Super-120B (Public Preview)	`nvidia-nemotron-3-super-120b`	120 billion	1,000,000	Not published	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ℹ️ Use is subject to Public Preview Terms including NVIDIA Model License.
NVIDIA	Nemotron 3 Nano 30B A3B	`nemotron-3-nano-30b`	30 billion	262,144	128,000		✔️
NVIDIA	Nemotron 3 Nano Omni	`nemotron-3-nano-omni`	30 billion	65,536	65,536	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ℹ️ Context window 65,536 tokens.
NVIDIA	Nemotron Nano 12B v2 VL	`nemotron-nano-12b-v2-vl`	12 billion	128,000	16,384	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
OpenAI	gpt-oss-120b	`openai-gpt-oss-120b`	Not published	128,000	131,072	✔️	✔️	✔️	✔️	✔️ Prompt caching
OpenAI	gpt-oss-20b	`openai-gpt-oss-20b`	Not published	128,000	131,072	✔️	✔️	✔️	✔️
Stability AI	Stable Diffusion 3.5 Large	`stable-diffusion-3.5-large`	8 billion	256	Not published	✔️		✔️		ℹ️ Image generation. Multimodal and generative model.
Xiaomi	MiMo V2.5	`mimo-v2.5`	Not published	32,000	32,000	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Xiaomi	MiMo V2.5 Pro	`mimo-v2.5-pro`	1 trillion	87,040	87,040	✔️	✔️			✔️ Text only ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Z.ai	GLM-5.2	`glm-5.2`	Not published	262,144	65,536	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Text only ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Z.ai	GLM-5.1	`glm-5.1`	754 billion	65,536	65,536	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Text only ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Z.ai	GLM 5	`glm-5`	744 billion	64,000	64,000	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to the MIT License.

Embeddings Models

An embedding model converts data into vector embeddings. DigitalOcean stores vector embeddings in an OpenSearch database cluster for use with agent knowledge bases. The following embeddings models are available on the platform, along with their token windows and recommended chunking ranges.

Alibaba Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
GTE Large (v1.5)	Not available	8192 tokens	0-750	500-1000	300-500
Qwen3 Embedding 0.6B (Multilingual) (in Public Preview)	600 million	8000 tokens	0-750	500-1000	300-500

BAAI Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
BGE M3	568M	8192 tokens	0-8192	Not Specified	Not Specified

Intfloat Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
E5 Large (multilingual)	560 million	514 tokens	0-512	100-512	100-500
E5 Large (v2)	Not available	512 tokens	0-512	Not Specified	Not Specified

UKP Lab (Technical University of Darmstadt) Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
All-MiniLM-L6-v2	22 million	256 tokens	0-256	100-256	100-200
Multi-QA-mpnet-base-dot-v1	109 million	512 tokens	0-512	100-512	100-500

Reranking Models

Reranking models reorder retrieved results to improve relevance after the initial retrieval step, and can also be used with vector databases. DigitalOcean supports the following reranking model for knowledge base retrieval:

BAAI Models

Model	Parameters	Usage Notes
BGE Reranker (v2) M3	Not available	Can be enabled at knowledge base creation, updated after creation.

Available Models for Inference

Foundation Models

Embeddings Models

Reranking Models

We can't find any results for your search.