Available Models for Inference

Last verified 24 Jun 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

The following foundation, embeddings, and reranking models are available.

We regularly update our model offerings to provide the most performant and efficient models, and deprecate older models. For information on our model deprecation policy and recommended model replacements, see Model Support Policy.

Foundation Models

Inference supports both open source and commercial foundation models. Open source models are generally published by research labs, available under open licenses. Commercial models are proprietary such as OpenAI and Anthropic models. All models are offered using DigitalOcean API access keys, but you can also bring your own provider’s API keys to access the commercial models.

We offer the following foundation models, subject to the AI Model Terms, our Service Terms, and the Terms of Service Agreement.

You can use these models in serverless inference, dedicated inference, inference routers, batch inference, agents, or Agent Development Kit (ADK). See the model-specific usage information below.

Anthropic Models

Anthropic models available on DigitalOcean Inference support tool (function) calling, prompt caching, adaptive thinking, fast mode, dynamic workflows, mid-conversation system messages, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model Model ID Context Window Max Output Tokens Serverless Inference ADK Agents Usage Notes Tentative End-of-Support
Claude Fable 5 anthropic-claude-fable-5 1,000,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens
✔️ Prompt caching
✔️ Tool calling
✔️ Adaptive thinking
ℹ️ Requires a mandatory 30-day data retention of prompts and completions for trust and safety reviews
Claude Haiku 4.5 anthropic-claude-haiku-4.5 200,000 64,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than October 2026
Claude Opus 4.8 anthropic-claude-opus-4.8 1,000,000 128,000
✔️
✔️
✔️
✔️ Evaluations judge model
✔️ Input context window of up to 1M tokens
✔️ Prompt caching
✔️ Tool calling
✔️ Fast mode
✔️ Adaptive thinking
✔️ Dynamic workflows
✔️ Mid-conversation system messages
No sooner than May 2027
Claude Opus 4.7 anthropic-claude-opus-4.7 200,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than April 2027
Claude Opus 4.6 anthropic-claude-opus-4.6 200,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than February 2027
Claude Opus 4.5 anthropic-claude-opus-4.5 200,000 64,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than November 2026
Claude Opus 4.1 anthropic-claude-4.1-opus 200,000 32,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than August 2026
Claude Sonnet 5 anthropic-claude-5-sonnet 1,000,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens
✔️ Prompt caching
✔️ Tool (function) calling
✔️ Adaptive thinking (API default: on, effort high)
✔️ Effort levels: low, medium, high, max, x-high
Claude Sonnet 4.6 anthropic-claude-4.6-sonnet 200,000 64,000
✔️
✔️
✔️
✔️ Evaluations judge model
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool (function) calling
No sooner than February 2027
Claude Sonnet 4.5 anthropic-claude-4.5-sonnet 200,000 64,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than September 2026
Arcee Models
Model Model ID Context Window Max Output Tokens Serverless Inference ADK Usage Notes
Trinity Large (Public Preview) arcee-trinity-large-thinking 128,000 128,000
✔️
✔️
✔️ Chat Completions API for sending prompts.
✔️ Prompt caching.
ℹ️ Use is subject to Public Preview Terms including Arcee Terms & Conditions.
fal Models
Model Model ID Type Use for Usage Notes
Fast SDXL fal-ai/fast-sdxl Image generation ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
Flux Schnell fal-ai/flux/schnell Image generation ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
Stable Audio 2.5 fal-ai/stable-audio-25/text-to-audio Text-to-audio ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
Multilingual TTS v2 fal-ai/elevenlabs/tts/multilingual-v2 Text-to-speech ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
OpenAI Models

OpenAI models available on DigitalOcean Inference support tool (function) calling, prompt caching, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model Model ID Context Window Max Output Tokens Serverless Inference ADK Agents Usage Notes
GPT-5.5 openai-gpt-5.5 1,000,000 128,000
✔️
✔️
✔️
✔️ Evaluations judge model
✔️ Input context window of up to 1M tokens
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 openai-gpt-5.4 400,000 128,000
✔️
✔️
✔️ Evaluations judge model
✔️ Input context window of up to 1M tokens (beta)
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 mini openai-gpt-5.4-mini 400,000 128,000
✔️
✔️
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 nano openai-gpt-5.4-nano 400,000 128,000
✔️
✔️
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 pro openai-gpt-5.4-pro 1,050,000 128,000
✔️
✔️
✔️ Evaluations judge model
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Tool calling
GPT-5.3-Codex openai-gpt-5.3-codex 400,000 128,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5.2 openai-gpt-5.2 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5.2 pro openai-gpt-5.2-pro 400,000 128,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5.1-Codex-Max openai-gpt-5.1-codex-max 400,000 128,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5 openai-gpt-5 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5 mini openai-gpt-5-mini 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5 nano openai-gpt-5-nano 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-4.1 openai-gpt-4.1 1,047,576 32,768
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-4o openai-gpt-4o 128,000 16,384
✔️
✔️
✔️
✔️ Evaluations judge model
✔️ Prompt caching
✔️ Tool calling
GPT-4o mini openai-gpt-4o-mini 128,000 16,384
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
o1 openai-o1 200,000 Not published
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
o3 openai-o3 200,000 Not published
✔️
✔️
✔️
✔️ Evaluations judge model
✔️ Prompt caching
✔️ Tool calling
o3-mini openai-o3-mini 200,000 Not published
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT Image 1 openai-gpt-image-1 Not published Not published
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT Image 1.5 openai-gpt-image-1.5 Not published Not published
✔️
✔️
GPT Image 2 openai-gpt-image-2 Not published Not published
✔️
✔️
DigitalOcean-Hosted Models
Provider Model Model ID Parameters Context Window Max Output Tokens Serverless Inference Dedicated Inference ADK Agents Usage Notes
Alibaba Qwen 2.5 14B Instruct qwen-2.5-14b-instruct 14 billion 32,768 8,192
✔️
Alibaba Qwen3-32B alibaba-qwen3-32b 32.8 billion 32,768 40,960
✔️
✔️
✔️
✔️
✔️ Evaluations judge model
Alibaba Qwen3 Coder Flash qwen3-coder-flash 30 billion 262,144 65,536
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
Alibaba Qwen 3.5 397B A17B qwen3.5-397b-a17b 397 billion 131,072 81,920
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching.
✔️ Evaluations judge model
Alibaba Qwen 3 TTS (1.7B) qwen3-tts-voicedesign 1.7 billion 32,768 Not published
✔️
✔️
ℹ️ Text-to-speech. Multimodal and generative model.
Alibaba Wan2.2-T2V-A14B wan2-2-t2v-a14b 14 billion 100 Not published
✔️
✔️
ℹ️ Text-to-video. Multimodal and generative model.
DeepSeek DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 70 billion 32,678 32,768
✔️
✔️
✔️
✔️
ℹ️ When using in a user-facing agent, we strongly recommend adding all available guardrails for a safer conversational experience.
DeepSeek DeepSeek V4 Pro deepseek-v4-pro 1.6 trillion 87,040 87,040
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
DeepSeek DeepSeek V4 Flash deepseek-4-flash 284 billion 65,536 65,536
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
DeepSeek DeepSeek V3.2 deepseek-3.2 680 billion 65,536 65,536
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
DeepSeek DeepSeek V3 deepseek-v3 671 billion 163,840 8,000
✔️
Google Gemma 4 gemma-4-31B-it 31 billion 256,000 8,192
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
MiniMax M2.5 (Public Preview) minimax-m2.5 230 billion 65,536 65,536
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
ℹ️ Use is subject to Public Preview Terms including MiniMax Model License.
Moonshot AI Kimi K2.5 kimi-k2.5 1 trillion 256,000 32,768
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
ℹ️ Use is subject to a Modified MIT license.
Moonshot AI Kimi K2.6 kimi-k2.6 1 trillion 96,000 96,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
ℹ️ Use is subject to a Modified MIT license.
Meta Llama 3.1 Instruct (8B) llama3-8b-instruct 80 billion 131,072 2,048
✔️
Meta Llama 3.3 Instruct-70B llama3.3-70b-instruct 70 billion 128,000 128,000
✔️
✔️
✔️
✔️
✔️ Evaluations judge model
Meta Llama 4 Maverick 17B 128E Instruct llama-4-maverick 400 billion 128,000 16,384
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI Ministral 3 8B Instruct ministral-3-8b-instruct-2512 8.92 billion 262,144 4,096
✔️
Mistral AI Ministral 3 14B Instruct mistral-3-14B 14 billion 262,144 16,384
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI Mistral 7B Instruct v0.3 mistral-7b-instruct-v0.3 7 billion 32,768 8,192
✔️
NVIDIA Nemotron 3 Ultra nemotron-3-ultra-550b 550 billion 131,072 131,072
✔️
✔️
✔️
✔️
✔️ Evaluations judge model
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
NVIDIA Nemotron-3-Super-120B (Public Preview) nvidia-nemotron-3-super-120b 120 billion 1,000,000 Not published
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Use is subject to Public Preview Terms including NVIDIA Model License.
NVIDIA Nemotron 3 Nano 30B A3B nemotron-3-nano-30b 30 billion 262,144 128,000
✔️
NVIDIA Nemotron 3 Nano Omni nemotron-3-nano-omni 30 billion 65,536 65,536
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Context window 65,536 tokens.
NVIDIA Nemotron Nano 12B v2 VL nemotron-nano-12b-v2-vl 12 billion 128,000 16,384
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
OpenAI gpt-oss-120b openai-gpt-oss-120b Not published 128,000 131,072
✔️
✔️
✔️
✔️
✔️ Prompt caching
OpenAI gpt-oss-20b openai-gpt-oss-20b Not published 128,000 131,072
✔️
✔️
✔️
✔️
Stability AI Stable Diffusion 3.5 Large stable-diffusion-3.5-large 8 billion 256 Not published
✔️
✔️
ℹ️ Image generation. Multimodal and generative model.
Xiaomi MiMo V2.5 mimo-v2.5 Not published 32,000 32,000
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
✔️ Tool calling
✔️ Structured outputs
✔️ Reasoning
✔️ Multilingual
ℹ️ Use is subject to the MIT License.
Xiaomi MiMo V2.5 Pro mimo-v2.5-pro 1 trillion 87,040 87,040
✔️
✔️
✔️ Text only
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
✔️ Tool calling
✔️ Structured outputs
✔️ Reasoning
✔️ Multilingual
ℹ️ Use is subject to the MIT License.
Z.ai GLM-5.2 glm-5.2 Not published 262,144 65,536
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
✔️ Text only
✔️ Tool calling
✔️ Structured outputs
✔️ Reasoning
✔️ Multilingual
ℹ️ Use is subject to the MIT License.
Z.ai GLM-5.1 glm-5.1 754 billion 65,536 65,536
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
✔️ Text only
✔️ Tool calling
✔️ Structured outputs
✔️ Reasoning
✔️ Multilingual
ℹ️ Use is subject to the MIT License.
Z.ai GLM 5 glm-5 744 billion 64,000 64,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Prompt caching
ℹ️ Use is subject to the MIT License.

Embeddings Models

An embedding model converts data into vector embeddings. DigitalOcean stores vector embeddings in an OpenSearch database cluster for use with agent knowledge bases. The following embeddings models are available on the platform, along with their token windows and recommended chunking ranges.

Alibaba Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
GTE Large (v1.5) Not available 8192 tokens 0-750 500-1000 300-500
Qwen3 Embedding 0.6B (Multilingual)
(in Public Preview)
600 million 8000 tokens 0-750 500-1000 300-500
BAAI Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
BGE M3 568M 8192 tokens 0-8192 Not Specified Not Specified
Intfloat Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
E5 Large (multilingual) 560 million 514 tokens 0-512 100-512 100-500
E5 Large (v2) Not available 512 tokens 0-512 Not Specified Not Specified
UKP Lab (Technical University of Darmstadt) Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
All-MiniLM-L6-v2 22 million 256 tokens 0-256 100-256 100-200
Multi-QA-mpnet-base-dot-v1 109 million 512 tokens 0-512 100-512 100-500

Reranking Models

Reranking models reorder retrieved results to improve relevance after the initial retrieval step, and can also be used with vector databases. DigitalOcean supports the following reranking model for knowledge base retrieval:

BAAI Models
Model Parameters Usage Notes
BGE Reranker (v2) M3 Not available Can be enabled at knowledge base creation, updated after creation.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.