Supported models
Endpoints enabled by model type
Section titled “Endpoints enabled by model type”When you create a model router, you need to specify the model type. The model type determines the capabilities of the model and enables a specific set of endpoints.
| Model type | Endpoint enabled | Description |
|---|---|---|
text-generation | /v1/chat/completions | LLM models for text generation. |
text-classification | /v1/rerank | Reranking models for text classification. |
text-embeddings-inference | /v1/embeddings | Text embeddings models for text similarity and clustering. |
image-to-text | /v1/ocr | OCR models for image to text conversion (only Mistral On-Premise supported). |
image-text-to-text | /v1/chat/completions | Multi-modal models to chat and analyze images. |
automatic-speech-recognition | /v1/audio/transcriptions | Automatic speech recognition models for audio to text conversion. |
Supported self-hosted API providers
Section titled “Supported self-hosted API providers”vLLM is open-source production-grade LLM server. It supports a wide range of models and is a great choice for self-hosted models.
Supported model types:
| Model type | Example model |
|---|---|
text-generation | openai/gpt-oss-120b |
image-text-to-text | mistralai/Mistral-Small-3.2-24B-Instruct-2506 |
text-classification | BAAI/bge-reranker-v2-m3 |
text-embeddings-inference | intfloat/e5-mistral-7b-instruct |
automatic-speech-recognition | openai/whisper-large-v3 |
Mistral On-Prem is the solution of Mistral AI to self-hosted their commercial models.
Supported model types:
| Model type | Example model |
|---|---|
text-generation | mistral-medium-2508 |
image-text-to-text | mistral-medium-2508 |
image-to-text | ocr-3-25-12 |
text-embeddings-inference | mistral-embed-23-12 |
automatic-speech-recognition | voxtral-mini-2507 |
Hugging Face Text Embeddings Inference is an open-source API dedicated for embeddings and reranking. Is a great choice for self-hosted models.
Supported model types:
| Model type | Example model |
|---|---|
text-embeddings-inference | BAAI/bge-m3 |
text-classification | BAAI/bge-reranker-v2-m3 |
Ollama is a local-first model runtime for self-hosted models.
Supported model types:
| Model type | Example model |
|---|---|
text-generation | qwen3.5:14b |