Supported models

Endpoints enabled by model type

When you create a model router, you need to specify the model type. The model type determines the capabilities of the model and enables a specific set of endpoints.

Model type	Endpoint enabled	Description
`text-generation`	`/v1/chat/completions`	LLM models for text generation.
`text-classification`	`/v1/rerank`	Reranking models for text classification.
`text-embeddings-inference`	`/v1/embeddings`	Text embeddings models for text similarity and clustering.
`image-to-text`	`/v1/ocr`	OCR models for image to text conversion (only Mistral On-Premise supported).
`image-text-to-text`	`/v1/chat/completions`	Multi-modal models to chat and analyze images.
`automatic-speech-recognition`	`/v1/audio/transcriptions`	Automatic speech recognition models for audio to text conversion.

Supported self-hosted API providers

vLLM is open-source production-grade LLM server. It supports a wide range of models and is a great choice for self-hosted models.

Supported model types:

Model type	Example model
`text-generation`	openai/gpt-oss-120b
`image-text-to-text`	mistralai/Mistral-Small-3.2-24B-Instruct-2506
`text-classification`	BAAI/bge-reranker-v2-m3
`text-embeddings-inference`	intfloat/e5-mistral-7b-instruct
`automatic-speech-recognition`	openai/whisper-large-v3

vLLM documentation

Mistral On-Prem is the solution of Mistral AI to self-hosted their commercial models.

Supported model types:

Model type	Example model
`text-generation`	mistral-medium-2508
`image-text-to-text`	mistral-medium-2508
`image-to-text`	ocr-3-25-12
`text-embeddings-inference`	mistral-embed-23-12
`automatic-speech-recognition`	voxtral-mini-2507

Mistral documentation

Hugging Face Text Embeddings Inference is an open-source API dedicated for embeddings and reranking. Is a great choice for self-hosted models.

Supported model types:

Model type	Example model
`text-embeddings-inference`	BAAI/bge-m3
`text-classification`	BAAI/bge-reranker-v2-m3

Text Embeddings Inference documentation

Ollama is a local-first model runtime for self-hosted models.

Supported model types:

Model type	Example model
`text-generation`	qwen3.5:14b

Ollama documentation