Routing
The Albert API allows you to configure one or more external API providers for each model.
These providers are defined in the configuration file (see deployment).
A single model can have multiple providers.
Example Configuration
In the example below, the turbo model is configured with two providers: an OpenAI provider and a vLLM provider.
The model can be called either using its ID (turbo) or using the alias defined in the aliases field (turbo-alias).
Each provider calls a different model, specified by the model field.
For example, the OpenAI provider calls gpt-3.5-turbo, while the vLLM provider calls meta-llama/Llama-3.1-8B-Instruct.
❗️ Important:
When configuring multiple providers for a model, we strongly recommend that they are of the same type and call the same underlying model.
Otherwise, responses may have different structures.
models:
- id: turbo
type: text-generation
aliases: ['turbo-alias']
load_balancing_strategy: least_busy
providers:
- model: gpt-3.5-turbo
type: openai
args:
api_url: https://api.openai.com
api_key: sk-...sA
timeout: 60
- model: meta-llama/Llama-3.1-8B-Instruct
type: vllm
args:
api_url: http://localhost:8000
api_key: sf...Df
timeout: 60
Code Logic
When the API starts, a ModelRegistry object is initialized.
This registry contains a ModelRouter for each model defined under models in the configuration file.
Each ModelRouter contains one or more ModelProvider objects, as specified in the providers list.
ModelRegistry
ModelRegistry acts like a dictionary and allows retrieving a model by its ID or one of its aliases (see deployment).
from app.utils.lifespan import models
model = models["guillaumetell-7b"]
If the model does not exist, the API returns an HTTP 404 error (Model not found) instead of
raising a KeyError.
The returned object is a ModelRouter, which contains the model’s configuration and its associated providers.
ModelRouter
The ModelRouter object stores the model configuration and its providers.
It exposes a get_provider method to select a provider for the model.
If multiple providers are available, the method selects one according to the configured routing_strategy
(see deployment)..
The model information corresponds to what is returned by the GET /v1/models endpoint:
id: model identifier used by providerstype: model type (see models)aliases: list of model aliasesmax_context_length: maximum input length supported by the model
from app.utils.lifespan import models
model = models["guillaumetell-7b"]
provider = model.get_provider(endpoint="chat/completions")
The endpoint parameter is optional.
If not provided, get_provider checks that the model type is compatible with the requested endpoint.
ModelProvider
ModelProvider is an AsyncOpenAI-like object that handles requests to the external API.
It exposes three main attributes:
api_url: the external API URLapi_key: the external API keymodel: the external model ID
Several ModelProvider subclasses exist, such as VllmModelProvider and OpenAIModelProvider.
Each defines an ENDPOINT_TABLE mapping the supported external API endpoints to Albert API endpoints.
Routing strategies
Shuffle
The shuffle strategy randomly distributes requests among available providers in a balanced way
Celery-Based Routing
When Celery is enabled, routing moves from a synchronous API-side decision to an asynchronous distributed routing layer, allowing additional capabilities:
- dynamic QoS evaluation
- rate/latency-aware provider scoring
- congestion-aware rerouting
- retrying failed routing attempts
- distributing load across multiple workers
Each model has its own dedicated RabbitMQ routing queue (e.g., router.<model_id>), ensuring isolation, per-model scaling, and preventing noisy-neighbor effects when some models receive more traffic.
Celery-based routing is optional and activated when the queuing subsystem is enabled.
Celery Routing Workflow Diagram
Below is the sequence diagram representing the Celery-driven routing flow:
Summary
Routing in the Albert API can operate in two modes:
-
Local Routing
- Provider selection is done synchronously inside the API
- Uses configured strategies (shuffle, round_robin, least_busy, etc.)
-
Celery Routing
- Provider selection is offloaded to Celery workers
- Workers use real-time Redis metrics
- Enables QoS scoring, retries, congestion handling, and distributed scalability
- API simply waits for the provider_id stored in Redis
For high availability, dynamic load balancing, or multi-provider setups, Celery-based routing is strongly recommended.