Production recommendations
This guide provides practical defaults and hardening recommendations for running OpenGateLLM in production.
General recommendations
Section titled “General recommendations”-
Use environment variables for sensitive data in configuration file, like:
settings:[...]auth_master_key: ${AUTH_MASTER_KEY}session_secret_key: ${SESSION_SECRET_KEY} -
Add
GUNICORN_CMD_ARGSenvironment variable to the deployment configuration to configure the Gunicorn server. We recommend to use the following configuration:Terminal window GUNICORN_CMD_ARGS= --workers {{ workers }} --worker-connections 1000 --timeout 240 --keep-alive 75 --graceful-timeout 75Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections. See Gunicorn documentation for more information.
Security and access control
Section titled “Security and access control”-
Use the master key only for bootstrap operations: creating the first admin role and user.
-
Do not use the master identity for day-to-day model administration. When you create a router, the model is shown with an
owned_byattribute set to the organization of the user who created it. -
Set a strong
auth_master_keyat least 32 characters, high entropy. -
Set a dedicated
session_secret_keyinstead of reusing the master key. -
Limit API key lifetime with
auth_key_max_expiration_days. -
Hide sensitive routes
adminandauthfrom public API docs.
Example:
settings: [...] auth_master_key: "your-strong-master-key" session_secret_key: "your-strong-session-secret-key" auth_key_max_expiration_days: 365 # days hidden_routers: ["admin", "auth"]For more information, see configuration file documentation.
Model configuration and routing
Section titled “Model configuration and routing”-
Add stable aliases to models so applications do not depend on provider-specific names.
-
Tune provider timeouts according to workload (
timeout: 120is a common production baseline for long generations). -
If you run multiple providers for one model, choose a load-balancing strategy that matches your objective (
shufflefor distribution,least_busyfor latency under load). -
Do not use configuration file to declare models, prefer to use the API to declare models, by endpoints or on the Playground UI (see Models configuration).
API metadata and versioning
Section titled “API metadata and versioning”- Set a clear
app_titlefor your deployment. - Align
swagger_versionwith the deployed release tag to simplify incident and support workflows.
Example:
settings: [...] app_title: "Albert API" swagger_version: "${RELEASE_TAG:-latest}"For more information, see configuration file documentation.
Observability
Section titled “Observability”- Keep usage logging enabled with
monitoring_postgres_enabledunless you explicitly accept losing usage history. Consider implementing a data retention policy for the usage table to manage database size. For more information, see usage monitoring documentation. - Keep Prometheus metrics enabled with
monitoring_prometheus_enabled. For more information, see Prometheus documentation. - For error monitoring, configure the optional
dependencies.sentrysection (there is no separatemonitoring_sentry_enabledsetting). For more information, see Sentry documentation.
Example:
dependencies: [...] sentry: dsn: ${SENTRY_DSN} environment: production
settings: [...] monitoring_postgres_enabled: true monitoring_prometheus_enabled: trueDependencies
Section titled “Dependencies”PostgreSQL
Section titled “PostgreSQL”- Use
postgresql+asyncpg://URLs. - Configure a connection pool (
pool_size,max_overflow,pool_pre_ping) for stable throughput. - Set SQL statement timeout to avoid stalled requests.
- Set application name to identify the application in PostgreSQL logs.
dependencies: [...] postgres: url: "postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE}" echo: false pool_size: 5 max_overflow: 10 pool_pre_ping: true connect_args: server_settings: application_name: "OpenGateLLM production" statement_timeout: "60s"- Use authentication and network isolation.
- Raise
max_connectionsbased on expected concurrency. - Enable timeout-related resilience parameters.
dependencies: [...] redis: url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}" max_connections: 200 socket_connect_timeout: 5 retry_on_timeout: true health_check_interval: 30 decode_responses: false socket_keepalive: trueFor the Redis server itself, also apply standard production hardening:
- Keep Redis up to date.
- Enable protected mode.
- Disable default users and create least-privilege users.
- Configure dedicated log files.
Elasticsearch (optional vector store)
Section titled “Elasticsearch (optional vector store)”- Override index defaults (
index_name,index_language,number_of_shards,number_of_replicas) for your workload. - Increase request timeout for large corpus search workloads.
- Use an index name that reflects the embedding model.
- If Elasticsearch is enabled,
settings.vector_store_modelmust reference a configured model of typetext-embeddings-inference.
dependencies:
elasticsearch: index_name: "bge-m3-1024" index_language: french number_of_shards: 24 number_of_replicas: 0 hosts: "http://${ELASTIC_HOST}:${ELASTIC_PORT}" basic_auth: - "${ELASTIC_USER}" - "${ELASTIC_PASSWORD}" request_timeout: 120 retry_on_timeout: true
settings: [...] vector_store_model: "my-embeddings-model"