Production recommendations
This guide provides practical defaults and hardening recommendations for running OpenGateLLM in production.
General recommendations
Section titled “General recommendations”-
Use environment variables for sensitive data in configuration file, like:
settings:[...]auth_secret_key: ${AUTH_SECRET_KEY}session_secret_key: ${SESSION_SECRET_KEY} -
Add
GUNICORN_CMD_ARGSenvironment variable to the deployment configuration to configure the Gunicorn server. We recommend to use the following configuration:Terminal window GUNICORN_CMD_ARGS= --workers {{ workers }} --worker-connections 1000 --timeout 240 --keep-alive 75 --graceful-timeout 75Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections. See Gunicorn documentation for more information.
Security and access control
Section titled “Security and access control”-
Change default
auth_secret_keysettings parameter to a strong, random string with at least 32 characters.After you have launched the API for the first time, an administrator account is created with the default values (
adminandchangeme). This account is intended to let you create your first administrator account. We recommend deleting it after you have created your own admin account. On subsequent restarts, the bootstrap administrator account will not be created if an administrator account already exists.Example:
settings:[...]auth_secret_key: "your-strong-secret-key"# delete this account after you have created your own admin account and clean your config fileauth_bootsrap_admin_username: "admin@example.com"auth_bootsrap_admin_password: "your-strong-password" -
Limit API key lifetime with
auth_key_max_expiration_dayssettings parameter.Example:
settings:[...]auth_key_max_expiration_days: 365 # days -
Hide sensitive routes
adminandauthfrom public API docs withhidden_routerssettings parameter.Example:
settings:[...]hidden_routers: ["admin", "auth"]Moreover, we recommend to not expose the
adminroutes on internet (only accessible from a VPN). For example with Nginx configuration:# block admin endpoints on the public api hostlocation ~ ^/v1/admin(/|$) {return 404;}# other routes are exposed to the publiclocation / {proxy_pass http://<api_host>:<api_port>;[...]} -
Disable admin pages of the Playground UI with
playground_disabled_pagessettings parameter for internet exposed Playground UI.Example:
settings:[...]playground_disabled_pages: ["roles", "users", "organizations", "routers", "providers"]We recommend to deploy two different Playground UIs, one accessible from a VPN with all pages enabled and one accessible from internet with only non admin pages enabled.
For more information, see configuration file documentation.
Model configuration and routing
Section titled “Model configuration and routing”-
Do not use configuration file to declare models, prefer to use the API to declare models, by endpoints or on the Playground UI (see Models configuration).
The configuration of models in the configuration file is only used for the initial bootstrap of the API. The model section of the configuration is ignored if any models are already registered in the database.
-
Add stable aliases to models so applications do not depend on provider-specific names.
-
Tune provider timeouts according to workload (
timeout: 120is a common production baseline for long generations). -
If you run multiple providers for one model, choose a load-balancing strategy that matches your objective (
shufflefor distribution,least_busyfor latency under load).
API metadata and versioning
Section titled “API metadata and versioning”- Set a clear
app_titlefor your deployment. - Align
swagger_versionwith the deployed release tag to simplify incident and support workflows.
Example:
settings: [...] app_title: "Albert API" swagger_version: "${RELEASE_TAG:-latest}"For more information, see configuration file documentation.
Observability
Section titled “Observability”- Keep usage logging enabled with
monitoring_postgres_enabledunless you explicitly accept losing usage history. Consider implementing a data retention policy for the usage table to manage database size. For more information, see usage monitoring documentation. - Keep Prometheus metrics enabled with
monitoring_prometheus_enabled. For more information, see Prometheus documentation. - For error monitoring, configure the optional
dependencies.sentrysection (there is no separatemonitoring_sentry_enabledsetting). For more information, see Sentry documentation.
Example:
dependencies: [...] sentry: dsn: ${SENTRY_DSN} environment: production
settings: [...] monitoring_postgres_enabled: true monitoring_prometheus_enabled: trueDependencies
Section titled “Dependencies”PostgreSQL
Section titled “PostgreSQL”- Use
postgresql+asyncpg://URLs. - Configure a connection pool (
pool_size,max_overflow,pool_pre_ping) for stable throughput. - Set SQL statement timeout to avoid stalled requests.
- Set application name to identify the application in PostgreSQL logs.
dependencies: [...] postgres: url: "postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE}" echo: false pool_size: 5 max_overflow: 10 pool_pre_ping: true connect_args: server_settings: application_name: "OpenGateLLM production" statement_timeout: "60s"- Use authentication and network isolation.
- Raise
max_connectionsbased on expected concurrency. - Enable timeout-related resilience parameters.
dependencies: [...] redis: url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}" max_connections: 200 socket_connect_timeout: 5 retry_on_timeout: true health_check_interval: 30 decode_responses: false socket_keepalive: trueFor the Redis server itself, also apply standard production hardening:
- Keep Redis up to date.
- Enable protected mode.
- Disable default users and create least-privilege users.
- Configure dedicated log files.
Elasticsearch (optional vector store)
Section titled “Elasticsearch (optional vector store)”- Override index defaults (
index_name,index_language,number_of_shards,number_of_replicas) for your workload. - Increase request timeout for large corpus search workloads.
- Use an index name that reflects the embedding model.
- If Elasticsearch is enabled,
settings.vector_store_modelmust reference a configured model of typetext-embeddings-inference.
dependencies:
elasticsearch: index_name: "bge-m3-1024" index_language: french number_of_shards: 24 number_of_replicas: 0 hosts: "http://${ELASTIC_HOST}:${ELASTIC_PORT}" basic_auth: - "${ELASTIC_USER}" - "${ELASTIC_PASSWORD}" request_timeout: 120 retry_on_timeout: true
settings: [...] vector_store_model: "my-embeddings-model"Playground UI
Section titled “Playground UI”-
Use Redis as state manager mode. To activate Redis, you need to set the
dependencies.redissection in the configuration file.Example:
dependencies:[...]redis:url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}" -
Increase the timeout of the OpenGateLLM API with
playground_opengatellm_timeoutsettings parameter.Example:
settings:[...]playground_opengatellm_timeout: 300