Skip to content

Production recommendations

This guide provides practical defaults and hardening recommendations for running OpenGateLLM in production.

  • Use environment variables for sensitive data in configuration file, like:

    settings:
    [...]
    auth_secret_key: ${AUTH_SECRET_KEY}
    session_secret_key: ${SESSION_SECRET_KEY}
  • Add GUNICORN_CMD_ARGS environment variable to the deployment configuration to configure the Gunicorn server. We recommend to use the following configuration:

    Terminal window
    GUNICORN_CMD_ARGS= --workers {{ workers }} --worker-connections 1000 --timeout 240 --keep-alive 75 --graceful-timeout 75

    Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections. See Gunicorn documentation for more information.

  • Change default auth_secret_key settings parameter to a strong, random string with at least 32 characters.

    After you have launched the API for the first time, an administrator account is created with the default values (admin and changeme). This account is intended to let you create your first administrator account. We recommend deleting it after you have created your own admin account. On subsequent restarts, the bootstrap administrator account will not be created if an administrator account already exists.

    Example:

    settings:
    [...]
    auth_secret_key: "your-strong-secret-key"
    # delete this account after you have created your own admin account and clean your config file
    auth_bootsrap_admin_username: "admin@example.com"
    auth_bootsrap_admin_password: "your-strong-password"
  • Limit API key lifetime with auth_key_max_expiration_days settings parameter.

    Example:

    settings:
    [...]
    auth_key_max_expiration_days: 365 # days
  • Hide sensitive routes admin and auth from public API docs with hidden_routers settings parameter.

    Example:

    settings:
    [...]
    hidden_routers: ["admin", "auth"]

    Moreover, we recommend to not expose the admin routes on internet (only accessible from a VPN). For example with Nginx configuration:

    # block admin endpoints on the public api host
    location ~ ^/v1/admin(/|$) {
    return 404;
    }
    # other routes are exposed to the public
    location / {
    proxy_pass http://<api_host>:<api_port>;
    [...]
    }
  • Disable admin pages of the Playground UI with playground_disabled_pages settings parameter for internet exposed Playground UI.

    Example:

    settings:
    [...]
    playground_disabled_pages: ["roles", "users", "organizations", "routers", "providers"]

    We recommend to deploy two different Playground UIs, one accessible from a VPN with all pages enabled and one accessible from internet with only non admin pages enabled.

For more information, see configuration file documentation.

  • Do not use configuration file to declare models, prefer to use the API to declare models, by endpoints or on the Playground UI (see Models configuration).

    The configuration of models in the configuration file is only used for the initial bootstrap of the API. The model section of the configuration is ignored if any models are already registered in the database.

  • Add stable aliases to models so applications do not depend on provider-specific names.

  • Tune provider timeouts according to workload (timeout: 120 is a common production baseline for long generations).

  • If you run multiple providers for one model, choose a load-balancing strategy that matches your objective (shuffle for distribution, least_busy for latency under load).

  • Set a clear app_title for your deployment.
  • Align swagger_version with the deployed release tag to simplify incident and support workflows.

Example:

settings:
[...]
app_title: "Albert API"
swagger_version: "${RELEASE_TAG:-latest}"

For more information, see configuration file documentation.

  • Keep usage logging enabled with monitoring_postgres_enabled unless you explicitly accept losing usage history. Consider implementing a data retention policy for the usage table to manage database size. For more information, see usage monitoring documentation.
  • Keep Prometheus metrics enabled with monitoring_prometheus_enabled. For more information, see Prometheus documentation.
  • For error monitoring, configure the optional dependencies.sentry section (there is no separate monitoring_sentry_enabled setting). For more information, see Sentry documentation.

Example:

dependencies:
[...]
sentry:
dsn: ${SENTRY_DSN}
environment: production
settings:
[...]
monitoring_postgres_enabled: true
monitoring_prometheus_enabled: true
  • Use postgresql+asyncpg:// URLs.
  • Configure a connection pool (pool_size, max_overflow, pool_pre_ping) for stable throughput.
  • Set SQL statement timeout to avoid stalled requests.
  • Set application name to identify the application in PostgreSQL logs.
dependencies:
[...]
postgres:
url: "postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE}"
echo: false
pool_size: 5
max_overflow: 10
pool_pre_ping: true
connect_args:
server_settings:
application_name: "OpenGateLLM production"
statement_timeout: "60s"
  • Use authentication and network isolation.
  • Raise max_connections based on expected concurrency.
  • Enable timeout-related resilience parameters.
dependencies:
[...]
redis:
url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}"
max_connections: 200
socket_connect_timeout: 5
retry_on_timeout: true
health_check_interval: 30
decode_responses: false
socket_keepalive: true

For the Redis server itself, also apply standard production hardening:

  • Keep Redis up to date.
  • Enable protected mode.
  • Disable default users and create least-privilege users.
  • Configure dedicated log files.
  • Override index defaults (index_name, index_language, number_of_shards, number_of_replicas) for your workload.
  • Increase request timeout for large corpus search workloads.
  • Use an index name that reflects the embedding model.
  • If Elasticsearch is enabled, settings.vector_store_model must reference a configured model of type text-embeddings-inference.
dependencies:
elasticsearch:
index_name: "bge-m3-1024"
index_language: french
number_of_shards: 24
number_of_replicas: 0
hosts: "http://${ELASTIC_HOST}:${ELASTIC_PORT}"
basic_auth:
- "${ELASTIC_USER}"
- "${ELASTIC_PASSWORD}"
request_timeout: 120
retry_on_timeout: true
settings:
[...]
vector_store_model: "my-embeddings-model"
  • Use Redis as state manager mode. To activate Redis, you need to set the dependencies.redis section in the configuration file.

    Example:

    dependencies:
    [...]
    redis:
    url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}"
  • Increase the timeout of the OpenGateLLM API with playground_opengatellm_timeout settings parameter.

    Example:

    settings:
    [...]
    playground_opengatellm_timeout: 300