Production recommendations

This guide provides practical defaults and hardening recommendations for running OpenGateLLM in production.

General recommendations

Use environment variables for sensitive data in configuration file, like:

settings:
  [...]
  auth_secret_key: ${AUTH_SECRET_KEY}
  session_secret_key: ${SESSION_SECRET_KEY}

Add GUNICORN_CMD_ARGS environment variable to the deployment configuration to configure the Gunicorn server. We recommend to use the following configuration:
Terminal window
```
GUNICORN_CMD_ARGS= --workers {{ workers }} --worker-connections 1000 --timeout 240 --keep-alive 75 --graceful-timeout 75
```
Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections. See Gunicorn documentation for more information.

Security and access control

Change default auth_secret_key settings parameter to a strong, random string with at least 32 characters.

The auth_secret_key is used to encrypt user API keys. If you change it, you need to regenerate all user API keys.

After you have launched the API for the first time, an administrator account is created with the default values (admin and changeme). This account is intended to let you create your first administrator account. We recommend deleting it after you have created your own admin account. On subsequent restarts, the bootstrap administrator account will not be created if an administrator account already exists.

Example:
```
settings:
  [...]
  auth_secret_key: "your-strong-secret-key"

  # delete this account after you have created your own admin account and clean your config file
  auth_bootsrap_admin_username: "admin@example.com"
  auth_bootsrap_admin_password: "your-strong-password"
```
Limit API key lifetime with auth_key_max_expiration_days settings parameter.

Example:
```
settings:
  [...]
  auth_key_max_expiration_days: 365 # days
```

Hide sensitive routes admin and auth from public API docs with hidden_routers settings parameter.

Example:

settings:
  [...]
  hidden_routers: ["admin", "auth"]

Moreover, we recommend to not expose the admin routes on internet (only accessible from a VPN). For example with Nginx configuration:

# block admin endpoints on the public api host
location ~ ^/v1/admin(/|$) {
  return 404;
}

# other routes are exposed to the public
location / {
  proxy_pass                                  http://<api_host>:<api_port>;
  [...]
}

Disable admin pages of the Playground UI with playground_disabled_pages settings parameter for internet exposed Playground UI.

Example:
```
settings:
  [...]
  playground_disabled_pages: ["roles", "users", "organizations", "routers", "providers"]
```
We recommend to deploy two different Playground UIs, one accessible from a VPN with all pages enabled and one accessible from internet with only non admin pages enabled.

For more information, see configuration file documentation.

Model configuration and routing

Do not use configuration file to declare models, prefer to use the API to declare models, by endpoints or on the Playground UI (see Models configuration).

The configuration of models in the configuration file is only used for the initial bootstrap of the API. The model section of the configuration is ignored if any models are already registered in the database.
Add stable aliases to models so applications do not depend on provider-specific names.
Tune provider timeouts according to workload (timeout: 120 is a common production baseline for long generations).
If you run multiple providers for one model, choose a load-balancing strategy that matches your objective (shuffle for distribution, least_busy for latency under load).

API metadata and versioning

Set a clear app_title for your deployment.
Align swagger_version with the deployed release tag to simplify incident and support workflows.

Example:

settings:
  [...]
  app_title: "Albert API"
  swagger_version: "${RELEASE_TAG:-latest}"

For more information, see configuration file documentation.

Observability

Keep usage logging enabled with monitoring_postgres_enabled unless you explicitly accept losing usage history. Consider implementing a data retention policy for the usage table to manage database size. For more information, see usage monitoring documentation.
Keep Prometheus metrics enabled with monitoring_prometheus_enabled. For more information, see Prometheus documentation.
For error monitoring, configure the optional dependencies.sentry section (there is no separate monitoring_sentry_enabled setting). For more information, see Sentry documentation.

Example:

dependencies:
  [...]
  sentry:
    dsn: ${SENTRY_DSN}
    environment: production

settings:
  [...]
  monitoring_postgres_enabled: true
  monitoring_prometheus_enabled: true

Dependencies

PostgreSQL

Use postgresql+asyncpg:// URLs.
Configure a connection pool (pool_size, max_overflow, pool_pre_ping) for stable throughput.
Set SQL statement timeout to avoid stalled requests.
Set application name to identify the application in PostgreSQL logs.

dependencies:
  [...]
  postgres:
    url: "postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE}"
    echo: false
    pool_size: 5
    max_overflow: 10
    pool_pre_ping: true
    connect_args:
      server_settings:
        application_name: "OpenGateLLM production"
        statement_timeout: "60s"

Redis

Use authentication and network isolation.
Raise max_connections based on expected concurrency.
Enable timeout-related resilience parameters.

dependencies:
  [...]
  redis:
    url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}"
    max_connections: 200
    socket_connect_timeout: 5
    retry_on_timeout: true
    health_check_interval: 30
    decode_responses: false
    socket_keepalive: true

For the Redis server itself, also apply standard production hardening:

Keep Redis up to date.
Enable protected mode.
Disable default users and create least-privilege users.
Configure dedicated log files.

Elasticsearch (optional vector store)

Override index defaults (index_name, index_language, number_of_shards, number_of_replicas) for your workload.
Increase request timeout for large corpus search workloads.
Use an index name that reflects the embedding model.
If Elasticsearch is enabled, settings.vector_store_model must reference a configured model of type text-embeddings-inference.

dependencies:

  elasticsearch:
    index_name: "bge-m3-1024"
    index_language: french
    number_of_shards: 24
    number_of_replicas: 0
    hosts: "http://${ELASTIC_HOST}:${ELASTIC_PORT}"
    basic_auth:
      - "${ELASTIC_USER}"
      - "${ELASTIC_PASSWORD}"
    request_timeout: 120
    retry_on_timeout: true

settings:
  [...]
  vector_store_model: "my-embeddings-model"

Playground UI

Use Redis as state manager mode. To activate Redis, you need to set the dependencies.redis section in the configuration file.

Example:
```
dependencies:
  [...]
  redis:
    url: "redis://${REDIS_USERNAME}:${REDIS_PASSWORD}@${REDIS_HOST}:${REDIS_PORT}"
```
Increase the timeout of the OpenGateLLM API with playground_opengatellm_timeout settings parameter.

Example:
```
settings:
  [...]
  playground_opengatellm_timeout: 300
```