Inference monitoring
Request usage logging
Section titled “Request usage logging”OpenGateLLM tracks inference activity by storing usage data for each API request. This monitoring helps you analyze model usage over time, identify consumption patterns, and support reporting needs.
Usage monitoring is backed by PostgreSQL and can be enabled through the configuration file. Once activated, requests are recorded in the usage table and can be explored from the Playground Usage page or queried directly from the database.
The logs contain the following information:
- user ID
- router ID
- provider ID
- number of input tokens
- number of output tokens
- environmental footprint (see the dedicated documentation here)
- cost (see the dedicated documentation here)
- duration
- timestamp
Sensitive information such as the prompt or response content is not included in the logs.
Configuration
Section titled “Configuration”To logs requests for usage monitoring, set monitoring_postgres_enabled to true in settings (enabled by default).
settings: [...] monitoring_postgres_enabled: trueModel health monitoring
Section titled “Model health monitoring”OpenGateLLM provides a health check endpoint to monitor the health of the models. This endpoint is available at /health/models and returns a JSON response with the health status of the models.
{ "data": [ { "id": "model_name", "status": "green" | "yellow" | "red" } ]}The health status is calculated based on the Little’s law indicator. The indicator is calculated as the ratio of the current inflight requests to the expected inflight requests. The expected inflight requests are calculated as the mean latency multiplied by the request per millisecond.
The health status is:
| Indicator | Status |
|---|---|
| < 0.8 | green |
| 0.8 - 1.1 | yellow |
| > 1.1 | red |
The health status is calculated for each model. The model health status is the worst status of the providers of the model.