Prometheus
OpenGateLLM can expose metrics in Prometheus format for monitoring and observability. This allows you to track API performance, usage patterns, and system health in real-time.
When Prometheus monitoring is enabled, OpenGateLLM exposes metrics at the /metrics endpoint in Prometheus format. These metrics can be scraped by a Prometheus server for visualization in tools like Grafana.
Metrics
Section titled “Metrics”Once enabled, you can access the metrics endpoint at:
http://localhost:8000/metricsThis endpoint returns metrics in Prometheus text-based exposition format, which can be scraped by your Prometheus server.
List of metrics
Section titled “List of metrics”All default metrics of prometheus-fastapi-instrumentator are available, see their README for more information.
These metrics are prefixed by the namespace ogl_.
In addition, OpenGateLLM exposes the following metrics for inference:
| Metric | Type | Description |
|---|---|---|
ogl_inference_requests_total | Counter | Total number of LLM requests (endpoint, model, status_code). |
ogl_inference_requests_duration_seconds | Histogram | Duration of LLM requests in seconds (endpoint, model, status_code). |
ogl_inference_ttft_milliseconds | Histogram | Time to first token for streaming responses in milliseconds (endpoint, model, status_code). |
ogl_inference_output_tokens_per_second | Histogram | Output generation speed in tokens/second (endpoint, model). |
ogl_inference_tokens_total | Counter | Total number of consumed tokens with `type=prompt |
Grafana dashboard
Section titled “Grafana dashboard”Work in progress, check our roadmap for more information.
Configuration
Section titled “Configuration”To enable Prometheus metrics exposure, you need to configure the monitoring setting in the settings section of your config.yml file.
Check Settings section in configuration file documentation for more information.
Example:
settings: [...] monitoring_prometheus_enabled: trueBy default, Prometheus monitoring is enabled. Set it to false to disable the /metrics endpoint.