Skip to content

Prometheus

OpenGateLLM can expose metrics in Prometheus format for monitoring and observability. This allows you to track API performance, usage patterns, and system health in real-time.

When Prometheus monitoring is enabled, OpenGateLLM exposes metrics at the /metrics endpoint in Prometheus format. These metrics can be scraped by a Prometheus server for visualization in tools like Grafana.

Once enabled, you can access the metrics endpoint at:

http://localhost:8000/metrics

This endpoint returns metrics in Prometheus text-based exposition format, which can be scraped by your Prometheus server.

All default metrics of prometheus-fastapi-instrumentator are available, see their README for more information. These metrics are prefixed by the namespace ogl_.

In addition, OpenGateLLM exposes the following metrics for inference:

MetricTypeDescription
ogl_inference_requests_totalCounterTotal number of LLM requests (endpoint, model, status_code).
ogl_inference_requests_duration_secondsHistogramDuration of LLM requests in seconds (endpoint, model, status_code).
ogl_inference_ttft_millisecondsHistogramTime to first token for streaming responses in milliseconds (endpoint, model, status_code).
ogl_inference_output_tokens_per_secondHistogramOutput generation speed in tokens/second (endpoint, model).
ogl_inference_tokens_totalCounterTotal number of consumed tokens with `type=prompt

Work in progress, check our roadmap for more information.

To enable Prometheus metrics exposure, you need to configure the monitoring setting in the settings section of your config.yml file.
Check Settings section in configuration file documentation for more information.

Example:

settings:
[...]
monitoring_prometheus_enabled: true

By default, Prometheus monitoring is enabled. Set it to false to disable the /metrics endpoint.

Configuration file documentation