Prometheus

OpenGateLLM can expose metrics in Prometheus format for monitoring and observability. This allows you to track API performance, usage patterns, and system health in real-time.

When Prometheus monitoring is enabled, OpenGateLLM exposes metrics at the /metrics endpoint in Prometheus format. These metrics can be scraped by a Prometheus server for visualization in tools like Grafana.

Metrics

Once enabled, you can access the metrics endpoint at:

http://localhost:8000/metrics

This endpoint returns metrics in Prometheus text-based exposition format, which can be scraped by your Prometheus server.

List of metrics

All default metrics of prometheus-fastapi-instrumentator are available, see their README for more information. These metrics are prefixed by the namespace ogl_.

In addition, OpenGateLLM exposes the following metrics for inference:

Metric	Type	Description
`ogl_inference_requests_total`	Counter	Total number of LLM requests (`endpoint`, `model`, `status_code`).
`ogl_inference_requests_duration_seconds`	Histogram	Duration of LLM requests in seconds (`endpoint`, `model`, `status_code`).
`ogl_inference_ttft_milliseconds`	Histogram	Time to first token for streaming responses in milliseconds (`endpoint`, `model`, `status_code`).
`ogl_inference_output_tokens_per_second`	Histogram	Output generation speed in tokens/second (`endpoint`, `model`).
`ogl_inference_tokens_total`	Counter	Total number of consumed tokens with `type=prompt

Grafana dashboard

Work in progress, check our roadmap for more information.

Configuration

To enable Prometheus metrics exposure, you need to configure the monitoring setting in the settings section of your config.yml file.
Check Settings section in configuration file documentation for more information.

Example:

settings:
    [...]
    monitoring_prometheus_enabled: true

By default, Prometheus monitoring is enabled. Set it to false to disable the /metrics endpoint.

Configuration file documentation