Skip to content

User budget

OpenGateLLM allows you to define the costs for each model router. For more information about model routers, see setup your models documentation. Then it attach a budget to each user to limit the usage of amount of requests made by the user. The compute cost is calculated based on the number of tokens used and the budget defined for the model based on the following formula:

cost = round((prompt_tokens / 1000000 * router.costs.prompt_tokens) + (completion_tokens / 1000000 * router.costs.completion_tokens), ndigits=6)

The compute cost returned in the response, in the usage.cost field. After the request is processed, the budget amount of the user is updated by the hooks decorator attached to each endpoint. The request cost is stored in the usage table, see usage monitoring documentation for more information.

There are three ways to configure model pricing used for budget computation: Playground UI, API, or configuration file.

To define pricing in the Playground, go to the Provider page and create or edit a provider with:

  • Prompt token cost: Cost per million input tokens.
  • Completion token cost: Cost per million output tokens.

Each user has a budget defined by create user endpoint or update user endpoint. The budget is defined in the budget field. You need has admin permission to create or update a user.

See POST and PATCH /v1/admin/users endpoints for more information on API reference.

The user can see each request cost in the response of the API request. The cost is returned in the usage.cost field. Moreover, Usage page in the Playground allows the user to see the history of the requests made by him and the cost of each.

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "my-language-model",
"choices": [
...
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30,
"cost": 0.000015,
"carbon": {"kWh": 0.0001456, "kgCO2eq": 0.0000672 }
}
}