Provider Routing

Route requests to the best provider

OneRouter routes requests to the best available providers for your model.

You can customize how your requests are routed using the provider object in the request body for LLM and generative Model.

The provider object can contain the following fields:

Field
Type
Default
Description

allow_fallbacks

boolean

true

Whether to allow backup providers when the primary is unavailable.

sort

string

-

Sort providers by price or throughput. (e.g. "price" or "throughput").

Uptime-Based Load Balancing (Default Strategy)

By default, requests are load balanced across the top providers to maximize uptime.

Price-Based Load Balancing

For each model in your request, OneRouter can load balance requests across providers, prioritizing price.

If you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.

Here is OneRouter's default load balancing strategy:

  1. Prioritize providers that have not seen significant outages in the last 30 seconds.

  2. For the stable providers, look at the lowest-cost candidates and select one weighted by inverse square of the price (example below).

  3. Use the remaining providers as fallbacks.

A Load Balancing Example

If Provider A costs $1 per million tokens, Provider B costs $2, and Provider C costs $3, and Provider B recently saw a few outages.

  • Your request is routed to Provider A. Provider A is 9x more likely to be first routed to Provider A than Provider C.

  • If Provider A fails, then Provider C will be tried next.

  • If Provider C also fails, Provider B will be tried last.

If you have sort set in your provider preferences, load balancing will be disabled.

To always prioritize low prices, and not apply any load balancing, set sort to "price".

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="claude-3-5-sonnet@20240620",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'sort': 'price'
  }
)

print(completion.choices[0].message.content)

To always prioritize low latency, and not apply any load balancing, set sort to "throughput".

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="claude-3-5-sonnet@20240620",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'sort': 'throughput'
  }
)

print(completion.choices[0].message.content)

Disabling Fallbacks

To guarantee that your request is only served by the first-tried provider, you can disable fallbacks.

from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="claude-3-5-sonnet@20240620",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  'provider': {
      'allow_fallbacks': false
  }
)

print(completion.choices[0].message.content)

Last updated