Provider Routing
Route requests to the best provider
OneRouter routes requests to the best available providers for your model.
You can customize how your requests are routed using the provider object in the request body for LLM and generative Model.
The provider object can contain the following fields:
allow_fallbacks
boolean
true
Whether to allow backup providers when the primary is unavailable.
sort
string
-
Sort providers by price or throughput. (e.g. "price" or "throughput").
Uptime-Based Load Balancing (Default Strategy)
By default, requests are load balanced across the top providers to maximize uptime.
Price-Based Load Balancing
For each model in your request, OneRouter can load balance requests across providers, prioritizing price.
If you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.
Here is OneRouter's default load balancing strategy:
Prioritize providers that have not seen significant outages in the last 30 seconds.
For the stable providers, look at the lowest-cost candidates and select one weighted by inverse square of the price (example below).
Use the remaining providers as fallbacks.
If you have sort set in your provider preferences, load balancing will be disabled.
To always prioritize low prices, and not apply any load balancing, set sort to "price".
from openai import OpenAI
client = OpenAI(
base_url="https://llm.onerouter.pro/v1",
api_key="<API_KEY>",
)
completion = client.chat.completions.create(
model="claude-3-5-sonnet@20240620",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
],
'provider': {
'sort': 'price'
}
)
print(completion.choices[0].message.content)import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://llm.onerouter.pro/v1',
apiKey: '<API_KEY>',
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'claude-3-5-sonnet@20240620',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
'provider': {
'sort': 'price'
}
});
console.log(completion.choices[0].message);
}
main();To always prioritize low latency, and not apply any load balancing, set sort to "throughput".
from openai import OpenAI
client = OpenAI(
base_url="https://llm.onerouter.pro/v1",
api_key="<API_KEY>",
)
completion = client.chat.completions.create(
model="claude-3-5-sonnet@20240620",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
],
'provider': {
'sort': 'throughput'
}
)
print(completion.choices[0].message.content)import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://llm.onerouter.pro/v1',
apiKey: '<API_KEY>',
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'claude-3-5-sonnet@20240620',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
'provider': {
'sort': 'throughput'
}
});
console.log(completion.choices[0].message);
}
main();Disabling Fallbacks
To guarantee that your request is only served by the first-tried provider, you can disable fallbacks.
from openai import OpenAI
client = OpenAI(
base_url="https://llm.onerouter.pro/v1",
api_key="<API_KEY>",
)
completion = client.chat.completions.create(
model="claude-3-5-sonnet@20240620",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
],
'provider': {
'allow_fallbacks': false
}
)
print(completion.choices[0].message.content)import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://llm.onerouter.pro/v1',
apiKey: '<API_KEY>',
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'claude-3-5-sonnet@20240620',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
'provider': {
'allow_fallbacks': false
}
});
console.log(completion.choices[0].message);
}
main();Last updated