Model Routing
Dynamically route requests to models
Using the fallback_models & fallback_rules parameter
fallback_models & fallback_rules parameterfallback_models
The fallback_models parameter lets you automatically try other models if the primary model's providers are
URL Endpoint is down: eg. 400/500/504/503/508/524 error code.
Streaming conversations are lagging: Latency (E2E) suddenly increased abnormally, while TPM (Transactions Per Minute) unexpectedly dropped.
Refuse to reply due to content moderation
Validation errors: e.g. Invalid parameters input, context length validation errors
{
"model": "gemini-2.5-pro",
"fallback_models": ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
"fallback_rules": "auto" // default value is "auto"
... // Other params
}If the fallback_rules parameter is set to "auto," "", or if the parameter isn't passed at all, OneRouter will automatically calculate baseline metrics based on your historical data and continuously make dynamic decisions about whether model fallback is needed.
fallback_rules
If you need more granular control over your model fallback switching strategy or want to create a strategy that better fits your business needs, you can explicitly specify the fallback_rules parameter in your input.
{
"model": "gemini-2.5-pro",
"fallback_models": ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
"fallback_rules": {
"error_code": {
"hint_array": [400, 500, 504, 503, 508, 524],
"action": "fallback"
},
"Latency": {
"hint_threshold": 500,
"action": "fallback"
},
"TTFT": {
"hint_threshold": 1000,
"action": "fallback"
},
"TPM": {
"hint_threshold": 100,
"action": "fallback"
},
"RPM": {
"hint_threshold": 100,
"action": "fallback"
}
}
}Processing pipeline
If the primary model you selected returns an error (endpoint down, or refuse to reply due to content moderation), streaming conversations are lagging, or context length validation errors. OneRouter will try to use the fallback_models instead.
If all the fallback_models are down or returns errors, OneRouter will return that error.
Requests are priced using the model that was ultimately used, which will be returned in the model attribute of the response body.
Using with OpenAI SDK
To use the fallback_models and fallback_rules with the OpenAI SDK, include it in the extra_body parameter. In the example below, gemini-2.5-pro will be tried first, and the fallback_models array will be tried in order as fallbacks.
from openai import OpenAI
openai_client = OpenAI(
base_url="https://llm.onerouter.pro/v1",
api_key={{API_KEY}},
)
completion = openai_client.chat.completions.create(
model="gemini-2.5-pro",
extra_body={
"fallback_models": ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
},
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)import OpenAI from 'openai';
const openrouterClient = new OpenAI({
baseURL: 'https://llm.onerouter.pro/v1',
// API key and headers
});
async function main() {
// @ts-expect-error
const completion = await openrouterClient.chat.completions.create({
model: 'gemini-2.5-pro',
fallback_models: ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
});
console.log(completion.choices[0].message);
}
main();Last updated