Model Routing & Fallbacks
Automatic failover between models.
OneRouter Smart Routing — Intelligent Optimization for Global AI API Access
OneRouter Smart Routing acts as an intelligent router between your application and global AI API providers. Its architecture consists of two key modules:
Global Health Monitoring Module This module continuously monitors the health status of AI API providers across different regions and time zones. By collecting real-time data on availability and performance, OneRouter maintains a global view of each provider’s stability and service quality.
Reinforcement Learning‑Driven Routing Engine Based on historical conversion data and the real‑time health metrics of various AI API providers, this engine evaluates multiple factors such as price, TPM (tokens per minute), RPM (requests per minute), and latency. Using reinforcement learning, it automatically generates an optimized candidate routing list every five minutes.
When Smart Routing is enabled, OneRouter dynamically directs each request to the most cost‑effective and stable model, ensuring users get the best balance between performance and budget.
By acting as a flexible, intelligent scheduling layer between clients and AI service providers, OneRouter can help businesses reduce operational costs by up to 90% while significantly improving overall performance and reliability.

How It Works
The Model Routing & Fallbacks feature lets you automatically try other models if the primary model’s providers are down, rate-limited, or refuse to reply due to content moderation.
fallback_models
The fallback_models parameter lets you automatically try other models if the primary model's providers are
URL Endpoint is down: eg. 400/500/504/503/508/524 error code.
Streaming conversations are lagging: Latency (E2E) suddenly increased abnormally, while TPM (Transactions Per Minute) unexpectedly dropped.
Refuse to reply due to content moderation
Validation errors: e.g. Invalid parameters input, context length validation errors
{
"model": "gemini-2.5-flash",
"fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
"fallback_rules": "auto" // default value is "auto"
... // Other params
}If the fallback_rules parameter is set to "auto," "", or if the parameter isn't passed at all, OneRouter will automatically calculate baseline metrics based on your historical data and continuously make dynamic decisions about whether model fallback is needed.
fallback_rules
If you need more granular control over your model fallback switching strategy or want to create a strategy that better fits your business needs, you can explicitly specify the fallback_rules parameter in your input.
{
"model": "gemini-2.5-flash",
"fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
"fallback_rules": {
"Error_code": [400, 500, 504, 503, 508, 524],
"Latency_threshold": 500,
"TTFT_threshold": 1000,
"TPM_threshold": 100,
"RPM_threshold": 100
}
}Fallback Behavior
If the model you selected returns an error, OneRouter will try to use the fallback model instead.
By default, any error can trigger the use of a fallback model, including:
Context length validation errors
Moderation flags for filtered models
Rate-limiting
Downtime
Streaming conversations are lagging
If the fallback model is down or returns an error, OneRouter will return that error.
Pricing
Requests are priced using the model that was ultimately used, which will be returned in the model attribute of the response body.
Using with OpenAI SDK
To use the fallback_models and fallback_rules with the OpenAI SDK, include it in the extra_body parameter. In the example below, gemini-2.5-flash will be tried first, and the fallback_models array will be tried in order as fallbacks.
from openai import OpenAI
openai_client = OpenAI(
base_url="https://llm.onerouter.pro/v1",
api_key={{API_KEY}},
)
completion = openai_client.chat.completions.create(
model="gemini-2.5-flash",
extra_body={
"fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
"fallback_rules": "auto"
},
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)import OpenAI from 'openai';
const openrouterClient = new OpenAI({
baseURL: 'https://llm.onerouter.pro/v1',
// API key and headers
});
async function main() {
// @ts-expect-error
const completion = await openrouterClient.chat.completions.create({
model: 'gemini-2.5-flash',
fallback_models: ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
fallback_rules: "auto",
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
});
console.log(completion.choices[0].message);
}
main();Last updated