Model Routing & Fallbacks

Automatic failover between models.

OneRouter Smart Routing — Intelligent Optimization for Global AI API Access

OneRouter Smart Routing acts as an intelligent router between your application and global AI API providers. Its architecture consists of two key modules:

Global Health Monitoring Module This module continuously monitors the health status of AI API providers across different regions and time zones. By collecting real-time data on availability and performance, OneRouter maintains a global view of each provider’s stability and service quality.
Reinforcement Learning‑Driven Routing Engine Based on historical conversion data and the real‑time health metrics of various AI API providers, this engine evaluates multiple factors such as price, TPM (tokens per minute), RPM (requests per minute), and latency. Using reinforcement learning, it automatically generates an optimized candidate routing list every five minutes.

When Smart Routing is enabled, OneRouter dynamically directs each request to the most cost‑effective and stable model, ensuring users get the best balance between performance and budget.

By acting as a flexible, intelligent scheduling layer between clients and AI service providers, OneRouter can help businesses reduce operational costs by up to 90% while significantly improving overall performance and reliability.

How It Works

The Model Routing & Fallbacks feature lets you automatically try other models if the primary model’s providers are down, rate-limited, or refuse to reply due to content moderation.

fallback_models

The fallback_models parameter lets you automatically try other models if the primary model's providers are

URL Endpoint is down: eg. 400/500/504/503/508/524 error code.
Streaming conversations are lagging: Latency (E2E) suddenly increased abnormally, while TPM (Transactions Per Minute) unexpectedly dropped.
Refuse to reply due to content moderation
Validation errors: e.g. Invalid parameters input, context length validation errors

{
  "model": "gemini-2.5-flash",
  "fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
  "fallback_rules": "auto"  // default value is "auto"
  ... // Other params
}

model: The primary model.

fallback_models: The fallback model list.

fallback_rules: Rules for determining whether to trigger model fallback, default value is "auto".

If the fallback_rules parameter is set to "auto," "", or if the parameter isn't passed at all, OneRouter will automatically calculate baseline metrics based on your historical data and continuously make dynamic decisions about whether model fallback is needed.

fallback_rules

If you need more granular control over your model fallback switching strategy or want to create a strategy that better fits your business needs, you can explicitly specify the fallback_rules parameter in your input.

{
  "model": "gemini-2.5-flash",
  "fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
  "fallback_rules": {
    "Error_code": [400, 500, 504, 503, 508, 524],
    "Latency_threshold": 500,
    "TTFT_threshold": 1000,
    "TPM_threshold": 100,
    "RPM_threshold": 100
  }
}

Fallback Behavior

If the model you selected returns an error, OneRouter will try to use the fallback model instead.

By default, any error can trigger the use of a fallback model, including:

Context length validation errors
Moderation flags for filtered models
Rate-limiting
Downtime
Streaming conversations are lagging

If the fallback model is down or returns an error, OneRouter will return that error.

Pricing

Requests are priced using the model that was ultimately used, which will be returned in the model attribute of the response body.

Using with OpenAI SDK

To use the fallback_models and fallback_rules with the OpenAI SDK, include it in the extra_body parameter. In the example below, gemini-2.5-flash will be tried first, and the fallback_models array will be tried in order as fallbacks.

from openai import OpenAI

openai_client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key={{API_KEY}},
)

completion = openai_client.chat.completions.create(
    model="gemini-2.5-flash",
    extra_body={
        "fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
        "fallback_rules": "auto"
    },
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life?"
        }
    ]
)

print(completion.choices[0].message.content)

import OpenAI from 'openai';

const openrouterClient = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  // API key and headers
});

async function main() {
  // @ts-expect-error
  const completion = await openrouterClient.chat.completions.create({
    model: 'gemini-2.5-flash',
    fallback_models: ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
    fallback_rules: "auto",
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });
  console.log(completion.choices[0].message);
}

main();

PreviousProvider Routing & Fallbacks NextLatency and Performance

Last updated 2 days ago