Model Routing

Dynamically route requests to models

Using the fallback_models & fallback_rules parameter

fallback_models

The fallback_models parameter lets you automatically try other models if the primary model's providers are

  • URL Endpoint is down: eg. 400/500/504/503/508/524 error code.

  • Streaming conversations are lagging: Latency (E2E) suddenly increased abnormally, while TPM (Transactions Per Minute) unexpectedly dropped.

  • Refuse to reply due to content moderation

  • Validation errors: e.g. Invalid parameters input, context length validation errors

{
  "model": "gemini-2.5-pro",
  "fallback_models": ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
  "fallback_rules": "auto"  // default value is "auto"
  ... // Other params
}

model: The primary model.

fallback_models: The fallback model list.

fallback_rules: Rules for determining whether to trigger model fallback, default value is "auto".

If the fallback_rules parameter is set to "auto," "", or if the parameter isn't passed at all, OneRouter will automatically calculate baseline metrics based on your historical data and continuously make dynamic decisions about whether model fallback is needed.

fallback_rules

If you need more granular control over your model fallback switching strategy or want to create a strategy that better fits your business needs, you can explicitly specify the fallback_rules parameter in your input.

{
  "model": "gemini-2.5-pro",
  "fallback_models": ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
  "fallback_rules": {
    "error_code": {
      "hint_array": [400, 500, 504, 503, 508, 524],
      "action": "fallback"
    },
    "Latency": {
      "hint_threshold": 500,
      "action": "fallback"
    },
    "TTFT": {
      "hint_threshold": 1000,
      "action": "fallback"
    },
    "TPM": {
      "hint_threshold": 100,
      "action": "fallback"
    },
    "RPM": {
      "hint_threshold": 100,
      "action": "fallback"
    }
  }
}

Processing pipeline

If the primary model you selected returns an error (endpoint down, or refuse to reply due to content moderation), streaming conversations are lagging, or context length validation errors. OneRouter will try to use the fallback_models instead.

If all the fallback_models are down or returns errors, OneRouter will return that error.

Requests are priced using the model that was ultimately used, which will be returned in the model attribute of the response body.

Using with OpenAI SDK

To use the fallback_models and fallback_rules with the OpenAI SDK, include it in the extra_body parameter. In the example below, gemini-2.5-pro will be tried first, and the fallback_models array will be tried in order as fallbacks.

from openai import OpenAI

openai_client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key={{API_KEY}},
)

completion = openai_client.chat.completions.create(
    model="gemini-2.5-pro",
    extra_body={
        "fallback_models": ["claude-3-5-sonnet@20240620", "gpt-5-chat"],
    },
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life?"
        }
    ]
)

print(completion.choices[0].message.content)

Last updated