Reasoning Tokens

For models that support it, the OneRouter API can return Reasoning Tokens, also known as thinking tokens. OneRouter normalizes the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified interface across different providers.

Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.

Reasoning tokens are included in the response by default if the model decides to output them. Reasoning tokens will appear in the reasoning field of each message.

Controlling Reasoning Tokens in OpenAI Chat Completions

You can control reasoning tokens in your requests using the reasoning parameter:

{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    // One of the following (not both):
    "effort": "high", // Can be "xhigh", "high", "medium", "low", "minimal" or "none"
    "max_tokens": 2000, // Specific token limit
  }
}

The reasoning config object consolidates settings for controlling reasoning strength across different models.

The effort can be one of below list:

  • "effort": "xhigh" - Allocates the largest portion of tokens for reasoning (approximately 95% of max_tokens)

  • "effort": "high" - Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)

  • "effort": "medium" - Allocates a moderate portion of tokens (approximately 50% of max_tokens)

  • "effort": "low" - Allocates a smaller portion of tokens (approximately 20% of max_tokens)

  • "effort": "minimal" - Allocates an even smaller portion of tokens (approximately 10% of max_tokens)

  • "effort": "none" - Disables reasoning entirely

For models that only support reasoning.max_tokens, the effort level will be set based on the percentages above.

Examples

Basic Usage with Reasoning Tokens

Using Max Tokens for Reasoning

You can specify the exact number of tokens to use for reasoning:

Streaming mode with reasoning tokens

Responses API Shape

When reasoning models generate responses, the reasoning information is structured in a standardized format through the reasoning_content item.

Last updated