Overview
An overview of OneRouter's API
OneRouter's request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OneRouter normalizes the schema across models and providers so you only need to learn one.
Requests
Completions Request Format
Here is the request schema as a TypeScript type. This will be the body of your POST
request to the /v1/chat/completions
endpoint (see the quick start above for an example).
For a complete list of parameters, see the Parameters.
Headers
Assistant Prefill
OneRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.
To use this features, simply include a message with role: "assistant"
at the end of your messages
array.
fetch('https://app.onerouter.pro/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <API_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{ role: 'user', content: 'What is the meaning of life?' },
{ role: 'assistant', content: "I'm not sure, but my best guess is" },
],
}),
});
Responses
CompletionsResponse Format
OneRouter normalizes the schema across models and providers to comply with the OpenAI Chat API.
This means that choices
is always an array, even if the model only returns one completion. Each choice will contain a delta
property if a stream was requested and a message
property otherwise. This makes it easier to use the same code for all models.
Here's the response schema as a TypeScript type:
// Definitions of subtypes are below
type Response = {
id: string;
// Depending on whether you set "stream" to "true" and
// whether you passed in "messages" or a "prompt", you
// will get a different output shape
choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
created: number; // Unix timestamp
model: string;
object: 'chat.completion' | 'chat.completion.chunk';
system_fingerprint?: string; // Only present if the provider supports it
// Usage data is always returned for non-streaming.
// When streaming, you will get one usage object at
// the end accompanied by an empty choices array.
usage?: ResponseUsage;
};
// If the provider returns usage, we pass it down
// as-is. Otherwise, we count using the GPT-4 tokenizer.
type ResponseUsage = {
/** Including images and tools if any */
prompt_tokens: number;
/** The tokens generated */
completion_tokens: number;
/** Sum of the above two fields */
total_tokens: number;
};
// Subtypes:
type NonChatChoice = {
finish_reason: string | null;
text: string;
error?: ErrorResponse;
};
type NonStreamingChoice = {
finish_reason: string | null;
native_finish_reason: string | null;
message: {
content: string | null;
role: string;
tool_calls?: ToolCall[];
};
error?: ErrorResponse;
};
type StreamingChoice = {
finish_reason: string | null;
native_finish_reason: string | null;
delta: {
content: string | null;
role?: string;
tool_calls?: ToolCall[];
};
error?: ErrorResponse;
};
type ErrorResponse = {
code: number; // See "Error Handling" section
message: string;
metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
};
type ToolCall = {
id: string;
type: 'function';
function: FunctionCall;
};
Here's an example:
{
"id": "xxx-xxxxxxxxxxxxxx",
"choices": [
{
"finish_reason": "stop", // Normalized finish_reason
"native_finish_reason": "stop", // The raw finish_reason from the provider
"message": {
// will be "delta" if streaming
"role": "assistant",
"content": "Hello there!"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 4,
"total_tokens": 4
},
"model": "openai/gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
}
Finish Reason
OneRouter normalizes each model's finish_reason
to one of the following values: tool_calls
, stop
, length
, content_filter
, error
.
Querying Cost and Stats
The token counts that are returned in the completions API response are not counted via the model's native tokenizer. Instead it uses a normalized, model-agnostic count (accomplished via the GPT4o tokenizer). This is because some providers do not reliably return native token counts. This behavior is becoming more rare, however, and we may add native token counts to the response object in the future.
Credit usage and model pricing are based on the native token counts (not the 'normalized' token counts returned in the API response).
Note that token counts are also available in the usage
field of the response body for non-streaming completions.
Last updated