Batch Processing API
What's Batch Processing
Batch processing is a powerful approach for handling large volumes of requests efficiently. Instead of processing requests one at a time with immediate responses, batch processing allows you to submit multiple requests together for asynchronous processing. This pattern is particularly useful when:
You need to process large volumes of data
Immediate responses are not required
You want to optimize for cost efficiency
You're running large-scale evaluations or analyses
Batch processing (batching) allows you to send multiple message requests in a single batch and retrieve the results later (within up to 24 hour). The main goals are to reduce costs by up to 50% and increase throughput for analytical or offline workloads.
How to use the Batches API
A Batch is composed of a list of requests. The shape of an individual request is comprised of:
A unique
custom_idfor identifying the Messages requestA
paramsobject with the standard Messages API parameters
You can create a batch by passing this list into the requests parameter:
Create a message batch
Create a batch of messages for asynchronous processing. All usage is charged at 50% of the standard API prices.
import requests
import json
headers = {
"Authorization": "Bearer <<API_KEY>>",
"Content-Type": "application/json"
}
data = {
"requests": [
{
"custom_id": "my-request-01",
"params": {
"model": "gpt-4o-mini-batch",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "How to learn nestjs?"
}
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
},
"stop_sequences": [
"text"
],
"system": "text",
"temperature": 1,
"tool_choice": null,
"tools": [],
"top_k": 1,
"top_p": 1,
"thinking": {
"budget_tokens": 1024,
"type": "enabled"
}
}
},
{
"custom_id": "my-request-02",
"params": {
"model": "gpt-4o-mini-batch",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "How to learn Reactjs?"
}
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
},
"stop_sequences": [
"text"
],
"system": "text",
"temperature": 1,
"tool_choice": null,
"tools": [],
"top_k": 1,
"top_p": 1,
"thinking": {
"budget_tokens": 1024,
"type": "enabled"
}
}
},
{
"custom_id": "my-request-03",
"params": {
"model": "gpt-4o-mini-batch",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "How to learn Nextjs?"
}
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
},
"stop_sequences": [
"text"
],
"system": "text",
"temperature": 1,
"tool_choice": null,
"tools": [],
"top_k": 1,
"top_p": 1,
"thinking": {
"budget_tokens": 1024,
"type": "enabled"
}
}
}
]
}
response = requests.post("https://llm.onerouter.pro/v1/batches", headers=headers, data=json.dumps(data))
data = response.json()
print("Batch created:", json.dumps(data, indent=2, ensure_ascii=False))In this example, three separate requests are batched together for asynchronous processing. Each request has a unique custom_id and contains the standard parameters you'd use for a Messages API call.
Get status or results of a specific message batch
Get batch status if in progress, or stream results if completed in JSONL format.
Cancel a specific batch
You can cancel a Batch that is currently processing using the cancel endpoint. Immediately after cancellation, a batch's processing_status will be canceling. Canceled batches end up with a status of ended and may contain partial results for requests that were processed before cancellation.
Last updated