Streaming
The OneRouter API allows streaming responses from any model. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.
To enable streaming, you can set the stream
parameter to true
in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.
Examples
Here is an example of how to stream a response, and process it:
import requests
import json
question = "How would you build the tallest building ever?"
url = completions"
headers = {
"Authorization": f"Bearer {{API_KEY}}",
"Content-Type": "application/json"
}
payload = {
"model": "{{MODEL}}",
"messages": [{"role": "user", "content": question}],
"stream": True
}
buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
buffer += chunk
while True:
try:
# Find the next complete SSE line
line_end = buffer.find('\n')
if line_end == -1:
break
line = buffer[:line_end].strip()
buffer = buffer[line_end + 1:]
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
try:
data_obj = json.loads(data)
content = data_obj["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
except json.JSONDecodeError:
pass
except Exception:
break
Additional Information
For SSE (Server-Sent Events) streams, OneRouter occasionally sends comments to prevent connection timeouts. These comments look like:
: ONEROUTER PROCESSING
Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.
Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify
the non-JSON payloads. We recommend the following clients:
Stream Cancellation
Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.
To implement stream cancellation:
import requests
from threading import Event, Thread
def stream_with_cancellation(prompt: str, cancel_event: Event):
with requests.Session() as session:
response = session.post(
"https://app.onerouter.pro/v1/chat/completions",
headers={"Authorization": f"Bearer {{API_KEY}}"},
json={"model": "{{MODEL}}", "messages": [{"role": "user", "content": prompt}], "stream": True},
stream=True
)
try:
for line in response.iter_lines():
if cancel_event.is_set():
response.close()
return
if line:
print(line.decode(), end="", flush=True)
finally:
response.close()
# Example usage:
cancel_event = Event()
stream_thread = Thread(target=lambda: stream_with_cancellation("Write a story", cancel_event))
stream_thread.start()
# To cancel the stream:
cancel_event.set()
Cancellation only works for streaming requests with supported providers. For non-streaming requests or unsupported providers, the model will continue processing and you will be billed for the complete response.
Last updated