Universal Audio API

Unified API for speech-to-text-transcription models、speech-to-text-translation models、text-to-speech-creation models

API Overview

To simplify the integration of different speech-to-text-transcription models (stt)speech-to-text-translation models (stt)text-to-speech-creation models (tts), OneRouter provides a unified image API.

API Specification

text-to-speech-creation models (tts)

Generates audio from the input text.

curl https://audio.onerouter.pro/v1/audio/speech \
    -H "Content-Type: application/json" \
    -H "Authorization: <API_KEY>" \
    -d '{
    "model": "gpt-4o-mini-tts",
    "input": "A cute baby sea otter",
    "voice": "alloy"
  }' \
  --output speech.mp3
  • <API_KEY> is your API Key generated in API page.

  • model is the model name, such as gpt-4o-mini-tts, available model list can be access in Model page.

  • The voice to use when generating the audio. Supported voices are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse.

Example response

The audio file content.

speech-to-text-translation models (stt)

Translates audio into English.

curl https://audio.onerouter.pro/v1/audio/translations \
    -H "Content-Type: multipart/form-data" \
    -H "Authorization: <API_KEY>" \
    --form 'file=@/path/to/file/speech.m4a' \
    --form 'model="whisper-1"'
  • <API_KEY> is your API Key generated in API page.

  • model is the model name, such as whisper-1, available model list can be access in Model page.

  • file is the audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Example response

{
  "text": "Hello, my name is Wolfgang and I come from Germany. Where are you heading today?"
}

speech-to-text-transcription models (stt)

Transcribes audio into the input language.

curl https://audio.onerouter.pro/v1/speech/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -H "Authorization: <API_KEY>" \
    --form 'file=@/path/to/file/speech.m4a' \
    --form 'model="whisper-1"
  • <API_KEY> is your API Key generated in API page.

  • model is the model name, such as whisper-1, available model list can be access in Model page.

  • file is the audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Example response


{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that.",
  "usage": {
    "type": "tokens",
    "input_tokens": 14,
    "input_token_details": {
      "text_tokens": 0,
      "audio_tokens": 14
    },
    "output_tokens": 45,
    "total_tokens": 59
  }
}

Last updated