Transcribe audio
Convert speech to text with the flux-voice model through FluxRouter.
FluxRouter exposes speech-to-text through the flux-voice model alias. You send an audio file and get a transcript back, using the same API key and base URL (https://api.fluxrouter.ai/v1) as your text requests. The request is metered and billed like any other Flux call.
This page covers transcribing an audio file. It uses the OpenAI-compatible /v1/audio/transcriptions endpoint, so if you already call OpenAI-style transcription, pointing at FluxRouter is a base-URL-and-key change.
The model id
Use flux-voice as the model. Where the text tier aliases (flux-fast, flux-standard, flux-reasoning) route to text models, flux-voice routes to a speech-to-text model behind the same key and endpoint.
flux-voice automatically balances speed and accuracy per clip. Two explicit variants are available when you want to pin one:
flux-voice— the default. Picks the right engine for each clip automatically. Omitmodelentirely and you get this.flux-voice-accurate— favor transcription accuracy.flux-voice-fast— favor latency.
You can confirm the alias is live by listing models:
curl https://api.fluxrouter.ai/v1/models \
-H "Authorization: Bearer $FLUX_API_KEY"
GET /v1/models returns every flux-* id you can use. See Models for the catalog.
Transcribe a file
Because FluxRouter is OpenAI-compatible, transcription uses the OpenAI audio request shape: a model and a file, sent as multipart/form-data. The OpenAI SDK sends this through its audio.transcriptions.create method.
from openai import OpenAI
client = OpenAI(
api_key="sk-...", # your Flux key
base_url="https://api.fluxrouter.ai/v1", # the one line you change
)
with open("meeting.m4a", "rb") as f:
result = client.audio.transcriptions.create(
model="flux-voice",
file=f,
)
print(result.text)
curl
curl https://api.fluxrouter.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $FLUX_API_KEY" \
-F "file=@meeting.m4a" \
-F "model=flux-voice"
Request fields
file is required; everything else is optional.
| Field | Notes |
|---|---|
file | The audio. Accepts wav, mp3, mp4/m4a, mpeg/mpga, ogg, webm, and flac. Up to 8 MB. |
model | flux-voice (default), flux-voice-accurate, or flux-voice-fast. |
language | ISO-639-1 code (for example en). Omit to auto-detect. |
prompt | A short hint of names, jargon, or spellings to improve accuracy. |
response_format | json (default), verbose_json, or text. |
timestamp_granularities[] | word and/or segment. Request word for word-level timestamps. |
temperature | Sampling temperature. |
Streaming (stream: true) and the subtitle formats (srt, vtt) are not supported and return a 400.
Response formats
json(default) —{ "text": "..." }.text— the transcript as a plain string.verbose_json—textpluslanguage,duration, andsegments. Addtimestamp_granularities[]=wordto also get per-word timestamps inwords— useful for cursor positioning and word-level editing in dictation UIs.
curl https://api.fluxrouter.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $FLUX_API_KEY" \
-F "file=@note.ogg" \
-F "model=flux-voice" \
-F "response_format=verbose_json" \
-F "timestamp_granularities[]=word"
Every successful response also carries transparency headers: x-flux-routed-model (the variant that served), x-flux-billed-seconds, and x-flux-cost-usd.
Paid plan required
Audio transcription is available on paid plans. A key without a paid plan receives a 402 with code premium_locked and no transcription. If you are building a client, handle that status distinctly — prompt the user to upgrade rather than treating it as a generic error. (A 401 means the key itself is missing or invalid; a 402 means a valid key without a paid plan.)
Limits
- File size: up to 8 MB per request. Larger uploads return
413. - Rate: transcription is rate-limited per account. A burst over the limit returns
429; back off and retry.
Billing
Transcription is billed at $0.10 per audio-minute, charged per second of audio with a 10-second minimum per request, and rolls up to the same single bill as the rest of your Flux usage. The price is the same whichever variant serves. See Routing and pricing for how billing works.
Notes
- Use
flux-voicefor transcription; do not send audio to the text aliases. - The default
flux-voicenever transcribes worse than the accurate variant — it only chooses when to favor speed. Pinflux-voice-accurateorflux-voice-fastonly when you have a specific reason. - For recorded-in-browser audio,
ogg/opusis a good container choice and keeps the automatic speed/accuracy selection working.