Speech to Text | Venice API Docs

curl --request POST \ --url https://api.venice.ai/api/v1/audio/transcriptions \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: multipart/form-data' \ --form file='@example-file' \ --form model=openai/whisper-large-v3 \ --form response_format=json \ --form timestamps=false

{ "text": "<string>", "duration": 123, "timestamps": { "word": [ { "word": "<string>", "start": 123, "end": 123 } ], "segment": [ { "text": "<string>", "start": 123, "end": 123 } ], "char": [ { "char": "<string>", "start": 123, "end": 123 } ] } }

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data

Request to transcribe audio to text.

file

The audio file object (not a base64 string). Supported formats: WAV, WAVE, FLAC, M4A, AAC, MP4, MP3, OGG, OGA, WEBM.

model

enum<string>

default:nvidia/parakeet-tdt-0.6b-v3

The model to use for transcription. See https://docs.venice.ai/models/overview for more information.

Available options:

nvidia/parakeet-tdt-0.6b-v3,

openai/whisper-large-v3,

fal-ai/wizper,

elevenlabs/scribe-v2,

stt-xai-v1

Example:

"openai/whisper-large-v3"

response_format

enum<string>

default:json

The format of the transcript output, in one of these options: json, text.

Available options:

json,

text

Example:

"json"

timestamps

boolean

default:false

Whether to include timestamps in the response.

Example:

false

language

string

ISO 639-1 language code (e.g., "en", "es", "fr"). Optional - if not provided, the model will auto-detect the language. Note: Only supported by certain models (e.g., Whisper). Ignored by models that do not support language hints.

Example:

"en"

Response

Transcription completed successfully

Transcription response

text

string

required

The transcribed text

duration

number

Duration of the audio in seconds

timestamps

object

Timestamps for the transcription (only if timestamps=true)

Show child attributes