TRANSCRIPTION API

Video Transcription API
40+ Languages, Word-Level Timestamps

V100's video transcription API converts spoken audio to text with word-level timestamps, speaker diarization, and per-word confidence scores. It supports 40+ languages, works in real-time on live meetings or asynchronously on uploaded files, and exports transcripts as SRT, VTT, or structured JSON. Unlike standalone speech-to-text services, V100's transcription is integrated into a full video platform: edit the transcript text and the video edits itself.