No separate services for transcription, captioning, and editing. V100 handles the full pipeline.
Word-level timestamps with 97%+ accuracy across 20 languages. Speaker diarization identifies up to 12 speakers. Returns JSON, SRT, VTT, or plain text.
POST /v1/transcribe
{ "source": "s3://bucket/video.mp4",
"language": "auto",
"diarize": true,
"format": "json" }
Dual-mode detection: waveform energy analysis for hard silence, plus transcript alignment for filler word gaps. Configurable threshold from 0.3s to 5s.
POST /v1/editor/edit
{ "source": "https://example.com/podcast.mp4",
"instructions": "Remove silence longer than 0.5 seconds",
"output": { "format": "mp4" } }
Generate captions in 20 languages with burned-in rendering or sidecar SRT/VTT export. Customize font, size, position, background color, and animation style.
POST /v1/captions
{ "source": "s3://bucket/video.mp4",
"languages": ["en", "es", "ja"],
"style": "burned_in",
"position": "bottom_center" }
Submit up to 10,000 videos per batch request. Parallel processing across our GPU cluster with webhook callbacks on completion. Process an entire content library overnight.
POST /v1/batch
{ "jobs": [
{ "source": "s3://b/vid1.mp4", "instructions": "..." },
{ "source": "s3://b/vid2.mp4", "instructions": "..." }
],
"webhook": "https://your-app.com/done" }
Export as MP4, WebM, MOV, GIF, or audio-only (MP3, WAV, FLAC). Set resolution, bitrate, codec (H.264, H.265, VP9, AV1), and frame rate per output.
POST /v1/editor/edit
{ "source": "...",
"instructions": "...",
"output": { "format": "webm", "codec": "vp9",
"resolution": "720p", "fps": 30 } }
All editing jobs run asynchronously. Get a job ID immediately, poll for status, or receive a webhook POST when processing completes. Includes progress percentage for long jobs.
GET /v1/jobs/job_abc123
// Response:
{ "status": "completed",
"progress": 100,
"output_url": "https://cdn.v100.ai/..." }