What is a live transcription API?

A live transcription API provides real-time speech-to-text conversion during active video meetings, as opposed to post-processing transcription which generates text after a meeting ends. V100's live transcription delivers interim results within 300 milliseconds and final results with word-level timestamps and speaker diarization in 40+ languages.

How is live transcription different from post-meeting transcription?

Live transcription processes audio in real-time as participants speak, enabling live captions, accessibility features, and real-time AI analysis. Post-meeting transcription processes the recorded audio after the meeting ends, which allows for higher accuracy through multiple processing passes but does not support real-time features. V100 offers both: live transcription during meetings and enhanced post-processing transcription from recordings.

What languages does V100 live transcription support?

V100 live transcription supports 40+ languages including English, Spanish, French, German, Portuguese, Japanese, Korean, Mandarin Chinese, Arabic, Hindi, Italian, Dutch, Polish, Russian, Turkish, and many more. Language detection can be automatic or manually specified per participant.

REAL-TIME

Live Transcription API
with Real-Time Captions

A live transcription API delivers real-time speech-to-text conversion during active video meetings -- not after they end. V100's live transcription provides interim results within 300 milliseconds, final results with word-level timestamps, and speaker diarization across 40+ languages. Powered by Deepgram for low-latency streaming and Whisper for enhanced accuracy, V100's transcription runs during the meeting as participants speak, enabling live captions overlay, real-time AI analysis, accessibility compliance, and instant searchable meeting records. This is fundamentally different from post-processing transcription, which only generates text after a recording is complete.

Get API Key View Code Samples

Sub-300ms latency

40+ languages supported

Speaker diarization included

Live Transcription vs. Post-Processing: Why Real-Time Matters

Most video APIs offer transcription as an afterthought -- you get a transcript hours after the meeting ends. Live transcription fundamentally changes what is possible during a meeting by making spoken content available as structured data in real-time.

Live Transcription (V100)

TIMING Results delivered during the meeting, within 300ms of speech

CAPTIONS Live captions overlay displayed to all participants in real-time

AI Enables real-time AI analysis (highlights, action items, sentiment)

A11Y Immediate accessibility for deaf and hard-of-hearing participants

Post-Processing (Traditional)

TIMING Results available minutes to hours after the meeting ends

CAPTIONS No live captions -- transcript only available after processing

AI AI analysis can only run after meeting ends (no real-time insights)

A11Y No accessibility support during the meeting itself

How Live Transcription Works

V100's live transcription pipeline processes audio streams from every participant in real-time. The system uses Deepgram for low-latency streaming recognition and Whisper for enhanced accuracy on complex audio, automatically selecting the optimal engine based on language and audio quality.

Audio Capture

Each participant's audio stream is captured independently from the WebRTC media track. Audio is preprocessed with noise suppression and automatic gain control before transcription.

Stream to ASR

Preprocessed audio is streamed to the ASR engine via WebSocket. Deepgram handles most languages with sub-200ms latency. Whisper activates for languages where it provides better accuracy.

Interim + Final Results

Interim results (partial words) are delivered immediately for live captions display. Final results include word-level timestamps, confidence scores, and speaker labels once a phrase is complete.

Diarize and Deliver

Speaker diarization labels each transcript segment with the participant who spoke. Results are delivered via WebSocket to the client SDK and stored as a structured transcript for the meeting record.

40+ Languages

English, Spanish, French, German, Portuguese, Japanese, Korean, Mandarin, Arabic, Hindi, Italian, Dutch, Polish, Russian, Turkish, Swedish, Norwegian, Danish, Finnish, Thai, Vietnamese, Indonesian, Malay, and many more. Language can be auto-detected or specified per participant.

Word-Level Timestamps

Every word in the transcript includes a precise start and end timestamp. This enables click-to-seek functionality in recorded meetings, accurate subtitle alignment, and precise correlation between transcript text and video playback position.

Speaker Diarization

Automatic identification of who said what. Each transcript segment is labeled with the speaking participant's identity. Uses a combination of audio source tracking (from WebRTC tracks) and voice fingerprinting for accurate attribution even when participants talk over each other.

Integration Guide

Enable live transcription when creating a meeting. Transcript results stream to the client SDK via WebSocket, or your backend can receive them via webhook.

live-transcription.js

// npm install v100-sdk
import { V100 } from 'v100-sdk';

const v100 = new V100('v100_live_YOUR_API_KEY');

// Create meeting with live transcription enabled
const meeting = await v100.meetings.create({
  name: 'Product Review',
  transcription: {
    enabled: true,
    mode: 'live',               // 'live' | 'post' | 'both'
    language: 'auto',            // Auto-detect or specify: 'en', 'es', 'fr', etc.
    diarize: true,              // Speaker identification
    interimResults: true,       // Partial words for live captions
    wordTimestamps: true,       // Per-word start/end times
    customVocabulary: ['V100', 'Dilithium', 'ML-KEM']
  },
  captions: {
    overlay: true,              // Show live captions in meeting UI
    position: 'bottom',         // 'bottom' | 'top' | 'custom'
    fontSize: 'medium'          // 'small' | 'medium' | 'large'
  }
});

// Listen for real-time transcript events (client SDK)
v100.on('transcription.interim', (data) => {
  // data.text      → "we should consider the"
  // data.speaker   → { id: "usr_abc", name: "Alice Chen" }
  // data.isFinal   → false
});

v100.on('transcription.final', (data) => {
  // data.text      → "we should consider the new pricing model"
  // data.speaker   → { id: "usr_abc", name: "Alice Chen" }
  // data.words     → [{ word: "we", start: 12.34, end: 12.50, confidence: 0.98 }, ...]
  // data.language  → "en"
  // data.isFinal   → true
});

// Get full transcript after meeting ends
const transcript = await v100.transcripts.get(meeting.id);
// transcript.segments → [{ speaker: "Alice", text: "...", start: 0.0, end: 5.2 }, ...]
// transcript.fullText → "Alice: We should consider..."

REST API for creating a meeting with live transcription:

terminal

curl -X POST https://api.v100.ai/v1/meetings \
  -H "Authorization: Bearer v100_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Review",
    "transcription": {
      "enabled": true,
      "mode": "live",
      "language": "auto",
      "diarize": true,
      "interimResults": true,
      "wordTimestamps": true
    },
    "captions": { "overlay": true }
  }'

Use Cases

Live transcription turns spoken words into structured, searchable, actionable data in real-time. This unlocks capabilities that post-processing transcription simply cannot provide.

Accessibility Compliance

ADA, Section 508, and WCAG 2.1 AA require that video content be accessible to people who are deaf or hard of hearing. Live captions provide immediate accessibility during meetings without requiring a human captioner.

Real-Time AI Analysis

Feed live transcript data to V100's AI highlights engine for real-time detection of action items, key decisions, and sentiment shifts. Without live transcription, AI analysis can only happen after the meeting ends.

Multilingual Meetings

Participants speaking different languages can each have captions in their preferred language. Live transcription detects the source language automatically and can provide real-time translated captions through V100's translation integration.

Clinical Documentation

Live transcription during telehealth sessions generates real-time clinical notes. Clinicians can review and correct the transcript during the appointment rather than spending time on documentation afterward.

Instant Searchability

As soon as a meeting ends, the full transcript with word-level timestamps is available for search. Find specific moments by keyword and jump directly to the corresponding point in the recording using precise timestamp correlation.

Custom Integrations

Stream transcript data to your own systems via webhook or WebSocket. Build custom dashboards, trigger CRM updates when specific keywords are mentioned, or pipe transcripts to your data warehouse for analytics.

Pricing

Live transcription is billed per minute of transcribed audio. Speaker diarization, word-level timestamps, and live captions overlay are included at no additional cost.

Free Tier

$0/mo

Test live transcription.

60 min transcription/month

Live captions

Speaker diarization

5 languages

RECOMMENDED

Pro

$0.006/min

Production live transcription.

40+ languages

Custom vocabulary

Word-level timestamps

Webhook delivery

Enterprise

Custom

Volume discounts + SLA.

Volume pricing

Dedicated ASR capacity

Custom model training

99.99% SLA

Add Live Transcription to Your App

Real-time captions in 40+ languages. Speaker diarization, word-level timestamps, and live overlay included on every plan.

Get API Key — Free Tier Speaker Diarization

Related Resources

Feature

Live Transcription API
with Real-Time Captions

Live Transcription vs. Post-Processing: Why Real-Time Matters

Live Transcription (V100)

Post-Processing (Traditional)

How Live Transcription Works

Audio Capture

Stream to ASR

Interim + Final Results

Diarize and Deliver

40+ Languages

Word-Level Timestamps

Speaker Diarization

Integration Guide

Use Cases

Accessibility Compliance

Real-Time AI Analysis

Multilingual Meetings

Clinical Documentation

Instant Searchability

Custom Integrations

Pricing

Add Live Transcription to Your App

Related Resources

Post-Processing Transcription

Speaker Diarization API

Auto Captions

Live Transcription APIwith Real-Time Captions

Live Transcription vs. Post-Processing: Why Real-Time Matters

Live Transcription (V100)

Post-Processing (Traditional)

How Live Transcription Works

Audio Capture

Stream to ASR

Interim + Final Results

Diarize and Deliver

40+ Languages

Word-Level Timestamps

Speaker Diarization

Integration Guide

Use Cases

Accessibility Compliance

Real-Time AI Analysis

Multilingual Meetings

Clinical Documentation

Instant Searchability

Custom Integrations

Pricing

Add Live Transcription to Your App

Related Resources

Post-Processing Transcription

Speaker Diarization API

Auto Captions

Live Transcription API
with Real-Time Captions