The average meeting recording is 40% silence and filler. V100's silence removal API detects dead air through waveform analysis and transcript alignment, then cuts it out -- preserving natural pacing and conversation flow.
The audio waveform is segmented into 50ms frames. Each frame's RMS energy is computed and compared against a noise floor baseline derived from the first 2 seconds of audio (or a user-specified reference). Frames below the silence threshold (default: -40dB relative to peak) are marked as candidate silence regions.
Simultaneously, the audio is transcribed with word-level timestamps. The transcript reveals gaps between words that represent natural pauses, filler words ("um", "uh", "like", "you know", "so", "basically"), and extended hesitations. These are cross-referenced with the waveform silence regions to distinguish between intentional dramatic pauses and unintentional dead air.
Silence regions exceeding your configured threshold (0.3s to 5s) are removed with 80ms crossfade transitions to prevent audio clicks. A configurable "keep padding" (default: 150ms) is preserved on each side of remaining speech to maintain natural breathing rhythm. The video track is cut in sync with zero frame drift.
Remove silence with a single API request. Configure thresholds, filler word detection, and padding.
curl -X POST https://api.v100.ai/v1/editor/edit \
-H "Authorization: Bearer $V100_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": "s3://your-bucket/meeting-recording.mp4",
"instructions": "Remove all silence longer than 0.8 seconds and remove filler words",
"silence_options": {
"threshold_seconds": 0.8,
"remove_fillers": true,
"filler_words": ["um", "uh", "like", "you know", "basically"],
"keep_padding_ms": 150,
"crossfade_ms": 80
},
"output": {
"format": "mp4",
"resolution": "source"
},
"webhook": "https://your-app.com/api/webhooks/v100"
}'
# Response (immediate):
# {
# "job_id": "job_sil_7f3a9b2c",
# "status": "processing",
# "estimated_seconds": 180
# }
{
"job_id": "job_sil_7f3a9b2c",
"status": "completed",
"output_url": "https://cdn.v100.ai/out/7f3a9b2c.mp4",
"stats": {
"original_duration_seconds": 3612,
"output_duration_seconds": 2247,
"removed_seconds": 1365,
"silence_segments_removed": 284,
"filler_words_removed": 67,
"reduction_percent": 37.8
}
}
Interview recordings typically contain 15-25% dead air from thinking pauses, connection delays, and filler words. Removing these makes episodes tighter and more listenable without manual editing.
A 60-minute meeting recording often contains 20-25 minutes of silence from screen sharing transitions, people joining/leaving, and "can you hear me?" troubleshooting. Cut it to a focused 35-minute recap.
Lecture recordings with pauses for writing, thinking, or slide transitions. Students watch at 1.5-2x speed anyway -- removing silence first makes normal playback feel natural and saves bandwidth.
Free tier includes 60 minutes of processing per month. No credit card required.
Get API Key — Free TierEdit video with natural language commands including silence removal, captioning, and more.
BlogDeep dive into silence detection algorithms and API integration patterns.
FeatureRemove silence from thousands of recordings in a single batch request.