How does V100's noise suppression work?

V100's noise suppression uses a Web Audio API pipeline with four stages: a high-pass filter that removes low-frequency noise (HVAC, fans, traffic rumble), a dynamics compressor that acts as a noise gate to suppress audio below a threshold, a gain node that boosts the cleaned signal, and a limiter that prevents clipping. The pipeline runs entirely in the browser with no downloads or server processing. Three suppression levels (low, medium, high) adjust the filter frequency, gate threshold, and compression ratio.

How does V100 noise suppression compare to Krisp?

Krisp uses a deep neural network trained on millions of noise samples to perform spectral noise removal. It excels at complex noise like barking dogs, children, and overlapping speech. V100 uses a signal processing pipeline (high-pass filter + dynamics compressor + limiter) that is lighter weight and runs natively in the browser with zero downloads. Krisp costs $60/year per user and requires a desktop app or SDK. V100's noise suppression is included at no extra cost and requires zero additional software. V100 is planning RNNoise WASM integration as a future upgrade for ML-based suppression.

Does V100's noise suppression add latency?

V100's noise suppression adds less than 10ms of audio latency. The Web Audio API processes audio in real time using the browser's native audio processing thread. The pipeline consists of four AudioNode instances connected in series, which are processed by the browser's optimized audio engine in a single render quantum (typically 128 samples at 48kHz, or about 2.67ms). The total added latency is imperceptible in conversation.

Noise Suppression via Web Audio API: How V100 Kills Backg...

Audio quality is the single most important factor in video call satisfaction. Not video resolution, not latency, not layout — audio. Research from Microsoft and Stanford has consistently shown that poor audio quality causes participants to disengage faster than poor video quality. A 720p video with crystal-clear audio feels professional. A 4K video with keyboard clicking, fan whine, and HVAC rumble in the background feels like a disaster.

The problem is that real-world audio environments are hostile. Mechanical keyboards produce 60-80dB transients on every keystroke. Desktop fans generate continuous broadband noise at 30-45dB. Air conditioning creates a persistent 100-300Hz rumble. Construction outside adds low-frequency thumps and high-frequency grinding. A participant in a coffee shop brings all of these together with espresso machines, background conversation, and door chimes. The microphone captures everything. The other participants hear everything.

V100 solves this with a four-stage noise suppression pipeline built on the Web Audio API. It runs entirely in the browser, requires zero downloads, adds less than 10ms of latency, and is controlled through a simple API toggle with three suppression levels. This post explains exactly how it works, why we chose signal processing over machine learning for the current implementation, and where we are taking it next.

The Four-Stage Pipeline

V100's noise suppression pipeline chains four Web Audio API nodes in series. Each stage addresses a specific category of noise. The signal flows from the raw microphone input through all four stages and emerges as a clean, normalized audio stream that replaces the original track on all peer connections.

Audio Signal Chain

High-Pass Filter — BiquadFilterNode type "highpass" removes low-frequency noise (HVAC, traffic, fan rumble)

Dynamics Compressor (Noise Gate) — DynamicsCompressorNode suppresses audio below the threshold, silencing background noise between speech

Gain Boost — GainNode restores volume after compression, compensating for the reduced dynamic range

Limiter — DynamicsCompressorNode configured as a hard limiter prevents clipping on loud transients (coughs, laughs, desk slaps)

Stage 1: High-Pass Filter

The first stage is a BiquadFilterNode configured as a high-pass filter. This removes all audio energy below a cutoff frequency, eliminating the low-frequency noise that plagues most environments. HVAC systems generate continuous noise between 50-250Hz. Desktop fans produce broadband noise with dominant energy below 200Hz. Traffic rumble sits between 40-150Hz. A high-pass filter tuned to the right cutoff eliminates all of this while preserving the frequency range of the human voice (fundamental frequencies between 85Hz for male voices and 255Hz for female voices, with formants extending to 4kHz+).

The cutoff frequency varies by suppression level. At low suppression, the cutoff is 200Hz — conservative enough to preserve deep male voices while removing the worst of the HVAC rumble. At medium, it rises to 300Hz, which catches more ambient noise at the cost of slightly thinning the deepest voices. At high, it goes to 400Hz, aggressively cutting everything below the primary voice band. Most users will not notice the difference between 200Hz and 300Hz on their voice. They will notice the dramatic reduction in background noise.

Stage 2: Dynamics Compressor as Noise Gate

The second stage is the core of the suppression pipeline. A DynamicsCompressorNode is configured with a threshold, ratio, attack, and release to function as a noise gate. Audio above the threshold passes through with minimal compression. Audio below the threshold is heavily compressed — effectively silenced.

When the user stops speaking, the ambient room noise (typically -50dB to -30dB) falls below the threshold. The compressor applies its ratio (12:1 at low, 16:1 at medium, 20:1 at high), reducing the noise to near-silence. When the user speaks, their voice (typically -20dB to 0dB) exceeds the threshold, and the compressor passes it through with minimal attenuation. The result: clean voice during speech, silence between speech.

Parameter	Low	Medium	High
High-pass cutoff	200Hz	300Hz	400Hz
Gate threshold	-50dB	-40dB	-30dB
Compression ratio	12:1	16:1	20:1
Attack time	0.003s	0.003s	0.003s
Release time	0.25s	0.20s	0.15s
Gain boost	1.2x	1.4x	1.6x

The attack time is uniformly fast at 3ms across all levels. This ensures the compressor engages almost instantly when the user starts speaking, preventing the first syllable from being clipped. The release time varies: 250ms at low (smooth, natural decay), 200ms at medium, and 150ms at high (faster gate close to catch noise between short pauses). The tradeoff is that faster release can make the gate audible as a slight "breathing" effect on quiet environments, which is why low suppression uses a slower release.

Stage 3: Gain Boost

The dynamics compressor reduces the overall level of the audio, even for speech that passes above the threshold (because the ratio is not 1:1 above threshold — it is closer to 2:1). The gain stage compensates for this level reduction, boosting the output to match the expected input level. Without gain compensation, the suppressed audio would sound noticeably quieter than unsuppressed audio, which would be jarring when toggling noise suppression on and off.

The gain boost is calibrated per level: 1.2x at low, 1.4x at medium, 1.6x at high. Higher suppression levels use more aggressive compression (higher ratio, higher threshold), which reduces the output level more, requiring more gain compensation. These values were calibrated by measuring the average voice level before and after compression and adjusting the gain to match.

Stage 4: Limiter

The final stage is a second DynamicsCompressorNode, but configured as a hard limiter rather than a noise gate. Its threshold is set to -1dB with a ratio of 20:1 and a 0.001s attack. Any audio that approaches 0dBFS (digital full scale) is immediately compressed to prevent clipping. This protects against loud transients that the gain stage might boost into clipping territory: a sudden cough, a laugh, a desk slap, or the sharp consonant in an emphatic "absolutely."

Without the limiter, the gain boost in stage 3 could push loud speech segments past 0dBFS, causing digital clipping — the harsh, distorted sound that makes listeners wince. The limiter is a safety net that ensures the output never clips, regardless of input level. It engages rarely (only on genuine peaks), so it has no audible effect during normal speech.

Seamless Track Replacement: No Renegotiation

A critical implementation detail is how the processed audio replaces the original microphone track on active WebRTC peer connections. The naive approach — removing the old track and adding a new one — triggers ICE renegotiation on every peer connection. In a room with 8 participants, that means 7 renegotiation cycles every time someone toggles noise suppression. Each cycle adds 200-500ms of signaling overhead and risks a brief audio gap.

V100 uses RTCRtpSender.replaceTrack() instead. This method swaps the audio track on an existing sender without triggering renegotiation. The WebRTC connection continues uninterrupted. The remote participant's audio stream switches from raw microphone to processed audio (or back) with zero gap, zero renegotiation, and zero interruption. The toggle is instantaneous and seamless.

The processed audio comes from a MediaStreamAudioDestinationNode at the end of the Web Audio pipeline. This node produces a MediaStream whose audio track can be used directly with replaceTrack(). When noise suppression is disabled, V100 replaces the processed track back to the original microphone track. The pipeline nodes remain instantiated but disconnected, ready for instant reconnection.

Audio Level Monitoring

V100 exposes a getAudioLevel() method that returns the current RMS audio level (0.0 to 1.0) of either the raw or processed audio stream. This powers UI audio meters that give visual feedback of the suppression effect. Developers can show a before/after meter that demonstrates how the suppression pipeline reduces background noise while preserving voice level.

The audio level is computed using an AnalyserNode tapped at the output of the pipeline. The analyser provides frequency-domain data via getByteFrequencyData() and time-domain data via getByteTimeDomainData(). V100 computes RMS from the time-domain data for the level meter and uses frequency-domain data for the optional spectral visualization.

Code Sample: Toggle and Level Selection

javascript

// Enable noise suppression at medium level
await room.setNoiseSuppression({
  enabled: true,
  level: 'medium',          // 'low' | 'medium' | 'high'
});

// Change level on the fly (no renegotiation)
await room.setNoiseSuppression({ level: 'high' });

// Disable noise suppression
await room.setNoiseSuppression({ enabled: false });

// Get current audio level for UI meter
const level = room.getAudioLevel();  // 0.0 - 1.0
console.log(`Current audio level: ${(level * 100).toFixed(0)}%`);

// Listen for suppression state changes
room.on('noiseSuppression:change', (event) => {
  console.log(`Suppression: ${event.enabled ? event.level : 'off'}`);
});

// Per-participant suppression control
room.setNoiseSuppression({
  enabled: true,
  level: 'high',
  participantId: 'user_abc123',  // Apply to specific user
});

V100 vs. Krisp vs. Zoom vs. Teams

Feature	V100	Krisp	Zoom	Teams
Approach	Web Audio pipeline	Deep neural network	Proprietary ML	Proprietary ML
Complex noise (dogs, babies)	Moderate	Excellent	Good	Good
Steady-state noise (HVAC, fans)	Excellent	Excellent	Good	Good
Keyboard clicks	Excellent	Excellent	Good	Good
Download required	None	Desktop app or SDK	Zoom client	Teams client
Added latency	<10ms	~20ms	Undisclosed	Undisclosed
API controllable	Yes (per participant)	SDK only	User settings only	User settings only
Suppression levels	3 (low/med/high)	2 (on/off)	3 (auto/low/high)	3 (auto/low/high)
Cost	Included	$60/yr per user	Included in plan	Included in plan

The honest assessment: Krisp's neural network approach is better at complex, non-stationary noise like barking dogs, baby cries, and overlapping human speech. These are sounds that occupy the same frequency range as the target voice and cannot be separated by a frequency filter or noise gate alone. V100's signal processing pipeline excels at steady-state noise (the types that dominate most professional environments) and transient mechanical noise (keyboards, mouse clicks, desk taps). For most office, home-office, and professional environments, V100's approach is more than sufficient.

The tradeoffs favor V100 in deployment simplicity. V100's pipeline runs natively in the browser — there is nothing to download, nothing to install, nothing to keep updated. Krisp requires either a desktop application (for end users) or an SDK integration (for developers). Zoom and Teams require their respective native clients. V100 works in any modern browser on any operating system with zero prerequisites.

Roadmap: RNNoise WASM for ML-Based Suppression

The Web Audio pipeline is V100's production noise suppression today. The next step is RNNoise, Mozilla's open-source recurrent neural network for noise suppression, compiled to WebAssembly. RNNoise is trained on a dataset of 100+ hours of noise and speech, and operates on 10ms audio frames in the frequency domain. It separates voice from noise with near-Krisp quality for common noise types, while running at under 5% CPU on modern hardware.

The WASM-compiled RNNoise model is approximately 200KB — small enough to load on demand without impacting page load time. V100 will integrate it as a new AudioWorkletNode that slots into the existing pipeline, replacing the high-pass filter and dynamics compressor with spectral noise estimation and gain calculation. The limiter stage will remain as a safety net.

The timeline: RNNoise integration is in internal testing now and will ship as an opt-in upgrade. Developers will be able to choose between the signal processing pipeline (lowest latency, zero download) and the ML pipeline (better complex noise removal, 200KB WASM download on first use). Both will be controllable through the same setNoiseSuppression() API.

Integration with V100's Audio Stack

Noise suppression is one component of V100's broader audio quality stack. It works alongside speaker diarization (which identifies who said what in a recording), active speaker detection (which determines who is currently speaking for auto-zoom), and echo cancellation (handled by the browser's native AEC). The noise suppression pipeline feeds into the audio level monitoring that drives speaker detection, meaning cleaner audio also means more accurate speaker tracking.

For developers building on V100, noise suppression is a single API call. Toggle it on, pick a level, and let the pipeline handle the rest. No AudioContext management, no node wiring, no track replacement logic. V100 abstracts all of the Web Audio complexity behind a clean interface that does the right thing.

Hear the difference

Start a V100 room, turn on a fan or type on your keyboard, and toggle noise suppression. The background noise disappears. The voice stays.

Start Free Trial API Reference

Noise Suppression via Web Audio API: How V100 Kills Background Noise