How does V100's virtual background API work?

V100 uses MediaPipe Selfie Segmentation to generate a real-time segmentation mask that separates the person from their background. The mask is applied via canvas compositing (globalCompositeOperation) to replace the background with a preset or custom image at 30fps. The composited canvas is captured via captureStream() and replaces the original camera track in the WebRTC connection.

What virtual background presets does V100 offer?

V100 includes 7 virtual background presets: slight blur, heavy blur, solid black, solid white, office environment, green screen, and custom image upload. Custom images can be set via URL or base64 data. All presets work at 30fps with MediaPipe segmentation.

How does V100's virtual background compare to Zoom and Teams?

Zoom and Teams offer virtual backgrounds as a built-in UI feature that users toggle manually. V100 offers virtual backgrounds as a programmable API: developers can set, swap, enforce, and disable backgrounds per-participant via code. V100 also auto-pauses virtual backgrounds during screen share and resumes after, handles portrait and landscape orientations, and falls back to full-frame blur when MediaPipe is unavailable.

Virtual Backgrounds via API: How V100 Uses MediaPipe for ...

Q: Can I control virtual backgrounds programmatically via API?

Yes. V100 is the only video API with full programmatic control over virtual backgrounds. You can set backgrounds per-participant via the session configuration, switch backgrounds mid-call via API, enforce company-approved backgrounds for all participants, and disable virtual backgrounds entirely for compliance scenarios. No other video API offers this level of API control.

Virtual backgrounds are table stakes for any video platform in 2026. Zoom has them. Teams has them. Google Meet has them. Your users expect to blur their messy apartment, replace their background with a corporate logo, or hide the fact that they are taking a board meeting from a coffee shop. This is not a differentiator. It is a baseline requirement.

What is a differentiator is whether developers can control virtual backgrounds programmatically. On Zoom, the user clicks a button in the UI. On Teams, the user selects a background from a gallery. On both platforms, the developer building on top of the SDK has limited or no control over what backgrounds are available, when they are applied, or whether they can be enforced. The background is a user feature, not a developer feature.

V100 takes a fundamentally different approach. Virtual backgrounds are an API. Developers set them via session configuration, swap them mid-call via API calls, enforce company-approved backgrounds for all participants, disable them for compliance scenarios, and control them per-participant. The user-facing experience is still a button in the meeting UI. But the developer has full programmatic control over what happens behind that button. This post is a technical deep-dive into how the implementation works.

How It Works: MediaPipe + Canvas + captureStream

V100's virtual background pipeline has three stages: segmentation, compositing, and stream replacement. The entire pipeline runs in the browser on the participant's device. No video frames are sent to a server for background processing. This is critical for both latency and privacy — the unprocessed camera feed never leaves the user's device.

Stage 1: Segmentation (MediaPipe Selfie Segmentation)

The segmentation stage uses MediaPipe Selfie Segmentation, a lightweight machine learning model that runs in the browser via WebAssembly. The model takes each video frame as input and produces a segmentation mask: a grayscale image where white pixels represent the person and black pixels represent the background. The model runs at the camera's frame rate (typically 30fps) and adds approximately 3-8 milliseconds of processing time per frame on modern hardware.

MediaPipe's segmentation model is trained specifically for selfie-style video: a single person facing the camera at arm's length. It handles common challenges well: varying skin tones, glasses, hats, headphones, and partially visible hands. It struggles with unusual poses (person turned sideways), transparent objects (glass on a desk), and very similar foreground/background colors (wearing a white shirt against a white wall). For these edge cases, V100 applies Gaussian blur to the mask edges to create a smooth transition rather than a hard cutoff.

Stage 2: Compositing (Canvas API)

The compositing stage uses the HTML5 Canvas API to combine the person (foreground) with the selected background. The process works as follows for each frame:

Per-frame compositing pipeline

Draw background

Draw the selected background image (or blur the camera frame) onto the canvas at full resolution.

Apply segmentation mask

Set globalCompositeOperation = 'destination-in' and draw the segmentation mask. This clips the background to only the areas where the person is NOT present.

Composite person

Set globalCompositeOperation = 'destination-over' and draw the original camera frame. The person appears in front of the replacement background.

Output frame

The composited canvas is the final frame sent to the WebRTC connection via captureStream().

The globalCompositeOperation approach is the most performant method for canvas-based compositing because it uses the browser's hardware-accelerated 2D renderer. Alternative approaches — pixel-by-pixel manipulation via getImageData/putImageData — are 10-50x slower because they bypass GPU acceleration and force a CPU round-trip for every pixel.

Stage 3: Stream Replacement (captureStream)

The final stage replaces the original camera track in the WebRTC connection with the composited canvas output. The canvas produces a MediaStream via canvas.captureStream(30) (30fps). The video track from this stream replaces the original camera track using RTCRtpSender.replaceTrack(). This is a seamless, glitch-free replacement — the remote participant sees the background change without any interruption in the video feed.

The canvas handles both portrait and landscape orientations automatically. The compositing logic detects the video resolution and adjusts the background image scaling to fill the frame without stretching or letterboxing. When the user rotates their device or resizes their window, the canvas dimensions update and the background scales accordingly.

The 7 Presets + Custom Upload

V100 ships with 7 built-in virtual background presets that cover the most common use cases. Each preset is selectable via the meeting UI or settable via the API.

Preset	API Value	Description	Use Case
Slight Blur	blur-light	Gaussian blur (radius 10px) on background	Subtle background de-emphasis
Heavy Blur	blur-heavy	Gaussian blur (radius 30px) on background	Complete background concealment
Solid Black	solid-black	Pure black (#000000) background	Professional, minimal distraction
Solid White	solid-white	Pure white (#FFFFFF) background	Clean, studio-style look
Office	office	Professional office environment image	Work-from-home professionalism
Green Screen	green-screen	Solid green (#00FF00) background	Post-processing, chroma key workflows
Custom Image	custom	User-uploaded image (URL or base64)	Branding, custom environments
None	none	Disable virtual background	Original camera feed

The green screen preset deserves special mention. It enables post-production workflows where the meeting recording is later processed with professional chroma key software to composite participants into any environment. This is used by broadcast media companies, content creators, and event production teams who need studio-quality compositing that goes beyond what real-time browser-based processing can achieve.

API Integration: Set and Swap Backgrounds via Code

The key difference between V100 and every other video platform is that virtual backgrounds are controllable via the API. This enables use cases that are impossible when virtual backgrounds are a user-only feature.

session-config.js

// Set virtual background in session config (before joining)
const session = await v100.createSession({
  roomId: 'board-meeting-q1',
  virtualBackground: {
    enabled: true,
    preset: 'blur-heavy',  // or 'office', 'custom', etc.
  },
});

// Switch background mid-call
await session.setVirtualBackground({
  preset: 'custom',
  imageUrl: 'https://company.com/branding/bg.jpg',
});

// Enforce company background for all participants
await v100.rooms.update('board-meeting-q1', {
  virtualBackground: {
    enforced: true,
    preset: 'custom',
    imageUrl: 'https://company.com/branding/bg.jpg',
    allowOverride: false,  // participants cannot change it
  },
});

// Disable virtual backgrounds (compliance mode)
await v100.rooms.update('deposition-room', {
  virtualBackground: {
    enabled: false,  // no backgrounds allowed
  },
});

// Per-participant control
await session.participants.setBackground('participant-id', {
  preset: 'blur-light',
});

The enforcement API is particularly important for enterprise customers. A company can mandate that all employees use the corporate-branded background on client-facing calls. A legal firm can disable virtual backgrounds entirely for depositions, ensuring that the video recording shows the actual environment. A healthcare provider can enforce a neutral background for patient consultations to maintain professionalism. These are policy decisions that require programmatic control, not user preferences.

Screen Share Integration: Automatic Pause and Resume

A common edge case that most virtual background implementations handle poorly is the transition between camera and screen share. When a participant starts sharing their screen, the virtual background should stop processing — applying a segmentation mask to a screen capture makes no sense and wastes CPU. When screen sharing stops, the virtual background should resume seamlessly.

V100 handles this automatically. When a participant starts screen sharing, the virtual background pipeline pauses: the MediaPipe model stops receiving frames, the canvas compositing loop stops, and the screen share track replaces the composited track in the WebRTC connection. When screen sharing ends, the pipeline resumes within one frame (33ms at 30fps). The participant sees their virtual background return instantly. Remote participants see a seamless transition with no black frames, no flickering, and no delay.

This integration extends to V100's picture-in-picture mode. When a participant is screen sharing with a camera PiP overlay, the virtual background is applied only to the PiP overlay (the camera feed) and not to the screen share content. The segmentation model processes the small PiP frame, which is computationally cheaper than processing a full-resolution camera feed, resulting in even lower CPU usage during screen share.

Graceful Fallback: When MediaPipe Is Unavailable

MediaPipe Selfie Segmentation loads from a CDN. In environments where CDN access is blocked (corporate firewalls, air-gapped networks, regions with CDN restrictions), the model cannot load and segmentation is unavailable. V100 handles this gracefully with a two-tier fallback strategy.

Tier 1: Full-frame blur. If the segmentation model cannot load but the user has selected a virtual background, V100 applies a full-frame Gaussian blur to the entire video feed. This does not separate the person from the background (the person is also blurred), but it provides privacy by obscuring the environment. This is better than showing the raw camera feed when the user explicitly requested a background.

Tier 2: Raw camera feed with notification. If even canvas processing is unavailable (extremely old browsers, canvas disabled by policy), V100 falls back to the raw camera feed and displays a notification to the user explaining that virtual backgrounds are not available in their environment. The video call proceeds normally — virtual backgrounds are never a hard requirement for joining a meeting.

For customers who need virtual backgrounds in air-gapped environments, V100 supports self-hosted MediaPipe model files. The model weights can be served from the customer's own infrastructure, eliminating the CDN dependency entirely. This is configured via the session configuration by providing a custom modelUrl parameter.

Technical Architecture Diagram

architecture


  Camera (getUserMedia)
       |
       v  30fps video frames
  +--------------------+
  |  MediaPipe Selfie  |  ~3-8ms per frame (WASM)
  |  Segmentation      |  Output: grayscale mask
  +--------+-----------+
           |
           v  mask + original frame
  +--------------------+
  |  Canvas Compositor |  <1ms per frame (GPU-accelerated)
  |                    |
  |  1. Draw background|  (preset image, blur, or solid color)
  |  2. destination-in |  (clip with mask)
  |  3. destination-   |  (composite person over background)
  |     over           |
  +--------+-----------+
           |
           v  composited frame
  +--------------------+
  |  captureStream(30) |  Canvas -> MediaStream
  +--------+-----------+
           |
           v  video track
  +--------------------+
  |  replaceTrack()    |  Swap into WebRTC connection
  |  RTCRtpSender      |  Seamless, no renegotiation
  +--------------------+
           |
           v
  Remote participants see composited video

  Screen Share Active?
  --------------------
  YES -> Pause segmentation + canvas loop
         Replace track with screen capture
  NO  -> Resume pipeline within 1 frame (33ms)

Performance: CPU Impact and Optimization

Virtual background processing adds CPU load on the participant's device. The MediaPipe segmentation model is the primary cost at 3-8ms per frame. The canvas compositing adds less than 1ms per frame when GPU-accelerated. At 30fps, the total CPU overhead is approximately 120-270ms of processing per second, or 12-27% of a single CPU core. On modern laptops and phones with multiple cores, this is well within budget. On older devices, V100 automatically reduces the segmentation frequency to 15fps to halve the CPU load while maintaining acceptable visual quality.

The blur presets are significantly cheaper than image replacement backgrounds because they skip the segmentation step entirely for the background. Instead of compositing a separate image, blur presets apply a CSS filter to the canvas and draw the camera frame twice: once blurred (background) and once sharp (foreground with mask). This reduces per-frame processing time to under 2ms total.

Per-frame processing cost by preset type

<2ms

Blur presets

No separate background image

4-9ms

Image backgrounds

Segmentation + compositing

0ms

Solid color

fillRect + mask (trivial)

Use Cases: Why API-Controlled Backgrounds Matter

Work from home: The most common use case. Employees hide their home environment on professional calls. V100's API allows companies to pre-configure the default background for their organization, so employees do not need to select it manually on every call.

HIPAA compliance: In telehealth sessions, the patient's environment may contain sensitive information (medication bottles, medical equipment, other people). Enforcing a blur or solid background protects patient privacy and prevents incidental disclosure of PHI. V100's enforcement API ensures the background is applied automatically, removing the burden from the patient.

Corporate branding: Enterprise customers use custom background images with their company logo, product imagery, or event branding. For webinars and customer-facing calls, a consistent branded background reinforces professionalism. V100's room-level enforcement ensures every participant presents a unified brand image.

Legal depositions: In video depositions, attorneys may want to disable virtual backgrounds entirely to ensure the recording shows the actual environment of the deponent. V100's disable API prevents any participant from activating a background, providing an authentic record. This is discussed further in our video deposition recording guide.

Content creation: The green screen preset enables professional post-production workflows. Creators record with a digital green screen, then use professional compositing software to place themselves in any environment with higher quality than real-time browser processing can achieve. This bridges V100's real-time video capabilities with traditional production workflows.

Comparison: V100 vs. Zoom vs. Teams vs. Daily

Feature	V100	Zoom	Teams	Daily
Virtual backgrounds	Yes	Yes	Yes	Yes
Set via API	Yes	No	No	Limited
Swap mid-call via API	Yes	No	No	No
Enforce per-room	Yes	No	Admin only	No
Disable for compliance	Yes (API)	Admin setting	Admin setting	No
Per-participant control	Yes	No	No	No
Custom image upload	Yes (URL + base64)	Yes (file upload)	Yes (file upload)	Limited
Screen share auto-pause	Yes (automatic)	Yes	Yes	Varies
Green screen preset	Yes	Yes	No	No

The pattern is clear. Zoom and Teams treat virtual backgrounds as a user feature controlled through their UI. Daily offers limited SDK control. V100 treats virtual backgrounds as a developer feature controlled through the API. For developers building custom video experiences — telehealth platforms, corporate meeting tools, event production systems, legal deposition software — API-level control is not a nice-to-have. It is the difference between building a polished product and hacking around platform limitations.

Implementation: Switching Backgrounds Mid-Call

background-switcher.js

// Example: branded background for sales calls, blur for internal
const session = await v100.joinRoom('sales-demo-42');

// Start with company branding
await session.setVirtualBackground({
  preset: 'custom',
  imageUrl: 'https://cdn.acme.com/bg/sales-backdrop.jpg',
});

// Client leaves, switch to casual internal standup
session.on('participantLeft', async (participant) => {
  if (participant.role === 'external') {
    const externals = session.participants
      .filter(p => p.role === 'external');
    if (externals.length === 0) {
      // All external participants left -- relax background
      await session.setVirtualBackground({
        preset: 'blur-light',
      });
    }
  }
});

// Dynamic per-participant backgrounds
session.on('participantJoined', async (participant) => {
  if (participant.role === 'presenter') {
    await session.participants.setBackground(
      participant.id,
      { preset: 'custom', imageUrl: participant.brandedBg }
    );
  }
});

This kind of dynamic, event-driven background control is impossible on platforms where virtual backgrounds are a user-facing button. V100's API makes the background a programmable element of the video experience, just like the video layout, the recording settings, or the participant permissions. It is one more dimension of control that developers need to build polished, professional video products.

Build with API-controlled virtual backgrounds

7 presets, custom upload, per-participant control, enforcement policies, and automatic screen share integration. Virtual backgrounds as a developer feature, not just a user button.

Start Free Trial Telehealth Guide

Virtual Backgrounds via API: How V100 Uses MediaPipe for Programmable Background Replacement