What is an SFU and why does it matter for large video calls?

An SFU (Selective Forwarding Unit) is a server that receives one video stream from each participant and selectively forwards it to all other participants. Without an SFU, peer-to-peer WebRTC requires each participant to send their video to every other participant separately — creating N*(N-1) connections. With 10 people, that is 90 connections. With 200, it is 39,800 — impossible. An SFU reduces this to N connections: each participant sends 1 stream to the server, and the server handles distribution. V100's Rust SFU supports up to 200 participants per room.

How does V100's SFU handle bandwidth for 200 participants?

V100 uses 3-layer simulcast: each participant sends three quality levels simultaneously (320x180 at 150Kbps, 640x360 at 500Kbps, and 1280x720 at 1.5Mbps). The SFU selects which layer to forward to each recipient based on their available bandwidth and whether the sender is the active speaker. The active speaker gets the high-quality layer. Others get medium or low based on bandwidth. This means 200 participants do not require 200 high-quality streams — most receive low-quality thumbnails, dramatically reducing server egress.

How does V100 compare to LiveKit and mediasoup SFUs?

V100's SFU is written in Rust with lock-free DashMap concurrency. LiveKit's SFU is written in Go with mutex-based synchronization. mediasoup's SFU is written in C++ with a Node.js control plane. V100's Rust advantage is deterministic memory management (no GC pauses) and lock-free data structures (no mutex contention under high participant counts). V100 also provides automatic P2P-to-SFU switching: calls with fewer than 4 participants use direct P2P connections for lowest latency, and seamlessly upgrade to SFU mode when the 4th participant joins.

SFU Architecture: How V100 Scales to 200 Participants per...

Peer-to-peer WebRTC is elegant for two people. Each participant sends their video and audio directly to the other participant. One connection. No server in the middle. The latency is as low as physics allows.

Add a third person and each participant now needs two connections. Four people: three connections each. Six people: five connections each, for a total of 30 directional streams. Ten people: 90 streams. Twenty people: 380 streams. The formula is N × (N-1), and it scales catastrophically. By the time you reach 6 participants, most consumer internet connections cannot sustain the upstream bandwidth required to send a separate video stream to each peer. At 10, it is impossible. At 200, the number of peer connections (39,800) is absurd on its face.

This is the fundamental scaling problem of WebRTC, and it is why every large video call in the world uses a Selective Forwarding Unit (SFU). The SFU sits between participants and changes the connection topology from "everyone connects to everyone" to "everyone connects to one server." Each participant sends one stream to the SFU. The SFU forwards each participant's stream to all other participants. The number of connections is 2N (one inbound, one outbound per participant) instead of N×(N-1). This is the architecture that makes Zoom, Teams, Meet, and every other large video call work.

V100's SFU is written from scratch in Rust. It supports 200 participants per room, 1,000 rooms per server instance, three-layer simulcast with adaptive quality selection, and automatic P2P-to-SFU switching. This post is the complete architectural deep-dive: how the SFU is structured, how simulcast works, how adaptive quality selection decides what to forward, and why Rust gives us a structural advantage over Go and C++ alternatives.

Architecture: SfuRoom + SfuRouter

V100's SFU is organized around two primary structures: SfuRoom and SfuRouter. The SfuRoom manages a single room of up to 200 participants. The SfuRouter manages up to 1,000 SfuRooms on a single server instance. The separation of concerns is clean: the Room handles participant lifecycle, stream forwarding, and quality selection. The Router handles room creation, destruction, and resource allocation.

SFU Architecture

RTR

SfuRouter — 1 per server instance. Manages up to 1,000 rooms. Handles room creation/destruction, load monitoring, and inter-room routing for breakout rooms.

SfuRoom — Up to 200 participants. Manages participant join/leave, stream publishing, mute state, speaker detection, and simulcast layer forwarding.

Participant — Inbound: 1 video (3 simulcast layers) + 1 audio. Outbound: N-1 forwarded streams with per-recipient quality selection.

Participant state is stored in a DashMap<ParticipantId, ParticipantState>, a lock-free concurrent hash map. DashMap provides sharded locking: the map is divided into shards (typically 64), and each shard has its own lock. Operations on different participants almost never contend on the same shard. This means 200 participants can join, leave, publish, and unpublish streams concurrently without blocking each other. Compare this to a single sync.Mutex protecting a Go map, where every concurrent operation must acquire the same lock.

Simulcast: 3 Quality Layers

Simulcast is the technique that makes SFU-based video calls practical at scale. Without simulcast, the SFU forwards the single video stream from each participant to all recipients at the same quality. With 200 participants, the server must forward 200 high-quality streams to each recipient — an egress bandwidth explosion. Simulcast solves this by having each participant encode and upload three quality layers simultaneously. The SFU then selects the appropriate layer for each recipient based on their bandwidth and the sender's importance (active speaker vs. non-speaker).

Layer	Resolution	Frame Rate	Bitrate	Use Case
Low	320x180	15fps	150 Kbps	Non-speaking thumbnails
Medium	640x360	25fps	500 Kbps	Visible but not speaking
High	1280x720	30fps	1.5 Mbps	Active speaker, pinned

Each participant's browser encodes all three layers simultaneously using the WebRTC RTCRtpEncodingParameters API. The three RTP streams share the same SSRC base and are distinguished by RTP stream IDs (RID). The participant's upstream bandwidth requirement is the sum of all three layers: approximately 2.15 Mbps. This is slightly more than a single 720p stream, but it gives the SFU the flexibility to select the optimal layer for each recipient independently.

Adaptive Quality Selection

The SFU's core intelligence is in layer selection: deciding which simulcast layer of each sender to forward to each recipient. This decision is made per-sender, per-recipient, and is re-evaluated continuously. The algorithm considers three factors: whether the sender is the active speaker, the recipient's available downstream bandwidth, and whether the sender's tile is visible in the recipient's viewport.

Layer selection rules

• Active speaker: Forward high layer (1280x720@30fps) to all recipients. The person talking deserves the best quality.
• Visible, not speaking, good bandwidth: Forward medium layer (640x360@25fps). Good enough for facial expressions without consuming high bandwidth.
• Visible, not speaking, constrained bandwidth: Forward low layer (320x180@15fps). Preserves presence without overwhelming the connection.
• Not visible (scrolled off-screen): Forward nothing. Zero bandwidth for tiles the recipient cannot see.
• Audio-only mode: Forward audio track only. Video paused to conserve bandwidth.

The "not visible" optimization is significant in large rooms. In a 200-person call, a typical participant sees 9-16 tiles on screen at a time (depending on layout and screen size). The remaining 184-191 participants are off-screen. For those participants, the SFU forwards zero video — only audio (which is tiny at ~30Kbps per stream with Opus). This means the recipient's downstream bandwidth scales with the number of visible tiles, not the number of total participants. A 200-person call consumes roughly the same downstream bandwidth as a 16-person call.

Bandwidth Math: 200 Participants

Let us work through the bandwidth numbers for a 200-participant call with adaptive quality selection. This is the math that determines whether the architecture is practical.

Per-recipient downstream bandwidth (200 participants)

1 active speaker × high (1.5 Mbps) 1.50 Mbps

8 visible non-speakers × medium (500 Kbps) 4.00 Mbps

6 visible non-speakers × low (150 Kbps) 0.90 Mbps

185 off-screen × audio only (30 Kbps) 5.55 Mbps

Total per recipient ~11.95 Mbps

Approximately 12 Mbps downstream per recipient in a 200-person call. This is well within the capacity of a typical broadband connection (25-100 Mbps). The server-side egress for the same room: 12 Mbps × 200 recipients = 2.4 Gbps. This is high but manageable on modern cloud instances with 10-25 Gbps network interfaces. And this is the ceiling — most real-world rooms do not have 200 participants all with video enabled. Webinar-style rooms with 1-3 presenters and 197 view-only participants require dramatically less server egress.

Compare this to naive forwarding without simulcast or visibility optimization: 200 participants × 1.5 Mbps each × 199 recipients = 59.7 Gbps per room. Simulcast and adaptive quality reduce server egress by approximately 25x.

DashMap: Lock-Free Concurrency in Rust

The SFU's performance under high participant counts depends on concurrent access to shared state: participant lists, stream metadata, quality selections, and room events. In Go, the standard approach is a sync.RWMutex protecting a map. This works at small scale but creates contention at large scale. When 200 participants are simultaneously joining, leaving, publishing, and unpublishing, every operation must acquire the mutex. Even a read-write lock creates contention when writes are frequent.

V100 uses DashMap, a concurrent hash map from the Rust ecosystem. DashMap uses sharded locking: the key space is divided into 64 shards, each with its own RwLock. Two operations contend only if they hash to the same shard. With 200 participants distributed across 64 shards, the probability of contention on any single operation is approximately 3/200, or 1.5%. In practice, this means the SFU handles concurrent participant operations with near-zero lock contention.

The DashMap advantage compounds at the Router level. With 1,000 rooms, the Router's room map is also a DashMap. Room creation, destruction, and lookup are all concurrent with per-shard locking. A Go implementation would need either a global mutex (creating a bottleneck) or manual sharding (adding complexity). DashMap provides optimal sharding out of the box.

Room Events: Real-Time Participant Lifecycle

The SFU broadcasts room events to all participants when state changes occur. These events drive the client's UI: adding and removing video tiles, updating speaker indicators, showing mute state, and triggering auto-zoom speaker transitions.

Event	Trigger	Payload
participant:join	New participant connects	participant_id, display_name, capabilities
participant:leave	Participant disconnects	participant_id, reason
track:publish	Participant starts sending video/audio	participant_id, track_id, kind, simulcast_layers
track:unpublish	Participant stops sending	participant_id, track_id
track:mute	Track muted (audio or video)	participant_id, track_id, muted
speaker:change	Active speaker changes	participant_id, audio_level

Events are broadcast via the signaling channel (WebSocket), not through WebRTC data channels. This ensures event delivery is reliable and ordered, even when media streams are experiencing packet loss. The signaling server and SFU share the same process, so event dispatch has zero network latency — the event is sent directly from the SfuRoom to the WebSocket handler.

Auto P2P ↔ SFU Switching

An SFU adds a server hop. For 2-3 participants, this hop is unnecessary overhead — P2P gives the lowest latency and simplest topology. For 4+ participants, the SFU is essential. V100 handles this transition automatically.

When a room has 3 or fewer participants, V100 uses direct P2P WebRTC connections. When the 4th participant joins, V100 seamlessly transitions to SFU mode. Each existing participant's peer connection is renegotiated to route through the SFU instead of directly to peers. The transition happens in under 500ms and is invisible to the user — no reconnection dialog, no audio gap, no video freeze.

When participants leave and the room drops back below 4, V100 transitions back to P2P. The hysteresis threshold prevents oscillation: the SFU-to-P2P transition requires staying at 3 or fewer participants for 5 seconds, preventing flip-flopping when a 4th participant briefly connects and disconnects.

Topology switching

1-3 participants

P2P (direct connections)

Lowest latency, no server hop

4-200 participants

SFU (server-forwarded)

Simulcast + adaptive quality

Code Sample: Room Creation and Management

javascript

// Create a room with SFU configuration
const room = await v100.createRoom({
  name: 'company-all-hands',
  maxParticipants: 200,
  mode: 'auto',             // 'p2p' | 'sfu' | 'auto'
  simulcast: {
    enabled: true,
    layers: [
      { rid: 'q', width: 320,  height: 180, maxBitrate: 150_000,   maxFramerate: 15 },
      { rid: 'h', width: 640,  height: 360, maxBitrate: 500_000,   maxFramerate: 25 },
      { rid: 'f', width: 1280, height: 720, maxBitrate: 1_500_000, maxFramerate: 30 },
    ],
  },
  adaptiveQuality: true,      // SFU selects layer per recipient
  autoSfuThreshold: 4,      // Switch to SFU at 4 participants
});

// Listen for participant events
room.on('participant:join', (p) => {
  console.log(`${p.displayName} joined (${room.participantCount}/200)`);
});

room.on('participant:leave', (p) => {
  console.log(`${p.displayName} left`);
});

room.on('speaker:change', (event) => {
  console.log(`Active speaker: ${event.participantId}`);
});

// Get room stats
const stats = await room.getStats();
console.log(`Mode: ${stats.mode}`);           // 'p2p' or 'sfu'
console.log(`Participants: ${stats.count}`);
console.log(`Egress: ${stats.egressMbps} Mbps`);

V100 SFU vs. LiveKit vs. mediasoup

Feature	V100	LiveKit	mediasoup
Language	Rust	Go	C++ / Node.js
Concurrency model	DashMap (lock-free)	sync.Mutex	Single-threaded event loop
GC pauses	None (no GC)	Yes (Go GC)	V8 GC (control plane)
Max participants/room	200	~100 (docs)	Varies by deployment
Simulcast layers	3 (configurable)	3	3
Adaptive quality	Bandwidth + viewport + speaker	Bandwidth + priority	Manual layer selection
Auto P2P/SFU switch	Yes (seamless)	No (SFU always)	No (SFU always)
Visibility optimization	Yes (zero BW off-screen)	Partial	Manual
Open source	No (proprietary)	Yes (Apache 2.0)	Yes (ISC)
Managed platform	Yes (V100 API)	Yes (LiveKit Cloud)	No (self-hosted only)

LiveKit is the closest comparable SFU. It is well-engineered, open source, and powers many production applications. The architectural difference is the concurrency model. Go's goroutine scheduler is excellent for I/O-bound workloads, but the garbage collector introduces non-deterministic pauses that can cause frame drops during peak load. Rust eliminates GC pauses entirely. Additionally, DashMap's sharded locking outperforms Go's sync.Mutex under high contention — the scenario that occurs when 200 participants are simultaneously active.

mediasoup is the C++ heavyweight of the SFU world. Its media-handling layer (written in C++) is extremely fast. But its control plane runs on Node.js, which introduces V8 GC pauses and single-threaded event loop limitations for room management operations. V100's pure Rust stack has no such split — room management, media forwarding, and participant lifecycle all run in the same process with the same memory model.

The honest tradeoff: LiveKit is open source and mediasoup is open source. V100's SFU is proprietary. If you need source code access or want to host the SFU yourself with full customization, LiveKit or mediasoup are strong choices. If you want a managed SFU with the best performance characteristics and no infrastructure to maintain, V100's API gives you all of this without deploying a single server.

33 Tests Passing

V100's SFU has a comprehensive test suite covering every aspect of room management, participant lifecycle, simulcast layer selection, and edge cases. The test suite runs on every commit and must pass before any deployment.

Test coverage

✓ Room creation and destruction
✓ Participant join/leave (including abrupt disconnection)
✓ Max participant enforcement (201st participant rejected)
✓ Track publish/unpublish lifecycle
✓ Simulcast layer negotiation
✓ Adaptive quality selection (speaker, bandwidth, visibility)
✓ Mute/unmute propagation
✓ Speaker detection accuracy
✓ P2P to SFU transition (forward and reverse)
✓ Hysteresis on SFU-to-P2P fallback
✓ Room event broadcast ordering
✓ Concurrent participant operations (DashMap stress test)
✓ Router room limit enforcement (1,001st room rejected)

33 tests passing — 0 failures, 0 ignored

What This Means for Developers

For developers building on V100, the SFU is invisible. You create a room, participants join, and V100 handles the rest: deciding when to use P2P vs. SFU, selecting simulcast layers, managing bandwidth, and broadcasting events. You do not configure SFU instances. You do not manage TURN servers. You do not tune simulcast parameters (unless you want to). The API abstracts the entire SFU layer behind room configuration options that express intent ("I want up to 200 participants with adaptive quality") rather than implementation ("route RTP packets through a media relay with these codec parameters").

The SFU works seamlessly with V100's other features: AI auto-zoom uses the SFU's speaker detection events. Per-tile zoom works on SFU-forwarded streams. Noise suppression processes the local audio before it reaches the SFU. These features compose because V100 owns the entire stack from client to server.

Build for 200 participants

Create a room, invite participants, and scale to 200 without configuring a single server. V100's SFU handles the infrastructure so you can focus on the product.

Start Free Trial RustTURN Deep-Dive

SFU Architecture: How V100 Scales to 200 Participants per Room in Rust