Fastest WebRTC Server in 2026: V100 vs coturn vs LiveKit ...

0.01ms

Server Processing

220K+

RPS (Apple Silicon)

68.4ns

V100 STUN Parse

3.63M

V100 Ops/Sec

The Competitors

The WebRTC server ecosystem in 2026 is dominated by five open-source projects and a handful of proprietary implementations. Each makes different trade-offs in language, architecture, and feature scope. Before we compare numbers, here is what each project is and what it is designed to do.

coturn (C)

The most widely deployed TURN server in the world. coturn is a mature C implementation of RFC 5766 (TURN) and RFC 5389 (STUN). It powers TURN relay for countless WebRTC deployments, from small startups to large enterprises. coturn is battle-tested and feature-complete, but it was written in an era when C was the default choice for network servers. It does not publish per-operation latency benchmarks.

LiveKit (Go)

A modern WebRTC SFU (Selective Forwarding Unit) written in Go. LiveKit has gained significant traction since its open-source launch, offering a complete real-time communication stack including video, audio, and data channels. It uses Go's goroutine-based concurrency model. LiveKit does not publish per-operation protocol latency numbers, but is well-regarded for its developer experience and feature set.

mediasoup (C++)

A powerful WebRTC SFU with its core written in C++ and a Node.js signaling layer. mediasoup is designed for high-performance media routing and is used by several commercial video platforms. The C++ worker handles media transport while Node.js handles signaling and control. mediasoup does not publish per-operation TURN/STUN benchmarks.

Janus (C)

A general-purpose WebRTC server written in C, created by Meetecho. Janus provides a plugin architecture that supports SFU, MCU, streaming, and recording use cases. It is one of the most versatile WebRTC servers available. Like coturn, it is written in C and does not publish per-operation latency benchmarks.

V100 (Rust)

V100's TURN/STUN implementation is a pure Rust crate with zero-copy parsing, lock-free concurrency, and no garbage collector. It is purpose-built for sub-microsecond protocol operations. The architecture is 19 Rust microservices with zero Node.js. V100 publishes per-operation benchmarks on production hardware (AWS Graviton4, c8g.16xlarge, 64 vCPUs). After gateway optimizations (request coalescing, Cachee-backed tiered cache, QUIC support), V100 achieves 0.01ms server processing, 220,661 RPS on Apple Silicon (~1M+ extrapolated on Graviton4 96-core), and p99 tail latency of 13.4ms at 50 concurrent.

The Comparison Table

We want this table to be useful and honest. Where a project has published benchmark data, we cite it. Where no published data exists, we say so explicitly. We do not estimate or extrapolate numbers for other projects.

Metric	V100 (Rust)	coturn (C)	LiveKit (Go)	mediasoup (C++)	Janus (C)
Language	Rust	C	Go	C++ / Node.js	C
STUN Parse Latency	68.4ns	Not published	Not published	Not published	Not published
Full Pipeline Tick	263.1ns	Not published	Not published	Not published	Not published
HMAC-SHA1 (integrity)	664.2ns	Not published	Not published	Not published	Not published
TURN Credential	863.0ns	Not published	Not published	Not published	Not published
Channel Binding	526.9ns	Not published	Not published	Not published	Not published
Sustained Throughput	3.63M ops/sec	Not published	Not published	Not published	Not published
Server Processing	0.01ms (10µs)	Not published	Not published	Not published	Not published
Gateway RPS	220,661	Not published	Not published	Not published	Not published
p99 Tail Latency	13.4ms @ 50 conc.	Not published	Not published	Not published	Not published
Cache Latency	sub-ns L1 / 31ns L2	Not published	Not published	Not published	Not published
GC Pauses	None (no GC)	None (no GC)	Yes (Go GC)	None (C++ core)	None (no GC)
Memory Safety	Compile-time	Manual	Runtime (GC)	Manual (C++)	Manual
PQ Crypto Support	17/17 tests pass	Not available	Not available	Not available	Not available
Test Coverage	542/542 pass	Not published	Not published	Not published	Not published

The elephant in the table: V100 is the only project in this comparison that publishes per-operation protocol latency benchmarks. This does not mean V100 is necessarily faster than coturn or Janus in all scenarios. It means we cannot compare directly because the other projects have not published equivalent measurements. We encourage every project listed here to publish Criterion-style benchmarks so the community can make informed decisions.

What Makes a WebRTC Server Fast?

Performance in a WebRTC server breaks down into several distinct categories, each with different optimization characteristics:

1. Protocol Parsing Speed

Every STUN/TURN message must be parsed from raw bytes into a structured representation. This involves reading the 20-byte header, validating the magic cookie, and parsing variable-length attributes. V100 does this in 68.4 nanoseconds using zero-copy parsing — the parser returns references into the original byte buffer without allocating memory. C implementations (coturn, Janus) would be expected to be fast here as well, since C has no runtime overhead, but neither publishes this benchmark.

2. Credential Validation Speed

TURN requires long-term credential validation, which involves HMAC-SHA1 computation. This is a cryptographic operation and is inherently more expensive than byte parsing. V100 validates TURN credentials in 863.0 nanoseconds, with the HMAC-SHA1 itself taking 664.2 nanoseconds. The speed here depends primarily on the HMAC implementation and whether hardware-accelerated SHA instructions are used.

3. Relay Throughput

The number of media packets a TURN server can relay per second is the ultimate scaling metric. V100 sustains 3.63 million operations per second on 64 vCPUs. This includes a mix of STUN bindings, credential validations, channel bindings, and relay forwarding. No other TURN/STUN server has published an equivalent throughput number.

4. Tail Latency

Average latency matters, but tail latency (p99, p999) matters more for real-time media. A server that averages 100ns but spikes to 10ms during GC pauses delivers a worse user experience than a server that consistently runs at 300ns. Rust and C have an inherent advantage here — no garbage collector means no unpredictable pauses. Go's GC has improved dramatically, but it still introduces pauses that are visible at the microsecond scale.

5. Concurrency Model

How a server handles thousands of simultaneous connections directly impacts throughput. V100 uses lock-free concurrent data structures (DashMap) across 64 worker threads. coturn uses a thread pool with mutex-guarded state. LiveKit uses Go goroutines with channel-based communication. mediasoup uses a multi-process model with one C++ worker per CPU core. Each approach has trade-offs between simplicity and raw performance.

Language-Level Analysis

Rust (V100) vs C (coturn, Janus)

In theory, C should match or beat Rust on raw throughput — both compile to native code with no runtime overhead. In practice, Rust's ownership system enables optimizations that are unsafe or impractical in C. Lock-free data structures in Rust are verified at compile time. In C, lock-free code is correct only if the programmer is perfect. The result is that V100 can use aggressive concurrency patterns (lock-free hash maps, wait-free queues) with confidence, while C codebases tend to use conservative locking for safety.

The other advantage Rust has over C is memory safety without runtime cost. Buffer overflows, use-after-free, and data races are compile-time errors in Rust. In C, they are CVEs. For a TURN server that processes untrusted network input, this is not a theoretical benefit — it is an operational security property.

Rust (V100) vs Go (LiveKit)

Go is an excellent language for building network services quickly. Its goroutine model makes concurrent programming accessible. However, Go has a garbage collector, and the garbage collector has latency implications. Go's GC pauses are typically 100 microseconds to a few milliseconds. For a server targeting sub-microsecond operations, even a 100-microsecond pause is 380 pipeline ticks worth of stalled work.

Go also has higher per-operation overhead from its runtime: goroutine scheduling, interface dispatch, and escape analysis misses that cause unexpected heap allocations. These add up when the target latency is measured in nanoseconds.

Rust (V100) vs C++ (mediasoup)

C++ and Rust are close competitors on raw performance. Both compile to native code, both have zero-cost abstractions, and neither has a garbage collector. The difference is safety: Rust prevents data races and memory errors at compile time, while C++ relies on programmer discipline and sanitizer tools. mediasoup's C++ worker is highly optimized, but it also has a Node.js signaling layer, which introduces the performance characteristics of the V8 runtime for control plane operations.

What V100 Does Not Benchmark (Yet)

Transparency requires acknowledging gaps. Here are metrics we have not yet published for V100, and where competitors may have advantages:

SFU media routing latency: The benchmarks in this post cover STUN/TURN protocol operations. SFU-specific metrics (simulcast switching, SVC layer selection, packet duplication for multi-party) are not yet published. LiveKit and mediasoup are mature SFUs with years of production optimization in this area.
Ecosystem and SDK breadth: LiveKit has client SDKs for every major platform. mediasoup has a large community. coturn works with every WebRTC implementation. V100's SDK ecosystem is newer.
Community battle-testing: coturn has been deployed in production by thousands of organizations for over a decade. Janus has extensive plugin ecosystem support. V100 is newer and has less production mileage.
Global edge network: Large-scale commercial platforms (Twilio, Agora, Daily) operate global edge networks with PoPs in dozens of regions. V100's edge footprint is still growing.

Performance is not everything. coturn's maturity, LiveKit's developer experience, mediasoup's flexibility, and Janus's plugin architecture are all valid reasons to choose those projects. V100's sub-microsecond performance matters when protocol-level latency is your bottleneck — which is the case for high-scale, latency-sensitive deployments.

The Post-Quantum Advantage

One area where V100 is unambiguously ahead is post-quantum cryptography. V100's TURN server includes hybrid X25519+ML-KEM-768 key exchange and ML-DSA-65 artifact signing, with 17 out of 17 post-quantum crypto tests passing. No other WebRTC server in this comparison offers post-quantum protection.

As quantum computing advances, the key exchanges that protect WebRTC sessions (currently ECDH) will become vulnerable to Shor's algorithm. V100 is the only WebRTC server that protects against harvest-now-decrypt-later attacks today. Read our full post-quantum architecture post for details.

Benchmark Reproducibility

All V100 benchmarks cited in this post were run on:

Hardware: AWS c8g.16xlarge (Graviton4, 64 vCPUs, ARM Neoverse V2)
Framework: Criterion.rs with warmup, outlier detection, and confidence intervals
Conditions: Dedicated instance, no co-located workloads, performance CPU governor
Total tests: 542/542 passing, including 17 post-quantum crypto tests

We invite every project listed in this comparison to publish equivalent benchmarks on equivalent hardware. A fair comparison requires standardized methodology and transparent reporting. We will update this post with any published numbers from competing projects.

Conclusion

V100 is the fastest WebRTC server with published per-operation benchmarks in 2026. 0.01ms server processing. 220,661 RPS on Apple Silicon (~1M+ extrapolated on Graviton4). 68.4ns STUN binding parse. 263.1ns full pipeline tick. 3.63 million ops/sec sustained on 64 vCPUs. 863.0ns TURN credential validation. p99 tail latency: 13.4ms at 50 concurrent. Sub-nanosecond L1 cache (DashMap) + 31ns L2 (Cachee). These numbers are measured, not estimated, and they run on production hardware.

The honest caveat: we are the only project in this comparison that publishes these numbers. coturn, LiveKit, mediasoup, and Janus may be faster than their lack of published benchmarks suggests. We hope this post encourages the WebRTC community to adopt standardized performance reporting so developers can make informed infrastructure decisions.

For a deeper dive into V100's pipeline architecture, read Inside V100's 263ns Pipeline Tick. For the full latency comparison including commercial APIs, see Video API Latency Benchmark Comparison.

Deploy the Fastest WebRTC Server

0.01ms server processing. 220K+ RPS. 68.4ns STUN parse. 263ns pipeline tick. 3.63M ops/sec. Post-quantum ready. Start building on V100.

Get Started with V100