Rust vs C++ for Video Server Performance: Why V100 Chose Rust

Rust Microservices

Memory Safety CVEs

68.4ns

STUN Parse (Rust)

542/542

Tests Passing

The Decision We Made

In 2024, when we started building V100's real-time video infrastructure, we had three realistic language choices for the data plane: C, C++, or Rust. Go and Java were eliminated immediately — garbage collection is incompatible with sub-microsecond latency targets. Node.js was never considered for the data plane (though many video platforms use it for signaling).

The existing landscape looked like this:

mediasoup — C++ media worker + Node.js signaling. The most popular open-source SFU for production deployments. Proven, fast, but split across two languages and two runtime models.
Janus — Pure C. Extremely mature, plugin-based architecture. Fast, but C's lack of memory safety means every buffer operation is a potential vulnerability.
coturn — Pure C. The standard TURN server. Battle-tested but with a long history of CVEs related to buffer handling.
LiveKit — Go. Excellent developer experience but with GC pauses that prevent guaranteed sub-microsecond latency.

We chose Rust because it was the only language that offered C/C++ performance with compile-time memory safety guarantees. Two years later, the benchmarks validate the decision.

The Performance Argument: Rust Matches C/C++

The common claim is that Rust performs "within 5% of C/C++" on typical workloads. For V100's specific workload — network packet parsing, cryptographic operations, and concurrent state management — we believe Rust performs at parity or better. Here is why.

Zero-Cost Abstractions Are Real

Rust's type system, traits, generics, and iterators compile down to the same machine code that equivalent C would produce. There is no runtime vtable dispatch (unless you explicitly opt in with dyn Trait), no hidden allocations, and no implicit copies. When we write a STUN parser that borrows a reference to the incoming byte buffer, the compiled code is a pointer and a length — identical to what a C programmer would write by hand.

This is not theoretical. V100 parses a STUN binding request in 68.4 nanoseconds. That is the time to read a 20-byte header, validate the magic cookie, and parse variable-length attributes from a raw byte buffer. The Rust compiler emits tight, branchless inner loops that are equivalent to hand-tuned C. We verified this by examining the generated assembly.

The Lock-Free Advantage

Where Rust actually beats typical C/C++ codebases is in concurrent data structure use. Rust's ownership system guarantees at compile time that no data races exist. This means we can use lock-free concurrent hash maps (DashMap), wait-free queues, and atomic reference counting without the fear that a subtle race condition will corrupt state under load.

In C++ codebases, engineers tend to use mutexes defensively — locking more than necessary because the cost of a data race (undefined behavior, silent corruption) is catastrophic. In Rust, the compiler prevents data races, so we lock only when absolutely necessary. The result: V100's session state lookup takes approximately 40 nanoseconds without acquiring any lock.

A mutex-guarded std::unordered_map in C++ would add 20-50ns for lock acquisition and release to every lookup. Over millions of operations per second, that adds up to significant overhead.

Concrete example: V100's TURN server manages session state for potentially thousands of concurrent allocations. Each incoming packet requires a state lookup. At 3.63 million ops/sec, even a 25ns overhead per lookup from mutex contention would cost 90 milliseconds of compute per second — enough to reduce throughput by measurable percentage points.

The Safety Argument: Not Just for Academic Papers

Memory safety in a TURN/STUN server is not abstract. These servers process untrusted network input — packets from the public internet that may be malformed, oversized, or crafted to exploit buffer handling bugs. Every STUN attribute parser, every credential validator, and every relay forwarder is an attack surface.

C/C++ TURN Server CVEs

coturn, the most widely deployed C TURN server, has had multiple CVEs related to buffer overflows and memory handling. This is not a criticism of coturn's developers — it is a consequence of writing network-facing code in a language that does not prevent buffer overruns. Every C network server has this vulnerability surface. The question is when the bug will be found, not whether it exists.

mediasoup's C++ worker has fewer publicly reported memory safety issues, partly because C++ provides better abstractions (std::vector, std::string) that reduce raw buffer manipulation. But C++ still allows use-after-free, dangling references, and data races — the compiler does not prevent them.

Rust's Compile-Time Guarantee

In Rust, the following categories of bugs are impossible in safe code:

Buffer overflows: Bounds checking is enforced. Accessing beyond the end of a slice is a panic, not undefined behavior.
Use-after-free: The borrow checker ensures references do not outlive the data they point to.
Data races: Mutable references are exclusive. Sharing mutable data across threads requires explicit synchronization primitives.
Null pointer dereference: Rust has no null. Optional values use Option<T>, which must be explicitly handled.
Double free: Ownership ensures exactly one owner frees each allocation.

V100 has zero memory safety CVEs since its first production deployment. This is not because we are better programmers than the coturn or mediasoup teams. It is because the language prevents the class of bugs that produces CVEs in C and C++ network servers.

V100 Benchmarks vs C/C++ Expectations

Neither mediasoup nor Janus publishes per-operation protocol latency benchmarks, so we cannot make a direct comparison. What we can do is present V100's numbers and analyze whether they are consistent with what C/C++ should achieve.

Operation	V100 (Rust)	mediasoup (C++)	Janus (C)	coturn (C)
STUN Binding Parse	68.4ns	Not published	Not published	Not published
XOR Address (IPv4)	34.5ns	Not published	Not published	Not published
XOR Address (IPv6)	125.8ns	Not published	Not published	Not published
Pipeline Tick	263.1ns	Not published	Not published	Not published
HMAC-SHA1	664.2ns	Not published	Not published	Not published
TURN Credential	863.0ns	Not published	Not published	Not published
Channel Binding	526.9ns	Not published	Not published	Not published
Sustained Throughput	3.63M ops/sec	Not published	Not published	Not published
Memory Safety	Compile-time	Manual (C++)	Manual (C)	Manual (C)
Data Race Prevention	Compile-time	Runtime (sanitizers)	Runtime (sanitizers)	Runtime (sanitizers)

The HMAC-SHA1 at 664.2ns and STUN parse at 68.4ns are consistent with or faster than what we would expect from well-optimized C. The HMAC timing is dominated by the SHA-1 computation, which is platform-dependent and benefits from hardware acceleration on ARM. The STUN parse timing reflects zero-copy parsing with no heap allocation — exactly what good C code would do, but with bounds checking that C does not have.

The Concurrency Model

V100 runs 64 worker threads on Graviton4, each processing protocol operations independently. Shared state (session tables, allocation maps, channel bindings) is accessed through lock-free concurrent data structures. Here is how Rust's concurrency model compares to C++:

Property	Rust	C++
Data race detection	Compile-time error	Runtime (TSan, optional)
Send/Sync traits	Compiler-enforced thread safety	No equivalent
Lock-free structure confidence	Borrow checker validates	Programmer must verify
Async runtime	tokio (work-stealing)	Various (Boost.Asio, libuv)
Typical locking pattern	Minimal (lock-free where possible)	Defensive (mutexes for safety)

The practical impact: V100 uses DashMap (a concurrent hash map using sharded RwLocks) for session state. Reads are lock-free within each shard. In C++ codebases, the equivalent pattern typically uses std::shared_mutex around a std::unordered_map, which serializes all writes and can contend with concurrent reads at high throughput.

The Honest Trade-offs

Choosing Rust was not free. Here are the real costs we paid:

1. Compile Times

Rust's compile times are significantly longer than C or C++ for equivalent codebases. Our full build takes minutes, not seconds. Incremental builds are fast, but a clean build is noticeably slow. This impacts development velocity, especially when making changes across multiple crates.

2. Hiring

There are fewer experienced Rust systems programmers than experienced C or C++ systems programmers. The Rust community is growing rapidly, but the talent pool for "Rust + real-time media + systems networking" is small. We invest heavily in training engineers who come from C++ backgrounds.

3. Ecosystem Maturity

The C/C++ ecosystem for media processing is decades old. Libraries like libavcodec, libsrtp, and OpenSSL are battle-hardened. Rust equivalents exist but are newer. We use Rust bindings to some C libraries where the Rust-native alternative is not yet mature enough for production.

4. Learning Curve

The borrow checker is genuinely difficult for new Rust programmers. Engineers who have spent years writing C++ find the restrictions frustrating initially. The productivity loss during the learning period is real — typically 2-3 months before an experienced C++ programmer is productive in Rust.

We are biased. We chose Rust and we are happy with the choice. That does not make C++ or C wrong for every video server. mediasoup's C++ worker is excellent engineering. coturn serves millions of users reliably. The right language depends on your team, your performance requirements, and your security posture. For V100's specific requirements — sub-microsecond latency, memory safety on untrusted input, fearless concurrency across 64 cores — Rust was the right choice.

What About Go?

LiveKit proves that Go can build a successful, widely-deployed WebRTC server. Go's goroutine model is excellent for I/O-heavy workloads and its developer experience is superb. However, Go has a garbage collector, and for V100's target of sub-microsecond operations, GC pauses are a fundamental constraint.

Go's GC pauses are typically 100 microseconds to a few milliseconds. At V100's throughput of 3.63 million ops/sec, a 100-microsecond GC pause stalls approximately 363 operations. For workloads where occasional microsecond-scale pauses are acceptable (most video applications), Go is an excellent choice. For workloads where consistent sub-microsecond latency is the goal, it is not.

The Benchmark Proves the Decision

V100 Graviton4 Benchmark Summary (Rust, 20 microservices)
stun_binding_parse:     68.4ns// zero-copy, no heap allocxor_mapped_ipv4:        34.5ns// 32-bit XORxor_mapped_ipv6:        125.8ns// 128-bit XOR + txn IDfull_pipeline_tick:     263.1ns// parse to serializeturn_channel_binding:   526.9ns// state alloc includedstun_integrity_hmac:    664.2ns// HMAC-SHA1turn_credential:        863.0ns// full validationsustained_throughput:   3.63Mops/sec// 64 vCPUs, Graviton4pipeline_throughput:    3.61Mops/sec// <1% scheduling overheadmemory_safety_cves:     0// compile-time guaranteedtotal_tests:            542/542// including 17 PQ crypto

These numbers represent the ceiling of what is possible when you combine a language with no GC overhead, compile-time memory safety, zero-cost abstractions, and fearless concurrency. C and C++ can match the raw throughput, but they cannot match the safety properties without runtime overhead (sanitizers, valgrind, fuzzing). Rust gives you both.

Conclusion

V100 chose Rust for its video infrastructure because Rust is the only language that delivers C/C++ performance with compile-time memory safety. The benchmarks prove this is not a theoretical claim: 68.4ns STUN parse, 263.1ns pipeline tick, 3.63 million ops/sec sustained on 64 Graviton4 vCPUs, zero memory safety CVEs, 542 tests passing.

The trade-offs are real — longer compile times, smaller talent pool, younger ecosystem. But for a network-facing server that processes millions of untrusted packets per second at nanosecond latency, the combination of performance and safety is decisive. We process untrusted input from the public internet at 3.63 million operations per second, and the compiler guarantees we cannot have a buffer overflow. No other language provides both properties simultaneously.

If you are evaluating language choices for a new real-time media server, we hope this post provides useful data points. For more on V100's benchmarks, read Inside V100's 263ns Pipeline Tick or Fastest WebRTC Server 2026. For V100's post-quantum cryptography implementation, see How We Built Post-Quantum Encrypted Video Conferencing.

Built in Rust. Benchmarked on Graviton4. Ready for Production.

68.4ns STUN parse. 263ns pipeline tick. 3.63M ops/sec. Zero memory safety CVEs. Start building on V100.

Get Started with V100