Lowest Latency Video API: Why Nanosecond Protocol Operati...

UPDATE (March 2026): After gateway optimizations (request coalescing, Cachee-backed tiered cache, QUIC support), V100 now achieves 0.01ms server processing (10µs via Server-Timing header) and 220K+ RPS on Apple Silicon (~1M+ extrapolated on Graviton4 96-core). p50 latency at 50 concurrent: 2.1ms. p99: 13.4ms. 0% error rate. Cache: sub-ns L1 (DashMap) + 31ns L2 (Cachee). The STUN/TURN protocol-level numbers below remain unchanged and verified on Graviton4.

0.01ms

Server Processing

220K+

RPS (Apple Silicon)

68.4ns

STUN Binding Parse

263.1ns

Full Pipeline Tick

3.63M

Ops/Sec Sustained

542/542

Tests Passing

The Latency You See vs. The Latency That Matters

When video API vendors talk about latency, they almost always mean API-level latency — the time between your REST call and the HTTP response. Twilio publishes approximately 50ms API latency in their documentation. Daily claims "sub-100ms" join times in their marketing materials. Agora references "ultra-low latency" with approximately 200ms end-to-end in their published case studies. These are useful numbers, but they measure the wrong thing.

API latency tells you how fast the control plane responds. It does not tell you how fast the data plane processes the thousands of protocol operations that keep your video call alive. Every second of a WebRTC call involves a constant stream of STUN keepalives, ICE connectivity checks, TURN relay operations, DTLS heartbeats, and SRTP packet forwarding. Each of these operations has its own latency, and those latencies compound.

Consider a 10-person video call. Each participant sends and receives multiple media streams. The TURN server processes STUN binding requests for NAT traversal, validates credentials, manages channel bindings, and relays encrypted media packets. At steady state, a single participant generates dozens of protocol operations per second. Multiply by 10 participants and you are looking at hundreds of operations per second per call, and thousands across your infrastructure.

If each of those operations takes 1 millisecond, you burn 1 second of compute per 1,000 operations. If each takes 263 nanoseconds, you burn 263 microseconds. The difference is not academic — it determines how many concurrent calls a single server can handle, how quickly ICE restarts complete after network changes, and how much jitter your relay layer injects into the media path.

V100's Protocol-Level Benchmark Numbers

We benchmark every protocol operation independently on production hardware. These numbers come from our Graviton4 benchmark suite running on a c8g.16xlarge instance with 64 vCPUs. The benchmarks use Criterion for statistical rigor, with warmup phases and confidence intervals.

Operation	Latency	What It Does
STUN Binding Parse	68.4ns	Parse an incoming STUN binding request from raw bytes
XOR Mapped Address (IPv4)	34.5ns	Encode the XOR-mapped address attribute for IPv4
XOR Mapped Address (IPv6)	125.8ns	Encode the XOR-mapped address attribute for IPv6
Full Pipeline Tick	263.1ns	Complete request-to-response cycle for one operation
STUN Integrity (HMAC-SHA1)	664.2ns	Compute and verify MESSAGE-INTEGRITY attribute
TURN Credential Validation	863.0ns	Validate long-term TURN credentials
TURN Channel Binding	526.9ns	Establish a channel binding for efficient relay

The headline number is the full pipeline tick at 263.1 nanoseconds. This is the time from receiving a raw packet to emitting the response. It includes parsing, validation, state lookup, response construction, and serialization. At sustained throughput, V100 processes 3.63 million operations per second on 64 vCPUs, with a pipeline throughput of 3.61 million ops/sec.

What Competitors Publish (and Don't Publish)

We looked at every major video API vendor's public documentation, technical blogs, and case studies to find protocol-level latency numbers. Here is what we found:

Vendor	Published Latency	Metric Type	Source
V100	263.1ns per op	Protocol-level pipeline tick	Graviton4 benchmark (this post)
Twilio	~50ms	API latency	Twilio documentation
Daily	<100ms	Join time	Daily marketing materials
Agora	~200ms	End-to-end latency	Agora case studies
LiveKit	Not published	No per-op latency data	Go-based SFU, no public benchmarks
Zoom	Not published	No protocol-level data	C++ backend, proprietary
Mux	Not published	Focused on VOD/streaming	Not a conferencing platform
coturn	Not published	No public benchmarks	C TURN server, open source

Honesty note: We are comparing different things. Twilio's 50ms is an API response time. Agora's 200ms is end-to-end including network transit. V100's 263.1ns is a server-side protocol operation. These are not directly equivalent measurements. What we are saying is that nobody else publishes the protocol-level numbers, and protocol-level latency is what determines server-side processing overhead, jitter contribution, and scalability ceiling.

Why Protocol Latency Compounds

A single slow protocol operation does not ruin a call. But protocol operations are not single events — they are continuous. STUN binding requests arrive every few seconds from every participant. ICE restart scenarios can trigger dozens of connectivity checks in rapid succession. TURN channel data forwarding happens for every single media packet when relay is required.

Consider the math for a 100-person webinar where 30% of participants require TURN relay (a typical corporate network scenario):

30 relayed participants × ~50 packets/second (audio) = 1,500 relay operations/sec
30 relayed participants × ~30 packets/second (video) = 900 relay operations/sec
100 participants × 1 STUN keepalive every 5 seconds = 20 STUN ops/sec
Total: ~2,420 protocol operations per second for one call

At V100's 263.1ns per operation, that consumes 0.64 milliseconds of compute per second. At 1ms per operation (a generous estimate for a typical TURN server), that consumes 2.42 milliseconds. The 4x difference seems small until you multiply by thousands of concurrent calls. On a server handling 5,000 concurrent calls, the difference is 3.2 seconds vs. 12.1 seconds of compute per second — and that is just the protocol overhead, before you account for media forwarding.

The ICE Restart Problem

ICE restarts are where protocol latency becomes visible to users. When a participant switches from WiFi to cellular, or when a NAT binding expires, ICE restarts. The WebRTC stack sends a burst of STUN connectivity checks to every candidate pair — often 20 to 50 checks in rapid succession. The server must parse, validate, and respond to each one as fast as possible, because the user is staring at a frozen video frame until ICE re-establishes.

At 68.4ns per STUN parse and 263.1ns per full pipeline tick, V100 processes 50 connectivity checks in 13.2 microseconds. The bottleneck is the network round-trip, not the server. The user experience is a brief flicker, not a multi-second freeze.

The Architecture Behind the Numbers

V100's video infrastructure is built as 19 Rust microservices with zero Node.js. This is not a philosophical statement about programming languages. It is a direct consequence of the latency target. Garbage-collected runtimes introduce unpredictable pause times that make sub-microsecond protocol processing impossible to guarantee.

The TURN server specifically is a pure Rust implementation that avoids heap allocation in the hot path. STUN message parsing is zero-copy — the parser operates directly on the incoming byte buffer without allocating intermediate structures. Credential validation uses pre-computed HMAC contexts. Channel binding lookup uses a lock-free concurrent hash map.

The benchmark hardware is an AWS c8g.16xlarge (Graviton4) with 64 vCPUs. Graviton4's ARM Neoverse V2 cores provide consistent, predictable performance without the frequency scaling variability of x86 turbo boost. This matters for latency benchmarks — we care about worst-case as much as average-case.

Graviton4 Benchmark Summary
// Hardware: AWS c8g.16xlarge (Graviton4, 64 vCPUs)// Architecture: 19 Rust microservices, zero Node.js// Benchmark tool: Criterion with warmup + confidence intervalsstun_binding_parse:     68.4nsxor_mapped_ipv4:        34.5nsxor_mapped_ipv6:        125.8nsfull_pipeline_tick:     263.1ns// 0.3 microsecondsstun_integrity_hmac:    664.2nsturn_credential:        863.0nsturn_channel_binding:   526.9nssustained_throughput:   3.63Mops/secpipeline_throughput:    3.61Mops/secper_op_latency:         0.3µspq_crypto_tests:        17/17passtotal_tests:            542/542pass

What We Don't Know (Yet)

We are rigorous about distinguishing what we have measured from what we have not. Here are the metrics we do not yet have published data for:

Global edge latency: We have not yet published multi-region benchmarks showing latency from every continent to V100's nearest edge node. This matters for users in Southeast Asia, Africa, and South America.
Mobile SDK latency: Our Graviton4 benchmarks are server-side. We have not yet published end-to-end latency measurements from mobile SDKs including encoding, network transit, and decoding.
Large-scale concurrent call benchmarks: 3.63M ops/sec is our single-instance throughput. We have not yet published numbers for horizontally scaled clusters handling tens of thousands of concurrent calls.

We will publish these numbers as we collect them. In the meantime, the protocol-level benchmarks represent what a single V100 instance can do, and they are independently reproducible on the same hardware.

Benchmark Methodology

Reproducibility matters more than impressive numbers. Here is exactly how we generated the benchmarks cited in this post:

Hardware: AWS c8g.16xlarge instance (Graviton4, 64 vCPUs, ARM Neoverse V2)
OS: Amazon Linux 2023, default kernel settings, no custom tuning
Benchmark framework: Criterion.rs with 5-second warmup, 100 iterations minimum, 95% confidence intervals
Test conditions: Dedicated instance, no other workloads, CPU governor set to performance mode
Methodology: Each operation benchmarked independently. Pipeline throughput measured with realistic mixed workloads. Sustained throughput measured over full benchmark duration.
Test suite: 542 total tests passing, including 17 post-quantum cryptography tests

Reproduce our claims. The V100 TURN server is a standalone Rust crate. The benchmark suite runs with cargo bench on any ARM64 instance. Numbers will vary with hardware — the Graviton4 results represent our production deployment target.

Why This Matters for Your Product

If you are building a product that depends on real-time video, the latency of your video infrastructure directly impacts user experience in ways that are difficult to diagnose after the fact. Users do not report "my ICE restart took 800ms." They report "the video froze for a second when I switched networks." Users do not report "the TURN relay added 3ms of jitter." They report "the audio was choppy."

Protocol-level latency is the foundation that everything else builds on. You can optimize your encoding, tune your adaptive bitrate algorithm, and build the best UI in the world — but if the underlying protocol operations are slow, you are fighting against your infrastructure instead of building on top of it.

V100 gives you a video API where the protocol layer is not the bottleneck. 263 nanoseconds per operation means the server-side overhead is effectively zero from the user's perspective. The remaining latency in your video calls is network physics and client-side processing — the parts that are supposed to dominate.

Build on the Fastest Video Infrastructure

0.01ms server processing. 220K+ RPS. 3.63 million protocol ops/sec at 263ns per op. Start building with the lowest-latency video API available.

Get Started with V100