Skip to main content
IMMERSIVE AUDIO FOA Ambisonics Rust-Native

Spatial Audio for the Spatial Computing Era.

First-Order Ambisonics encoding, binaural HRTF rendering, and real-time head tracking for Apple Vision Pro, AirPods Pro, and any stereo headphones. Sub-millisecond pipeline latency.

4
B-Format Channels
<1ms
Render Latency
6DOF
Head Tracking
48kHz
Sample Rate
Ambisonics Encoding

Full Soundfield in 4 Channels

First-Order Ambisonics (FOA) captures the complete 3D soundfield using four B-format channels derived from spherical harmonics. V100 encodes, rotates, and decodes in real time.

W

W — Omnidirectional

Pressure signal. Captures sound equally from all directions. The "mono" reference channel.

Y00 = 1
ACN index: 0
X

X — Front–Back

Figure-8 dipole aligned to the front axis. Positive lobe faces forward.

Y11 = cos(θ)
ACN index: 3
Y

Y — Left–Right

Figure-8 dipole on the lateral axis. Positive lobe points left.

Y1-1 = sin(θ)cos(φ)
ACN index: 1
Z

Z — Up–Down

Figure-8 dipole on the vertical axis. Captures height information.

Y10 = sin(φ)
ACN index: 2

SN3D Normalization

Schmidt semi-normalization ensures all spherical harmonic components have comparable energy levels. Unlike N3D (full normalization) or maxN (Furse-Malham), SN3D prevents clipping while maintaining numerical stability across all channel orders.

// SN3D normalization factors
W: 1.000   (order 0)
Y: 1.000   (order 1, index 1)
Z: 1.000   (order 1, index 2)
X: 1.000   (order 1, index 3)

ACN Channel Ordering

Ambisonics Channel Number (ACN) is the standard ordering used by MPEG-H, YouTube 360, and Google spatial audio. Each channel is assigned a unique integer index based on its spherical harmonic degree and order.

// ACN = n² + n + m
ACN 0 W → omni (n=0, m=0)
ACN 1 Y → left-right (n=1, m=-1)
ACN 2 Z → up-down (n=1, m=0)
ACN 3 X → front-back (n=1, m=1)
3D Audio Positioner

Place Audio Objects in 3D Space

Position mono or stereo sources anywhere around the listener using azimuth, elevation, and distance. V100 encodes each object into the Ambisonics soundfield in real time.

LISTENER
0° Front
180° Rear
-90° L
+90° R
Commentator
az:0° el:0° d:2m
Crowd L
az:-90° el:5° d:15m
Crowd R
az:+90° el:5° d:15m
Field Mic
az:0° el:30° d:8m
Effects
az:-135° el:0° d:5m
Commentary
Crowd / Ambience
Field Microphones
Effects / Music

Azimuth

-180° to +180°

Horizontal angle around the listener. 0° is front, ±180° is directly behind.

Elevation

-90° to +90°

Vertical angle. +90° is directly overhead, -90° is directly below.

Distance

0.1m–100m

Distance from listener with inverse-square attenuation and air absorption modeling.

Head Tracking

Real-Time Soundfield Rotation

Quaternion-based head tracking from Vision Pro, AirPods Pro, or Meta Quest feeds directly into V100's rotation matrix. The soundfield follows the listener's head in real time.

Vision Pro
IMU Sensors
1000 Hz sample
WebSocket
Quaternion Stream
[w, x, y, z]
V100 Engine
Soundfield Rotation
R(q) * B(t)
Binaural Out
Left + Right Ears
Stereo PCM

Yaw

Rotation around the vertical axis. Turning your head left or right. Maps to azimuth shift in the soundfield.

axis: Y (up)
range: ±180°
effect: horizontal pan

Pitch

Rotation around the lateral axis. Tilting your head up or down. Maps to elevation shift in the soundfield.

axis: X (right)
range: ±90°
effect: vertical pan

Roll

Rotation around the front axis. Tilting your head ear-to-shoulder. Adjusts left-right balance and height cues.

axis: Z (front)
range: ±180°
effect: tilt compensation
head_tracking.json
// WebSocket message from Vision Pro
{
  "timestamp": 1711547823.456,
  "quaternion": {
    "w": 0.9239,   // scalar component
    "x": 0.0000,   // pitch axis
    "y": 0.3827,   // yaw axis (45deg turn)
    "z": 0.0000    // roll axis
  },
  "device": "apple_vision_pro",
  "session_id": "sa_sess_7f3a..."
}
Binaural Rendering

HRTF Convolution for True 3D Perception

Head-Related Transfer Functions model how sound diffracts around your head, pinnae, and torso. V100 convolves the Ambisonics soundfield with HRTF filters to produce binaural stereo that works with any headphones.

Left Ear Signal
HRTFL(θ, φ) * B(t)
ITD: 0–700µs
ILD: 0–20dB
HRTF Filter
Pinna diffraction + ITD + ILD
Right Ear Signal
HRTFR(θ, φ) * B(t)
Spectral: pinna cues
Temporal: early reflect
Step 1
Decode FOA
Extract W, X, Y, Z from Ambisonics stream
Step 2
Apply Rotation
Rotate soundfield by head-tracking quaternion
Step 3
HRTF Convolve
Frequency-domain multiply with HRTF filters
Step 4
Stereo Output
Left + Right PCM at 48kHz / 24-bit
Works with any stereo headphones — no special hardware required
Compatible Devices

From Vision Pro to Any Headphones

Full spatial audio with head tracking on supported devices. Binaural rendering on everything else. No listener left behind.

Apple Vision Pro

6DOF Head Tracking
Spatial Audio + HRTF + Tracking

AirPods Pro

3DOF Head Tracking
Dynamic Head Tracking + Spatial

Meta Quest 3

6DOF Head Tracking
Inside-Out Tracking + Spatial

Sony WH-1000XM5

3DOF Head Tracking
360 Reality Audio Compatible

Any Stereo Headphones

Binaural Rendering
Fixed HRTF, No Tracking Required
Feature Vision Pro AirPods Pro Meta Quest Stereo
Binaural HRTF
Head Tracking 6DOF 3DOF 6DOF None
Personalized HRTF
Latency <1ms <5ms <2ms <1ms
Immersive Metadata

MPEG-H & Dolby Atmos Ready

V100 generates standards-compliant metadata for object-based audio delivery. Export to MPEG-H 3D Audio, Dolby Atmos ADM, or our native spatial format.

MH

MPEG-H 3D Audio

ISO/IEC 23008-3

Object-based audio with scene metadata. Supports up to 128 audio objects with position, gain, and spread parameters. Used in broadcast (ATSC 3.0, DVB).

  • Scene description metadata
  • Object position + interactivity
  • HOA + channel bed support
DA

Dolby Atmos

ADM BWF / DAMF

Export Audio Definition Model metadata for Dolby Atmos workflows. Object positions, bed assignments, and binaural render metadata in standard ADM XML.

  • ADM BWF export
  • 7.1.4 bed + objects
  • Renderer-agnostic positioning
V1

V100 Spatial

Native JSON Format

Our native format optimized for low-latency streaming. Compact binary metadata interleaved with audio frames for sub-millisecond scene updates.

  • Binary + JSON hybrid
  • Frame-level position updates
  • WebSocket real-time streaming
spatial_scene.json
{
  "format": "v100_spatial_v1",
  "sample_rate": 48000,
  "ambisonics_order": 1,
  "normalization": "SN3D",
  "channel_order": "ACN",
  "objects": [
    {
      "id": "commentator",
      "azimuth": 0.0,
      "elevation": 0.0,
      "distance": 2.0,
      "gain": 1.0
    },
    {
      "id": "crowd_left",
      "azimuth": -90.0,
      "elevation": 5.0,
      "distance": 15.0,
      "gain": 0.8
    }
  ],
  "export": ["mpeg_h", "dolby_atmos_adm"]
}
Spatial Audio API

7 Endpoints. Full Control.

Create spatial scenes, position objects, stream head tracking, and render binaural output. All through a single REST + WebSocket API.

Method Endpoint Description
POST /v1/spatial/scenes Create a new spatial audio scene with Ambisonics config
POST /v1/spatial/scenes/{id}/objects Add an audio object with position (azimuth, elevation, distance)
PATCH /v1/spatial/objects/{id}/position Update object position in real time (azimuth, elevation, distance, gain)
WS /v1/spatial/scenes/{id}/tracking WebSocket for head-tracking quaternion stream (device → V100)
POST /v1/spatial/scenes/{id}/render Render binaural output (returns stereo PCM or streams via WebSocket)
POST /v1/spatial/scenes/{id}/export Export metadata (MPEG-H, Dolby Atmos ADM, or V100 native)
GET /v1/spatial/scenes/{id} Retrieve scene state, all objects, and current head-tracking status
create_scene.sh
# Create a spatial audio scene
curl -X POST https://api.v100.ai/v1/spatial/scenes \
  -H "Authorization: Bearer $V100_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Live Sports Mix",
    "ambisonics_order": 1,
    "normalization": "SN3D",
    "sample_rate": 48000,
    "head_tracking": true
  }'

# Response
{
  "scene_id": "sa_scene_7f3a...",
  "ws_url": "wss://api.v100.ai/v1/spatial/scenes/sa_scene_7f3a.../tracking",
  "status": "active"
}
Available Now

Build Immersive Audio Experiences

From live sports to virtual events, V100 spatial audio turns flat stereo into a fully immersive 3D soundfield. Start with our free tier.