What is First-Order Ambisonics (FOA)?

FOA encodes a full 3D soundfield into 4 B-format channels (W, X, Y, Z) using spherical harmonics. V100 uses SN3D normalization and ACN channel ordering, the industry standard for spatial audio interchange.

Does V100 spatial audio work with Apple Vision Pro?

Yes. V100 receives head-tracking quaternion data from Vision Pro over WebSocket, rotates the Ambisonics soundfield in real time, and renders binaural output for the device's spatial audio system.

What headphones are compatible with V100 spatial audio?

V100 binaural output works with any stereo headphones. Head tracking requires a device with IMU sensors (Apple Vision Pro, AirPods Pro, Meta Quest). Without head tracking, the soundfield is rendered at a fixed orientation.

Spatial Audio for Spatial Computing — FOA Ambisonics, Binaural HRTF, Head Tracking

Ambisonics Encoding

Full Soundfield in 4 Channels

First-Order Ambisonics (FOA) captures the complete 3D soundfield using four B-format channels derived from spherical harmonics. V100 encodes, rotates, and decodes in real time.

W

W — Omnidirectional

Pressure signal. Captures sound equally from all directions. The "mono" reference channel.

Y₀⁰ = 1

ACN index: 0

X

X — Front–Back

Figure-8 dipole aligned to the front axis. Positive lobe faces forward.

Y₁¹ = cos(θ)

ACN index: 3

Y

Y — Left–Right

Figure-8 dipole on the lateral axis. Positive lobe points left.

Y₁^-1 = sin(θ)cos(φ)

ACN index: 1

Z

Z — Up–Down

Figure-8 dipole on the vertical axis. Captures height information.

Y₁⁰ = sin(φ)

ACN index: 2

SN3D Normalization

Schmidt semi-normalization ensures all spherical harmonic components have comparable energy levels. Unlike N3D (full normalization) or maxN (Furse-Malham), SN3D prevents clipping while maintaining numerical stability across all channel orders.

// SN3D normalization factors

W: 1.000 (order 0)

Y: 1.000 (order 1, index 1)

Z: 1.000 (order 1, index 2)

X: 1.000 (order 1, index 3)

ACN Channel Ordering

Ambisonics Channel Number (ACN) is the standard ordering used by MPEG-H, YouTube 360, and Google spatial audio. Each channel is assigned a unique integer index based on its spherical harmonic degree and order.

// ACN = n² + n + m

ACN 0 W → omni (n=0, m=0)

ACN 1 Y → left-right (n=1, m=-1)

ACN 2 Z → up-down (n=1, m=0)

ACN 3 X → front-back (n=1, m=1)

3D Audio Positioner

Place Audio Objects in 3D Space

Position mono or stereo sources anywhere around the listener using azimuth, elevation, and distance. V100 encodes each object into the Ambisonics soundfield in real time.

LISTENER

0° Front

180° Rear

-90° L

+90° R

Commentator

az:0° el:0° d:2m

Crowd L

az:-90° el:5° d:15m

Crowd R

az:+90° el:5° d:15m

Field Mic

az:0° el:30° d:8m

Effects

az:-135° el:0° d:5m

Commentary

Crowd / Ambience

Field Microphones

Effects / Music

Azimuth

-180° to +180°

Horizontal angle around the listener. 0° is front, ±180° is directly behind.

Elevation

-90° to +90°

Vertical angle. +90° is directly overhead, -90° is directly below.

Distance

0.1m–100m

Distance from listener with inverse-square attenuation and air absorption modeling.

Head Tracking

Real-Time Soundfield Rotation

Quaternion-based head tracking from Vision Pro, AirPods Pro, or Meta Quest feeds directly into V100's rotation matrix. The soundfield follows the listener's head in real time.

Vision Pro

IMU Sensors

1000 Hz sample

WebSocket

Quaternion Stream

[w, x, y, z]

V100 Engine

Soundfield Rotation

R(q) * B(t)

Binaural Out

Left + Right Ears

Stereo PCM

Yaw

Rotation around the vertical axis. Turning your head left or right. Maps to azimuth shift in the soundfield.

axis: Y (up)
range: ±180°
effect: horizontal pan

Pitch

Rotation around the lateral axis. Tilting your head up or down. Maps to elevation shift in the soundfield.

axis: X (right)
range: ±90°
effect: vertical pan

Roll

Rotation around the front axis. Tilting your head ear-to-shoulder. Adjusts left-right balance and height cues.

axis: Z (front)
range: ±180°
effect: tilt compensation

head_tracking.json

// WebSocket message from Vision Pro
{
  "timestamp": 1711547823.456,
  "quaternion": {
    "w": 0.9239,   // scalar component
    "x": 0.0000,   // pitch axis
    "y": 0.3827,   // yaw axis (45deg turn)
    "z": 0.0000    // roll axis
  },
  "device": "apple_vision_pro",
  "session_id": "sa_sess_7f3a..."
}

Binaural Rendering

HRTF Convolution for True 3D Perception

Head-Related Transfer Functions model how sound diffracts around your head, pinnae, and torso. V100 convolves the Ambisonics soundfield with HRTF filters to produce binaural stereo that works with any headphones.

Left Ear Signal

HRTF_L(θ, φ) * B(t)

ITD: 0–700µs

ILD: 0–20dB

HRTF Filter

Pinna diffraction + ITD + ILD

Right Ear Signal

HRTF_R(θ, φ) * B(t)

Spectral: pinna cues

Temporal: early reflect

Step 1

Decode FOA

Extract W, X, Y, Z from Ambisonics stream

Step 2

Apply Rotation

Rotate soundfield by head-tracking quaternion

Step 3

HRTF Convolve

Frequency-domain multiply with HRTF filters

Step 4

Stereo Output

Left + Right PCM at 48kHz / 24-bit

Works with any stereo headphones — no special hardware required

Compatible Devices

From Vision Pro to Any Headphones

Full spatial audio with head tracking on supported devices. Binaural rendering on everything else. No listener left behind.

Apple Vision Pro

6DOF Head Tracking

Spatial Audio + HRTF + Tracking

AirPods Pro

3DOF Head Tracking

Dynamic Head Tracking + Spatial

Meta Quest 3

6DOF Head Tracking

Inside-Out Tracking + Spatial

Sony WH-1000XM5

3DOF Head Tracking

360 Reality Audio Compatible

Any Stereo Headphones

Binaural Rendering

Fixed HRTF, No Tracking Required

Feature	Vision Pro	AirPods Pro	Meta Quest	Stereo
Binaural HRTF
Head Tracking	6DOF	3DOF	6DOF	None
Personalized HRTF
Latency	<1ms	<5ms	<2ms	<1ms

Immersive Metadata

MPEG-H & Dolby Atmos Ready

V100 generates standards-compliant metadata for object-based audio delivery. Export to MPEG-H 3D Audio, Dolby Atmos ADM, or our native spatial format.

MH

MPEG-H 3D Audio

ISO/IEC 23008-3

Object-based audio with scene metadata. Supports up to 128 audio objects with position, gain, and spread parameters. Used in broadcast (ATSC 3.0, DVB).

Scene description metadata
Object position + interactivity
HOA + channel bed support

DA

Dolby Atmos

ADM BWF / DAMF

Export Audio Definition Model metadata for Dolby Atmos workflows. Object positions, bed assignments, and binaural render metadata in standard ADM XML.

ADM BWF export
7.1.4 bed + objects
Renderer-agnostic positioning

V1

V100 Spatial

Native JSON Format

Our native format optimized for low-latency streaming. Compact binary metadata interleaved with audio frames for sub-millisecond scene updates.

Binary + JSON hybrid
Frame-level position updates
WebSocket real-time streaming

spatial_scene.json

{
  "format": "v100_spatial_v1",
  "sample_rate": 48000,
  "ambisonics_order": 1,
  "normalization": "SN3D",
  "channel_order": "ACN",
  "objects": [
    {
      "id": "commentator",
      "azimuth": 0.0,
      "elevation": 0.0,
      "distance": 2.0,
      "gain": 1.0
    },
    {
      "id": "crowd_left",
      "azimuth": -90.0,
      "elevation": 5.0,
      "distance": 15.0,
      "gain": 0.8
    }
  ],
  "export": ["mpeg_h", "dolby_atmos_adm"]
}

Spatial Audio API

7 Endpoints. Full Control.

Create spatial scenes, position objects, stream head tracking, and render binaural output. All through a single REST + WebSocket API.

Method	Endpoint	Description
POST	/v1/spatial/scenes	Create a new spatial audio scene with Ambisonics config
POST	/v1/spatial/scenes/{id}/objects	Add an audio object with position (azimuth, elevation, distance)
PATCH	/v1/spatial/objects/{id}/position	Update object position in real time (azimuth, elevation, distance, gain)
WS	/v1/spatial/scenes/{id}/tracking	WebSocket for head-tracking quaternion stream (device → V100)
POST	/v1/spatial/scenes/{id}/render	Render binaural output (returns stereo PCM or streams via WebSocket)
POST	/v1/spatial/scenes/{id}/export	Export metadata (MPEG-H, Dolby Atmos ADM, or V100 native)
GET	/v1/spatial/scenes/{id}	Retrieve scene state, all objects, and current head-tracking status

create_scene.sh

# Create a spatial audio scene
curl -X POST https://api.v100.ai/v1/spatial/scenes \
  -H "Authorization: Bearer $V100_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Live Sports Mix",
    "ambisonics_order": 1,
    "normalization": "SN3D",
    "sample_rate": 48000,
    "head_tracking": true
  }'

# Response
{
  "scene_id": "sa_scene_7f3a...",
  "ws_url": "wss://api.v100.ai/v1/spatial/scenes/sa_scene_7f3a.../tracking",
  "status": "active"
}

Available Now

Build Immersive Audio Experiences

From live sports to virtual events, V100 spatial audio turns flat stereo into a fully immersive 3D soundfield. Start with our free tier.

Schedule Demo API Docs

Broadcast Platform • Performance Benchmarks • Pricing

Spatial Audio for the Spatial Computing Era.