ISO 26428-11: Digital Cinema — Part 11: Additional Audio Channels

Immersive audio specifications for digital cinema — object-based audio, height channels, and multi-dimensional sound

1. Scope of ISO 26428-11

ISO 26428-11 defines the digital cinema additional audio channel specifications, extending the baseline 5.1/7.1 surround sound formats to support immersive audio configurations including object-based audio, height channels, and multi-dimensional soundscapes. This standard ensures interoperability across digital cinema servers (DCS), audio processors, and screen processing systems manufactured by different vendors, while maintaining backward compatibility with existing cinema sound infrastructure.

Designing for object-based audio (OBA) requires the audio processor to handle up to 128 simultaneous audio objects plus 16 bed channels. Ensure your DSP resource allocation can scale to this worst-case mix without introducing latency exceeding 5 ms at 48 kHz sampling.
Parameter Baseline (5.1) ISO 26428-11 Extended
Maximum channels 6 64 (≤ 16 bed + 128 objects)
Sampling rate 48 kHz 48 / 96 / 192 kHz
Bit depth 16 or 24 bit 24 bit (mandatory), 32 bit float (optional)
Audio codec Linear PCM PCM / Dolby Atmos / DTS:X / Auro-3D
Channel mapping Fixed Dynamic, per-reel metadata
Latency budget (AES67) N/A ≤ 1 ms network + 5 ms processing

2. Immersive Audio Architecture

The additional audio framework specified in ISO 26428-11 uses a metadata-driven rendering model. Each audio essence is accompanied by a Composition Playlist (CPL) that describes spatial positioning metadata including azimuth, elevation, distance, and gain coefficients. The cinema audio processor (CAP) renders these objects in real time to the available speaker array, applying panning laws (VBAP, DBAP, or Ambisonic decoding) based on the auditorium’s specific speaker layout configuration stored in the Speaker Configuration File (SCF).

When deploying height-channel speaker arrays, the vertical angle between adjacent layers should not exceed 30° to avoid localization gaps. For a typical 15 m screen-to-rear-wall cinema, a three-layer (floor, ear-level, ceiling) configuration requires minimum 9 height speakers per side wall.

Audio synchronisation is critical. The standard mandates that all audio channels and objects be time-aligned within ±1 sample period at 48 kHz (±20.8 μs). This requires a common clock reference distributed via AES67 or IEEE 1588 (PTP) precision time protocol. Network jitter must be below 1 μs RMS to maintain this alignment across all processing nodes.

3. Engineering Implementation Considerations

Real-world implementation of ISO 26428-11 compliant systems presents several engineering challenges. The audio processor must support both channel-based and object-based rendering simultaneously, with seamless transitions between reels that may use different audio formats. Digital cinema servers must embed the additional audio metadata within the existing MXF (Material Exchange Format) container without disrupting the core audio essence decoding path.

  • DSP throughput: A 64-channel immersive system at 96 kHz/24-bit requires approximately 150 MIPS of dedicated DSP processing for real-time rendering, excluding object panning calculations which can add another 200–400 MIPS depending on object count.
  • Network topology: Use dedicated AVB (Audio Video Bridging) or Milan-certified network switches with ≤ 100 μs store-and-forward latency per hop. Standard Ethernet switches introduce unpredictable queuing delays that violate the 1 ms end-to-end latency budget.
A proper commissioning process using the ISO 26428-11 calibration test tracks (pink noise with spatial metadata) enables automated verification of all speaker channels, delays, and SPL alignment in under 30 minutes — compared to 3–4 hours for manual alignment.
Power sequencing is critical in immersive audio systems. Amplifier racks can draw > 100 A peak during capacitor bank charging. Always implement a staggered power-up sequence (2–3 second delay between racks) to avoid tripping building circuit breakers during system startup.

4. Frequently Asked Questions

Q: Is ISO 26428-11 backward compatible with existing 5.1 cinema installations?
A: Yes. The standard specifies a backward-compatible downmix algorithm encoded in the audio metadata. A legacy 5.1 processor will reproduce a 7.1.4 immersive mix as a standard 5.1 output, though the immersive effect is naturally lost.
Q: What is the maximum number of audio objects supported?
A: ISO 26428-11 supports up to 128 simultaneous audio objects plus 16 bed channels, for a total of 144 simultaneous audio elements. Practical deployments typically use 32–64 objects.
Q: How does the standard handle speaker layout variations between cinemas?
A: The Speaker Configuration File (SCF) stored in each auditorium’s audio processor maps the ideal sound field to the physical speaker layout. The rendering engine adapts object panning based on actual speaker positions, number of height layers, and subwoofer configuration.

Leave a Reply

Your email address will not be published. Required fields are marked *