Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
IEC 62312 provides a comprehensive framework for achieving and maintaining synchronization between audio and video signals in professional and consumer AV systems. The standard addresses the fundamental challenge that audio and video signals often traverse different processing paths with varying latencies: video processing (scaling, frame-rate conversion, compression/decompression) typically introduces 1-3 frames of delay, while audio processing (sample-rate conversion, filtering, perceptual coding) may add 10-50 ms. Without careful synchronization design, these latency differences produce perceptible lip-sync errors.
The standard applies to a wide range of systems: broadcast production and transmission chains, home theater systems, video conferencing equipment, digital cinema, live event production, and streaming media platforms. It covers both wired and wireless transmission paths and addresses synchronization across heterogeneous networks where audio and video may be transported over different protocols (e.g., AES67 audio with SMPTE ST 2110 video).
IEC 62312 defines a hierarchical clock architecture where a master clock generator (MCG) provides the primary timing reference. The master clock must have accuracy better than ±1 ppm for standard-definition systems and ±0.1 ppm for high-definition and UHD systems. Clock distribution follows a daisy-chain or star topology using dedicated timing signals (e.g., AES11 for audio, SMPTE ST 2059 for video over IP).
The standard establishes quantitative synchronization tolerances. For consumer applications, the audio-to-video offset must not exceed ±40 ms (ITU-R BT.1359 recommendation). For professional broadcast and production, the tolerance tightens to ±15 ms for critical monitoring and ±5 ms for live production where talent monitors are used. Jitter requirements are specified separately: audio clock jitter must not exceed 1 ns RMS (20 Hz – 20 kHz) to avoid degradation of digital-to-analog conversion quality.
| Application Class | Max A/V Offset | Clock Accuracy | Audio Jitter (RMS) | Video Timing |
|---|---|---|---|---|
| Consumer home theater | ±40 ms | ±5 ppm | 5 ns | ±0.5 frame |
| Broadcast production | ±15 ms | ±0.5 ppm | 1 ns | ±0.1 frame |
| Live event / studio | ±5 ms | ±0.1 ppm | 0.5 ns | ±0.05 frame |
| Digital cinema | ±10 ms | ±0.1 ppm | 0.2 ns | ±0.01 frame |
| Video conferencing | ±30 ms | ±1 ppm | 2 ns | ±0.25 frame |
IEC 62312 provides detailed guidance on managing and correcting synchronization errors. The standard distinguishes between fixed latency (deterministic, caused by processing pipelines and buffers) and variable latency (non-deterministic, caused by network congestion, clock drift, or codec rate control). Fixed latency is compensated by static delays inserted in the shorter path, while variable latency requires adaptive algorithms that continuously monitor and adjust the relative timing.
For IP-based systems, the standard recommends using RTP timestamps combined with PTP-synchronized wall clocks to compute the end-to-end latency difference between audio and video streams. The synchronization plane should operate independently of the media transport plane to avoid feedback loops. The standard also addresses the critical issue of “sync leader” selection—in a multi-device system, one device is designated the timing leader, and all others slave their output timing to it.
The most common cause is the audio processing chain in TVs and soundbars. Many modern TVs apply advanced video processing (motion interpolation, noise reduction, upscaling) that adds 2-5 frames of video delay, while the audio path (especially via HDMI ARC/eARC or optical) may not add corresponding delay. The result is audio that leads video—a particularly distracting form of lip-sync error.
Yes, the principles apply, but OTT services face additional challenges: client devices have heterogeneous processing capabilities, adaptive bitrate switching can cause timing discontinuities, and the lack of a common clock reference between encoder and decoder requires timestamp-based synchronization using the Media Presentation Timeline (MPD) in DASH or the Program Clock Reference (PCR) in HLS.
The standard recommends using test signals with simultaneous audio and video events—such as a flash (video) synchronized with a tone burst (audio) or a clapperboard pattern. Professional testing uses test pattern generators with known delay characteristics and precision oscilloscope measurements at the system output.
The sync leader is the device that generates or distributes the master timing reference. All other devices in the system lock their output timing to the sync leader. The sync leader should be the device with the most stable clock source (typically a dedicated master clock generator or a device locked to GPS/GNSS for broadcast applications).