Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 14496-8:2005, titled “Information technology — Coding of audio-visual objects — Part 8: Carriage of ISO/IEC 14496 contents over IP networks,” is an essential standard within the MPEG-4 (ISO/IEC 14496) family. It defines how MPEG-4 content—including audio, video, scene description, and object-oriented streams—can be efficiently and reliably transported over Internet Protocol (IP) networks. The standard is designed to work seamlessly with existing protocols such as RTP (Real-time Transport Protocol), RTSP (Real-time Streaming Protocol), and SDP (Session Description Protocol).
Adopted by many national bodies (for example, CAN/CSA-ISO/IEC 14496-8:05 in Canada), this standard ensures interoperability between diverse implementations, from IP-based streaming servers to consumer media devices. It addresses key challenges such as timing recovery, error resilience, and multiplexing of multiple MPEG-4 streams.
The standard defines a system where MPEG-4 content is packetized into RTP packets according to specific payload formats. The architecture comprises three main layers:
ISO/IEC 14496-8 specifies distinct RTP payload formats for each type of MPEG-4 elementary stream:
| Stream Type | MIME Type / Encoding Name | Clock Rate (Hz) | Key Parameter |
|---|---|---|---|
| MPEG-4 Audio (AAC, etc.) | audio/mpeg4-generic | Varies (up to 96 kHz) | mode, profile-level-id |
| MPEG-4 Video (AVC/H.264, etc.) | video/mpeg4-generic | 90 kHz | profile-level-id, config |
| MPEG-4 Systems (BIFS, OD) | application/mpeg4-generic | 90 kHz | streamType, objectType |
Each payload format supports fragmentation (e.g., using MPEG-4 Access Units or SL packetized data). The standard mandates specific RTP header usage (e.g., marker bit for last packet of a frame).
For synchronized playback, the standard relies on RTP timestamps and RTCP sender reports. MPEG-4’s Object Clock Reference (OCR) is mapped to the RTP timestamp domain using a system of periodic beacon frames. Implementations must maintain a common reference clock across all streams of the same session.
The standard extends SDP to describe MPEG-4 streams. Mandatory fields include a=rtpmap with the encoding name derived from the MPEG-4 stream type and a unique payload type number. Configuration information (e.g., AudioSpecificConfig for MPEG-4 Audio) is transported in a=fmtp lines:
a=rtpmap:96 audio/mpeg4-generic/44100/2 a=fmtp:96 streamtype=5; profile-level-id=15; config=1190
An RTP packet may carry a single MPEG-4 Access Unit (AU) or a fragment thereof. The standard defines a fragmentation unit (FU) header for video and audio to support large AUs. Aggregation (multiple small AUs per packet) is allowed to reduce overhead for low-rate streams, but the marker bit must be set accordingly.
MPEG-4’s built-in error resilience tools (e.g., Reversible VLC for AAC, video packetization) are preserved. The standard also encourages use of RTP’s payload-specific mechanisms, such as the sequence number field for detecting loss and the RTCP feedback for reporting loss patterns.
Compliance with ISO/IEC 14496-8:2005 is typically verified through:
config parameter for codecs that require it (e.g., AAC).Developers should consult the following alongside the standard:
© 2026 International Standards Organization. This article is for informational purposes and does not replace the official standard text.