Understanding ISO/IEC 14496-8:2005 – Carriage of MPEG-4 Content over IP Networks

A Technical Overview of the Standard for Network Transport of Audio-Visual Objects

1. Scope and Overview

ISO/IEC 14496-8:2005, titled “Information technology — Coding of audio-visual objects — Part 8: Carriage of ISO/IEC 14496 contents over IP networks,” is an essential standard within the MPEG-4 (ISO/IEC 14496) family. It defines how MPEG-4 content—including audio, video, scene description, and object-oriented streams—can be efficiently and reliably transported over Internet Protocol (IP) networks. The standard is designed to work seamlessly with existing protocols such as RTP (Real-time Transport Protocol), RTSP (Real-time Streaming Protocol), and SDP (Session Description Protocol).

Adopted by many national bodies (for example, CAN/CSA-ISO/IEC 14496-8:05 in Canada), this standard ensures interoperability between diverse implementations, from IP-based streaming servers to consumer media devices. It addresses key challenges such as timing recovery, error resilience, and multiplexing of multiple MPEG-4 streams.

Key Benefit: ISO/IEC 14496-8:2005 provides a uniform framework for delivering MPEG-4 content over IP, enabling rich multimedia services on any IP-connected device.

2. Technical Requirements and Architecture

2.1 Overall System Model

The standard defines a system where MPEG-4 content is packetized into RTP packets according to specific payload formats. The architecture comprises three main layers:

  • Content Layer: The original MPEG-4 streams (Systems, Audio, Video).
  • Packetization Layer: RTP payload formats that fragment and encapsulate MPEG-4 data.
  • Transport Layer: RTP/RTCP over UDP/IP for real-time delivery.

2.2 RTP Payload Formats

ISO/IEC 14496-8 specifies distinct RTP payload formats for each type of MPEG-4 elementary stream:

Stream Type MIME Type / Encoding Name Clock Rate (Hz) Key Parameter
MPEG-4 Audio (AAC, etc.) audio/mpeg4-generic Varies (up to 96 kHz) mode, profile-level-id
MPEG-4 Video (AVC/H.264, etc.) video/mpeg4-generic 90 kHz profile-level-id, config
MPEG-4 Systems (BIFS, OD) application/mpeg4-generic 90 kHz streamType, objectType

Each payload format supports fragmentation (e.g., using MPEG-4 Access Units or SL packetized data). The standard mandates specific RTP header usage (e.g., marker bit for last packet of a frame).

2.3 Clock Recovery and Synchronization

For synchronized playback, the standard relies on RTP timestamps and RTCP sender reports. MPEG-4’s Object Clock Reference (OCR) is mapped to the RTP timestamp domain using a system of periodic beacon frames. Implementations must maintain a common reference clock across all streams of the same session.

Implementation Note: In networks with non-negligible jitter, a de-jitter buffer of at least 100 ms is recommended. The buffer size should account for the greatest expected inter-arrival time variation.

3. Implementation Highlights

3.1 Session Description with SDP

The standard extends SDP to describe MPEG-4 streams. Mandatory fields include a=rtpmap with the encoding name derived from the MPEG-4 stream type and a unique payload type number. Configuration information (e.g., AudioSpecificConfig for MPEG-4 Audio) is transported in a=fmtp lines:

 a=rtpmap:96 audio/mpeg4-generic/44100/2 a=fmtp:96 streamtype=5; profile-level-id=15; config=1190 

3.2 Fragmentation and Aggregation

An RTP packet may carry a single MPEG-4 Access Unit (AU) or a fragment thereof. The standard defines a fragmentation unit (FU) header for video and audio to support large AUs. Aggregation (multiple small AUs per packet) is allowed to reduce overhead for low-rate streams, but the marker bit must be set accordingly.

3.3 Error Resilience

MPEG-4’s built-in error resilience tools (e.g., Reversible VLC for AAC, video packetization) are preserved. The standard also encourages use of RTP’s payload-specific mechanisms, such as the sequence number field for detecting loss and the RTCP feedback for reporting loss patterns.

Performance Tip: When using MPEG-4 Video with AVC, enable the Optional RTP Payload Format for Scalable Video Coding (if supported) to graceful degrade under packet loss.

4. Compliance and Testing Notes

4.1 Conformance Points

Compliance with ISO/IEC 14496-8:2005 is typically verified through:

  • Reception tests: Decoding of pre-encapsulated streams provided by certification bodies.
  • Bitstream compliance: The generated RTP packets must conform to the specified syntax and rules.
  • Interoperability events: Liasing with industry forums (e.g., MPEG Industry Forum).

4.2 Common Pitfalls

  • Using incorrect clock rates in SDP (e.g., 44.1 kHz for audio instead of the required rate).
  • Improper setting of the RTP marker bit – must be set for the last packet of each AU.
  • Omitting the config parameter for codecs that require it (e.g., AAC).
  • Mixing different Access Unit types within one RTP packet without proper fragmentation signaling.

4.3 Reference Documents and Tools

Developers should consult the following alongside the standard:

  • RFC 3640 (RTP Payload Format for MPEG-4 Streams) – nearly identical to ISO/IEC 14496-8.
  • ISO/IEC 14496-1 (Systems) for SL packet semantics.
  • ISO/IEC 14496-3 (Audio) and -10 (AVC) for codec-specific configuration.
Critical: Always verify that your implementation matches the version of the standard referenced in your certification requirements. The 2005 edition includes errata from the 2004 edition that affect fragmentation handling.


Frequently Asked Questions

Q: What is the main difference between ISO/IEC 14496-8 and RFC 3640?
A: RFC 3640 is functionally equivalent to ISO/IEC 14496-8:2005 for the core payload formats. The ISO standard provides additional normative guidance for MPEG-4 Systems Object Descriptor signaling and aligns with the full MPEG-4 reference model. Both documents use the same RTP payload structures.
Q: Does ISO/IEC 14496-8 support multiple streams in one RTP session?
A: Yes. The standard permits multiple MPEG-4 elementary streams to be carried in a single RTP session by using different payload type numbers and SSRC identifiers. However, each elementary stream must be described separately in the session description.
Q: Is this standard applicable to web-based streaming over HTTP?
A: ISO/IEC 14496-8 focuses on RTP-based transport. For adaptive HTTP streaming (e.g., DASH), refer to ISO/IEC 23009-1, which uses segments encapsulated in MP4 rather than raw MPEG-4 streams over RTP. Still, the RTP payload formats may be used for live contribution links.
Q: Where can I obtain a copy of ISO/IEC 14496-8:2005?
A: The standard can be purchased through ISO, IEC, or national member bodies such as CSA Group (Canada) or ANSI (USA). Technical previews are often available via the MPEG website or national committee document repositories.


© 2026 International Standards Organization. This article is for informational purposes and does not replace the official standard text.

📥 Standard Documents Download

🔒
Please wait 10 seconds, the download links will appear after the ad loads

Leave a Reply

Your email address will not be published. Required fields are marked *