Understanding ISO/IEC 14496-14:2007 – The MP4 File Format Specification

Scope, Technical Structure, and Compliance for the Universal Multimedia Container

Scope of ISO/IEC 14496-14:2007

ISO/IEC 14496-14:2007, also known as MPEG-4 Part 14, defines the MP4 file format as a container for timed multimedia content. It is a derivative of the ISO Base Media File Format (ISO/IEC 14496-12) and specifies how to store MPEG-4 encoded audio, video, and other streams in a structured, streamable, and extensible file. The standard primarily targets interoperability among authoring tools, players, and streaming servers, ensuring that a compliant MP4 file can be decoded and rendered across a wide range of platforms.

The format formally adopts the object-oriented box (or atom) structure inherited from the QuickTime file architecture. Every MP4 file consists of a sequence of boxes, each identified by a four-character code (e.g., ftyp, moov, mdat). The standard defines the logical order, mandatory presence, and permissible nesting of these boxes for a conformant file. It also introduces the concept of brands—used in the File Type Box (ftyp)—to signal the version of the format and the expected capabilities of a reader.

Foundation Note: ISO/IEC 14496-14:2007 relies entirely on the box semantics of ISO/IEC 14496-12:2008. Any implementation of MP4 must also conform to the parent specification (ISO Base Media File Format).

Beyond storage, the standard covers streaming scenarios through the fragmented MP4 feature (moof boxes), enabling progressive download and dynamic adaptive streaming over HTTP (DASH). This makes the MP4 file format equally relevant for local playback and network distribution.

Technical Requirements and Box Structure

A conformant MP4 file must contain a minimum set of boxes arranged according to a fixed hierarchy. The requirement ensures that any compliant parser can initialise the decoding pipeline without ambiguity.

Mandatory Boxes

Every MP4 file begins with a File Type Box (ftyp) that declares the major brand, compatible brands, and the file’s internal version number. Immediately after or interspersed may appear the Movie Box (moov), which contains metadata for all tracks (duration, sample descriptions, codec parameters). The Media Data Box (mdat) holds the actual encoded samples. For streaming, the Movie Fragment Box (moof) and associated mdat fragments deliver incremental metadata and data.

Interoperability Concern: Although mdat is technically optional, omitting it results in an empty file. Most validators require at least one mdat box for a meaningful presentation.
Table 1: Key Box Types Defined in ISO/IEC 14496-14:2007
Box Type Mandatory Description
ftyp Yes File Type Box; declares compatibility brands and file version.
moov Yes Movie Box; contains track metadata, sample descriptions, codec configuration.
mdat No* Media Data Box; stores raw media samples (audio frames, video frames, etc.).
moof No Movie Fragment Box; used only in fragmented MP4, contains per-fragment metadata.
stbl Yes** Sample Table Box (within minf); provides decoding time, sample size, and offset maps.
mdia Yes** Media Box; houses the track’s information, handler, and media header.

* Often considered essential for playback; ** Required inside the trak hierarchy.

Brands and Compatibility

The ftyp box uses four-character codes to specify which family of standards the file follows. For example, brand mp42 indicates MPEG-4 Part 14 version 2 (2007). Compatible brands list older or alternative base specifications to enable backward compatibility. A player must support at least one compatible brand to open the file.

Interoperability Milestone: The brand system allows a single MP4 file to be played across devices designed for ISO BMFF (e.g., isom) and specific MPEG-4 decoders (mp41, mp42).

Implementation Highlights

Implementing an MP4 multiplexer or demultiplexer requires careful attention to track organisation and sample mapping. ISO/IEC 14496-14:2007 defines how to encapsulate MPEG-4 audio (AAC) and video (AVC/H.264) elementary streams using descriptor structures such as esds (Elementary Stream Descriptor) within the stsd (Sample Description) box.

  • Codec Initialisation: All codec configuration data (e.g., AVCDecoderConfigurationRecord for H.264) must be stored in the sample entry inside stsd.
  • Timing and Indexing: The stts (decoding time to sample) and stsz (sample size) tables provide precise timestamps and byte offsets for each sample.
  • Streaming Support: For progressive download, the standard encourages placing moov before mdat (fast-start layout). Fragmented MP4 uses a single moov for global metadata followed by multiple moof+mdat pairs.
  • Track References: The tref box allows tracks to reference each other, used for features like multi-angle video, or timed text.
Performance Tip: Placing the moov box at the file beginning significantly reduces initial buffering time for local playback and enables immediate parsing without scanning the entire file.

Compliance and Conformance

Conformance to ISO/IEC 14496-14:2007 is verified at two levels: structural conformance (correct box hierarchy and mandatory presence) and content conformance (codec parameters consistent with the declared stream types).

Conformance Checkpoints

  1. Brand Declaration: The ftyp box must list at least one brand that is defined in the standard (mp41, mp42) or a recognised derived brand.
  2. Box Ordering: The moov box should ideally appear before mdat for compatibility, though the standard permits it to follow mdat (non-fast-start).
  3. Sample Description Integrity: Every track must contain a sample entry matching its coding type; for MPEG-4 video, the sample entry type must be mp4v or avc1/avc3 for H.264.
  4. File Extension Mapping: The standard reserves the .mp4 extension for files with brand mp41 or mp42. Other extensions (.m4v, .m4a) are used for derivative specifications.
Common Pitfall: Omitting the stbl box inside the track hierarchy causes the file to be unparseable because no timing or sample mapping information is available. Always validate the complete box tree with a reference parser.

To test compliance, developers should use the ISO BMFF validator tools provided by the MPEG industry forum or the official ISO conformance software. Interoperability testing across multiple platforms (Windows, macOS, iOS, Android) is strongly recommended before product release.

Frequently Asked Questions

Q: What is the difference between MP4 and the ISO Base Media File Format (ISOBMFF)?
A: MP4 (ISO/IEC 14496-14) is a specific profile of ISOBMFF (ISO/IEC 14496-12). It inherits the box structure from ISOBMFF but imposes additional constraints and brands (mp41, mp42) to guarantee MPEG-4 codec compatibility. ISOBMFF is a generic standard; MP4 tailors it to MPEG-4 content.
Q: Are there mandatory boxes in an MP4 file?
A: Yes. The File Type Box (ftyp) and Movie Box (moov) are mandatory. Inside moov, at least one Track Box (trak) and the Media Data Box (mdat) are strongly required for meaningful content. The standard also requires an stbl box inside each track.
Q: Can an MP4 file contain non-MPEG-4 codecs, such as H.264 or HEVC?
A: Yes, as long as the file brand includes an appropriate compatible brand or uses a derived specification that extends the baseline. For H.264, the avc1 codec identifier is widely supported. MP4 is extensible, but the file may lose pure ‘mp42’ branding if non-MPEG-4 codecs are present.
Q: What does the brand (mp41, mp42) in ftyp signify?
A: The major brand indicates the superior standard to which the file conforms. mp41 refers to the first edition (2003) of Part 14, while mp42 refers to the second edition (2007). Compatible brands list other standards that can also decode the file, e.g., isom for baseline ISOBMFF readers.

Technical article based on ISO/IEC 14496-14:2007. All rights reserved. Last updated in 2026.

📥 Standard Documents Download

🔒
Please wait 10 seconds, the download links will appear after the ad loads

Leave a Reply

Your email address will not be published. Required fields are marked *