Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 14496-14:2007, also known as MPEG-4 Part 14, defines the MP4 file format as a container for timed multimedia content. It is a derivative of the ISO Base Media File Format (ISO/IEC 14496-12) and specifies how to store MPEG-4 encoded audio, video, and other streams in a structured, streamable, and extensible file. The standard primarily targets interoperability among authoring tools, players, and streaming servers, ensuring that a compliant MP4 file can be decoded and rendered across a wide range of platforms.
The format formally adopts the object-oriented box (or atom) structure inherited from the QuickTime file architecture. Every MP4 file consists of a sequence of boxes, each identified by a four-character code (e.g., ftyp, moov, mdat). The standard defines the logical order, mandatory presence, and permissible nesting of these boxes for a conformant file. It also introduces the concept of brands—used in the File Type Box (ftyp)—to signal the version of the format and the expected capabilities of a reader.
Beyond storage, the standard covers streaming scenarios through the fragmented MP4 feature (moof boxes), enabling progressive download and dynamic adaptive streaming over HTTP (DASH). This makes the MP4 file format equally relevant for local playback and network distribution.
A conformant MP4 file must contain a minimum set of boxes arranged according to a fixed hierarchy. The requirement ensures that any compliant parser can initialise the decoding pipeline without ambiguity.
Every MP4 file begins with a File Type Box (ftyp) that declares the major brand, compatible brands, and the file’s internal version number. Immediately after or interspersed may appear the Movie Box (moov), which contains metadata for all tracks (duration, sample descriptions, codec parameters). The Media Data Box (mdat) holds the actual encoded samples. For streaming, the Movie Fragment Box (moof) and associated mdat fragments deliver incremental metadata and data.
mdat is technically optional, omitting it results in an empty file. Most validators require at least one mdat box for a meaningful presentation. | Box Type | Mandatory | Description |
|---|---|---|
ftyp | Yes | File Type Box; declares compatibility brands and file version. |
moov | Yes | Movie Box; contains track metadata, sample descriptions, codec configuration. |
mdat | No* | Media Data Box; stores raw media samples (audio frames, video frames, etc.). |
moof | No | Movie Fragment Box; used only in fragmented MP4, contains per-fragment metadata. |
stbl | Yes** | Sample Table Box (within minf); provides decoding time, sample size, and offset maps. |
mdia | Yes** | Media Box; houses the track’s information, handler, and media header. |
* Often considered essential for playback; ** Required inside the trak hierarchy.
The ftyp box uses four-character codes to specify which family of standards the file follows. For example, brand mp42 indicates MPEG-4 Part 14 version 2 (2007). Compatible brands list older or alternative base specifications to enable backward compatibility. A player must support at least one compatible brand to open the file.
isom) and specific MPEG-4 decoders (mp41, mp42). Implementing an MP4 multiplexer or demultiplexer requires careful attention to track organisation and sample mapping. ISO/IEC 14496-14:2007 defines how to encapsulate MPEG-4 audio (AAC) and video (AVC/H.264) elementary streams using descriptor structures such as esds (Elementary Stream Descriptor) within the stsd (Sample Description) box.
stsd.stts (decoding time to sample) and stsz (sample size) tables provide precise timestamps and byte offsets for each sample.moov before mdat (fast-start layout). Fragmented MP4 uses a single moov for global metadata followed by multiple moof+mdat pairs.tref box allows tracks to reference each other, used for features like multi-angle video, or timed text.moov box at the file beginning significantly reduces initial buffering time for local playback and enables immediate parsing without scanning the entire file. Conformance to ISO/IEC 14496-14:2007 is verified at two levels: structural conformance (correct box hierarchy and mandatory presence) and content conformance (codec parameters consistent with the declared stream types).
ftyp box must list at least one brand that is defined in the standard (mp41, mp42) or a recognised derived brand.moov box should ideally appear before mdat for compatibility, though the standard permits it to follow mdat (non-fast-start).mp4v or avc1/avc3 for H.264..mp4 extension for files with brand mp41 or mp42. Other extensions (.m4v, .m4a) are used for derivative specifications.stbl box inside the track hierarchy causes the file to be unparseable because no timing or sample mapping information is available. Always validate the complete box tree with a reference parser. To test compliance, developers should use the ISO BMFF validator tools provided by the MPEG industry forum or the official ISO conformance software. Interoperability testing across multiple platforms (Windows, macOS, iOS, Android) is strongly recommended before product release.
mp41, mp42) to guarantee MPEG-4 codec compatibility. ISOBMFF is a generic standard; MP4 tailors it to MPEG-4 content. ftyp) and Movie Box (moov) are mandatory. Inside moov, at least one Track Box (trak) and the Media Data Box (mdat) are strongly required for meaningful content. The standard also requires an stbl box inside each track. avc1 codec identifier is widely supported. MP4 is extensible, but the file may lose pure ‘mp42’ branding if non-MPEG-4 codecs are present. mp41, mp42) in ftyp signify?mp41 refers to the first edition (2003) of Part 14, while mp42 refers to the second edition (2007). Compatible brands list other standards that can also decode the file, e.g., isom for baseline ISOBMFF readers.