Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
“content”: “
ISO/IEC 14496-12 defines the ISO Base Media File Format (ISOBMFF), the foundational framework for a vast family of multimedia container formats. The CAN/CSA-ISO/IEC 14496-12:16 standard represents the Canadian adoption of this international specification, ensuring that implementations within Canada align with global multimedia interoperability requirements.
ISOBMFF is a time-based media file format designed to hold media data—video, audio, text, and images—in a structured, extensible, and object-oriented manner. It serves as the basis for:
The standard specifies how to describe media content, the timing and structure of media samples, and how to organize data for local playback, streaming, and editing. The CAN/CSA adoption mirrors the ISO text verbatim, providing a harmonized benchmark for compliance testing in North American markets.
The fundamental data structure in ISOBMFF is the Box (historically called an Atom in QuickTime terminology). Every file is a sequence of boxes. Each box is identified by a Four Character Code (4CC), includes a size field, and can contain nested child boxes, forming a hierarchical tree.
| Box Type (4CC) | Name | Purpose | Mandatory |
|---|---|---|---|
ftyp | File Type Box | Defines the file brand, major version, and list of compatible brands. | Yes |
moov | Movie Box | Container for all presentation metadata, track information, and sample tables. | Yes (or moof) |
mdat | Media Data Box | Holds the raw, interleaved encoded media sample data. | Yes |
moof | Movie Fragment Box | Container for fragment-level metadata; essential for streaming and dynamic content. | For Fragmented Files |
sidx | Segment Index Box | Provides segment-level indexing for subsegments in fragmented files. | Recommended for DASH |
trak | Track Box | Describes a single media track (video, audio, timed text). | In moov |
For streaming applications, the standard defines a fragmented movie structure. Instead of storing all metadata upfront in a single moov box, metadata can be distributed across multiple moof boxes. This allows muxing data in small segments that can be delivered independently, enabling low-latency streaming and seamless ad insertion.
The typical structural order for a fragmented file optimized for CMAF or DASH is:
ftyp -> moov (with mvex) -> styp -> sidx -> moof -> mdat
ftyp box first. The compatible_brands list is critical for ensuring backward compatibility. For example, an HEIC file may list mif1 and heic as compatible brands under the ftyp box. moov box precedes moof or mdat—is highly recommended. Non-standard ordering is a common source of parsing and interoperability issues. ISOBMFF is the backbone of MPEG-DASH (ISO/IEC 23009-1) and CMAF. CMAF strictly applies fragmentation rules using moof and mdat boxes for chunked encoding and delivery. Conforming to CAN/CSA-ISO/IEC 14496-12:16 ensures that generated CMAF tracks are compatible across CDNs and players without requiring transcoding at the edge.
The standard integrates tightly with ISO/IEC 23001-7 (Common Encryption, CENC). It defines how protection scheme information is stored in the schm (Scheme Type Box) and how per-sample encryption parameters are stored in the tenc (Track Encryption Box). These boxes reside either in the moov box for static encryption or in the moof box for dynamic key rotations.
Leveraging ISOBMFF, HEIF stores images as items within the meta box or as tracks for image sequences. The derivations and transformations (e.g., thumbnails, overlays) are all defined using the standard ISOBMFF infrastructure, making HEIF incredibly flexible for modern imaging workflows.
stco (Chunk Offset Box) or co64 (64-bit Chunk Offset) box. A single incorrect offset can render the entire mdat data unrecoverable or cause severe audio/video desynchronization. Conformance to the Canadian standard requires strict adherence to the ISO specification text. Key compliance checkpoints include:
largesize field must be used for boxes over 4 GB).timescale and duration fields in the moov and tkhd boxes must be consistent. The average bitrate calculated from the data must match the values stored in the btrt box.mvex box must be present in moov if fragments are used. The default_sample_duration and default_sample_size in the trex box must be correctly calibrated.ftyp box must accurately reflect the file’s capabilities. Writing unsupported brands can cause validators to reject the file.For formal testing, reference software from the MPEG committee remains the ultimate authority. Open-source tools such as GPAC (MP4Box) and Bento4 offer robust validation capabilities and are widely used in the compliance testing pipeline.
mp42, isom) and restrictions on permissible box types and codecs. moov box contains the metadata required to commence decoding. For optimal streaming, the moov box should be placed before the mdat box (a practice known as “fast start” or “fragmented layout”). If the moov box is at the end of the file, the player must download the entire file before playback can begin, which is highly inefficient for streaming. stco/co64), (2) mismatched track durations and edit list totals, (3) missing or malformed ftyp brands, and (4) inconsistent fragmentation parameters in the mvex box. Year of implementation reference: 2026. Users of this document should consult the latest version of CAN/CSA-ISO/IEC 14496-12 for the most current normative text.
”