Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 29500-2 defines the Open Packaging Conventions (OPC), the foundational packaging technology that underlies all OOXML document formats. OPC specifies how application data — parts, relationships, and metadata — are organized into a single self-contained package using a ZIP archive as the physical container. While OPC was designed for OOXML, its general-purpose architecture has been adopted by other standards including the XPS (OpenXPS) document format and various industry-specific container formats. Understanding OPC is essential for any engineer working with structured document formats.
The OPC model is built on two fundamental abstractions: parts and relationships. A part is a logical storage unit with a MIME content type and a compressed stream within the ZIP archive. Relationships are typed, directed links between a source part and a target part (or an external resource). Relationships are stored in XML fragments called relationship parts, conventionally named with a .rels suffix.
| OPC Concept | Description | Example |
|---|---|---|
| Part | A named stream with a content type | /word/document.xml (Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml) |
| Relationship | A typed link from a source part to a target | rId1 → /word/media/image1.png (Type: http://schemas.openxmlformats.org/officeDocument/2006/relationships/image) |
| Package-level relationship | Relationship whose source is the package root | Package → /word/document.xml (Type: …/officeDocument) |
| Content Type | MIME type identifying the part format | [Content_Types].xml — the single part that routes all content types |
| Part name | Unicode string with forward-slash path notation | /word/theme/theme1.xml |
Part names in OPC follow strict rules: they must be valid absolute URI paths starting with “/”, use forward slashes, and avoid characters that would require percent-encoding except where unavoidable. The part name /word/document.xml and /word/document.xml are the same; OPC part names are case-insensitive but case-preserving — a design choice that simplifies cross-platform interoperability.
The physical OPC package is a standard ZIP archive with specific constraints. The ZIP format must use DEFLATE compression for parts (store method is permitted but discouraged), and the archive must not span multiple volumes. The [Content_Types].xml part must be the first entry in the ZIP file — this is the only ordering constraint, and it enables streaming consumption of OPC packages.
| ZIP Feature | OPC Requirement | Engineering Impact |
|---|---|---|
| Compression | DEFLATE (store method allowed) | Images & media already compressed should use “store” to avoid double compression |
| Encryption | Standard ZIP 2.0 encryption NOT allowed | Use OPC Digital Signatures or packaging-level encryption |
| Entry ordering | [Content_Types].xml must be first | Streaming readers depend on this for content-type resolution |
| Entry names | UTF-8 (no CP437/IBM437 encoding) | Cross-platform compatibility requires UTF-8 flag set in ZIP local header |
| Segmenting | No multi-volume archives | OPC packages are always single-file; use OPC chunking for large documents |
The [Content_Types].xml part serves as the MIME type directory for the entire package. Every part in the package must have a corresponding content type entry, either through a part-specific override or a default extension mapping. The resolution algorithm is straightforward: (1) exact part name match wins, (2) if no override exists, the default for the file extension is used, (3) if neither exists, the package is invalid.
From an engineering perspective, the override/default duality enables efficient specification: common extensions (e.g., .xml, .png, .rels) get a single default entry, while parts with non-standard or context-specific types use overrides. This reduces the [Content_Types].xml size for packages with many similar parts.
OPC specifies a comprehensive digital signature framework. Signatures are stored as OPC parts under /_xmlsignatures/ and reference signed parts via relationships. The framework supports multiple signers, signature policies (e.g., sign only parts, sign relationships), and co-signing scenarios. Signature validation involves verifying the XML signature, checking that no unsigned parts have been added, and confirming that no signed parts have been modified.
The signature framework uses the W3C XML Signature (XMLDSIG) standard with specific OPC profiles. Engineers implementing OPC signature verification should pay careful attention to the transform chain: OPC requires a enveloped signature transform and a relationship transform that canonicalizes relationship content before signing.
A: Yes, but you must update [Content_Types].xml with an entry for the new part. Additionally, if you want the part to be discoverable, you need to add a relationship to it from a known part (or from the package root). Parts without any incoming relationships are not accessible through the standard OPC navigation API but are still valid if their content type is registered.
A: OPC does not impose a part size limit. However, the underlying ZIP format limits DEFLATE-compressed entries to 2^32 – 1 bytes of uncompressed data (approximately 4 GB). For larger content, use the OPC chunking pattern — split the content across multiple sequential parts with a container relationship.
A: External relationships reference resources outside the package (e.g., a hyperlink to a web URL or a linked file on a network share). These are stored in relationship parts with TargetMode=”External”. The OPC API resolves external relationships but does not validate the external target’s existence or accessibility.
A: For packages with thousands of parts, the relationship traversal overhead can be noticeable. The recommended optimization is to flatten deeply nested relationship chains where possible. Additionally, many OPC implementations cache the parsed relationship XML, so repeated lookups are cheap after initial load.