Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 29341-16-1:2011 defines the UPnP AV Architecture:2 specification, the overarching framework that unifies all UPnP Audio/Video device and service templates into a coherent interoperability model. This standard establishes the fundamental device roles — Media Server, Media Renderer, and Control Point — and defines the interaction patterns through which these roles collaborate to deliver a seamless media streaming experience across heterogeneous home and enterprise networks. The architecture builds upon the core UPnP Device Architecture (UDA) version 1.0, adding AV-specific extensions for content management, transport control, and connection handling.
The AV Architecture:2 specification introduces a three-role model. A Media Server (DMS — Digital Media Server) hosts content and makes it available for streaming. A Media Renderer (DMR — Digital Media Renderer) receives and renders content. A Control Point (DMC — Digital Media Controller) orchestrates the interaction: it discovers content on the server, identifies a suitable renderer, establishes a connection between them, and controls playback. Control Points may be embedded in the same device as a server or renderer (e.g., a smartphone that both browses and controls) or operate as standalone entities (e.g., a wall-mounted tablet running a home automation dashboard).
A key architectural innovation in version 2 is the introduction of the AVTransport:2 service, which separates transport control from connection management more cleanly than version 1. The architecture also defines the concept of virtual channels — logical pathways that can carry multiple media streams simultaneously, enabling features such as picture-in-picture, multi-room audio synchronization, and simultaneous recording and playback on the same device.
The Media Server role bundles three core services: ContentDirectory (for browsing and searching content), ConnectionManager (for establishing data transport), and optionally AVTransport (for controlling playback if the server also acts as a renderer). The Media Renderer role bundles ConnectionManager, AVTransport, and RenderingControl (for adjusting volume, brightness, etc.). The Control Point does not expose services but rather consumes them — it discovers devices via SSDP, queries their capabilities, and invokes actions to establish and control media flows.
| Device Role | Required Services | Optional Services | Example Devices |
|---|---|---|---|
| Media Server (DMS) | ContentDirectory:2, ConnectionManager:2 | AVTransport:2 | NAS, media server software, smartphone |
| Media Renderer (DMR) | ConnectionManager:2, AVTransport:2, RenderingControl:2 | (none) | Smart TV, streaming box, wireless speaker |
| Control Point (DMC) | (none as server/renderer) | (none) | Smartphone app, tablet, smart home hub |
| Media Player (DMP) | ContentDirectory:2, ConnectionManager:2, AVTransport:2, RenderingControl:2 | (none) | Set-top box with integrated UI, game console |
The interaction flow follows a well-defined sequence. First, the Control Point discovers available Media Servers and Media Renderers on the network through SSDP multicast discovery. Second, it browses the ContentDirectory service on a server to find desired media items. Third, it examines each item’s res (resource) elements to determine the available transport protocols and content formats. Fourth, it queries the ConnectionManager services on both the server and renderer to verify protocol compatibility and establish a connection. Finally, it uses AVTransport actions to control playback on the renderer and RenderingControl actions to adjust the audio/video presentation parameters. This sequence can be optimized by caching discovery and capability information to reduce latency for frequently used devices.
The AV Architecture:2 specification defines several media flow patterns. The most common is the two-box push model: the Control Point directs the Media Server to send content directly to the Media Renderer. The Control Point is not involved in the data path — it only manages signaling. This is the pattern used by DLNA Push Controller applications. An alternative is the three-box model where the Control Point, Server, and Renderer are all separate physical devices. The Architecture also supports the two-box pull model where the Renderer itself acts as the Control Point and pulls content from the Server (common in smart TVs with integrated media browsing UIs).
Transport mechanisms supported by the architecture include HTTP GET (the most common, for progressive download and streaming), HTTP POST (for uploading content to a server), RTP (for real-time streaming with timing information), and vendor-extensible protocols. The architecture also defines the concept of transport layers that can encapsulate DRM, link protection (DTCP-IP), and quality-of-service markings without modifying the core service interfaces. The ConnectionManager’s protocol info data structure is designed to be extensible — new transport protocols can be added by defining new protocol identifiers without revising the service specification.
The specification also addresses the important concept of content format identification. Each media resource carries a MIME type (e.g., video/mpeg, audio/mpeg, image/jpeg) and may include additional protocol-specific information through the protocolInfo attribute. The architecture recommends that implementations use the full four-field protocol info format rather than abbreviated forms to maximize interoperability. For DLNA-certified devices, additional profile identifiers (e.g., DLNA.1.5) are appended to the additionalInfo field to enable precise capability matching.
Implementing a system that conforms to the UPnP AV Architecture:2 requires careful attention to device discovery robustness. The SSDP discovery protocol uses UDP with a default TTL of 4, meaning discovery messages are confined to the local subnet. For multi-subnet deployments (e.g., enterprise networks with VLANs), implement SSDP proxy or multicast routing between subnets. Alternatively, use the Device Discovery mechanism defined in the UPnP UDA, which allows devices to register with a known discovery proxy.
Service coordination is another critical area. When a Control Point invokes actions across multiple services (e.g., ContentDirectory browsing followed by ConnectionManager setup), it must handle the case where services reside on different devices or on the same device. The architecture recommends that device implementations expose a single DeviceDescription XML that enumerates all embedded services and their control/event URLs. Control Points should parse this description to build a complete service topology before initiating any AV operations. Caching the device description with a timeout aligned to the SSDP advertisement interval minimizes redundant network requests.
Error handling at the architecture level requires a layered approach. Network-level errors (timeouts, connection refused) are handled by the transport layer. Service-level errors (invalid arguments, resource exhaustion) are returned as UPnP error codes in SOAP responses. Application-level errors (content format mismatch, DRM restriction) must be communicated through state variables and events. The architecture recommends that implementations log all errors with a severity level and timestamp to facilitate debugging of multi-vendor interoperability issues. A centralized error aggregator service within the Control Point can correlate errors across devices to identify systematic problems such as incompatible firmware versions or misconfigured network parameters.