Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The ISO/IEC 29341-18-12 standard defines the UPnP AVTransport v2 service, a core component of the UPnP AV architecture responsible for managing playback control of audio and video content across a home network. As part of the ISO/IEC 29341 series (formerly the UPnP Device Architecture), this service specification provides a standardized interface for controlling transport functions such as play, pause, stop, seek, and skip on media rendering devices.
Version 2 of the AVTransport service builds upon the capabilities of v1 by adding support for multiple transport instances, enhanced seek modes, group coordination, and richer state reporting. This makes it suitable for modern multi-room audio systems, home theater configurations, and synchronized media playback scenarios. The service is typically hosted by media renderers (e.g., smart speakers, smart TVs) but can also be implemented by media servers that offer trick-play control of streaming content.
At the heart of the AVTransport v2 service is a well-defined state machine that governs the lifecycle of media playback. The service maintains a TransportState variable that transitions through a series of states: STOPPED, PLAYING, PAUSED_PLAYBACK, PAUSED_RECORDING, RECORDING, and TRANSITIONING. Each state transition is triggered by specific actions, and not all transitions are valid from every state — the standard explicitly defines the allowed state diagram.
The service also manages multiple logical transports within a single device instance. Each transport instance is identified by an InstanceID argument, allowing a single device to independently control multiple simultaneous playback streams. This is critical for multi-room audio systems where different zones play different content. The v2 specification extends this with the CurrentTransportActions state variable, which provides a real-time list of currently available actions, enabling control points to dynamically enable or disable UI controls.
| TransportState | Valid Actions | Description |
|---|---|---|
| STOPPED | Play, Next, Previous, Seek | No media is being rendered; ready to begin playback |
| PLAYING | Pause, Stop, Next, Previous, Seek, Record | Media is actively being rendered to the user |
| PAUSED_PLAYBACK | Play, Stop, Next, Previous, Seek | Playback is suspended at the current position |
| TRANSITIONING | None | Device is buffering or preparing a new track |
| RECORDING | Stop, Pause (if supported) | Media content is being recorded to local storage |
| PAUSED_RECORDING | Record, Stop | Recording is temporarily suspended |
The AVTransport v2 service defines over 20 actions, including SetAVTransportURI, Play, Pause, Stop, Seek, Next, Previous, GetPositionInfo, GetTransportInfo, and SetPlayMode. The SetAVTransportURI action is particularly important as it establishes the media resource to be played. It accepts a URI string and an optional metadata XML fragment (in DIDL-Lite format) describing the content. The v2 spec adds support for multiple URIs via the AVTransportURIMetaData and NextAVTransportURI arguments, enabling gapless playback and playlist queuing.
From an engineering perspective, the standard recommends that implementations pre-buffer the next track when NextAVTransportURI is set, minimizing the gap between consecutive tracks. This pattern is commonly called “look-ahead buffering” and is critical for professional audio applications. The Seek action supports multiple seek modes: ABS_TIME (absolute time), REL_TIME (relative offset), TRACK_NR (track number), and in v2, FRAME (frame-accurate seeking). The SetPlayMode action enables additional modes such as NORMAL, SHUFFLE, and REPEAT_ONE, allowing control points to customize the playback experience according to user preferences.
When implementing an AVTransport v2 service, there are several critical engineering considerations. First, the LastChange evented variable must be managed carefully — it uses an XML event payload that aggregates all state changes since the last event notification. Control points subscribe to this variable via the UPnP eventing mechanism (GENA). The aggregated XML reduces network traffic compared to v1’s per-variable eventing approach.
Second, error handling must be robust. The AVTransport service can return a variety of error codes including AVTransportURI Not Supported (702), Play Mode Not Supported (703), and Seek Mode Not Supported (704). Each of these must be handled gracefully by the control point UI. Third, the service should implement the GetDeviceCapabilities action to advertise supported features such as playback media types (audio, video, image) and record media types.