ISO/IEC 29341-17-12: UPnP AV AVTransport Service

UPnP Audio/Video — Part 17-12: AVTransport Service Specification

1. AVTransport Service and Transport State Machine

ISO/IEC 29341-17-12 specifies the AVTransport service, which is the central playback control component in the UPnP AV architecture. While ConnectionManager handles the establishment of media connections, AVTransport is responsible for controlling the actual playback of media content — managing the transport state machine, URI-based content selection, seeking, speed control, and track-based playback navigation. This service is the primary interface through which control points implement the familiar play-pause-stop-seek user experience.

The AVTransport service maintains a formal transport state machine that governs the device’s playback behavior. The core states are STOPPED, PLAYING, PAUSED_PLAYBACK, TRANSITIONING, and NO_MEDIA_PRESENT. Each state has well-defined legal transitions triggered by specific actions. For example, from STOPPED, the only valid transitions are to PLAYING (via Play) or to NO_MEDIA_PRESENT (if media is removed). From PLAYING, the transport can transition to PAUSED_PLAYBACK (via Pause), STOPPED (via Stop), or TRANSITIONING (when changing tracks with Next or Previous). Control points must respect these state machine semantics to ensure predictable device behavior.

When implementing the AVTransport state machine, always use the CurrentTransportState and CurrentTransportStatus state variables as the single source of truth. Never infer transport state from cached action results — the state may have changed due to a concurrent control point invocation or an internal device event.

2. URI Management and Playback Navigation

The AVTransport service uses a URI-based content model. The control point sets the content to be played by calling SetAVTransportURI with a URI pointing to the media resource and a current URI metadata string (in DIDL-Lite format) describing the content. The service also supports a next URI via SetNextAVTransportURI, enabling gapless playback between consecutive tracks — the device buffers the next track while the current one is still playing, eliminating the silence gap between songs or video chapters.

Playback navigation actions include Next and Previous for track skipping, Seek for positional navigation within a track, and Play with a Speed parameter for variable-speed playback. The Seek action supports multiple unit types: TRACK_NR for seeking to a specific track in a multi-track resource, ABS_TIME for seeking to an absolute time position, REL_TIME for relative time seek, ABS_COUNT for frame-accurate seeking, and X_DLNA_REL_BYTE for byte-level seeking in DLNA-optimized scenarios. The GetPositionInfo action returns the current playback position in multiple unit formats simultaneously, allowing control points to display position information without additional conversion.

Action State Transition Description Common Error Codes
SetAVTransportURI Any -> STOPPED (or NO_MEDIA_PRESENT cleared) Set the URI of the media to be played 716 (Seek Mode Not Supported), 718 (Illegal MIME-Type)
Play STOPPED/PAUSED_PLAYBACK -> PLAYING Start or resume playback at specified speed 703 (Invalid State), 705 (No Media Present)
Pause PLAYING -> PAUSED_PLAYBACK Temporarily suspend playback 703 (Invalid State) if not PLAYING
Stop PLAYING/PAUSED_PLAYBACK -> STOPPED Stop playback and reset position 703 (Invalid State) if already STOPPED
Seek PLAYING/PAUSED_PLAYBACK -> TRANSITIONING Seek to specified position in the media 716 (Seek Mode Not Supported), 717 (Illegal Seek Target)
Next PLAYING/PAUSED_PLAYBACK/STOPPED -> TRANSITIONING Skip to the next track 712 (No Such Resource) if no next URI set
Previous PLAYING/PAUSED_PLAYBACK/STOPPED -> TRANSITIONING Go to previous track or restart current 712 (No Such Resource) at first track
GetPositionInfo Any (no state change) Retrieve current playback position None
GetTransportInfo Any (no state change) Retrieve transport state and status None
The TRANSITIONING state is the most commonly misunderstood state in the AVTransport machine. It is a transient state that the device enters when moving between tracks (Next/Previous) or performing a seek operation. Control points must not issue new transport commands during TRANSITIONING — they should wait for the device to return to PLAYING, PAUSED_PLAYBACK, or STOPPED before issuing further actions.

3. Engineering Design Insights for AVTransport

Implementing a robust AVTransport service requires handling several asynchronous complexities. The most significant is the management of the relationship between the transport state machine and the underlying media decoder pipeline. Decoder initialization, buffer pre-rolling, and audio-video synchronization all take real time, and the device must reflect these phases accurately through the state variables. For example, after SetAVTransportURI, the device should set CurrentTransportState to STOPPED while buffering, then transition to PLAYING when Play is invoked and sufficient data has been buffered.

Multiple AVTransport instances (identified by AVTransportID) allow a single device to manage several independent playback sessions simultaneously. Each instance has its own state machine, URI, position, and transport settings. This is essential for multi-room audio systems, picture-in-picture video, or recording devices that need to monitor playback while recording. The ConnectionManager’s PrepareForConnection action associates a connection with a specific AVTransport instance, and the control point includes the AVTransportID in all subsequent AVTransport action invocations.

The Play action’s Speed parameter deserves careful engineering attention. The standard specifies that Speed=1 denotes normal playback, values greater than 1 indicate fast forward (e.g., 2, 4, 8, 16, 32), and values between 0 and 1 (exclusive) indicate slow motion. Negative values indicate reverse playback. The device must advertise which speeds it supports via the TransportPlaySpeed state variable, and control points must query this before attempting non-normal-speed playback. Engineers should implement speed transitions gracefully, maintaining audio-video synchronization at non-standard speeds whenever the decoder pipeline supports it.

Audio synchronization (lip sync) is a subtle but critical implementation detail. When the AVTransport service manages both audio and video streams, it must ensure that the audio output remains synchronized with the video frames. The standard defines an AVSyncOffset state variable that allows control points to adjust the synchronization offset (in milliseconds) to compensate for varying processing delays in the audio and video paths. A positive offset delays audio relative to video; a negative offset advances audio.

Implement gapless playback by using SetNextAVTransportURI to pre-buffer the subsequent track while the current track is still playing. This technique eliminates the inter-track silence gap that occurs when a new URI is set after the current track ends. For local file playback, start decoding the next track when the current track reaches 5-10 seconds from its end to ensure seamless transition regardless of decoder initialization latency.
Never block the AVTransport action handler while the underlying decoder performs time-consuming operations like format detection or buffer initialization. Return from the action immediately with the state set to TRANSITIONING, and update the state to PLAYING or STOPPED asynchronously when the operation completes. Blocking the action handler stalls the entire UPnP device stack and degrades responsiveness for all concurrent control points.

4. Frequently Asked Questions

Q: Can AVTransport handle streaming media (e.g., HTTP live streams) as well as local file playback?
A: Yes, the URI-based content model is transport-agnostic. The same AVTransport actions handle both local URIs (file://, internal://) and network URIs (http://, rtsp://, mms://). However, seek behavior differs — seek operations on live streams may be restricted or unsupported, and the device indicates this through the SeekMode state variable.
Q: How does the service handle multiple control points issuing conflicting transport commands?
A: The AVTransport service processes actions sequentially. If two control points issue Play commands simultaneously, both succeed (the second Play from PLAYING state is a benign idempotent operation). Conflicting commands like Play and Stop from different control points are resolved by action ordering — the last action received before the state machine processes it takes effect.
Q: What happens when SetAVTransportURI is called during active playback?
A: The device stops the current playback, resets the transport state to STOPPED, loads the new URI, and awaits a Play command. The CurrentTrackURI and CurrentTrackMetadata state variables are updated to reflect the new content.
Q: Is there a limit to the number of tracks that can be enqueued via SetNextAVTransportURI?
A: The standard does not define a playlist mechanism within AVTransport. Only a single next URI is supported. For multi-track playlists, the control point should manage the queue and call SetAVTransportURI with each successive track as the previous one completes, monitoring the transport state to detect track completion events.

Leave a Reply

Your email address will not be published. Required fields are marked *