Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
ISO/IEC 29341-9-11 defines the AVTransport v3 service, the most complex service in the UPnP AV architecture. AVTransport is responsible for controlling audio/video playback — including play, pause, stop, seek, speed control, and track management — across all types of AV devices. While ConnectionManager manages the streaming connection, AVTransport manages what happens to the content once the connection is established: how it plays, in what order tracks are presented, and how the user interacts with the playback experience.
Version 3 of AVTransport represents a major evolution from v2, introducing multi-track playlist management, gapless playback support, enhanced seek modes (including frame-accurate seeking), and improved synchronization capabilities for multi-room audio scenarios. It also formalizes the playback queue concept, allowing Control Points to build, reorder, and manipulate a queue of content items without requiring a separate ContentDirectory service on the renderer.
The AVTransport v3 architecture is built around a formal state machine with six transport states: STOPPED, PLAYING, PAUSED_PLAYBACK, PAUSED_RECORDING, RECORDING, and TRANSITIONING (a transient state between tracks when gapless playback is active). Each state defines which actions are valid — for example, Stop() is valid in all states, Play() is valid only from STOPPED, PAUSED_PLAYBACK, and TRANSITIONING, while Record() is valid only from STOPPED and PAUSED_RECORDING. Invalid action invocations return error code 701 (transition not available).
The service manages multiple independent transport instances, each identified by an InstanceID (integer, starting from 0). Each InstanceID maintains its own complete transport state: AVTransportURI (the current content URI), TransportState, PlayMode, record quality, current track metadata (TrackMetaData), and position information (RelativeTimePosition, AbsoluteTimePosition, TrackDuration). This multi-instance design allows a single device to support multiple simultaneous playback sessions — for example, picture-in-picture or multiple audio zones. InstanceIDs are dynamically allocated by the SetAVTransportURI() action and released when the transport returns to STOPPED with no next URI.
| Feature | AVTransport v2 | AVTransport v3 |
|---|---|---|
| Transport states | 5 (no TRANSITIONING) | 6 (+ TRANSITIONING for gapless) |
| Seek modes | TRACK_NR, ABS_TIME, REL_TIME | + ABS_FRAME, REL_FRAME (frame-accurate) |
| Playlist management | Single track | Multi-track with NextAVTransportURI |
| Gapless playback | Not supported | Full support with pre-buffering |
| Multi-room sync | Not supported | AVTransportSyncGroup (+/-5 ms) |
| Play modes | 4 (no REPEAT_ALL_SHUFFLE) | 5 (+ REPEAT_ALL_SHUFFLE) |
| Max InstanceIDs | 1 (implicit) | Multiple, dynamically allocated |
AVTransport v3 defines a comprehensive set of transport actions organized into functional groups. Playback Control actions: Play(), Stop(), Pause(), Next(), Previous(). Seek actions: Seek() with modes TRACK_NR (track selection), ABS_TIME (absolute time), REL_TIME (relative time from current), ABS_FRAME (frame-accurate), and REL_FRAME. Playlist Management actions: SetAVTransportURI(), SetNextAVTransportURI(), GetPositionInfo(), GetTransportInfo(), GetTransportSettings(). Device Capabilities: GetDeviceCapabilities() returns supported play modes, seek modes, and record quality modes.
The gapless playback feature in v3 uses the NextAVTransportURI mechanism. When a Control Point calls SetNextAVTransportURI() while content is playing, the service pre-buffers the next track. When the current track reaches its end, the transport transitions through TRANSITIONING (typically 0-500 ms, depending on buffering) and automatically starts playing the next track. The SetNextAVTransportURI() action returns error 705 if the next URI cannot be decoded. For seamless looping, the Control Point can set NextAVTransportURI equal to AVTransportURI before playback ends.
PlayMode control in v3 supports five standard modes: NORMAL (sequential playback, stop at end), REPEAT_ONE (loop current track), REPEAT_ALL (loop entire playlist), SHUFFLE (randomized playback order), and REPEAT_ALL_SHUFFLE (shuffle with repeat). The CurrentTrackUri and CurrentTrackMetaData state variables update automatically as the transport moves through tracks. The NumberOfTracks state variable indicates total playlist size, while CurrentTrack indicates the active position (1-based index).
Implementing AVTransport v3 correctly requires rigorous state machine management. The transport state machine must be thread-safe because multiple Control Points and internal events (track completion, buffering underrun) can trigger state transitions concurrently. The recommended implementation pattern is a single-threaded event loop with a state transition queue: actions enqueue state change requests, the event loop processes them sequentially, and events are sent for each completed transition. This avoids race conditions without requiring fine-grained locking.
Position tracking performance is critical for user experience. The RelativeTimePosition and AbsoluteTimePosition state variables must be updated at least once per second (the UPnP AV moderation guideline). Implementations should use a high-resolution timer (microsecond precision) for the underlying time base but format the position as H:MM:SS.F (hours:minutes:seconds.fractions, where fractions is 1/10 second by default or 1/100 second if DLNA.ORG_PARMAP indicates higher precision). The GetPositionInfo() action should respond within 50 ms to maintain responsive seek bar rendering on Control Points.
For multi-room audio synchronization, v3 introduces the AVTransportSyncGroup concept. Devices within the same sync group share a common clock reference and coordinate playback timing to within +/-5 ms. The GroupID and GroupCoordinatorID state variables identify the sync group, while the GroupPlaybackMode determines whether all devices play the same content (SAME) or different tracks from a shared playlist (DISTINCT). Implementation of this feature requires network time protocol (NTP or IEEE 1588 PTP) support at the OS level and careful audio buffer management to compensate for network jitter.