Minutes of the Audio/Video Transport Working Group

Reported by Steve Casner

1.  Introduction and status

The AVT working group had not planned to meet in Munich during a
period of dormancy while implementation and testing of RTCP scaling
mechanisms was underway.  However, in response to several requests,
one AVT session was scheduled to discuss a new problem with the RTCP
scaling plus open issues for several new payload formats that were
submitted since the last meeting.

The next goal for AVT is to get RFCs 1889 and 1890, the Real-time
Transport Protocol and the companion RTP profile for audio/video
conferencing, revised for advancement to Draft Standard by year's end.
The profile has been revised in draft-ietf-avt-new-profile-01.txt and
.ps.  The main RTP specification needs to incorporate RTCP scaling
changes that are still being refined; however, an interim first draft
of the other revisions, including extensions for layered encodings,
should be published as soon as posible.

As of the Munich meeting, the RTP payload format for H.263 video had
been approved for publication as a Proposed Standard, and the payload
format for redundant audio was close to approval.  (Since the meeting,
both have been published as RFCs 2190 and 2198.)  The drafts on
compression of IP/UDP/RTP headers and a revision to the RTP payload
format for MPEG2 in RFC 2038 have been posted for IESG Last Call
before publication as Proposed Standards.


2.  RTCP "timer reconsideration"

Jonathan Rosenberg gave a brief review of the problem that a flood of
RTCP packets can occur if a large number of participants
simultaneously join a session.  This problem and the solution of RTCP
"timer reconsideration" have been discussed at the previous two IETF
meetings and are explained in the recently posted Internet-Draft
draft-ietf-avt-reconsider-00.ps.  However, a concern raised at the
last meeting was that the timer reconsideration algorithm exhibits a
"plateau effect" wherein no RTCP packets are sent for a period after
the algorithm stops the flood.  New simulations show that the plateau
effect is significantly reduced as the joins are spread out in time
and disappears when the join rate is no more than 500 users/sec.
Since perfectly synchronized joins are very unlikely, the plateau
effect is not considered to be a problem.

There is an analogous problem of an RTCP BYE packet flood on
simultaneous leaves.  Unlike the initial RTCP packets at the time of
joining, BYE packets can't be delayed because the application is
terminating.  Some partial remedies were discussed at the last
meeting.

The newly discovered problem is a relatively minor one related to the
BYE flood: if many participants leave a session at once, other
participants that remain in the session may be falsely timed out.
Some of the remaining participants will have their RTCP interval
expire earlier than others.  These participants will reduce their
estimate of the number of participant and consequently also reduce
their RTCP interval.  Since the remaining participants may still have
a long time to wait before their previously calculated RTCP interval
expires, they might not send any RTCP packets while multiple shorter
RTCP intervals elapse for the particpants that have noticed the drop.
The long-waiting participants will therefore be timed out (considered
inactive) by those that have adjusted to shorter intervals.  Even if
the BYE flood is slowed to the normal 5% RTCP bandwidth, simulations
show that a single missed packet can cause a timeout.

An extension to the timer reconsideration algorithm, dubbed "reverse
reconsideration", recomputes the RTCP timer whenever the group size
estimate decreases (due to a BYE).  This significantly reduces the
amount of time during which the group size estimate may be wrong, but
there is still a problem that the estimate can drop to zero.  This
problem can be eliminated, as suggested in an INFOCOM97 paper by
Sharma, Estrin, Floyd and Jacobson on a similar problem, using a
filter to slow the rate of decrease of the estimate.  Further analysis
to determine the right kind of filter is underway.

Jon Crowcroft suggested keeping a bit of history from interval to
interval and continuously estimating the group size to use as a
predictor.  However, a predictor might not work well with unexpected
sharp transitions in group size.  He also suggested that the BYE flood
might be managed by having participants send BYEs with a limited scope
and then having the participants that remain act as proxies to
retransmit the BYEs spread out in time.  A proxy election protocol
would be needed.  This starts to get pretty complicated.

Dave Oran suggested that there could be a server thread that takes
responsibility for sending a delayed BYE after the application is
closed, just as TCP implementations must save some state when a
connection closes to shut down gracefully.  Steve Casner responded
that if some implementations can do this, great, but we can't assume
all will.  Similarly, it is important to note that although we want to
define a timer reconsideration algorithm to go in the appendix of the
RTP spec, different implementations can implement different timer
reconsideration algorithms and the whole system will still work.

The group was asked if there were any objections to the proposal to
add the timer reconsideration algorithm to the RTP spec.  There were
none.


2.1 Large-scale tests of the RTCP scaling mechanisms

The timer reconsideration algorithm has been simulated, but we would
have more confidence if it was tested on a large scale with real
implementations.  If that is not practical, testing on a large-scale
simulation/emulation testbed network would be a good backup for the
initial simulations.  The audience was queried about any planned
tests, but no plans were reported at this time.

Jonthan Rosenberg is implementing a test program called rtpbomb that
can emulate a large number of participants.  It may be used in
combination real participants to test timer reconsideration and other
aspects of RTCP scaling.  His analysis suggests that most of the
phenomena in the algorithm are linear with group size, so a smaller
test should still be useful to predict behavior of a larger group.


3. Other RTP Issues/Questions

Steve Casner brought up three other issues for the group to consider,
but there were few comments:

  - During the discussion of the global multicast address allocation
    scheme in MBONED working group, Van Jacobson asserted that
    applications should not depend upon sequential multicast addresses
    being allocated for multiple groups carrying the different layers
    of a layered coding scheme.  The layered encoding extension to
    RTP proposed by Speer and McCanne does assume sequential
    addresses, so a revision may be necessary.

  - AVT has received a Request for Proposals from the DAVIC group.  It
    would be difficult for the group as a whole to respond, but
    individual participants are invited to do so.

  - Should AVT set a policy of allocating no more static payload
    types such that dynamic payload types should be used instead?

Scott Petrack responded to the last point to say that we should
consider none of the assigned payload types to be static, but rather
default assignments.  When a session control protocol is in use, if
there is a need for more than the 32 dynamic payload types already set
aside, the default values could be reassigned.  The group seemed to be
in general agreement with this proposal, but it should be discussed on
the mailing list as well.


4. New RTP payload format proposals

It is expected that additional payload formats for RTP will continue
to be developed as new encodings are developed and as RTP is employed
in new applications.  At this meeting, payload formats were introduced
for H.263+ and BT-656 video, MPEG4 and QuickTime multimedia, and DTMF
tone signalling in audio.  In addition, "meta" payload formats that
specify methods for repair of packet losses were discussed, including
the question of how much error correction is appropriate.

4.1 H.263+ video payload format

As noted in the introduction, the payload format specification for
H.263 video has been published as a Proposed Standard.  However,
enhancement of the encoding itself continues under the label H.263+
for improved loss resilience and to utilize increased processing power
to achieve higher quality.  As agreed in previous AVT meetings, a
separate RTP payload format is to be devised to support the enhanced
encoding.  In fact, two different approaches were proposed in draft
payload format specifications sent to the AVT mailing list just before
the meeting.  Stephan Wenger, who is the primary author of one and a
co-author on the other, gave an overview of the H.263+ encoding
enhancements and the tradeoffs between the two payload formats.

The first approach, from TU-Berlin and U-Bremen, uses a very simple
payload header but includes the H.263+ picture header in each packet
except for "follow-on packets" when app-level fragmentation is done.
This allows for the use of almost all of the optional modes defined in
the H.263+ encoding scheme.  The second approach, from Intel, follows
more the design of the existing H.263 payload format in which selected
fields from the picture header are incorporated into a set of payload
format headers for different modes.  This approach reduces overhead to
support small packet sizes but precludes use of some encoding options
deemed to be rarely used.

Wenger stated the intention of the authors of both approaches to
resolve the differences and produce one merged proposal.  He suggested
that the merged proposal might include both approaches.  Steve Casner
countered that this might not be a good result because it increases
complexity and reduces interoperability -- putting in all the options
when a decision can't be made is often the result of a committee
design.  Christian Huitema commented that some encoding functions were
left out when the H.261 payload format was designed because they were
not applicable to packet transport, and asked if a similar analysis
could be made here.  Wenger replied that H.263+ has been designed to
work with packet loss and that some of the optional modes were
added expressly to deal with packet loss.

Wenger requested input soon from the working group regarding the
tradeoffs between the two approaches because the recommendations for
mode combinations to use are being prepared now by the ad-hoc group he
leads.  The two specifications should be posted as Internet-Drafts for
wider exposure.

4.2 MPEG4 payload format

There is no draft payload format specification for MPEG4 yet because
Gerard Fernando would like feedback on some design questions first.
He gave a brief overview of the scope and structure of MPEG4, which is
a framework for integrating different kinds of natural and synthetic
media streams built around audio/visual objects called AVIOs.  There
are two primary questions:

  - How can MPEG4 payload format refer to existing and forthcoming
    payload formats for the encodings of individual media streams
    (AVIOs) rather than trying to redo them all?

  - How should the "scene description data", which composes the
    individual AVIOs into the complete presentation, be transmitted?
    If this data is lost, the whole scene is lost, so adequate loss
    resilience mechanisms must be employed.  Can the scene description
    data be decomposed over both time and "space" -- time because a
    transmission may be joined after the start, and space because
    media are sent in separate streams and some receivers might not
    tune into all of them?

MPEG4 is still being developed, with standardization due in January
1999.  The MPEG4 committee would like to work with AVT to ensure that
packetization considerations are included in the design; they view
RTP/UDP multiplexing as an appropriate means of multiplexing
elementary streams.  This is clearly beneficial from AVT's point of
view as well.  One aspect of the use of separate media streams in RTP
is the ability to apply different network QoS and reliability
mechanisms as needed; the VMIF part of MPEG4 is trying to formalize
the selection of stream-specific QoS in a transport-independent and
network-independent way.  Fernando and Casner are to explore what
formal or informal liaison procedures should be followed in this case,
both for information transfer and to enable reference to RTP in the
MPEG4 specification.

4.3 BT-656 video payload format

Dermot Tynan presented a proposal, draft-tynan-rtp-bt656-00.txt, for
carrying ITU-R BT.656-3 uncompressed video over RTP.  BT656 is
studio-quality digital video sampled according to BT.601-5 (formerly
CCIR601) at 13.5 or 18 MHz.  At the normal, lower rate, each scan line
contains 720 samples occupying 1440 bytes in the 4:2:2 chrominance
encoding.  At the "high definition" rate, each line contains 1144
samples for NTSC or 1152 samples for PAL.  The payload format consists
of a simple header with bit fields to indicate NTSC/PAL, sampling
rate, framing and scan line information, followed by one scan line of
samples.

Steve Casner expressed the concern that the packet size might exceed
the MTU for some networks.  Although at the lower sampling rate an
IP/UDP/RTP packet will fit within a 1500-byte MTU, at the higher
sampling it would not.  Don Hoffman pointed out that if the packet
size does exceed the MTU such that IP fragmentation occurs, integrated
services packet classification won't work on fragments other than the
first so those packets won't get the desired QoS.  Christian Huitema
claimed it is important to have application-level fragmentation so
that services such as forward error correction will apply when a
fragment is lost.  It should not cost much to be able to indicate that
a packet contained part of a scan line, perhaps by giving the starting
sample number.  Tynan agreed to consider this addition.

4.4 QuickTime payload format

Alagu Periyannan just asked to call the working group's attention to
the recently posted draft "RTP Payload Format for QuickTime Media
Streams", draft-ietf-avt-qt-rtp-00.txt, and in particular to the list
of open issues in section 4 of that draft.  The motivation in
developing this format was to carry all the payloads in QuickTime
without having to define an RTP payload format for each.

Steve Casner pointed out that the tradeoff of this technique is that
there is some amount of additional constant description info that must
be carried in each packet but would not be needed with separate
formats.  Philipp Hoschka asked if it wouldn't be better to define
individual payload formats because the encodings might also be used
outside of QuickTime as well.  Periyannan replied that where
individual payload formats are defined, they should be used in
preference to this format.  However, several of the encodings used
with QuickTime are not standardized, such as Apple Video, Cinepak, and
several proprietary codecs.

4.5 DTMF audio payload format

Jonathan Rosenberg presented Henning Schulzrinne's proposed audio
payload format to carry DTMF (tone dialing) signals as defined in
draft-ietf-avt-dtmf-00.txt and .ps.  This payload format might be used
for calling across the Internet through IP telephony gateways to
control an answering machine or other device.  Low data rate speech
codecs may not reproduce the DTMF tones faithfully enough to work
properly at the far end.  This payload format essentially provides a
very low rate encoding specialized for DTMF tones.

The draft defines a primary format in which each DTMF digit is
represented by 32 bits to control frequency, amplitude and duration.
Redundancy can be provided using the mechanism specified in RFC 2198.
This introduces 64 bits of overhead, but the data rate is so low that
this probably does not matter.  Alternatively, a more compact
representation of each digit would allow adding redundancy within the
DTMF payload format and still fitting each digit into 32 bits.  The
selection between these techniques is an open issue on which input
from the working group is sought.  Steve Casner noted that the more
compact format would require the RTP clock rate to be different than
that of the normal audio payload format within which the DTMF packets
are interspersed, and suggested that this is a significant enough
disadvantage to prefer the larger format.  Scott Petrack agreed that
the size difference was probably not significant.

4.6 Forward Error Correction payload format

Two "meta" payload formats on forward error correction (that is,
independent of media type and format) have recently been posted:
draft-budge-media-error-correction-00.txt by Budge, et al., and
draft-ietf-avt-fec-00.txt by Rosenberg and Schulzrinne.  The authors
of the first draft did not attend, but Jonathan Rosenberg presented
the second draft which builds on ideas from the first, and compared
the two drafts.  Both schemes are based on the idea of sending
additional FEC packets which are the XOR of multiple packets from the
original packet stream.

It should be noted that the scheme presented by Rosenberg differs from
that described in the second draft.  An RTP header extension (X bit)
is no longer used; instead, the payload type is changed to indicate an
FEC packet, and an additional format-specific header is inserted
before the XOR of the covered payloads, as in the Budge scheme.

One drawback of the Budge draft was that the timestamp and marker bit
of lost packets could not always be recovered.  The proposed remedy is
for these fields in the FEC packets to be the XOR of the corresponding
fields from the original packets covered by the FEC packet.  This has
its own drawback that the timestamp will vary erratically, perturbing
the jitter feedback calculation unless the FEC packets are excluded.
Carsten Bormann also pointed out that when RTP header fields are
perturbed from their usual increments, it can have a negative impact
on the efficiency of RTP header compression.

A second drawback of the Budge scheme was that the payload type was
changed in the original packets to be the FEC payload type which means
that a receivers without FEC capability could not receive just the
original packets and ignore the FEC packets.  In Rosenberg's scheme,
the original packets are unmodified, including the payload type.

To extend the range of packet patterns that could be covered by an FEC
without having to predefine specific scemes, Rosenberg proposes to
carry a bit mask to indicate the pattern explicitly.  The RTP sequence
number of FEC packets in Rosenberg's proposal is the minimum of the
sequence numbers of the packets covered by the FEC to serve as a base
for the mask.  Steve Casner claimed that this is not acceptable
because it will prevent the FEC packets from passing header validation
and duplicate suppression algorithms in RTP packet processing.
Rosenberg said alternative sequence number schemes had been
considered, but the cost is additional overhead.

Christian Huitema argued that if a general FEC payload format is to be
defined, it should support other schemes with better performance and
only a marginal increase in complexity compared to that of "n+1"
schemes like XOR.  For example, it is possible to specify a scheme
that adds two FEC packets to eight original packets which will allow
recovery from a loss of any two of the ten packets.  This is important
because simulations have shown that one packet of redundancy was not
enough.  Rosenberg agreed that the FEC proposals need further
refinement.

4.7 Applicability of error correction

In addition to the redundant audio and FEC payload formats already
mentioned, retransmission and interleaving are two more loss
resilience schemes that might be employed with RTP.  Colin Perkins
reviewed these four methods to compare the tradeoffs in latency,
bandwidth overhead and processing overhead (details are available at
http://www.cs.ucl.ac.uk/staff/c.perkins/slides/).  It is an important
question to consider when these schemes are applicable to particular
applications or to particular network conditions.

Colin presented some network loss statistics showing that single
packet losses predominated by a factor of four over two-packet loss
bursts.  The probability drops rapidly such that long burst losses are
rare.  On the other hand, in a large conference, most packets will be
lost by at least one receiver, therefore a retransmission scheme may
need to resend almost every packet.

The redundant transmission scheme has low latency and low bandwidth
overhead, but potentially high processor overhead if the low-rate
redundant encoding is computationally complex.  A retransmission
scheme is likely to impose much longer delay and will incur the
overhead of control traffic in addition to the duplicate data, but
provides exact repair and can correct more than single-packet losses.
There is a synergy between redundancy and retransmission in that
redundancy can cover most of the errors, leaving retransmission to
pick up the remainder for those receivers that care more about low
loss than low delay.

Interleaving disperses the effects of loss, but does not eliminate
them.  It has low overhead, but high latency.  FEC may have a
similarly high latency or a high bandwidth overhead, depending on the
size of the pattern covered, but its major advantage is media and
format independence.  For interactive applications, either redundant
transmission or a low-latency FEC would seem most suitable, while for
broadcast-style applications interleaving works well.

An important open question is what constitutes a sensible operating
point for real-time media transmission?  How much loss should
applications try to cope with before declaring that the user should
try again at some less congested time?  There is little congestion
control in many real-time media applications, and extreme loss
resilience measures would just worsen congestion.  That would not be
network friendly.  If fair congestion avoidance mechanisms are
deployed in routers, then applications that don't implement congestion
control may be penalized.  Therefore, any loss resilience schemes that
are defined for RTP should consider not just loss performance but also
impact on the network.

5. Revision of RTP MIB specification

The final topic of the meeting was an update by Mark Baugher on the
RTP MIB design and implementation plans.  An interim draft of the MIB
was sent to the mailing list before the meeting to provide an
opportunity for feedback; a real Internet-Draft will be posted in
October.  A more complete description of the MIB was given in Memphis;
the changes since then were to fix errors identified by Fred Baker in
his review of the MIB, to add reporting of receiver feedback, to
support RTP operation over different underlying protocols, and to
remove from the specification those parts that won't be included in
initial implementions and hence won't be validated yet.  In
particular, support for RTP translators was removed.

Receiver feedback is reported through additional tables that are
created only upon request from the network manager as a side-effect of
creating an entry in the Session Table.  This avoids the state
explosion that might occur if all receiver feedback was always
monitored for all sessions.

There are a few details remaining to be worked out, but implementation
of a network management application using the RTP MIB is underway to
evaluate how well this MIB works for managing real-time applications
on networks that weren't designed for them.  To make this really
workable, the management application needs to handle multicast routing
in addition to RTP.  The next stage of MIB development will be based
on the results of the implementation.

6.  Next meeting

Steve Casner closed the meeting saying that AVT will meet again at the
next IETF in December.  As stated above, a goal for the working group
is to get the RTP spec revised for Draft Standard by then.  In
addition, there is outstanding work on several topics presented at
this meeting which should be discussed on the mailing list so that
completed drafts are ready by the next meeting.