CURRENT_MEETING_REPORT_


Reported by Jeffrey Mogul/DEC

AGENDA

     (a) Report on current draft (McCloghrie/Fox/Mogul)
     (b) Review other alternatives
     (c) Review goals and assumptions
     (d) Obtain consensus on approach
     (e) Focus on details
     (f) What next?

MINUTES

This was the second meeting of the MTU Discovery Working Group.

We started with a quick presentation by Keith McCloghrie of the draft
that he and Rich Fox wrote based on the apparent consensus of the
December meeting.  Some attendees had not read the draft, and we tried
to ensure that everyone understood the basic outline.  [Summary:
senders occasionally attach an IP PTMU-Query Option to their datagrams.
Routers update the PMTU value in the option; the last-hop router returns
the PMTU to the sender using the ICMP Path-MTU message.  If the
destination host detects a change in the MTU (when a fragment is
received), it sends an ICMP Unexpected Fragment Report message.]

We also reviewed the "Steve Deering" proposal from last year, as there
was a realization that it might not be dead, after all.  Among other
things, we now know that there are not 1 but 4 spare bits in the IP
header (there are 3 unused in the TOS field), and that the powers that
be might therefore be likely to let us use one.  [Summary of Deering
proposal:  senders often send datagrams with "RF" (Report Fragmentation)
bit set in the IP header.  A host receiving fragment-0 of a datagram
with RF set sends an ICMP Fragmentation Occurred message.]

We then started a fairly unstructured discussion comparing the costs and
benefits of the two approaches.

  1. Lifetime of protocol:  on the one hand, in principle MTU discovery
     should be obviated by the coming revolution in routing protocols.
     Within "a few" years, the routing protocols will provide path-MTU
     information, so MTU discovery will be unnecessary.  Of course, we
     all know about things that are supposed to happen "real soon now";
     we particularly all know about relatively new things that
     "everyone" implements.  Still, while avoiding the trap of assuming
     that the world will be perfect in just a couple of years, it may
     not be worth trying to solve the problem of MTU discovery for all
     time, since it may not be useful for that long.
  2. Rapidity of deployment:  Clearly, MTU discovery of any form only
     works for a sender if some subset of the other nodes (routers

                                   1






     and/or destinations) suport it.  Query-based schemes depend upon
     support from a large fraction of the routers; RF-style schemes only
     help if a large fraction of the end-hosts support it.  There was
     some debate about which population is more likely to upgrade soon
     (routers or end-hosts).  No consensus was reached.
  3. Connection lifetimes:  Van's data suggest that most non-local TCP
     connections are short (ca.  4 datagrams).  This makes some sense
     (mostly SMTP) although this is only one sample point, and we agreed
     that more data would be useful.  Van argued that this works against
     a query-based scheme, since by the time one has useful information,
     there's not much left to do with it.  His argument in favor of the
     RF scheme was that the right way to use it is to assume that you
     can send large datagrams (sized by your first-hop MTU, or perhaps
     some estimate of the NSFNET PMTU, ca.  1500), and let the
     destination tell you if you are screwing up.
     In general, we realize that fragmentation is not inherently evil.
     Although it might create some extra overhead for the routers, what
     we really have to avoid is the "deterministic fragment loss"
     problem which causes connections to stall.  Thus, (I hope I am
     correctly paraphrasing Van's argument) it is only worth doing for
     connections that last a while, either because they are carrying
     lots of data, or because they are stalled due to fragment loss.
     Query-based schemes waste router resources because processing IP
     options is expensive, and the payoff is unlikely.
     It was argued that, since the senders cache the MTU values learned
     by either scheme in the per-host routing entries, querying would
     not have to be done on every connection to be useful.  Again, Van
     drew on his traffic studies to suggest that (even over a 12-hour
     period) there was generally little correlation between connections
     ...  that is, just because one pair of hosts makes a connection
     does not mean that they will do so any time soon.  Some of us did
     not believe that is necessarily true (for example, how much traffic
     comes from mail-hub machines like DECWRL and UUNET?) Again, we
     agreed that it would be nice to have more traffic data available.
  4. Complexity:  Now that the draft specification for the query-based
     scheme is done, we realized that it is a lot more complex than we
     thought.  One problem is the number of tunable parameters.  Since
     the RF scheme doesn't require the receiver to maintain any state
     about the sender [actually, this is not quite true, as noted
     later], doesn't require the sender to schedule when to send the
     option, doesn't cause the receiver to send notifications when
     intentional fragmentation occurs [NFS would probably not set RF],
     and it requires no support at all from the routers, it appears to
     be simpler [but keep reading].

After this discussion, it was pretty clear that the consensus had
shifted to trying to use the RF scheme.  We made the assumption that we
could get a header bit (Van argued that although the RF scheme could be
done using an option, the cost/benefit analysis might be against it).
The next step was to explore how well that would really work.

One problem that came up right away is that James VanBokkelen believes
there to exist many PC-based systems that (1) do not reassemble

                                   2






fragments (2) do advertise MSS values of 1500 to non-local peers
Currently, these hosts function because the 576-if-nonlocal rule
observed by most non-PC hosts means that, given today's Internet, even
when they advertise an MTU of 1500 to a non-local host, the host at the
other end will not send datagrams big enough to be fragmented.  [I
suppose it is unlikely for two PCs to talk to each other over long
distances.]  However, if we use the simplest RF scheme, these hosts are
going to get fragmented datagrams.  Since we assume that any host which
implements MTU discovery is also in conformance with the other rules
(specifically, fragmentation reassembly), we therefore know that such
sub-standard PCs won't send the ICMP Fragmentation Occurred message, and
these connections would stall.

The obvious fix is to not invoke MTU discovery (i.e., not send segments
> 576 bytes) unless you are sure that the other end supports it.  This
means that you have to have seen a datagram with RF set coming back to
you from the destination before you can send large datagrams.

More subtly, since we don't want to mislead these stupid PCs (which
apparently don't follow the 576-byte rule in either direction) you
cannot even send an MSS > 576 to a non-local peer until you have seen an
RF bit from it.  Thus, since the TCP MSS option can only be sent on the
SYN datagram, a host initiating a TCP connection may not be able to use
MTU discovery (and large segments) unless it has talked with the other
end recently.  (The second host is in a better position; since it sees
the RF bit before it has to sends its own MSS option, it can set a large
MSS immediately.  This is nice for FTP retrieves; it doesn't help for
SMTP, alas).

The consensus was that this limitation was acceptable, since it erred on
the conservative side.  (Although it errs on the case of the most common
connection-type [SMTP], since SMTP connections are normally short we
wouldn't gain much anyway.)  When two connections are made in quick
succession, things work nicely (e.g., several mail messages, or the
control connection of an FTP session followed by the data connection.
The control connection will seldom carry large segments, but the
exchange of RF bits done then will allow the data connection to use
large segments right away.)

Mike Karels proposed (off-the-cuff, not necessarily believing that it
was right) that routers fragmenting a datagram with RF set could also
send the fragmentation-occurred ICMP. This seemed to create problems
given the requirement for handshaking imposed by the broken-PC crowd, so
Mike agreed to go off and think about this one.

One question arose about the use of a previously unused bit in the IP
header:  what would current implementations do if they see it set?  (We
know that we can safely add options, since by definition these are
ignored if not known.)  While the IP spec says these bits must be zero,
the "robustness principle" implies that routers and hosts should ignore
them.  Unfortunately, John Moy from Proteon admitted that Proteon
routers drop such datagrams, and Noel Chiappa says that this is true of
other implementations based on his old MIT "C-gateway" code.  We have to

                                   3






find out just how bad this is going to be; perhaps Proteon will be able
to upgrade all of its customers before MTU discovery is widely
implemented.

[Side note:  Clearly, implementations contrary to the basic IP spec are
causing us serious grief.  How much do we twist the protocol to
accomodate them?]

An orthogonal issue is that in high-speed long-distance networks, there
might be lots of packets in flight when the route changes to one with a
lower MTU (e.g., on a satellite link with a half-second RTT, 4kb
packets, and 100 Mbit/sec channel, this means 1500 packets per RTT!)
Since the source cannot react to a Fragment Occurred message sooner than
one RTT worth of packets after the one that triggered the message, we
are concerned that setting the RF bit on every packet could lead to
positive (i.e., anti-stability) feedback in a network that is loosing
capacity.

This could be attacked in two ways:  limit the rate at which the RF bit
is sent, or limit the rate at which the ICMP is sent.  The former could
be done "once per RTT", once per some constant time period, or perhaps
once per window.  It's not clear if there is a convenient way of marking
out the boundaries between windows

ACTION ITEMS


  1. Noel Chiappa and Van Jacobson were assigned to try to get the IESG
     to free up an IP header bit.
  2. Mike Karels was going to think more about having routers send ICMPs
     when they fragment.
  3. We need to determine how many routers will drop packets with RF
     set, and how hard it will be to fix this.  Is it any different if
     we use one of the bits in the TOS area?
  4. Ditto for end-hosts; are there any that drop such packets?
  5. The Router Requirements WG was known to be considering changing the
     way that fragmentation was done (fragment into equal-size pieces;
     currently, routers are supposed to send N maximal-size fragments
     and one smaller one).  This would make the RF scheme nearly
     useless.  [Phil Almquist says that the RRWG will work with us on
     this, so it shouldn't be a problem].
  6. Perhaps more traffic studies would be useful.
  7. Someone has to write the next draft.  Keith and Rich were thanked
     for their hard work, on their draft that is now tabled, and were
     not coerced into starting a different document.  Since Van was the
     fiercest proponent of RF at the meeting, he was given
     responsibility to see to it that the draft is written.  He agreed
     but said he was going to try to get Steve Deering to do the work
     (Steve was absent due to serious thesis time-pressure, so maybe Van
     is going to be stuck with it.)  The chair requested a draft within
     one month (7 March 1990).
  8. James VanBokkelen was going to see just how many hosts out there

                                   4






     are unable to reassemble fragmented IPs, how hard it would be to
     fix this, how many vendors are involved, etc.


IESG ACTION

On Thursday, February 8, at the open IESG meeting, the IESG was asked to
allow this bit to be used for MTU discovery.  I was not there, but I
understand that the IESG is willing to release this bit if we come to a
consensus on a protocol that they think is reasonable.

SCHEDULE

We expect to meet again at the May IETF meeting.

At that point, we will probably either adopt one of the schemes, or give
up.



                                   5






ATTENDEES

    Ballard Bare             bare%hprnd@hplabs.hp.com
    Art Berggreen            art@sage.acc.com
    Richard Bosch            probe@mit.edu
    Ron Broersma             ron@nosc.mil
    John Cavanaugh           John.Cavanaugh@StPaul.ncr.com
    Noel Chiappa             jnc@LCS.MIT.EDU
    James Davin              jrd@ptt.lcs.mit.edu
    Farokh Deboo             sun!iruucp!ntrlink!fjd
    Rich Fox                 sytek!rfox@sun.com
    Van Jacobson             van@lbl-csam.arpa
    Mike Karels              karels@berkeley.edu
    Mike Marcinkevicz        mdm@gumby.dsd.trw.com
    Tony Mason               mason@transarc.com
    Keith McCloghrie         sytek!kzm@hplabs.HP.COM
    Bill Melohn              melohn@sun.com
    Jeff Mogul               mogul@decwrl.dec.com
    John Moy                 jmoy@proteon.com
    Drew Perkins             ddp@andrew.cmu.edu
    Michael Petry            petry@trantor.umd.edu
    Nuggehalli Pradeep       pradeep@orville.nas.nasa.gov
    Mark Rosenstein          mar@athena.mit.edu
    Tony Staw                staw@marvin.enet.dec.com
    James VanBokkelen        jbvb@ftp.com
    John Veizades            veizades@apple.com
    Steve Willis             swillis@wellfleet.com
    John Wobus               JMWobus@suvm.acs.syr.edu
    David Zimmerman          dpz@convex.com



                                   6