MessageWay BOF (MSGWAY)

Reported by Danny Cohen/Myricom

The MSGWAY BOF, chaired by Danny Cohen, was held on Tuesday, 4 April, at
the 32nd IETF meeting in Danvers, MA. Sixteen people attended.

Danny presented the problem and a proposed approach (see slides).  A
discussion of MsgWay and of the working group followed.


The Problem

While the speed of computing circuits increases with time, the speed of
light is unchanged.  As a result, distances shrink.  For example, the
diameter of Ethernets has shrunk from 2Km to 0.2Km as their speed grew
from 10 to 100Mbps.  Similarly, buses that used to dominate the
inter-board communication (e.g., VME) are useful nowadays mainly for
intra-board communication (e.g., PCI).

Modern computing systems in general, and MPPs (Massively Parallel
Processing systems) in particular, use ``I/O fabric'' (MPP-networks) in
stead of the traditional I/O buses.

Most MPP-networks are built to handle variable length packets, are made
of very short point-to-point FDX links of high performance (high data
rates, low latency, and very low BER), have error detection and flow
control, and use cut-thru (aka ``wormhole'') switches with source
routing.  In spite of these common similarities, each MPP network is
typically an island unto itself, incapable of interoperating with other
MPP-networks.

There is a need to use several homogeneous MPPs, and clusters (or
networks) of workstations, as a single MPP, without losing the high
performance communication native to them.


A Proposed Approach

The interoperability between heterogeneous computing systems should be
handled at Level 3, like IP, not just at the lower levels.

IP, that has successfully served the Internet for over 20 years as the
basic tool for interoperability among heterogeneous computers, is not
appropriate for MsgWay because many tradeoffs were made sacrificing high
performance for generality and scalability.  In addition, IP does not
address individual processors in MPPs.  (However, IPv6 could have been
modified to fix this problem.)

The MsgWay approach is to define a Level 3 protocol that is similar in
its philosophy to IP, but has implementation details geared toward
high-performance, possibly at some cost of generality and wide area
scalability.

MsgWay will be a Level 3 protocol, like IP, which could support IP (by
encapsulation).  The MsgWay protocol will have both an EEP (end-to-end
protocol, like IP) and an RRP (router-to-router protocol, like GGP).

Among the tradeoffs that made IP general, but proved to be deficiencies
for high performance are:


   o Long addresses (32 going on 128 bits)
   o Addressing ``hosts'' only (not individual processors)
   o No support of source routing
   o Need for routers with global knowledge
   o Hierarchical de-muxing
   o No flow control
   o No error detection
   o No fault recovery
   o No support of DMA
   o No support of byte alignment
   o Fields not sorted by need


MsgWay will alleviate these deficiencies by having addresses of 16-bits,
that could be dynamically assigned for sessions.  MsgWay will support
both source routing (for Level 2 forwarding) and Level 3 addressing (for
Level 3 forwarding).  The use of the source routing would allow the
MsgWay switches to operate without any routing knowledge that has to be
loaded to them.  MsgWay will have format to support zero-copy operation
(i.e., direct copy from the network interface into the destination user
area).  MsgWay would have flow control based on the flow control of the
participating networks.  Similarly, MsgWay would have error indication
in trailers, to allow the use of various CRC hardware.  Even though each
participating network may use a different technique for error detection,
MsgWay would have a uniform way to indicate errors.  MsgWay will address
the alignment issue, to allow computers with different chunks (such as
the Paragon's 8B-chunks, RACEway's 4B-chunks, and Myrinet's 2B-chucks)
to efficiently communicate.  In addition, in order to minimize the
wormhole latency, the fields in the MsgWay protocol header will be
sorted by their need (e.g., starting with the destination address that
is always needed).

MsgWay would support dynamic mapping and discovery required for
automatic fault recovery.

Like IP, MsgWay does not define performance figures, connectors,
communication media, address assignment, routing and discovery, APIs,
and so on.  Following the IP philosophy, all these issues will be
defined separately.

MsgWay defines a Level 3 protocol for interoperability of heterogeneous
multi-processors at high performance.



Discussion


A discussion about the proposed MsgWay activity followed the above
presentation.  Several questions were raised by the participants in the
BOF.


   o Why should MsgWay be an IETF activity?

         It is proposed to conduct the MsgWay activity as an IETF
         working group because of the firm belief that interoperability
         should be handled at Level 3 (not just at Levels 1 and 2), and
         because of the recognition that MPP-networks are computer
         communication networks with much in common with the networks
         that the IETF community is dealing with.  MsgWay is a small
         computer network, not an extended computer bus.


   o Why not use IP ``as is'' with slight modifications, as needed for
     high performance?

         It is believed that this is the proposed approach.


   o What about transport level issues, like reliability (a la TCP)?

         It is left for higher level protocols, as/if needed (note that
         this is exactly IP's approach).


   o Must MsgWay hosts use source-route?

         No.  MsgWay will support both Level 2 forwarding (by source
         routes) and Level 3 forwarding (by addresses).


   o Must processors in the same host (say a Paragon) use MsgWay among
     each other?

         No.  They may use their native communication system.  For
         generality the API may look the same but there is no need to
         use MsgWay for internal communication within a system.  This is
         similar to the use of IP between hosts on the same LAN. (Hosts
         on the same ethernet could communicate by raw ethernet packets,
         without IP - but using IP has some advantages.)


   o How is the Source Route handled?

         It is consumed along the way (not an incremented pointer).
         This allows each network along the path to be presented with
         exactly the optimal bit pattern for its use.  Note that this
         requires recomputing the checksum.


   o MTU?

         The maximum packet size will be configured for the entire
         MsgWay (probably not exceeding a few KBytes).  It is assumed
         that each participating network can handle large packets.
         There is no need to legislate that all MsgWay's always have the
         same MTU. It is expected that the mapping process will
         automatically discover the MTU and disseminate it.


   o Interconnection of separate MsgWay-islands?

         MsgWay-islands could be interconnected via IP. They could be
         either (1) interconnected by using IP as a tunnel encapsulating
         MsgWay, or (2) connected by using IP and having the
         MsgWay-islands independent of each other (treating MsgWay as a
         LAN). Once IP is used over WANs the high-performance of MsgWay
         is most likely to be lost.


   o Up to how many stages of source-route make sense (rather than
     addresses)?

         This is a runtime binding.  No need to decide at committee
         time.  Msgway should be able to handle both.



The MSGWAY Working Group

Mailing list information for the MSGWAY group:


      General Discussion:    MsgWay@myri.com
      To Subscribe:          MsgWay-request@myri.com
      Archive:               ftp://ftp.isi.edu/msgway/msgway.mail


Danny will work with Frank Kastenholz, one of the Internet Area
co-Directors, on a draft charter for the proposed working group and will
post it on the mailing list.

The MSGWAY Working Group is expected to conduct its work over e-mail, to
meet at IETF meetings, and to possibly have additional meetings between
IETF meetings.

Danny reported that in addition to those who participated in the 32nd
IETF BOF meeting, there are about 20 other people from academia,
government, and industry (see list below) who expressed interest in
participating in defining MsgWay.  Most of them had already participated
in two meetings discussing MsgWay (January 1995 in Utah, and March 1995
in Florida).  Most of these people expressed interest in participating
in the IETF MSGWAY Working Group.  Given that both Jon Postel and Danny
Cohen were already scheduled to be in this IETF BOF meeting, the others
were advised that their presence at this meeting was not necessary.

Among those who participated in the earlier MsgWay meetings are people
from Intel, Mercury and Myricom that are committed to implement and to
demonstrate interoperability among Intel's Paragon, Mercury's RACEway,
and Myricom's Myrinet, using the format that will be adopted for MsgWay
by the MSGWAY Working Group.



                                Academia

 Jon Postel       Postel@isi.edu              USC/ISI
 Tony Skjellum    tony@aurora.cs.msstate.edu  Mississippi State University
 Al Davis         ald@cs.utah.edu             Univ of Utah/CSD
 Barney Maccabe   maccabe@cs.unm.edu          UNM/CS + Sandia
 Stu Tewksbury    skt@msrc.wvu.edu            West Virginia University
 Andy White       abw@lanl.gov                Los Alamos National Lab

                                Government

 Mike.  Masters   mmaster@ariel.nswc.navy.mil Naval Surface Warfare Center
 Jose L. Munoz    munoz@arpa.mil              ARPA/CSTO
 Bob Parker       rparker@arpa.mil            ARPA/CSTO

                                 Industry

 Danny Cohen      Cohen@myri.com              Myricom
 Chuck Seitz      Chuck@myri.com              Myricom
 Craig Lund       clund@mc.com                Mercury Computer Systems
 Alan L. Pool     alp@mc.com                  Mercury Computer Systems
 Bob Graybill     graybill@mmlgrf.mml.mmc.com Martin Marietta Laboratories
 Greg Chesson     greg@sgi.com                Silicon Graphics
 Glenn.  Ladd.    gladd@msmail4.hac.com       Hughes
 Lloyd Lewins     llewins@msmail4.hac.com     Hughes
 Phil Sementilli  sement@igate1.hac.com       Hughes Missiles
 Dave Dunning     ddunning@ssd.intel.com      Intel SSD
 Paul Pierce      prp@ssd.intel.com           Intel SSD
 Stephen Wheat    srwheat@ssd.intel.com       Intel SSD
 Joe Brewer       JoeEBrewer@aol.com          Westinghouse
 Bob Means        rwm@hnc.com                 HNC, Inc.
 Marc Campbell    campbellm@aol.com           Northrop-Grumman


Schedule

It is expected to have a rough draft of the minimal MsgWay protocol by
the 33rd IETF meeting in mid-July.

It is expected that the first interoperability demonstration will take
place no later than October 1995.  Myrinet, RACEway, and Intel's Paragon
are expected to participate in that interoperability demonstration.


Legalities

Frank Kastenholz of FTP Software brought up legal issues.  It was
suggested that Danny should check with Carl Malamud about related
patents that may be in the way of MsgWay.  (Already done.)

It was reported that Myricom has trademarked both MessageWay and Msgway,
for free use by this activity.

By including the slides and this text in the proceedings of the IETF we
are establishing MsgWay prior-art at least for April 95.