CCAMP Working Group Richard Rabbat (Fujitsu Labs. of America) Internet Draft Ching-Fong Su (Fujitsu Labs. of America) Expires: December 2004 Vishal Sharma (Metanoia, Inc.) June 2004 Observations on the Applicability of the Fault Notification Protocol draft-rabbat-fnp-applicability-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The Fault Notification Protocol (FNP) is a set of procedures designed to enable time-bounded failure notification in networks using an IP- based control plane. This document discusses the applicability of FNP in the context of optical transport networks. It highlights the protocolÆs principles of operation, and then describes the network, node, fault, and operational models in optical networks for which the protocol is designed. It also discusses the relationship to higher layers, and issues of scalability. Some guidelines for deployment are also provided. Rabbat, et al Expires - December 2004 [Page 1] draft-rabbat-fnp-applicability-01.txt June 2004 Table of Contents 1. Introduction...................................................2 2. Terminology....................................................2 3. Operational Overview of FNP....................................3 4. FNP Applicability..............................................4 4.1 Network Model.................................................4 4.2 Node Architecture.............................................4 4.3 Fault Model (Types of faults supported).......................4 4.4 Network Layer at which FNP Applies............................5 4.5 Relationship to Higher (Packet) Layers........................5 4.6 Operational Model.............................................5 4.7 Framing and Data Plane Considerations.........................6 4.8 Scalability Considerations....................................6 4.9 Guidelines for Deployment.....................................7 5. Conclusion.....................................................7 6. Acknowledgements...............................................7 7. Intellectual Property Considerations...........................8 8. References.....................................................8 9. Authors' Addresses.............................................9 10. Full Copyright Statement......................................9 1. Introduction As carriers move towards offering advanced services on their networks, with a tighter integration of the different network layers, the ability to provide rapid, scalable, and timely restoration is crucial for meeting agreed-upon SLAs, either between providers or between the end-customer and a provider. In this context, time- bounded fault notification will be a key component of the overall carrier restoration strategy. The Fault Notification Protocol (FNP) [2] is a protocol developed to meet this service provider requirement. It is designed to facilitate rapid recovery by enabling time-bounded fault notification in networks that use an IP-based control plane. The purpose of this memo is to discuss the applicability of FNP in the context of optical transport networks. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3]. Rabbat, et al Expires - December 2004 [Page 2] draft-rabbat-fnp-applicability-01.txt June 2004 3. Operational Overview of FNP In this section, we briefly review the basic operation of FNP while confining our discussion here to optical transport networks. Fundamentally, FNP is a set of procedures designed to provide time- bounded fault notification in a network with shared protection. That is, a network where either the protection route between two nodes carries ôextra trafficö from two or more disjoint trails, or the provider implements M:N type shared restoration. Once a network fault is detected at T-detect, and a set hold-off time T2 has expired at time T-start-ntf, the node detecting the fault sends out a fault notification message to each of its neighbors on the control plane. The message essentially identifies the resource (fiber, lambda, or node) at fault; this allows any network node receiving a fault notification message to determine whether it lies on the path of a backup LSP corresponding to a working LSP affected by the fault. The message also carries the time (per the local clock) at which the notification process started T-start-ntf. Each network node, upon the receipt of a fault notification message first transmits the message on each of its remaining outgoing interfaces, and then processes the message to determine whether it lies on the path of a backup LSP(s) that needs to be activated as a result of that fault. If so, the node first drops any extra-traffic that was using the resources originally reserved for this backup LSP, and reconfigures its cross-connect hardware so that the working traffic arriving on the backup LSP can be directed to the appropriate outgoing link/interface. The flooding mechanism ensures that information about the fault is propagated to each network node in the minimum number of hops on the control plane and, provided that the fault notification packet gets high-priority in the transmission queues at each node, also that fault notification is propagated in the shortest possible time. An ingress node, upon receiving the notification message, waits for an amount of time that is the difference of the upper bound on the notification time (T-ntf) and the time at which the fault notification started (T-start-ntf), and then switches traffic from the affected working path(s) to the backup path(s). Note that this eliminates a phase of signaling that would typically be needed in a signaling-based approach to activate the nodes along the backup LSP. The key is to ensure that by the time a protection-switching node performs the switch, all intermediate nodes along the associated backup path(s) will have configured themselves. This is assured by selecting a backup path in such a way that for any fault on the Rabbat, et al Expires - December 2004 [Page 3] draft-rabbat-fnp-applicability-01.txt June 2004 corresponding working path, all of the nodes along the backup path will have been informed (and will have reconfigured themselves) within a time bound T-ntf following T-start-ntf. Therefore, each node on the recovery path performs the switch [T-ntf û (T-current û T- start-ntf)] milliseconds after learning of the fault via a fault notification message. 4. FNP Applicability Our objective in this section is to clearly specify how and where FNP applies in the context of optical transport networks, by discussing its applicability along several dimensions, as outlined below. 4.1 Network Model FNP is initially designed to operate within a single IGP area, where fine-grained signaling is used. In fine-grained signaling, the entire backup resource (link, lambda, and hence, label) is selected during the initial signaling phase for the backup path. Although FNP could also apply to coarse-grained signaling (where only a link bundle is selected during the signaling of the backup path, but the specific lambda and, hence, label, is selected upon the occurrence of a failure) that requires coordination with signaling between adjacent nodes, and is left for further study. FNP is useful in contexts where either: (a) the provider implements 1:1 restoration and allows the bandwidth on the backup path to be shared by trails that originate and terminate at nodes other than the s-d of the backup path, or (b) the provider implements more general shared-mesh restoration, where multiple working LSPs with disjoint paths share backup resources. 4.2 Node Architecture FNP is designed to work in networks with OEO nodes. Its applicability to networks with OOO nodes (that is, fully transparent all-optical networks) depends on the monitoring capabilities of the OOO systems deployed, and is for further study. For a network with OEO nodes, the fault detection and correlation (which happens before FNP is activated, and is outside the scope of this document) occurs at the node closest to the fault. Once the detection procedure has determined that a bonafide fault has occurred, it activates FNP for fault notification. 4.3 Fault Model (Types of faults supported) Rabbat, et al Expires - December 2004 [Page 4] draft-rabbat-fnp-applicability-01.txt June 2004 FNP is designed to support three types of faults in an optical transport network û fiber cuts, transponder failures, and switch failures. These correspond, respectively, to link faults, lightpath or LSP faults, and node faults. 4.4 Network Layer at which FNP Applies In the case of optical transport networks, FNP is designed to operate at the fiber and optical lightpath layers. The protocol works in the context of an optical transport layer that is controlled by an IP- based control plane. The operation of FNP in a multi-layer context, is a complex problem, and is for further study. (For example, in a multi-layer situation, the goal might be to perform notification both at the layer closest to the fault (as FNP currently does) and at the service layer (for example at the level of a VT1.5 circuit that may typically be embedded inside a larger SONET/SDH circuit on a lightpath).) 4.5 Relationship to Higher (Packet) Layers A key aspect of using FNP at the optical transport layer to provide time-bounded notification (and hence recovery) is to be able to provide the higher (packet) layer some guarantees on how long the optical transport layer would take to respond to a failure. This allows carriers to implement appropriate hold-off timers at the higher-layers, and to use this information to craft adequate SLAÆs with their customers. In the event where the client layers (higher (packet) layers) and the server layer (the optical transport layer) are under the control of different providers, it is reasonable to expect that the inter- provider agreements between the carriers would incorporate protection switching timing bounds. In that case, notification timing bound guarantees provided by the carrier owning/operating the server layer would be useful to enable the carrier owning/operating the client layer to, in turn, incorporate these in the SLAÆs it signs. This notion could be applied recursively between pairs of adjacent carriers. 4.6 Operational Model FNP is applicable in a hierarchical network layering model, for example, packet over SONET/SDH over lambda over fiber, with the recognition that the SONET/SDH layer is itself a layered architecture (for example, VT1.5 in STS-1 in STS-3 in STS-12/48). Rabbat, et al Expires - December 2004 [Page 5] draft-rabbat-fnp-applicability-01.txt June 2004 Note that FNP does not by itself impose any requirements on the policy that the provider uses to devise pre-emption schemes in the case where shared restoration and extra-traffic are used. As a practical matter, however, the carrier (or carriers) involved would have to devise pre-emption schemes that are not susceptible to a domino effect (where the removal of some extra-traffic LSP causes a cascading effect, triggering the pre-emption of a series of LSPs). A carrier would be expected to ensure this simply to maintain network stability. 4.7 Framing and Data Plane Considerations FNP is a control-plane mechanism for disseminating fault information throughout a network. As explained in Section 3, in the context of transport networks, the flooding mechanism of FNP accomplishes both notification and node reconfiguration simultaneously. That is, it informs the intermediate nodes along a backup LSP corresponding to an affected working LSP of a fault, thus allowing them to reconfigure themselves, while at the same time notifying the edge nodes responsible for taking a restoration action to recover the affected LSP(s). When appropriate digital framing of the optical signal is available in the data plane (e.g. G.709 digital wrapper or SONET/SDH framing), and the optical transport nodes can process and interpret the framing overhead, FNP can interwork with the fault notification mechanisms available in the data plane (e.g. the Forward/Backward Defect Indication signals embedded in the framing overhead). In this case, even though notification of the end nodes may occur in the data plane, the notification of the nodes along the backup paths of the affected working paths is still needed so that they can reconfigure themselves. This can be accomplished via FNP. 4.8 Scalability Considerations FNP ensures that at most one message is exchanged on every control channel link, whereas fault notification using signaling may lead to a large number of signaling messages per link, as explained shortly. This leads to a scalability advantage for DWDM networks that have a large number of wavelengths or when there are numerous LSPs, each corresponding to a small granularity SONET/SDH channel. Let us define the length of a control channel between two adjacent nodes to be the number of hops that a control message takes to go from one node to the other. Thus, an in-band control channel has length one. By extension, the length of a path in the control plane is the sum of lengths of control channels used in this path. In practice, the maximum number of messages using signaling per failed LSP is equal to the length of the path that the notification message Rabbat, et al Expires - December 2004 [Page 6] draft-rabbat-fnp-applicability-01.txt June 2004 takes from the detecting node to the protection switching point "s" plus twice sum of the lengths of the control channels corresponding to each hop of the protection path from s to d (s-d protection path on the control channel). For the set of affected LSPs, that value is multiplied by the number of LSPs affected by the fault that have unique sources (the assumption being that a bundled notification message can be sent to sources that originate multiple LSPs affected by the same fault). The number of messages, in the worst case, is thus directly proportional to the number of LSPs affected (assuming each affected LSP originates at a unique source). This compares to a maximum number of messages for FNP equal to the sum of the lengths of all control channels in the network. 4.9 Guidelines for Deployment While use of FNP can be appropriate in a variety of situations, we provide some initial thoughts on deployment considerations here. FNP is expected to be very useful in core optical networks where the provider deploys a mesh-based topology and has a large number of active lambdas (or the possibility of having several lambdas turned on as the network grows). As explained earlier, this would save on the signaling overhead of individually activating each backup LSP. As explained in Section 4.5, FNP is applicable in situations where adjacent client and server layers are under the control of different providers. Although FNP does not impose a limit on how many providers may be involved in offering service to the end customer, practical considerations would dictate that this ôrecursionö of provider client-server relationships not be more than a few levels deep. 5. Conclusion This document has provided an overview of the domain of applicability of the FNP protocol in the context of optical transport networks. By outlining the network, node, and fault models to which FNP applies, the document has provided guidelines on where FNP is currently usable, and outlined areas of further work. 6. Acknowledgements We would like to thank the members of the CCAMP WG for on-line and off-line discussions that helped shape some of the ideas behind this document. In particular, Adrian Farrel, Zafar Ali, Neil Harisson, Jonathan Sadler, Jonathan Lang, Fabio Ricciato and Roberto Albanese. Rabbat, et al Expires - December 2004 [Page 7] draft-rabbat-fnp-applicability-01.txt June 2004 7. Intellectual Property Considerations The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. 8. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, IETF RFC 2026, October 1996. [2] Rabbat, R., and V. Sharma (Eds.), "Fault Notification Protocol for GMPLS-Based Recovery", Internet Draft, work in progress, draft-rabbat-fault-notification-protocol-05.txt, May 2004. [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels," BCP 14, IETF RFC 2119, March 1997. Rabbat, et al Expires - December 2004 [Page 8] 9. Authors' Addresses Richard Rabbat Ching-Fong Su Fujitsu Labs of America, Inc. Fujitsu Labs of America, Inc. 1240 E. Arques Ave, MS 345 1240 E. Arques Ave, MS 345 Sunnyvale, CA 94085 Sunnyvale, CA 94085 United States of America United States of America Phone: +1-408-530-4537 Phone: +1-408-530-4572 Email: rabbat@alum.mit.edu Email: csu@fla.fujitsu.com Vishal Sharma Metanoia, Inc. 888 Villa St, Suite 200B Mountain View, CA 94041 United States of America Phone: +1-408-530-8313 Email: v.sharma@ieee.org 10. Full Copyright Statement "Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Rabbat, et al Expires - December 2004 [Page 9]