Sub-graph routing: A generalized fault-tolerant strategy for link failures in WDM optical networks

doi:10.1016/j.comnet.2005.05.025

Computer Networks

Volume 50, Issue 2, 8 February 2006, Pages 181-199

https://doi.org/10.1016/j.comnet.2005.05.025 Get rights and content

Abstract

Single- and multiple-link failure resilience are of critical importance in survivable wavelength division multiplexed (WDM) optical network design. Networks are not made up simply of logical links, but also of links that are physically routed together or share a common node. Links that share a common physical duct are known in literature as shared risk link groups (SRLGs), and their impact on network design is widely studied.

This paper presents a complete survivability approach for WDM optical network design known as sub-graph fault tolerance. Sub-graphs are created by removing any combination of components in the network to accommodate specific fault scenarios. Connections in the network are accepted if they can be routed in all predetermined sub-graphs. In this manner, protection against 100% of all multiple- and single-link failure scenarios designated by the designer is achieved, and heuristic best-effort protection is provided against subsequent or simultaneous link failures for which the network is not specifically designed.

Introduction

The explosive growth of the Internet over the last decade has created a significant increase in bandwidth requirements for long-haul optical networks. Currently, optical networks primarily incorporate coarse wavelength division multiplexing (CWDM) as a means of increasing the 10 Gbps (OC-192) bit rate that each wavelength is capable of carrying. As optical networking technology continues to develop, dense wavelength division multiplexed (DWDM) systems capable of supporting data rates of 40 Gbps are likely to become commonplace. With such high capacity links being deployed to quench the ever-increasing thirst for bandwidth, the design of resilient optical networks is imperative. A fiber cut, possibly the result of an errant excavation, has been estimated to occur once every four days by TEN, a pan-European carrier network [1]. A single-component failure in a WDM network can mean serious data and revenue loss.

Fault tolerance in WDM optical networks is an important design consideration. The forfeiture of data and revenue due to the loss of even a single fiber is staggering, not to mention the magnitude of loss due to multiple simultaneous or sequential fiber failures. The motivation of the proposed work is to provide a strategy for protecting WDM optical networks against all types of multiple-link failures using sub-graph routing and to study its performance relative to an alternate protection strategy.

Different routing strategies, such as fixed-path routing, fixed alternate path routing, and wavelength assignment methods have been studied in [2], [3], [4]. Fixed alternate path routing works by provisioning an alternate link disjoint path for each request to tolerate a single-link failure along its primary or active path. An extension of alternate path routing, known as backup path multiplexing or backup multiplexing protection [5], allows backup connections with link disjoint primary connections to be multiplexed on the same wavelength. Assuming a single-link failure model, backup multiplexing guarantees 100% restoration for all primary connections. By allowing backup connections to share capacity in the network, the resources used by each connection are reduced and network performance is increased. Backup multiplexing is a proactive failure-independent path restoration scheme.

Sub-graph routing [6], is an alternative strategy for routing dependable connections in optical networks. In this approach, for single-link failure tolerance, each network is mapped into distinct sub-graphs resulting from the removal of links in the original network. A connection in this scheme becomes accepted only if it is accepted in all the sub-graphs. That way, in the event of the failure of a link the network state can be restored to the corresponding sub-graph where all connections are guaranteed 100% restoration for that single-link fault scenario.

Single-link fault tolerance is very useful, however the occurrence of multiple-link failures is not uncommon in a practical network. It might happen that two or more distinct physical links may be routed via the same common duct or physical channel. Such commonality might be only for a few hundred meters or less, but if any damage happens to this physical duct, it will cause simultaneous logical failures in two or more distinct logical links. Such instances where separate fiber optic links share a common failure structure is often referred to as an SRLG (shared-risk link group) [7]. Any physical failure of one of the ducts that defines an SRLG can invoke a simultaneous logical failure of multiple links.

Multiple-link failures can also arise out of independent overlapping sequential link failures. For example, a fiber cut occurring as the result of an excavation necessitates that fiber be repaired, which is hardly a quick process. In the meantime, an unrelated fiber may also fail, thus creating a sequential multiple-link failure scenario not provisioned for in the original network design.

Different double-link failure models in which any two links in the network may fail in an arbitrary order have been proposed in literature. The basic idea of these approaches is to pre-compute two backup paths for each link in the primary path and reserve resources on these paths [8]. A significant finding is that such a design for complete dual-failure restorability requires almost triple the amount of spare capacity [9].

Sub-graph fault tolerance, a path restoration technique that can accommodate both simultaneous multiple-link failures due to SRLGs and sequential overlapping link failures, is presented in this paper. The proposed scheme provides 100% protection for simultaneous multiple-link failures, and best-effort protection for overlapping sequential multiple-link failures.

The remainder of the paper is organized as follows: Section 2 describes the network and fault model. Section 3 presents the sub-graph fault tolerance for single-link failures. Section 3 presents the sub-graph fault tolerance for single-link and Section 4 presents SRLG failures. Section 5 discusses sub-graph fault tolerance for sequential overlapping failures. Section 6 presents the constrained sub-graph routing. Section 7 presents the performance evaluation and Section 8 presents the numerical simulation results, and Section 9 concludes the paper.

Section snippets

Network and fault model

A network is represented by a graph G = (V, E) where V is the set of nodes V = {v₁, v₂, … , v_N} and ∣V∣ = N, and E is the set of edges or links, E = {e₁, e₂, … , e_L} and ∣E∣ = L. A failure may be of a single-link e_l ∈ E or a single node v_n ∈ V, a group of multiple links or multiple nodes, or a combination of links, SRLGs, or nodes. Such groups are said to be shared-risk resource groups (SRRGs). A group failure is represented by s_m ∈ S. A set of all such groups is represented by set S, S = {s₁, s₂, … , s_M}, where ∣S∣ = M. Thus

Sub-graph fault tolerance for single-link failures

The sub-graph fault tolerance mechanism for accommodating single-link failures is demonstrated in Fig. 2. For the purposes of this example, assume that each edge in the base network (and its constituent sub-graphs) is bidirectional, has a total capacity of one unit, and the distance between any pair of adjacent vertices is one. The single-link failure scenario is denoted by the fault set $F = E$ . Let there be a request between nodes 1 and 2. This connection request attempts to find a path from 1 to

Sub-graph fault tolerance for single-link and SRLG failures

The sub-graph routing methodology for protecting against all single-link failures can be extended to tolerate specific multiple-link failure situations as shown in Fig. 4.

For this example, the same network is considered, except that 3 SRLGs of size 2 are introduced, as shown in Fig. 4 by the circled links. Each link in the network is assumed to be a bidirectional link of capacity one unit. The corresponding sub-graphs are generated through the removal of individual links, as well as links

Sub-graph fault tolerance for sequential overlapping component failures

In this section, the capability of the sub-graph routing scheme [6] to tolerate sequential overlapping component failures in the network is discussed. Sequential overlapping multiple-link failures happen when a component failure occurs while the network is already in a fault recovery state. The goal in this situation is to provide 100% fault tolerance for requests entering the network after the occurrence of the first failure while maximizing the protection against a second failure for all

Constrained sub-graph routing

A limitation of sub-graph based fault tolerance is the possibility of connections being reassigned during transition from the base network to the target sub-graph when a fault occurs even if they do not traverse the failed component [6]. The above-proposed scheme depends on the network’s ability to change to the state of a sub-graph during fault recovery. This potentially requires many connections in the network to be reassigned to different paths as defined by a sub-graph. To overcome this

Probability of path reassignment

Sub-graph fault tolerance depends on the network state’s ability to be changed during a recovery. This may require some connections in the network to be reassigned to different paths. In order to quantify the amount of reassignment taking place, path and path/trunk reassignment probabilities are measured. They are the probabilities that a connection’s path or path/trunk on the base network will have to change when the sub-graph state corresponding to the link failure is incorporated.

Results

Sub-graph routing results are labeled in the format “Topology-Component Protection (Routing Constraint)”. Topology is omitted when the chart is topology specific. Component protection specifies how the sub-graphs are formed, and routing constraint specifies whether unconstrained, path, or path/trunk constrained routing is used. For backup multiplexing results, only link protection is used, and for primary routing results, no protection is used.

The blocking probability results for all three

Conclusions

A methodology for tolerating single-link, SRLG, and node failures in an optical network is developed in this paper. The concept of shared-risk resource groups (SRRGs) is introduced to accommodate combinations of link, SRLG and node failures that might arise due to geographical, physical, or logical proximity. The sub-graph fault tolerant approach is shown to be able to accommodate all such scenarios more effectively than the existing backup multiplexing protection scheme from both a performance

Michael T. Frederick is a Ph.D. candidate at Iowa State University. Michael earned his Bachelor’s degree with a minor in Spanish in 2001 and his Master’s degree in 2002, each in Computer Engineering from Iowa State University. There he is part of the Dependable Computing and Networking Laboratory under the direction of Professor Arun K. Somani. His research interests include network fault tolerance, image processing, dynamically reconfigurable computing, and the design and implementation of

References (19)

P.F. Falcao, Pan-European multi-wavelength transport networks. Network: design, architecture, survivability and SDH...
S. Subramaniam et al.
All-optical networks with sparse-wavelength conversion
IEEE/ACM Transactions on Networking
(1996)
A. Mokhtar et al.
Adaptive wavelength routing in all-optical networks
IEEE/ACM Transactions on Networking
(1998)
H. Harai, M. Murata, H. Miyahara, Performance of alternate routing methods in all-optical switching networks, in:...
G. Mohan et al.
Efficient algorithms for routing dependable connections in WDM optical networks
IEEE/ACM Transactions on Networking
(2001)
M.T. Frederick, A.K. Somani, A single-fault recovery strategy for optical networks using sub-graph routing, in: 7th...
J. Doucette, W.D. Grover, Capacity design studies of span-restorable mesh transport networks with shared-risk link...
H. Choi, S. Subramaniam, H.A. Choi, On double link failure recovery in WDM optical networks, in: INFOCOM 2002,...
M. Clouqueur, W.D. Grover, Mesh-restorable networks with complete dual failure restorability and with selectively...

There are more references available in the full text version of this article.

Cited by (16)

An optimal protection and restoration scheme (OPARS) for optical networks
2013, Optik
Citation Excerpt :
Therefore only premium traffic, comprising a small fraction of the total load in an OBS network, would be afforded this type of protection. Frederick et al. [8] have presented an L + 1 fault tolerance which is used for the recovery of optical networks from single link failures without the allocation of valuable system resources. While the approach in its simplest form performs well against the existing schemes, the flexibility of L + 1 leave many options to examine possible ways to further increase performance.
Protection and restoration are critical network design issues for optical networks since even a single failure for a short duration may result in huge data loss due to the large capacity of optical fibers. However, few studies have been done on these issues for optical burst switching (OBS) networks. Protection and restoration are essential mechanisms for guaranteeing more reliable traffic delivery services. But it is not easy to apply existing mechanisms to optical burst switching (OBS) networks due to its one-way reservation signaling and the statistical burst multiplexing. Thus, to achieve the high transmission performance and reliability simultaneously, unique properties of OBS must be considered in the design of protection scheme. In this paper, an optimal protection and restoration scheme (OPARS) has been introduced that not only optimizes the number of provisioned protection wavelengths adaptively based on the traffic load as well as the quality of service (QoS) requirements of bursts in high speed networks but also minimizes burst loss rates. In addition, the proposed scheme has been used as an efficient contention resolution technique. The simulation results verify that the proposed scheme improve the network resource and channel utilization while guaranteeing the targeted protection reliability and QoS requirements of bursts.
Disaster survivability in optical communication networks
2013, Computer Communications
Citation Excerpt :
Generally, restoration approaches do not guarantee recovery of disrupted services. Ref. [82] proposed a sub-graph fault-tolerant routing that provides guaranteed recovery against multiple failures represented as SRLG. In this approach, a sub-graph is created by removing the edges of an SRLG from the original graph.
With the frequent occurrences of natural disasters damaging large portions of communication networks and the rising risk of intentional attacks, network vulnerability to multiple cascading, correlated, and collocated failures has become a major concern. Optical backbone networks provide highly-scalable connectivity across large distances. These networks exploit optical technology to carry huge aggregated data and can support “higher-layer” networks, such as SONET, Ethernet, IP, MPLS, ATM, etc. Given the high complexity and scale of backbone networks, multiple correlated failures can have a devastating impact on topological connectivity, which in turn can cause widespread “end-to-end” connection-level disruptions. These outages may affect many applications/services supported by the optical layer, irrespective of the importance of the service and/or sensitivity of the carried data. Hence, it is crucial to understand the vulnerability of optical backbone networks to disasters and design appropriate countermeasures. In this paper, we present a general classification of the existing research works on disaster survivability in optical networks and a survey on relevant works based on that classification. We also classify disasters based on their characteristics and impact on communication networks and discuss different ways to combat them. We conclude the paper with open issues and challenges.
Shared risk link group failure restoration with in-band approximate failure localization
2013, Optical Switching and Networking
Citation Excerpt :
Note that FIP requires an end-to-end SRLG-disjoint protection path for the targeted working path that could be very resource-consuming and possibly infeasible in the case of sparse topologies. Failure dependent protection (FDP) [1–10] was reported as a perspective alternative solution to FIP, where the switching node of an interrupted lightpath performs the restore of the lightpath according to which links failed in the network. With FDP, multiple protection paths (that may not be disjoint with the working path) are pre-planned, and upon a failure, the node which is responsible for traffic switchover (i.e., the source of the lightpath), identifies the failed SRLG and activates a set of protection paths for the restoration accordingly.
This paper proposes a novel failure recovery framework for multi-link shared risk link group (SRLG) failures in optical mesh networks, called failure presumed protection (FPP). The proposed framework is characterized by a failure dependent protection (FDP) mechanism where the optical layer in-band failure identification and restoration tasks for route selection are jointly considered. FPP employs in-band monitoring at each node to obtain on-off status of any working lightpath in case the lightpath is terminated at (or traversing through) the node. Since the locally available failure status at a node may not be sufficient for unambiguous failure localization, the proposed framework reroutes the interrupted lightpaths in such a way that all the suspicious links which do not have 100% restorability under any SRLG failure are kept away. We claim that this is the first study on FDP that considers both failure localization and FDP survivable routing. Extensive simulations are conducted to examine the proposed FPP method under various survivable routing architectures and implementations. The results are further compared with a large number of previously reported counterparts. We will show that the FPP framework can overcome the topological limitation which is critical to the conventional failure independent protection method (e.g., shared path protection). In addition, it can be served as a viable solution for FDP survivable routing where failure localization is considered.
Progressive datacenter recovery over optical core networks after a large-scale disaster
2016, Proceedings of the 2016 12th International Conference on the Design of Reliable Communication Networks, DRCN 2016
Progressive network recovery in optical core networks
2015, Proceedings of 2015 7th International Workshop on Reliable Networks Design and Modeling, RNDM 2015
Analysis and modeling of k-regular and k-connected protection structure in ultra-high capacity optical networks
2015, China Communications

View all citing articles on Scopus

Pallab Datta received his bachelors in Electronics Engineering from Regional Engineering College, Allahabad, India in 1999. He finished his Ph.D. degree in computer engineering in the Department of Electrical and Computer Engineering at Iowa State University, Ames, Iowa.

His research interests include survivable network design, design and implementation of network architectures, upgradeability issues, algorithms for next generation networks, and diverse routing for shared resource group (SRRG) failures. He is also the co-developer of ISTOS (Iowa State Optical Simulator), a simulation environment for performing simulations in optical networks. He is currently working as a visiting researcher in INRIA, Sophia-Antipolis in the project MASCOTTE.

Arun K. Somani is currently Jerry R. Junkins Endowed Chair Professor of Electrical and Computer Engineering at Iowa State University. He earned his MSEE and Ph.D. degrees in electrical engineering from the McGill University, Montreal, Canada, in 1983 and 1985, respectively. He worked as Scientific Officer for Govt. of India, New Delhi from 1974 to 1982 and as a faculty member at the University of Washington, Seattle, WA from 1985 to 1997 in electrical engineering and computer science and engineering departments where he was promoted to Full Professor in September 1995.

His research interests are in the area of fault tolerant computing, computer interconnection networks, WDM-based optical networking, and parallel computer system architecture. He is the chief architect of an anti-submarine warfare system (developed for Indian navy) and Meshkin fault-tolerant computer system architecture (developed for the Boeing Company). He has also developed several robust interconnection topologies, architected, designed, and implemented a 46-node multi-computer cluster-based system, Proteus, using a large grain message-passing model and separate data and control planes, and uses fiber optic communication links. His current research is in developing scalable architectures and algorithms to manage, control, and deliver dependable service efficiently for network employing optical fiber technology, wavelength division multiplexing, wavelength conversion, wavelength sharing, traffic grooming, access network design, Fault and Attack Management (FAM) in optical networking.

He has served on several program committees of various conferences in his research areas. He was the General Chair of IEEE Fault Tolerant Computing Symposium—1997 and Technical Program Committee Chair of International Conference on Computer Communications and Networks, 1999, and OPTICOMM 2003. He is serving as the General Chair of BroadNets 2005. He has served as IEEE distinguished visitor and IEEE distinguished tutorial speaker. He has been elected a Fellow of IEEE for his contributions to theory and applications of computer networks.

^☆: The research reported in this paper is funded in part by the National Science Foundation under grant ANI-9973102 and by Defense Advanced Research Projects Agency and National Security Agency under grant N6001-00-1-8949.

View full text

Sub-graph routing: A generalized fault-tolerant strategy for link failures in WDM optical networks☆

Abstract

Introduction

Section snippets

Network and fault model

Sub-graph fault tolerance for single-link failures

Sub-graph fault tolerance for single-link and SRLG failures

Sub-graph fault tolerance for sequential overlapping component failures

Constrained sub-graph routing

Probability of path reassignment

Results

Conclusions

All-optical networks with sparse-wavelength conversion

IEEE/ACM Transactions on Networking

Adaptive wavelength routing in all-optical networks

IEEE/ACM Transactions on Networking

Efficient algorithms for routing dependable connections in WDM optical networks

IEEE/ACM Transactions on Networking