Electrical and Computer Engineering ETDs
Publication Date
7-2-2012
Abstract
Wireline high-speed networks have become a critical part of modern cyberinfra-structures and provide the base substrates to support a full range of higher-layer user services and applications. Indeed, a wide range of technologies have been deployed in these domains, ranging from ultra-fast Internet Protocol (IP) packet routing systems to multi-wavelength optical switching nodes. Today these setups provide immense levels of traffic scalability, reaching well into the 100s of gigabits/second and even terabits/second ranges. Owing to this growth, network survivability is now a central concern, as even a single link or node failure can cause widespread service disruption for thousands of users or more. Now over the years, a full range of network survivability schemes have been developed for packet routing and optical switching networks. Indeed, the open research literature lists many types of solutions here, broadly classified as pre-fault protection and post-fault restoration strategies. The former schemes pro-actively set up backup (redundant) resource pools to overcome anticipated failure events. Meanwhile, the latter strategies are more reactive by design and attempt to re-establish connectivity after failures. By and large, the bulk of these solutions are only concerned with single failure recovery, i.e., either at the link or node level. In general, these are the most common types of faults events experienced in operational networks. However, recent developments and considerations are pushing the need for more capable schemes to recover from multiple failure events, i.e., as occurring during natural disasters, massive power outages, and weapon of massive destruction (WMD) type attacks. Indeed, these types of scenarios are much more challenging, as they induce large numbers of correlated failures which can quickly overwhelm most traditional single-failure recovery schemes. Along these lines, some recent studies have looked at network recovery under massive correlated network failures. The key idea here is to introduce probabilistic risk information into the path provisioning (routing, protection) processes in order to minimize vulnerability to random failures. However, even though these schemes can reduce connections failure rates, they yield very high resource inefficiencies (usage consumption). In turn, these concerns will inhibit their adoption in most practical network settings, as operators have to balance the need for improved resiliency with revenue generation. To address this challenge, this thesis proposes a novel multi-failure survivability scheme that jointly incorporates both risk mitigation and traffic engineering (TE) efficiency objectives. In particular, the approach leverages multi-path routing strategies to first compute a selection of diverse working/backup path pairs and then uses ranking methods to select the most balanced combination. This framework applies graph-theoretic principles and hence can readily be integrated into real-world traffic provisioning systems. The performance of the proposed solution is evaluated using discrete event simulation techniques for a variety of network topologies and compared against several existing schemes. Overall findings show that the scheme yields notably improved survivability rates as compared to vanilla traffic engineering policies. At the same time, it also gives much better operational resource efficiencies versus existing probabilistic risk reduction routing strategies. Hence network carriers can fully leverage this new design to achieve much-improved reliability for critical data flows without sacrificing operational revenues.
Keywords
Computer networks--Reliability, Fault-tolerant computing, Data recovery (Computer science), Packet switching (Data transmission)--Computer simulation.
Sponsors
DTRA: Defense Threat Reduction Agency
Document Type
Thesis
Language
English
Degree Name
Computer Engineering
Level of Degree
Masters
Department Name
Electrical and Computer Engineering
First Committee Member (Chair)
Hayat, Majeed M.
Second Committee Member
Pattichis, Marios S.
Recommended Citation
Díaz, Oscar A.. "Design and evaluation of network survivability schemes for correlated multi-failure scenarios." (2012). https://digitalrepository.unm.edu/ece_etds/61