Computer Science ETDs
Publication Date
Spring 5-1-2017
Abstract
Networks are the backbone of modern HPC systems. They serve as a critical piece of infrastructure, tying together applications, analytics, storage and visualization. Despite this importance, we have not fully explored how evolving communication paradigms and network design will impact scientific workloads. As networks expand in the race towards Exascale (1×10^18 floating point operations a second), we need to reexamine this relationship so that the HPC community better understands (1) characteristics and trends in HPC communication; (2) how to best design HPC networks to save power or enhance the performance; (3) how to facilitate scalable, informed, and dynamic decisions within the network. My thesis is that one can improve application performance and system power usage by gaining a detailed understanding of HPC communication on both the network endpoints and fabric; specifically, I address the problem of network-induced memory contention, quantify the power/performance tradeoffs for dragonfly topologies in HPC networks, and increase the scalability/responsiveness of large-scale network monitoring. This dissertation highlights opportunities for improving network performance and power efficiency, while uncovering pitfalls and mitigation strategies brought about by shifting trends in HPC communication and fabric design. We begin by examining the communication characteristics of the network endpoints. We show how one-sided communication techniques can lead to contention in the memory subsystem with (3X increases to runtime) and how this can be avoided. Then, we move onto a macro level study of the network fabric, where we demonstrate the tradeoffs between power and performance when designing HPC network topology. Lastly, in order to facilitate dynamic and responsive solutions, we provide new methods for scalable network monitoring and improved models of data aggregation.
Language
English
Keywords
HPC, Network Congestion, Exascale, Simulation, NiMC, SAI
Document Type
Dissertation
Degree Name
Computer Science
Level of Degree
Doctoral
Department Name
Department of Computer Science
First Committee Member (Chair)
Dorian Arnold
Second Committee Member
Dave Ackley
Third Committee Member
Patrick Bridges
Fourth Committee Member
Wennie Shu
Recommended Citation
Groves, Taylor L.. "Characterizing and Improving Power and Performance in HPC Networks." (2017). https://digitalrepository.unm.edu/cs_etds/82