Computer Science ETDs

Publication Date

Fall 11-2-2017

Abstract

The compute capacity growth in high performance computing (HPC) systems is outperforming improvements in other areas of the system for example, memory capacity, network bandwidth and I/O bandwidth. Therefore, the cost of executing a floating point operation is decreasing at a faster rate than moving that data. This increasing performance gap causes wasted CPU cycles while waiting for slower I/O operations to complete in the memory hierarchy, network, and storage. These bottlenecks decrease application time to solution performance, and increase energy consumption, resulting in system under utilization. In other words, data movement is becoming a key concern for future HPC system-design. Data volume reduction techniques (e.g. lossless data compression, information hiding approaches, difference-based patches etc.) have been useful in many contexts to reduce data movement. In this thesis, I study the use of such techniques to reduce data movement in the context of current and future HPC environments. I trade off computation to reduce data volume, for faster completion of I/O operations. I identify three key data movement areas in HPC, intra-process, inter-process and inter-application data movement and investigate the impacts of various compression techniques on the data associated with each of these areas. To be specific, I introduce a compression-based paging system for HPC memory and demonstrate up to 78\% capacity improvement with minimal runtime overhead (4\%). Next, I propose and demonstrate a novel two-level diff-based approach that can reduce inter-process data movement by up to 99\% although with potentially large runtime overhead. Finally, I reduce inter-application data movement by up to 90\% using checkpoint/restart-based fault tolerance protocol as a case study. By doing so, I show that checkpoint data compression can improve application runtime efficiency by more than 50\% and reduce energy expenditure by up to 90\%.

Language

English

Keywords

High performance computing, Supercomputing, Data movement, Distributed computing, Fault tolerance

Document Type

Dissertation

Degree Name

Computer Science

Level of Degree

Doctoral

Department Name

Department of Computer Science

First Committee Member (Chair)

Dorian Arnold

Second Committee Member

Kurt Ferreira

Third Committee Member

Patrick Bridges

Fourth Committee Member

David Lowenthal

Share

COinS