Computer Science ETDs

Publication Date

Fall 10-19-2017


As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), they must leverage increasing levels of parallelism to achieve their performance goals. In addition to increased parallelism, machines of that scale will have strict power limitations placed on them. One direction currently being explored to alleviate those issues are many-core processors such as Intel’s Xeon Phi line. Many-core processors sacrifice clock speed and core complexity, such as out of order pipelining, to increase the number of cores on a die. While this increases floating point throughput, it can reduce the performance of serialized, synchronized, and latency sensitive code paths, such as traditional communication libraries.

In this thesis, I examine the impact of many-core processors on large-scale scientific appli- cations and explore ways to improve performance for both future and legacy applications. I examine the effect by characterizing the performance and power tradeoffs for different core frequencies and network hardware. Then, I explore the viability of next-generation programming models by benchmarking the performance of communication libraries utilizing multi-threaded one-sided communication. Next, I improve communication library performance for legacy applications for many-core systems through optimizing the matching algorithm to leverage single instruction multiple data vectors and caching behavior. Finally, I explore two other matching algorithm optimizations targeted at next-generation processors and applications.




High Performance Computing, MPI, Many Core Processors, Networking

Document Type


Degree Name

Computer Science

Level of Degree


Department Name

Department of Computer Science

First Committee Member (Chair)

Patrick G. Bridges

Second Committee Member

Ryan E. Grant

Third Committee Member

Anthony Skjellum

Fourth Committee Member

Dorian C. Arnold

Project Sponsors

Sandia National Laboratories