Computer Science ETDs
Publication Date
2021
Abstract
Understanding the performance of parallel and distributed programs remains a focal point in determining how compute systems can be optimized to achieve exascale performance. Lightweight, statistical models allow developers to both characterize and predict performance trade-offs, especially as HPC systems become more heterogeneous with many-core CPUs and GPUs. This thesis presents a lightweight, statistical modeling approach of performance variation which leverages extreme value theory by focusing on the maximum length of distributed workload intervals. This approach was implemented in MPI and evaluated on several HPC systems and workloads. I then present a performance model of partitioned communication which also uses an expected maximum value method. This performance model was validated with benchmarked results from HPC systems. These lightweight, statistical models provide insight into the behavior of HPC applications and systems and allow developers to predict performance impacts as HPC systems evolve towards exascale.
Language
English
Keywords
Exascale, HPC, Performance Model, Performance Variability, Partitioned Communication, Extreme Value Theory, Statistics, MPI
Document Type
Thesis
Degree Name
Computer Science
Level of Degree
Masters
Department Name
Department of Computer Science
First Committee Member (Chair)
Patrick G. Bridges
Second Committee Member
Trilce Estrada
Third Committee Member
Amanda Bienz
Recommended Citation
Dominguez-Trujillo, Jered B.. "Statistical Modeling of HPC Performance Variability and Communication." (2021). https://digitalrepository.unm.edu/cs_etds/111