Computer Science ETDs
Publication Date
Summer 7-29-2025
Abstract
Boundary exchanges dominate the cost of both stenciled codes and those that rely on sparse matrix operations. The performance of large boundary exchanges is limited by synchronization overheads and injection bandwidth limitations. Irregular boundary exchanges incur additional overheads due to the large number of required messages. This thesis investigates multiple methods for improving the performance and scalability of both Cartesian and irregular boundary exchanges. Since boundary exchanges are typically performed iteratively, persistent communication presents an opportunity for optimization by sharing and amortizing setup costs. Partitioned communication is also explored to increase asynchrony, reducing bottlenecks from synchronization overheads and data congestion. For irregular applications with larger numbers of messages impacting their performance, aggregation can avoid high latency messages and neighborhood collectives can provide these optimizations portably. Finally, increasing asynchrony in large irregular boundary exchanges can alleviate synchronization bottlenecks that are amplified by load imbalance. Synchronization is reduced with partitioned communication and an alternate CSC matrix format, enabling increased overlap of communication and computation for better system utilization. For regular halo exchanges, measured timings show that persistent MPI communication can provide a speedup of up to 37% over the baseline MPI communication, and partitioned MPI communication can provide a speedup of up to 68%. Additionally, results from hypre BoomerAMG show up to a 38% speedup on sparse matrix-vector multiplication using aggregating neighbor collectives in linear solvers. Last, benchmark tests using partitioned MPI and the CSC matrix format demonstrate improvement to sparse matrix-dense matrix multiplication on SuiteSparse matrices by up to 190%.
Language
English
Keywords
HPC, MPI, Boundary Exchanges, Irregular Communication, Sparse Matrix Operations
Document Type
Dissertation
Degree Name
Computer Science
Level of Degree
Doctoral
Department Name
Department of Computer Science
First Committee Member (Chair)
Amanda Bienz
Second Committee Member
Patrick Bridges
Third Committee Member
Anthony Skjellum
Fourth Committee Member
Rui Peng Li
Recommended Citation
Collom, Gerald. "Optimizing Distributed Boundary Exchanges for Benchmarks, Solvers and Sparse Matrix Operations." (2025). https://digitalrepository.unm.edu/cs_etds/134