Computer Science ETDs

Publication Date

Summer 6-17-2025

Abstract

High Performance Computing (HPC) applications increasingly rely on both process and thread-level parallelism to maximize performance across complex, multi-node systems. However, conventional bulk synchronous communication strategies often leave both compute and network resources underutilized due to synchronization delays. This dissertation systematically evaluates the potential of fine-grained, threaded inter-node communication as a strategy for reducing these inefficiencies. To this end, I design and develop two tools: the MiniMod modular application framework and the Configurable Messaging Benchmark (CMB), which together enable empirical, reproducible assessment of communication performance across varying application behaviors, threading models, and communication granularities. Through experiments across multiple systems and workloads, I analyze thread arrival distributions, quantify reclaimable compute time, and assess how early, asynchronous communication can overlap with computation to improve efficiency. My results demonstrate that performance benefits depend heavily on application structure, threading variability, and middleware design. This work establishes concrete criteria under which threaded fine-grained communication is advantageous, guiding future co-design of HPC applications and communication libraries.

Language

English

Document Type

Thesis

Degree Name

Computer Science

Level of Degree

Doctoral

Department Name

Department of Computer Science

First Committee Member (Chair)

Patrick G. Bridges

Second Committee Member

Amanda Bienz

Third Committee Member

Ryan Grant

Fourth Committee Member

Tony Skjellum

Share

COinS