Publication Date

Summer 8-1-2022


This study compared the performance of machine learning models in classifying COVID-19 patients using exhaled breath signals and simulated datasets. Ground truth classification was determined by the gold standard Polymerase Chain Reaction (PCR) test results. A residual bootstrapped method generated the simulated datasets by fitting signal data to Autoregressive Moving Average (ARMA) models. Classification models included neural networks, k-nearest neighbors, naïve Bayes, random forest, and support vector machines. A Recursive Feature Elimination (RFE) study was performed to determine if reducing signal features would improve the classification models performance using Gini Importance scoring for the two classes. The top 25% of features determined by Gini Importance scores suggest that profiles from specific Volatile Organic Compounds (VOC) in patient breath may contribute to model performance.

Degree Name


Level of Degree


Department Name

Mathematics & Statistics

First Committee Member (Chair)

James Degnan

Second Committee Member

Mohammad Motamed

Third Committee Member

Justin Baca


COVID-19, machine learning, breath signals, simulation, autoregressive moving average

Document Type