Publication Date
Summer 7-23-2020
Abstract
In this thesis, I studied a newly developed variable selection method SODA, and three customarily used variable selection methods: LASSO, Elastic net, and Random forest for environmental mixture data. The motivating datasets have neuro-developmental status as responses and metal measurements and demographic variables as covariates. The challenges for variable selections include (1) many measured metal concentrations are highly correlated, (2) there are many possible ways of modeling interactions among the metals, (3) the relationships between the outcomes and explanatory variables are possibly nonlinear, (4) the signal to noise ratio in the real data may be low. To compare these methods under the challenges, I simulated responses under various scenarios with covariates bootstrapped from real data and then compared the percentages of false positives and false negatives of these methods. I conclude that no method has the lowest percentage of false positives and false negatives at the same time across all scenarios. However, RF methods seem to have modest performances in both percentages, compared to SODA, LASSO, and Elastic net.
Degree Name
Statistics
Level of Degree
Masters
Department Name
Mathematics & Statistics
First Committee Member (Chair)
Li Li
Second Committee Member
James Degnan
Third Committee Member
Fletcher G. W. Christensen
Fourth Committee Member
Ronald Christensen
Language
English
Keywords
Variable selection, SODA, LASSO, elastic net, Random Forest, False Positives, False Negatives
Document Type
Thesis
Recommended Citation
Djamen, Paul-Yvann. ""A comparison of variable selection methods using bootstrap samples from environmental metal mixture data"." (2020). https://digitalrepository.unm.edu/math_etds/131
Included in
Applied Mathematics Commons, Engineering Commons, Mathematics Commons, Medicine and Health Sciences Commons, Statistics and Probability Commons