"A comparison of variable selection methods using bootstrap samples from environmental metal mixture data"
In this thesis, I studied a newly developed variable selection method SODA, and three customarily used variable selection methods: LASSO, Elastic net, and Random forest for environmental mixture data. The motivating datasets have neuro-developmental status as responses and metal measurements and demographic variables as covariates. The challenges for variable selections include (1) many measured metal concentrations are highly correlated, (2) there are many possible ways of modeling interactions among the metals, (3) the relationships between the outcomes and explanatory variables are possibly nonlinear, (4) the signal to noise ratio in the real data may be low. To compare these methods under the challenges, I simulated responses under various scenarios with covariates bootstrapped from real data and then compared the percentages of false positives and false negatives of these methods. I conclude that no method has the lowest percentage of false positives and false negatives at the same time across all scenarios. However, RF methods seem to have modest performances in both percentages, compared to SODA, LASSO, and Elastic net.
Level of Degree
Mathematics & Statistics
First Committee Member (Chair)
Second Committee Member
Third Committee Member
Fletcher G. W. Christensen
Fourth Committee Member
Variable selection, SODA, LASSO, elastic net, Random Forest, False Positives, False Negatives
Djamen, Paul-Yvann. ""A comparison of variable selection methods using bootstrap samples from environmental metal mixture data"." (2020). https://digitalrepository.unm.edu/math_etds/131
Applied Mathematics Commons, Engineering Commons, Mathematics Commons, Medicine and Health Sciences Commons, Statistics and Probability Commons