Publication Date

Summer 7-23-2020


In this thesis, I studied a newly developed variable selection method SODA, and three customarily used variable selection methods: LASSO, Elastic net, and Random forest for environmental mixture data. The motivating datasets have neuro-developmental status as responses and metal measurements and demographic variables as covariates. The challenges for variable selections include (1) many measured metal concentrations are highly correlated, (2) there are many possible ways of modeling interactions among the metals, (3) the relationships between the outcomes and explanatory variables are possibly nonlinear, (4) the signal to noise ratio in the real data may be low. To compare these methods under the challenges, I simulated responses under various scenarios with covariates bootstrapped from real data and then compared the percentages of false positives and false negatives of these methods. I conclude that no method has the lowest percentage of false positives and false negatives at the same time across all scenarios. However, RF methods seem to have modest performances in both percentages, compared to SODA, LASSO, and Elastic net.

Degree Name


Level of Degree


Department Name

Mathematics & Statistics

First Committee Member (Chair)

Li Li

Second Committee Member

James Degnan

Third Committee Member

Fletcher G. W. Christensen

Fourth Committee Member

Ronald Christensen




Variable selection, SODA, LASSO, elastic net, Random Forest, False Positives, False Negatives

Document Type