Publication Date

Spring 2022


This study examined data from the United States Geological Survey Produced Water database, version 2.3 (USGS DB) and built models to estimate the concentration of radium-226 in produced water given the values of other predictor variables. The dataset had only about 254 observations that were useable. Although the USGS DB had up to 190 possible attributes, it also had extreme rates of missingness, and many of the candidate variables were highly correlated. Multiple imputation techniques were employed using the Mice, Hmisc, and RMS packages for the R language to deal with the missing data. A multiple linear regression and two logistic regression main effects models were fitted to the data. The bootstrap was used as a means of internal validation of models. The models concluded that log10(total dissolved solids) and log10(barium) appear to be significant predictors of log10(radium-226) and radium exceedance probabilities.

Degree Name


Level of Degree


Department Name

Mathematics & Statistics

First Committee Member (Chair)

James Degnan

Second Committee Member

Yan Lu

Third Committee Member

Laura J. Crossey


Radium-226, produced water, shale gas, tight gas, conventional hydrocarbon

Document Type