Publication Date

Fall 11-13-2018


Cluster randomized trials are increasingly popular in epidemiological and medical research. When analyzing the data from such studies it is imperative that the hierarchical structure of the data be taken into account. Multilevel logistic regression is used to analyze clustered data with binary outcomes. Previous literature shows that a greater number of clusters is more important than a large number of subjects per cluster. This paper investigates if it is possible to compensate for the increased bias found for parameter estimates when the number of clusters is decreased. A simulation study was conducted where the absolute percent relative bias for each parameter estimate with 5 to 49 clusters and 10, 20, 30, 60, 90, 120, 150, 180, and 210 subjects per cluster were compared to the bias found for corresponding parameter estimates when the number of clusters was 50 with 10 subjects per cluster. Maximum Likelihood, Restricted Maximum Likelihood, and Generalized Estimating Equation methods, with multiple Intraclass Correlation Coefficients were examined. For Maximum Likelihood estimates, results show that it is possible to account for the effects of few clusters by increased sample size when examining fixed effect parameter estimates. For variance components, it was not possible to fully compensate under all conditions, but in general, the trend found was that increasing the number of subjects per cluster either results in decreased bias or the bias plateaued after a certain sample size. Further investigation is needed on Restricted Maximum Likelihood and Generalized Estimating Equation estimates, but results show that they do not perform well when the number of subjects per cluster is few. The results of this study are very informative for researchers who are limited to few clusters.

Degree Name


Level of Degree


Department Name

Mathematics & Statistics

First Committee Member (Chair)

Fares Qeadan

Second Committee Member

Helen Wearing

Third Committee Member

Yan Lu




Clustered Randomized Trials, CRT, Multilevel Logistic Regression, MLE, REML, GEE

Document Type