Biology ETDs

Publication Date



Abstract-I Background: Heterogeneous cell populations have previously been described as noisy. However, recent studies have demonstrated that heterogeneity can be biologically significant. We present here an approach for rapid and complete identification of heterogeneous cell populations from high-throughput flow cytometry data. We have developed a novel measure Slope Differentiation Identification (SDI) using flow cytometry-based protein expression, quantifying the rate of change in protein expression between two conditions (exponential and stationary phase) of yeast cells, as a function of cell size or cell granularity. Results: SDI had superior Gene Ontology enrichment when compared with other approaches such as k-means clustering and an approach based on the bi-modality of the fluorescence intensity distribution. Cell populations were also validated using gradient-separation followed by microscopy, where proteins with high SDI measure showed significant levels of differentiation between high and low density cells. Conclusion: Overall, our approach has identified novel protein expression patterns that differentiate quiescent and non-quiescent cell populations. Abstract-II Background: With the advent of genomics, there has been a rapid increase in the use of two and onecolor microarrays, used to measure mRNA abundance for the entire genome. Variability in microarray analysis undermines its utility in identifying the entire subset of differentially expressed mRNAs. Recent microarray studies have shown that, although it is assumed that variances are constant for every hybridized spot within a microarray, variances may differ for each biological sample analyzed (Ritchie, Diyagama et al. 2006). Another common assumption is that log-intensity values for any given gene have a Normal distribution. For many datasets, both assumptions have been shown to be incorrect, resulting in distortions in the significance when testing for differential expression of each gene (Bar-Even, Paulsson et al. 2006; Wentzell, Karakach et al. 2006). Approach: To overcome the limitations of existing approaches in identifying significant, differentially expressed genes, we have developed a novel unsupervised statistical approach called Calibration Regression Analysis of Microarrays (CRAM) that uses a combination of empirical Bayes and regression calibration. The main novelty of our approach is the modeling of gene expression variances as a function of the log-intensity within each sample. Another version was later developed CRAM-GS in which the association between genes is captured using an adjusted gene correlation measure. Results: CRAM was compared to four existing approaches for identifying differentially expressed genes. Performance was based on the ability to identify co-regulated genes in the same Gene Ontology process. CRAM exhibited a marginal improvement in GO process enrichment compared with the other approaches. To the original datasets, three more were included in which the later version CRAM-GS, showed a significant improvement compared to CRAM, suggesting a major additional benefit of incorporating gene correlations into the model. All versions of CRAM were two orders of magnitude faster than the existing approaches. Overall, CRAM provides an adaptive, computationally efficient approach for accurate identification of differentially expressed genes.

Project Sponsors

National Science Foundation (NSF) grant MCB-0645854 and National Institutes of Health (NIH) grant GM-67593 (to M.W.W) and NIH grant 1U54MH084690-01 (to L.S). P.H.T was supported by NIH/IMSD grant GM-060201. R.J was supported by NIH grant GM-075149.




Empirical Bayes Regression Calibration Flow cytometry Microarrays, Regression Calibration, Flow cytometry, Microarrays

Document Type


Degree Name


Level of Degree


Department Name

UNM Biology Department

First Advisor

Werner-Washburne, Maggie

First Committee Member (Chair)

Toolson, Eric

Second Committee Member

Natvig, Donald

Third Committee Member

Wearing, Helen