Chemistry ETDs

Publication Date



For my doctoral work, I have developed strategies to mine public databases for data that can be used to infer structural and functional information for the hotdog-fold and HADSF superfamilies. For the hotdog-fold superfamily, I used curated and automatically applied annotations of structure, taxonomic lineage, function, and subfamily membership from the UniProtKB, gene context and taxonomic information from the NCBI, and the results of several in-depth explorations of subfamily/function and structural class membership. Based on the distribution of the aforementioned annotations mapped onto a sequence similarity network (SSN), I applied structural assignments to sequences and/or specific function/subfamily assignments to ~143,000 sequences and general subfamily assignments to an additional ~61,000 sequences. I also identified 52 clusters containing nearly 9,000 uncharacterized sequences lacking any annotations whatsoever and several probable instances of cross-domain gene transfer that would be of interest for further study. Within the thioesterase family of the hotdog-fold superfamily, I identified ~450 targets to undergo high-throughput screens in Karen Allens lab, the SSN-mapped results of which underscore widespread promiscuity across the family. I demonstrated the use of HTS and gene context results to infer functional identities for hotdog-fold superfamily members, though most gene contexts proved to be unilluminating. In the HADSF, I explored the diversity and function space of Firmicutes members, revealing the wide range of HADSF representatives even within members of the same genus. SSNs mapped according to taxonomic lineage, subfamily membership, and function revealed several instances of probable gene transfer among Firmicutes members, but also across phyla. Related gene context, biological range, and HTS results revealed a member of Listeria innocua to be a member of the PTS pathway and provided potentially useful information for other HADSF members. Two groups of HADSF members were earlier identified as having interesting evolutionary histories. I provide biological range- and gene context-based evidence for the convergent evolution of FMN phosphatase activity in E. coli and Bacteroides thetaiotaomicron HADSF members, divergent evolution of the same in E. coli and Salmonella enterica members, and divergent evolution of yidA in E. coli and BT3352 in B. thetaiotaomicron.




Chemistry, Bioinformatics, Protein evolution

Document Type


Degree Name


Level of Degree


Department Name

Department of Chemistry and Chemical Biology

First Advisor

Dunaway-Mariano, Debra

First Committee Member (Chair)

Mariano, Patrick

Second Committee Member

Allen, Karen

Third Committee Member

Liang, Fu-Sen