Biomedical Sciences ETDs

Publication Date

Fall 12-1-2018

Abstract

Next-generation sequencing technologies (NGS) have undergone extensive improvements since the invention of the 454 sequencing system in 2005. With tremendous progress in throughput, speed and a dramatic reduction in per-base cost, DNA sequencing is widely used in basic science as well as translational research. However, it is still a challenge to acquire a complete human genome. The long-range information is often missing due to the short length of NGS reads, which leaves many gaps in between scaffolds rather than an entire piece for each chromosome. Moreover, without the long-range information, haplotype-resolved genome sequencing and structural variant detection can be difficult, however, it is critical to understand the genetic basis of complex phenotypes with haplotype information. These complex structural genomic variations are often involved in numerous diseases, such as cancer. Here we developed a novel method to provide a more complete human genome sequence and allow genome studies to accurately identify all variants and phase them to the appropriate homologous chromosome. Ultimately, our approach can decrease the cost of whole genome sequencing while dramatically increasing the accuracy and completeness of the sequencing.

In the first chapter, I overviewed the current DNA sequencing technologies, compared short-read sequencing and long-read sequencing and illustrated their advantages and drawbacks. In chapter 2, I summarized the major haplotype-resolved DNA sequencing approaches, which include Hi-C, synthetic long reads and CPT-Seq. In chapter 3, I provided a detailed description of our novel methods to construct NGS library directly on a solid surface, which simplified NGS pipeline significantly and can contribute to the goal of sequencing a genome for $100. In chapter 4, an approach to generate megabase long linked reads is described. With DNA combing, surface tagmentation and barcode-enabled DNA chip, the method would allow us to assemble and phase the variants across entire chromosomes. In the last chapter, I discussed the potential application of our technologies in epigenomics, RNA sequencing and genomic medicine. The technologies described in this dissertation will transform genomics and have impacts in the biological sciences, from personalized medicine to de novo sequencing of human genome.

Document Type

Dissertation

Language

English

Degree Name

Biomedical Sciences

Level of Degree

Doctoral

Department Name

Biomedical Sciences Graduate Program

First Committee Member (Chair)

Jeremy S. Edwards

Second Committee Member

David S. Peabody

Third Committee Member

Darrell L. Dinwiddie

Fourth Committee Member

Payman Zarkesh-Ha

Share

COinS