Computer Science ETDs

Publication Date

Spring 4-15-2025

Abstract

Advancing personalized medicine depends on effectively integrating and interpreting the vast, heterogeneous landscape of biological data, from genomic sequences and transcriptomics to the insights embedded in scientific literature. Current machine learning models often focus on single data modalities, limiting their capacity to capture the multifaceted nature of biological systems. We address this gap by developing three attention-based machine-learning models integrating diverse data modalities. Firstly, DeepVul is a multi-task model that leverages cancer transcriptome data to predict genes critical for cancer survival and their corresponding drugs. Subsequently, LitGene refines gene representations by integrating textual information from the scientific literature. Finally, Protein2Text is a large language model that translates protein sequences into natural language descriptions, making complex biochemical data accessible and interpretable. These models echo a comprehensive approach to integrating various data modalities to provide an alternative view of biological systems, paving the way for truly personalized medicine for everyone.

Language

English

Keywords

Transformer models, deep learning, attention mechanisms, gene function prediction, protein sequence analysis, biomedical AI

Document Type

Dissertation

Degree Name

Computer Science

Level of Degree

Doctoral

Department Name

Department of Computer Science

First Committee Member (Chair)

Abdullah Mueen

Second Committee Member

Avinash Sahu

Third Committee Member

Trilce Estrada

Fourth Committee Member

Bruna Jacobson

Share

COinS