Computer Science ETDs
Publication Date
Spring 4-15-2025
Abstract
Advancing personalized medicine depends on effectively integrating and interpreting the vast, heterogeneous landscape of biological data, from genomic sequences and transcriptomics to the insights embedded in scientific literature. Current machine learning models often focus on single data modalities, limiting their capacity to capture the multifaceted nature of biological systems. We address this gap by developing three attention-based machine-learning models integrating diverse data modalities. Firstly, DeepVul is a multi-task model that leverages cancer transcriptome data to predict genes critical for cancer survival and their corresponding drugs. Subsequently, LitGene refines gene representations by integrating textual information from the scientific literature. Finally, Protein2Text is a large language model that translates protein sequences into natural language descriptions, making complex biochemical data accessible and interpretable. These models echo a comprehensive approach to integrating various data modalities to provide an alternative view of biological systems, paving the way for truly personalized medicine for everyone.
Language
English
Keywords
Transformer models, deep learning, attention mechanisms, gene function prediction, protein sequence analysis, biomedical AI
Document Type
Dissertation
Degree Name
Computer Science
Level of Degree
Doctoral
Department Name
Department of Computer Science
First Committee Member (Chair)
Abdullah Mueen
Second Committee Member
Avinash Sahu
Third Committee Member
Trilce Estrada
Fourth Committee Member
Bruna Jacobson
Recommended Citation
Jararweh, Ala. "Leveraging Attention Mechanism to Unlock Gene and Protein Attributes." (2025). https://digitalrepository.unm.edu/cs_etds/132
Included in
Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Cancer Biology Commons, Computational Biology Commons, Data Science Commons, Genetics Commons