Computer Science ETDs

Publication Date

Spring 5-17-2025

Abstract

Modern drug discovery and chemical biology research relies heavily on analyzing bioassay data. One of the many challenges in bioassay data analysis is identifying false trails, i.e., chemical compounds which initially appear to have desirable activity but are found to be problematic upon further investigation. Badapple (the BioAssay-Data Associative Promiscuity Pattern Learning Engine) was created over ten years ago to help researchers identify promiscuous compounds and thus avoid a common source of these false trails. Through an effort involving software engineering, cheminformatics, and biomedical data science we have developed Badapple 2.0, which incorporates updated assay records and expanded data semantics. The expanded semantics offer additional insights into Badapple’s predictions and have supported novel, in-depth analyses which demonstrate the comprehensiveness of its data. Badapple 2.0 was developed as part of an ongoing anti-alphaviral discovery effort, and has high potential for improving the efficiency of other early-stage drug discovery projects.

Language

English

Keywords

Cheminformatics, Data Science

Document Type

Thesis

Degree Name

Computer Science

Level of Degree

Masters

Department Name

Department of Computer Science

First Committee Member (Chair)

Xin Chen

Second Committee Member

Jeremy Yang

Third Committee Member

Christophe Lambert

Share

COinS