Computer Science ETDs

Publication Date

Summer 5-28-2017


Online data contains a wealth of information, but as with most user-generated content, it is full of noise, fraud, and automated behavior. The prevalence of "junk" and fraudulent text affects users, businesses, and researchers alike. To make matters worse, there is a lack of ground truth data for these types of text, and the appearance of the text is constantly changing as fraudsters adapt to pressures from hosting sites. The goal of my dissertation is therefore to extract high-quality content from and identify fraudulent and automated behavior in large, complex social media datasets in the absence of ground truth data. Specifically, in my dissertation I design a collection of data inspection, filtering, fusion, mining, and exploration algorithms to: automate data cleaning to produce usable data for mining algorithms, quantify the trustworthiness of business behavior in online e-commerce sites, and efficiently identify automated accounts in large and constantly changing social networks. The main components of this work include: noise removal, data fusion, multi-source feature generation, network exploration, and anomaly detection.




Bot detection, anomaly detection, unsupervised methods, spam, Twitter, review spam

Document Type


Degree Name

Computer Science

Level of Degree


Department Name

Department of Computer Science

First Committee Member (Chair)

Abdullah Mueen

Second Committee Member

Jedidiah Crandall

Third Committee Member

Shuang Luan

Fourth Committee Member

Michalis Faloutsos