Computer Science ETDs
Publication Date
Summer 5-28-2017
Abstract
Online data contains a wealth of information, but as with most user-generated content, it is full of noise, fraud, and automated behavior. The prevalence of "junk" and fraudulent text affects users, businesses, and researchers alike. To make matters worse, there is a lack of ground truth data for these types of text, and the appearance of the text is constantly changing as fraudsters adapt to pressures from hosting sites. The goal of my dissertation is therefore to extract high-quality content from and identify fraudulent and automated behavior in large, complex social media datasets in the absence of ground truth data. Specifically, in my dissertation I design a collection of data inspection, filtering, fusion, mining, and exploration algorithms to: automate data cleaning to produce usable data for mining algorithms, quantify the trustworthiness of business behavior in online e-commerce sites, and efficiently identify automated accounts in large and constantly changing social networks. The main components of this work include: noise removal, data fusion, multi-source feature generation, network exploration, and anomaly detection.
Language
English
Keywords
Bot detection, anomaly detection, unsupervised methods, spam, Twitter, review spam
Document Type
Dissertation
Degree Name
Computer Science
Level of Degree
Doctoral
Department Name
Department of Computer Science
First Committee Member (Chair)
Abdullah Mueen
Second Committee Member
Jedidiah Crandall
Third Committee Member
Shuang Luan
Fourth Committee Member
Michalis Faloutsos
Recommended Citation
Minnich, Amanda Jean. "Spam, Fraud, and Bots: Improving the Integrity of Online Social Media Data." (2017). https://digitalrepository.unm.edu/cs_etds/85