Computer Science ETDs

Publication Date

Summer 7-15-2017


In the era of new technologies, computer scientists deal with massive data of size hundreds of terabytes. Smart cities, social networks, health care systems, large sensor networks, etc. are constantly generating new data. It is non-trivial to extract knowledge from big datasets because traditional data mining algorithms run impractically on such big datasets. However, distributed systems have come to aid this problem while introducing new challenges in designing scalable algorithms. The transition from traditional algorithms to the ones that can be run on a distributed platform should be done carefully. Researchers should design the modern distributed algorithms based on the problem domain. The main goal of this dissertation is to demonstrate the importance of domain specific knowledge in developing scalable knowledge discovery algorithms on distributed systems. Data properties such as origin, type, context and size play important roles to achieve speed, efficiency and scalability. In this dissertation, I describe three domain specific knowledge discovery systems on three diverse domains: a distributed algorithm to extract patterns from log messages generated by computers, a distributed algorithm to find abnormal behavior in social media, and a scalable algorithm for matching patterns in streaming time series data. I explain how to exploit the data properties in a distributed knowledge discovery system to achieve scalability and speed. The algorithms achieve horizontal scalability for any data size, and the systems are currently deployed at the University of New Mexico.




Data mining, distributed computing, big data, knowledge discovery, Time series mining

Document Type


Degree Name

Computer Science

Level of Degree


Department Name

Department of Computer Science

First Committee Member (Chair)

Abdullah Mueen

Second Committee Member

Shuang Luan

Third Committee Member

Trilce Estrada

Fourth Committee Member

Amy Neel