Computer Science ETDs
Publication Date
Summer 7-15-2017
Abstract
In the era of new technologies, computer scientists deal with massive data of size hundreds of terabytes. Smart cities, social networks, health care systems, large sensor networks, etc. are constantly generating new data. It is non-trivial to extract knowledge from big datasets because traditional data mining algorithms run impractically on such big datasets. However, distributed systems have come to aid this problem while introducing new challenges in designing scalable algorithms. The transition from traditional algorithms to the ones that can be run on a distributed platform should be done carefully. Researchers should design the modern distributed algorithms based on the problem domain. The main goal of this dissertation is to demonstrate the importance of domain specific knowledge in developing scalable knowledge discovery algorithms on distributed systems. Data properties such as origin, type, context and size play important roles to achieve speed, efficiency and scalability. In this dissertation, I describe three domain specific knowledge discovery systems on three diverse domains: a distributed algorithm to extract patterns from log messages generated by computers, a distributed algorithm to find abnormal behavior in social media, and a scalable algorithm for matching patterns in streaming time series data. I explain how to exploit the data properties in a distributed knowledge discovery system to achieve scalability and speed. The algorithms achieve horizontal scalability for any data size, and the systems are currently deployed at the University of New Mexico.
Language
English
Keywords
Data mining, distributed computing, big data, knowledge discovery, Time series mining
Document Type
Dissertation
Degree Name
Computer Science
Level of Degree
Doctoral
Department Name
Department of Computer Science
First Committee Member (Chair)
Abdullah Mueen
Second Committee Member
Shuang Luan
Third Committee Member
Trilce Estrada
Fourth Committee Member
Amy Neel
Recommended Citation
Hamooni, Hossein. "Distributed Knowledge Discovery for Diverse Data." (2017). https://digitalrepository.unm.edu/cs_etds/86
Included in
Numerical Analysis and Scientific Computing Commons, Other Computer Engineering Commons, Other Computer Sciences Commons