Electrical and Computer Engineering ETDs
Publication Date
9-1-2015
Abstract
Given the continuous growth of illicit activities on the Internet, there is a need for intelligent systems to identify malicious web pages. It has been shown that URL anal- ysis is an e\u21b5ective tool for detecting phishing, malware, and other attacks. Previous studies have performed URL classification using a combination of lexical features, network tra c, hosting information, and other strategies. These approaches require time-intensive lookups which introduce significant delay in real-time systems. This paper describes a lightweight approach for classifying malicious web pages using URL lexical analysis alone. The goal is to explore the upper-bound of the classification accuracy of a purely lexical approach. Another aim is to develop an approach which could be used in a real-time system. These goal culminate in the development of a classification system based on lexical analysis of URLs. It correctly classifies URLs of malicious web pages with 99.1% accuracy, a 0.4% false positive rate, an F1-Score of 98.7, and requires 0.62 milliseconds on average. This method substantially out- performs previously published algorithms on out-of-sample data.
Keywords
Machine Learning, Malware Detection, Classification, Malicious Web Pages, Supervised Learning, Natural Language Processing
Sponsors
Amrita Center for CyberSecurity
Document Type
Thesis
Language
English
Degree Name
Computer Engineering
Level of Degree
Masters
Department Name
Electrical and Computer Engineering
First Committee Member (Chair)
Jordan, Ramiro
Second Committee Member
Lamb, Chris
Recommended Citation
Darling, Michael. "A Lexical Approach for Classifying Malicious URLs." (2015). https://digitalrepository.unm.edu/ece_etds/63