Publication Date

Fall 11-4-2022

Abstract

In today’s world, deep learning models are widely used in a variety of fields. Audio

applications include speech recognition, audio classification, and music information

retrieval. In this paper, we will focus on the classification of music genres using an

artificial neural network. The development of audio machine learning techniques has

created an independence from traditional, more time-consuming signal processing

techniques. Starting with raw audio data, we will gain an understanding of what

audio is and its digital representation. Then, the focus will be on obtaining frequency

information from audio signals through the use of spectrograms. Transforming the

spectrograms into the perceptually relevant mel scale allows us to eventually extract

mel frequency cepstral coefficients (MFCC) from audio files. We will then make use

of our network architecture to process the MFCC’s. A convolutional neural network,

our network of choice here, is trained to classify audio files into one of nine musical

genres with an accuracy of 89.1% using the GTZAN dataset, which is only about 4

percentage points below the state-of-the-art performance for this dataset.

Degree Name

Mathematics

Level of Degree

Masters

Department Name

Mathematics & Statistics

First Committee Member (Chair)

Mohammad Motamed

Second Committee Member

Jehanzeb Chaudhary

Third Committee Member

Jacob Schroder

Language

English

Document Type

Thesis

Share

COinS