Electrical and Computer Engineering ETDs

Publication Date

Summer 8-1-2023

Abstract

Speaker diarization from a single microphone is extremely challenging in noisy classroom environments. A new method based on simulating a microphone array has shown promising results, while requiring very little training. It used a minimum distance classifier to identify the speaker among a list of possible speakers. This thesis investigates machine learning methods for determining the speaker. The AOLME dataset that was used contains 758 samples totaling 894.4 seconds. Each is taken from a noisy classroom environment, focusing on five speakers - any one of whom could be active in a given sample - lasting an average of 1.2 seconds. Data augmentation effectively doubled the samples in the data set. The machine learning schemes tested were a neural network, support vector machine, k nearest neighbors, random forests, gradient boosting and voting classifier integrating several of these. Our best performance of 86.4% classification accuracy was achieved with random forests.

Document Type

Thesis

Language

English

Degree Name

Computer Engineering

Level of Degree

Masters

Department Name

Electrical and Computer Engineering

First Committee Member (Chair)

Marios Pattichis

Second Committee Member

Xiang Sun

Third Committee Member

Ali Bidram

Fourth Committee Member

Tyler Lovelly

Share

COinS