Speaker diarization from a single microphone is extremely challenging in noisy classroom environments. A new method based on simulating a microphone array has shown promising results, while requiring very little training. It used a minimum distance classifier to identify the speaker among a list of possible speakers. This thesis investigates machine learning methods for determining the speaker. The AOLME dataset that was used contains 758 samples totaling 894.4 seconds. Each is taken from a noisy classroom environment, focusing on five speakers - any one of whom could be active in a given sample - lasting an average of 1.2 seconds. Data augmentation effectively doubled the samples in the data set. The machine learning schemes tested were a neural network, support vector machine, k nearest neighbors, random forests, gradient boosting and voting classifier integrating several of these. Our best performance of 86.4% classification accuracy was achieved with random forests.
Level of Degree
Electrical and Computer Engineering
First Committee Member (Chair)
Second Committee Member
Third Committee Member
Fourth Committee Member
Briggs, Richard. "Speaker Diarization of Noisy Classrooms from a Single Microphone Based on an Array of Virtual Microphones and Machine Learning." (2023). https://digitalrepository.unm.edu/ece_etds/605