Electrical and Computer Engineering ETDs

Publication Date

Spring 4-15-2025

Abstract

This thesis describes the development of a speech recognition system to classify Navajo (Dine) words using Low Resource Language (LRL) datasets. Presently there are no recognized high-quality open-sourced datasets for the Dine language needed to train models for speech recognition. A small balanced dataset was designed to train several models. To overcome the scarcity of the LRL dataset, the audio recordings were augmented to account for time-stretching, amplitude variations, time shifts, small amounts of white Gaussian noise, and SpecAugmentation. The models included a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), and a Long Short-Term Memory (LSTM) model with a leave-one-out method applied to three of the ten speakers. The results compare nine methods: SGD, Momentum, Nesterov, AdaGrad, RMSProp, Adam, Adamax, Nadam, and AdamW. This demonstrated excellent classification performance on LRL models for Navajo Speech Recognition. Our system can reliably recognize a small number of Navajo words spoken by new speakers that the System has not been previously trained on.

Keywords

Diné, Navajo Speech Recognition, Low Resource Language Datasets

Document Type

Thesis

Language

English

Degree Name

Electrical Engineering

Level of Degree

Masters

Department Name

Electrical and Computer Engineering

First Committee Member (Chair)

Dr. Mario Pattichis

Second Committee Member

Dr. Xiang Sun

Third Committee Member

Dr. Melvatha Chee

Fourth Committee Member

Dr. Ali Bidram

Share

COinS