Electrical and Computer Engineering ETDs

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Mario J. Esparza Perez, University of New Mexico - Main CampusFollow

Publication Date

Summer 7-14-2021

Abstract

Audio recordings of collaborative learning environments contain a constant presence of cross-talk and background noise. Dynamic speech recognition between Spanish and English is required in these environments. To eliminate the standard requirement of large-scale ground truth, the thesis develops a simulated dataset by transforming audio transcriptions into phonemes and using 3D speaker geometry and data augmentation to generate an acoustic simulation of Spanish and English speech. The thesis develops a low-complexity neural network for recognizing Spanish and English phonemes (available at github.com/muelitas/keywordRec). When trained on 41 English phonemes, 0.099 PER is achieved on Speech Commands. When trained on 36 Spanish phonemes and tested on real recordings of collaborative learning environments, a 0.7208 LER is achieved. Slightly better than Google’s Speech-to-text 0.7272 LER, which used anywhere from 15 to 1,635 times more parameters and trained on 300 to 27,500 hours of real data as opposed to 13 hours of simulated audios.

Keywords

Bilingual Speech Recognition, Neural Networks, Noisy Speech Recognition, Phonemes, CTC, Speech Synthesis and Simulation

Document Type

Thesis

Language

English

Degree Name

Computer Engineering

Level of Degree

Masters

Department Name

Electrical and Computer Engineering

First Committee Member (Chair)

Marios Pattichis

Second Committee Member

Ramiro Jordan

Third Committee Member

Sylvia Celedón-Pattichis

Fourth Committee Member

Balasubramaniam Santhanam

Recommended Citation

Esparza Perez, Mario J.. "Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments." (2021). https://digitalrepository.unm.edu/ece_etds/563

Download

Included in

Electrical and Computer Engineering Commons

COinS

Electrical and Computer Engineering ETDs

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Publication Date

Abstract

Keywords

Document Type

Language

Degree Name

Level of Degree

Department Name

First Committee Member (Chair)

Second Committee Member

Third Committee Member

Fourth Committee Member

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Electrical and Computer Engineering ETDs

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Author

Publication Date

Abstract

Keywords

Document Type

Language

Degree Name

Level of Degree

Department Name

First Committee Member (Chair)

Second Committee Member

Third Committee Member

Fourth Committee Member

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links