Electrical and Computer Engineering ETDs
Publication Date
8-25-2016
Abstract
In the area of speech science, one particular problem of importance has been to develop a clear method for detecting hypernasality in speech. For speech pathologists, hypernsality is a critical diagnostic used for judging the severity of velopharyngeal (nasal cavity/mouth separation) inadequacy in children with a cleft lip or cleft palate condition. For physicians and particularly neurologists, these same velopharyngeal inadequacies are believed to be linked to nervous system disorders such as Alzheimers disease and particularly Parkinson's disease. One can therefore envision the need to not only find a reliable method for detecting hypernasality, but to also quantify the level (severity) of hypernasality as well. An integral component in the study of speech is the analysis of speech formants, i.e., vocal tract resonances. Traditional acoustical analysis methods of using a linear source model follow the premise that differences between normal and hypernasal speech can be distinguished by shifts or power changes in the formant frequencies and/or the widening (or narrowing) of the formant bandwidths. Such a premise, however, has not been validated with consistency. Part of the reason is that traditional acoustical analysis methods such as one-third octave band, LPC (Linear Predictive Coding), and cepstral analysis are ill-equipped to deal with the nonlinear, non-stationary, and wideband characteristics of normal and nasal speech signals. Relatively newer DSP methods that employ group delay or energy separation overcome some of these problems, but have their own issues such as possible mode mixing, noise, and the aforementioned wideband problem. However, initial investigations into energy separation methods show promise as long as these issues can be resolved. This thesis evaluates the success of a novel acoustical energy approach which deals with the mode mixing and wideband problems where: (1) a DSP sifting algorithm known as the EMD (Empirical Mode Decomposition) is first implemented to decompose the voice signal into a number of IMFs (Intrinsic Mode Functions). (2) Energy analysis is performed on each IMF via the Teager-Kaiser Energy Operator. The proposed EMD energy approach is applied to voice samples taken from the American CLP Craniofacial database and is shown to produce a clear delineation between normal and nasal samples and between different levels of hypernasality.'
Keywords
hypernasality, Teager-Kaiser, Emperical Mode Decomposition, formant
Document Type
Thesis
Language
English
Degree Name
Electrical Engineering
Level of Degree
Masters
Department Name
Electrical and Computer Engineering
First Committee Member (Chair)
Jordan, Ramiro
Second Committee Member
Neel, Amy
Third Committee Member
Santhanam, Bal
Recommended Citation
De La Cruz, Christopher. "Hypernasal Speech Analysis via Emperical Mode Decomposition and the Teager-Kasiser Energy Operator." (2016). https://digitalrepository.unm.edu/ece_etds/65