Electrical and Computer Engineering ETDs

Publication Date

8-25-2016

Abstract

In the area of speech science, one particular problem of importance has been to develop a clear method for detecting hypernasality in speech. For speech pathologists, hypernsality is a critical diagnostic used for judging the severity of velopharyngeal (nasal cavity/mouth separation) inadequacy in children with a cleft lip or cleft palate condition. For physicians and particularly neurologists, these same velopharyngeal inadequacies are believed to be linked to nervous system disorders such as Alzheimers disease and particularly Parkinson's disease. One can therefore envision the need to not only find a reliable method for detecting hypernasality, but to also quantify the level (severity) of hypernasality as well. An integral component in the study of speech is the analysis of speech formants, i.e., vocal tract resonances. Traditional acoustical analysis methods of using a linear source model follow the premise that differences between normal and hypernasal speech can be distinguished by shifts or power changes in the formant frequencies and/or the widening (or narrowing) of the formant bandwidths. Such a premise, however, has not been validated with consistency. Part of the reason is that traditional acoustical analysis methods such as one-third octave band, LPC (Linear Predictive Coding), and cepstral analysis are ill-equipped to deal with the nonlinear, non-stationary, and wideband characteristics of normal and nasal speech signals. Relatively newer DSP methods that employ group delay or energy separation overcome some of these problems, but have their own issues such as possible mode mixing, noise, and the aforementioned wideband problem. However, initial investigations into energy separation methods show promise as long as these issues can be resolved. This thesis evaluates the success of a novel acoustical energy approach which deals with the mode mixing and wideband problems where: (1) a DSP sifting algorithm known as the EMD (Empirical Mode Decomposition) is first implemented to decompose the voice signal into a number of IMFs (Intrinsic Mode Functions). (2) Energy analysis is performed on each IMF via the Teager-Kaiser Energy Operator. The proposed EMD energy approach is applied to voice samples taken from the American CLP Craniofacial database and is shown to produce a clear delineation between normal and nasal samples and between different levels of hypernasality.'

Keywords

hypernasality, Teager-Kaiser, Emperical Mode Decomposition, formant

Document Type

Thesis

Language

English

Degree Name

Electrical Engineering

Level of Degree

Masters

Department Name

Electrical and Computer Engineering

First Advisor

Santhanam, Bal

First Committee Member (Chair)

Jordan, Ramiro

Second Committee Member

Neel, Amy

Third Committee Member

Santhanam, Bal

Share

COinS