• Sujan Reddy Kotha
  • 26.05.2011
  • 25.11.2011
Automatic Speech Recognition (ASR) systems work very reliably if close-talking microphones are used for speech input. If the distance between speaker and microphone increases, the recognition is often hampered by reverberation and other types of distortions. A major reason for this is that typical recognizers are based on first-order hidden Markov models (HMMs) assuming that the current speech feature vector is conditionally independent of the previous ones. Reverberation, however, has a dispersive effect on the feature vectors, which significantly increases the inter-frame correlation and thus limits the performance of such recognizers. Different techniques have already been investigated in order to model the inter-frame dependency, e.g., differential features and frame-wise model adaptation.

In this thesis, the concepts of first- and second-order HMMs shall be merged to form a “combined-order” HMM (CO-HMM). Such a CO-HMM is to be designed so that the transition probabilities are independent of the previous state, whereas each state is composed of different output probability density functions (pdfs) depending on its predecessor. An open theoretical issue to be treated in this work is the procedure at HMM boundaries, where no unique preceding state exists. For training, the Baum-Welch method shall be applied to initially set up a conventional first-order HMM. The ICEWIND approach can then be employed to estimate the predecessor-dependent output pdfs. For recognition, the Viterbi decoder has to be adapted accordingly. The practical implementation shall be realized by extending the routines of the ASR toolkit HTK. Finally, connected-digit recognition experiments based on the TI digit corpus are to be carried out to assess the performance of the proposed concept.