

This paper presents results on whispered speech recognition of isolated words with Whi-Spe database, in speaker dependent mode. Word recognition rate is calculated for all speakers, four train/test scenarios, three values of mixture components, with modeling of context independent monophones, context dependent triphones and whole words. As a feature vector, Mel Frequency Cepstral Coefficients was used. The HTK, toolkit for building Hidden Markov Models, was used to implement isolated word recognizer. The best obtained results in match scenarios showed nearly equal recognition rate of 99.86% in normal speech recognition, and 99.90% in whispered speech recognition. Specifically, in mismatch scenarios, the best achieved recognition rate was 64.80% for training on part of normally phonated speech and testing on whispered speech and, in the opposite case, with training on whispered speech, the normal speech recognition was 74.88%. © Springer International Publishing Switzerland 2014.
| Engineering controlled terms: | Audio signal processingHidden Markov modelsSpeechSpeech communication |
|---|---|
| Engineering uncontrolled terms | Context dependentContext independentMel frequency cepstral co-efficientMixture componentsSpeaker dependentsSpeech signal processingWhispered speechWord recognition |
| Engineering main heading: | Speech recognition |
Galić, J.; School of Electrical Engineering, University of Belgrade, Telecommunications Department, Belgrade, Serbia
© Copyright 2020 Elsevier B.V., All rights reserved.