Skip to main content
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Volume 8773, 2014, Pages 251-25816th International Conference on Speech and Computer, SPECOM 2014; Novi Sad; Serbia; 5 October 2014 through 9 October 2014; Code 109219

HTK-based recognition of whispered speech(Conference Paper)

  Save all to author list
  • aSchool of Electrical Engineering, University of Belgrade, Telecommunications Department, Belgrade, Serbia
  • bUniversity of Banja Luka, Department of Electronics and Telecommunications, Banja Luka, Bosnia and Herzegovina
  • cLife Activities Advancement Center, aboratory for Psychoacoustics and Speech Perception, Belgrade, Serbia
  • dČačak Technical College, Computing and Information Technology Department, Čačak, Serbia

Abstract

This paper presents results on whispered speech recognition of isolated words with Whi-Spe database, in speaker dependent mode. Word recognition rate is calculated for all speakers, four train/test scenarios, three values of mixture components, with modeling of context independent monophones, context dependent triphones and whole words. As a feature vector, Mel Frequency Cepstral Coefficients was used. The HTK, toolkit for building Hidden Markov Models, was used to implement isolated word recognizer. The best obtained results in match scenarios showed nearly equal recognition rate of 99.86% in normal speech recognition, and 99.90% in whispered speech recognition. Specifically, in mismatch scenarios, the best achieved recognition rate was 64.80% for training on part of normally phonated speech and testing on whispered speech and, in the opposite case, with training on whispered speech, the normal speech recognition was 74.88%. © Springer International Publishing Switzerland 2014.

Author keywords

HTKSpeech recognitionSpeech signal processingWhispered speech database

Indexed keywords

Engineering controlled terms:Audio signal processingHidden Markov modelsSpeechSpeech communication
Engineering uncontrolled termsContext dependentContext independentMel frequency cepstral co-efficientMixture componentsSpeaker dependentsSpeech signal processingWhispered speechWord recognition
Engineering main heading:Speech recognition
  • ISSN: 03029743
  • ISBN: 978-331911580-1
  • Source Type: Book Series
  • Original language: English
  • DOI: 10.1007/978-3-319-11581-8_31
  • Document Type: Conference Paper
  • Volume Editors: Ronzhin Y.,Ronzhin Y.,Potapova R.,Delic V.
  • Sponsors: AlfaNum Speech Technologies Ltd,International Speech Communication Association, ISCA,Speech Technology Center Ltd,University of Novi Sad
  • Publisher: Springer Verlag

  Galić, J.; School of Electrical Engineering, University of Belgrade, Telecommunications Department, Belgrade, Serbia
© Copyright 2020 Elsevier B.V., All rights reserved.

Cited by 10 documents

Galić, J. , Marković, B. , Grozdić, Đ.
Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering
(2024) Applied Sciences (Switzerland)
Babić, N. , Galić, J.
An Analysis of Speech Emotion Recognition Based on Hybrid DNN-HMM Framework
(2023) 2023 31st Telecommunications Forum, TELFOR 2023 - Proceedings
Galić, J. , Grozdić, Đ.
Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
(2023) Advances in Electrical and Computer Engineering
View details of all 10 citations
{"topic":{"name":"Speech Communication; Neural Network; Audio Signal Processing","id":61698,"uri":"Topic/61698","prominencePercentile":58.277027,"prominencePercentileString":"58.277","overallScholarlyOutput":0},"dig":"1524f452c86b1f77a831aac958f159b03eb0669088488551527c47152a8d03a8"}

SciVal Topic Prominence

Topic:
Prominence percentile: