Skip to main content
Journal of Communications Technology and ElectronicsVolume 62, Issue 11, 1 November 2017, Pages 1255-1261

Whispered speech recognition based on gammatone filterbank cepstral coefficients(Article)

  Save all to author list
  • Telecommunication Department, School of Electrical Engineering, University of Belgrade, Belgrade, 11000, Serbia

Abstract

This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios. © 2017, Pleiades Publishing, Inc.

Indexed keywords

Engineering controlled terms:Filter banksHidden Markov modelsMarkov processesSpeech
Engineering uncontrolled termsCepstral coefficientsCepstral mean subtractionGammatone filterbankIsolated wordsON dynamicsSpeaker dependentsWhispered speech
Engineering main heading:Speech recognition

Funding details

Funding sponsor Funding number Acronym
Ministarstvo Prosvete, Nauke i Tehnološkog Razvoja32032,178027,OI-178027,TR-32032MPNTR
  • 1

    ACKNOWLEDGMENTS This research was financed in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia within the projects OI-178027 and TR-32032.

  • ISSN: 10642269
  • CODEN: JTELE
  • Source Type: Journal
  • Original language: English
  • DOI: 10.1134/S1064226917110134
  • Document Type: Article
  • Publisher: Maik Nauka-Interperiodica Publishing

  Marković, B.; Telecommunication Department, School of Electrical Engineering, University of Belgrade, Belgrade, Serbia;
© Copyright 2017 Elsevier B.V., All rights reserved.

Cited by 7 documents

Habeeb, I.Q. , Al-Zaydi, Z.Q. , Abdulkhudhur, H.N.
SPEECH RECOGNITION IN VIDEOS USING FEATURES OF DIFFERENT ENHANCEMENT FILTERS
(2024) Eurasian Journal of Mathematical and Computer Applications
Rajeshwari, B.S. , Ghosh, N.
Classification of Valvular Diseases using Phonocardiogram without Segmentation and Gammatone Filters
(2023) International Conference on Smart Systems for Applications in Electrical Sciences, ICSSES 2023
Habeeb, I.Q. , Abdulkhudhur, H.N. , Al-Zaydi, Z.Q.
Three N-grams Based Language Model for Auto-correction of Speech Recognition Errors
(2021) Communications in Computer and Information Science
View details of all 7 citations
{"topic":{"name":"Speech Communication; Neural Network; Audio Signal Processing","id":61698,"uri":"Topic/61698","prominencePercentile":58.277027,"prominencePercentileString":"58.277","overallScholarlyOutput":0},"dig":"1524f452c86b1f77a831aac958f159b03eb0669088488551527c47152a8d03a8"}

SciVal Topic Prominence

Topic:
Prominence percentile: