Skip to main content
Advances in Electrical and Computer EngineeringVolume 17, Issue 1, 2017, Pages 21-26

Comparison of cepstral normalization techniques in whispered speech recognition(Article)(Open Access)

  Save all to author list
  • aSchool of Electrical Engineering, University of Belgrade, Bulevar kralja Aleksandra 73, Belgrade, 11120, Serbia
  • bFaculty of Electrical Engineering, University of Banja Luka, Patre 5, Banja Luka, 78000, Bosnia and Herzegovina
  • cČačak Technical College, Svetog Save 65, Čačak, 32000, Serbia

Abstract

This article presents an analysis of different cepstral normalization techniques in automatic recognition of whispered and bimodal speech (speech+whisper). In these experiments, conventional GMM-HMM speech recognizer was used as speaker-dependant automatic speech recognition system with special Whi-Spe corpus containing utterance recordings in normally phonated speech and whisper. The following normalization techniques were tested and compared: CMN (Cepstral Mean Normalization), CVN (Cepstral Variance Normalization), MVN (Cepstral Mean and Variance Normalization), CGN (Cepstral Gain Normalization) and quantile-based dynamic normalization techniques such as QCN and QCN-RASTA. The experimental results show to what extent each of these cepstral normalization techniques can improve whisper recognition accuracy in mismatched train/test scenario. The best result is obtained using CMN in combination with inverse filtering which provides an average 39.9 percent improvement in whisper recognition accuracy for all tested speakers.

Author keywords

Automatic speech recognitionCepstral analysisHidden Markov modelsSpeech analysisWhisper
  • ISSN: 15827445
  • Source Type: Journal
  • Original language: English
  • DOI: 10.4316/AECE.2017.01004
  • Document Type: Article
  • Publisher: University of Suceava

  Grozdić, D.; School of Electrical Engineering, University of Belgrade, Bulevar kralja Aleksandra 73, Belgrade, Serbia;
© Copyright 2017 Elsevier B.V., All rights reserved.

Cited by 10 documents

Galić, J. , Marković, B. , Grozdić, Đ.
Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering
(2024) Applied Sciences (Switzerland)
Labied, M. , Belangour, A. , Banane, M.
Fine-Tuning Whisper for Speech Translation: A Case Study on Translating Darija to Arabic
(2024) 2024 International Conference on Decision Aid Sciences and Applications, DASA 2024
Alenizi, A.S. , Al-Karawi, K.A.
Speaker Recognition with Deep Learning Approaches: A Review
(2024) Lecture Notes in Networks and Systems
View details of all 10 citations
{"topic":{"name":"Speech Communication; Neural Network; Audio Signal Processing","id":61698,"uri":"Topic/61698","prominencePercentile":58.277027,"prominencePercentileString":"58.277","overallScholarlyOutput":0},"dig":"1524f452c86b1f77a831aac958f159b03eb0669088488551527c47152a8d03a8"}

SciVal Topic Prominence

Topic:
Prominence percentile: