Skip to main content
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Volume 9811 LNCS, 2016, Pages 67-7418th International Conference on Speech and Computer, SPECOM 2016; Budapest; Hungary; 23 August 2016 through 27 August 2016; Code 179989

A phonetic segmentation procedure based on hidden markov models(Conference Paper)

  Save all to author list
  • aFaculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
  • bAlfaNum Speech Technologies, Novi Sad, Serbia
  • cSpeech Morphing Inc, Campbell, CA, United States

Abstract

In this paper, a novel variant of an automatic phonetic segmentation procedure is presented, especially useful if data is scarce. The procedure uses the Kaldi speech recognition toolkit as its basis, and combines and modifies several existing methods and Kaldi recipes. Both the specifics of model training and test data alignment are explained in detail. Effectiveness of artificial extension of the starting amount of manually labeled material during training is examined as well. Experimental results show the admirable overall correctness of the proposed procedure in the given test environment. Several variants of the procedure are compared, and the usage of speaker-adapted context-dependent triphone models trained without the expanded manually checked data is proven to produce the best results. A few ways to improve the procedure even more, as well as future work, are also discussed. © Springer International Publishing Switzerland 2016.

Author keywords

Hidden markov modelsKaldiPhonetic segmentation

Indexed keywords

Engineering controlled terms:LinguisticsMarkov processesSpeech recognition
Engineering uncontrolled termsContext dependentKaldiModel trainingPhonetic segmentationTest dataTest EnvironmentTriphones
Engineering main heading:Hidden Markov models

Funding details

Funding sponsor Funding number Acronym
Ministarstvo Prosvete, Nauke i Tehnološkog RazvojaTR32035MPNTR
  • 1

    This research was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia, under Grant No. TR32035. The authors are grateful to the company “Speech Morphing, Inc.” from Campbell, CA, USA, for providing the speech corpora for the experiments.

  • ISSN: 03029743
  • ISBN: 978-331943957-0
  • Source Type: Book Series
  • Original language: English
  • DOI: 10.1007/978-3-319-43958-7_7
  • Document Type: Conference Paper
  • Volume Editors: Ronzhin A.,Potapova R.,Nemeth G.
  • Sponsors:
  • Publisher: Springer Verlag

  Popović, B.; Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;
© Copyright 2017 Elsevier B.V., All rights reserved.

Cited by 3 documents

Savchenko, V.V. , Savchenko, A.V.
Guaranteed Significance Level Criterion in Automatic Speech Signal Segmentation
(2020) Journal of Communications Technology and Electronics
Gong, R. , Serra, X.
Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions
(2018) Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Popović, B. , Pakoci, E. , Pekar, D.
End-to-end large vocabulary speech recognition for the serbian language
(2017) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View details of all 3 citations
{"topic":{"name":"Phoneme; Markov Process; Speech Recognition","id":37378,"uri":"Topic/37378","prominencePercentile":33.052864,"prominencePercentileString":"33.053","overallScholarlyOutput":0},"dig":"2254216a6674b4b7dc0bd8c7f41fa8dc097f810a83901cbf1b989aa4e377a725"}

SciVal Topic Prominence

Topic:
Prominence percentile: