Skip to main content
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Volume 11096 LNAI, 2018, Pages 522-53120th International Conference on Speech and Computer, SPECOM 2018; Leipzig; Germany; 18 September 2018 through 22 September 2018; Code 218179

A Comparison of Language Model Training Techniques in a Continuous Speech Recognition System for Serbian(Conference Paper)

  Save all to author list
  • aDepartment for Power, Electronic and Telecommunication Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, 21000, Serbia
  • bAlfaNum Speech Technologies, Bulevar Vojvode Stepe 40, Novi Sad, 21000, Serbia
  • cDepartment for Music Production and Sound Design, Academy of Arts, Alfa BK University, Nemanjina 28, Belgrade, 11000, Serbia
  • dIve Andrića 1A, Odžaci, 25250, Serbia

Abstract

In this paper, a number of language model training techniques will be examined and utilized in a large vocabulary continuous speech recognition system for the Serbian language (more than 120000 words), namely Mikolov and Yandex RNNLM, TensorFlow based GPU approaches and CUED-RNNLM approach. The baseline acoustic model is a chain sub-sampled time delayed neural network, trained using cross-entropy training and a sequence-level objective function on a database of about 200 h of speech. The baseline language model is a 3-gram model trained on the training part of the database transcriptions and the Serbian journalistic corpus (about 600000 utterances), using the SRILM toolkit and the Kneser-Ney smoothing method, with a pruning value of 10−7 (previous best). The results are analyzed in terms of word and character error rates and the perplexity of a given language model on training and validation sets. Relative improvement of 22.4% (best word error rate of 7.25%) is obtained in comparison to the baseline language model. © 2018, Springer Nature Switzerland AG.

Author keywords

Language modelingLSTMLVCSRRNNLM

Indexed keywords

Engineering controlled terms:Computational linguisticsContinuous speech recognitionModeling languagesNatural language processing systemsSpeech
Engineering uncontrolled termsLanguage modelLarge vocabulary continuous speech recognitionLSTMLVCSRObjective functionsRNNLMTime-delayed neural networksWord and characters
Engineering main heading:Long short-term memory

Funding details

Funding sponsor Funding number Acronym
Ministarstvo Prosvete, Nauke i Tehnološkog RazvojaMPNTR
Provincial Secretariat for Higher Education and Scientific Research, Autonomous Province of Vojvodina114-451-2570/2016-02
  • 1

    Acknowledgments. The work described in this paper was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia, within the project “Development of Dialogue Systems for Serbian and Other South Slavic Languages”, EUREKA project DANSPLAT, “A Platform for the Applications of Speech Technologies on Smartphones for the Languages of the Danube Region”, id E! 9944, and the Provincial Secretariat for Higher Education and Scientific Research, within the project “Central Audio-Library of the University of Novi Sad”, No. 114-451-2570/2016-02.

  • ISSN: 03029743
  • ISBN: 978-331999578-6
  • Source Type: Book Series
  • Original language: English
  • DOI: 10.1007/978-3-319-99579-3_54
  • Document Type: Conference Paper
  • Volume Editors: Potapova R.,Jokisch O.,Karpov A.
  • Sponsors:
  • Publisher: Springer Verlag

  Popović, B.; Department for Power, Electronic and Telecommunication Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, Serbia;
© Copyright 2018 Elsevier B.V., All rights reserved.

Cited by 4 documents

Pakoci, E.T. , Popovic, B.Z.
Recurrent Neural Networks and Morphological Features in Language Modeling for Serbian
(2021) 2021 29th Telecommunications Forum, TELFOR 2021 - Proceedings
Popović, B.Z. , Pakoci, E.T. , Pekar, D.J.
Transfer Learning for Domain and Environment Adaptation in Serbian ASR
(2020) Telfor Journal
Popovic, B. , Pakoci, E. , Pekar, D.
Transfer Leaming in Automatic Speech Recognition for Serbian
(2019) 27th Telecommunications Forum, TELFOR 2019
View details of all 4 citations
{"topic":{"name":"Computational Linguistics; Modeling Language; Speech Recognition","id":10377,"uri":"Topic/10377","prominencePercentile":51.398098,"prominencePercentileString":"51.398","overallScholarlyOutput":0},"dig":"11c57b3db6cbcac537fce9ae5c10dfc1a2e9d7c4ec120f977323e62ab4b460be"}

SciVal Topic Prominence

Topic:
Prominence percentile: