Skip to main content
EntropyVolume 24, Issue 3, March 2022, Article number 414

Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech(Article)(Open Access)

  Save all to author list
  • aFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, Novi Sad, 21000, Serbia
  • bFaculty of Electronic Engineering, University of Nis, Aleksandra Medvedeva 14, Nis, 18000, Serbia
  • cFaculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Ive Lole Ribara 29, Kosovska Mitrovica, 38220, Serbia

Abstract

Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.

Author keywords

convolutional neural networkemotional speechquantizationspeaker recognition

Funding details

Funding sponsor Funding number Acronym
Science Fund of the Republic of Serbia6524560,6527104,AI-Com-in-AI
  • 1

    Funding: This research was supported by the Science Fund of the Republic of Serbia (grant #6524560, AI—S ADAPT and grant #6527104, AI-Com-in-AI).

  • ISSN: 10994300
  • Source Type: Journal
  • Original language: English
  • DOI: 10.3390/e24030414
  • Document Type: Article
  • Publisher: MDPI

  Simić, N.; Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, Novi Sad, Serbia;
© Copyright 2022 Elsevier B.V., All rights reserved.

Cited by 16 documents

Jahangir, R. , Alreshoodi, M. , Khaled Alarfaj, F.
Spectrogram Features-Based Automatic Speaker Identification For Smart Services
(2025) Applied Artificial Intelligence
Đurkić, T. , Simić, N. , Suzić, S.
Multimodal Emotion Recognition Using Compressed Graph Neural Networks
(2025) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Tomar, S. , Koolagudi, S.G.
Transformation of Emotional Speech to Anger Speech to Reduce Mismatches in Testing and Enrollment Speech for Speaker Recognition System
(2025) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View details of all 16 citations
{"topic":{"name":"Speaker Identification; Speech Communication; Discriminant Analysis","id":832,"uri":"Topic/832","prominencePercentile":83.47038,"prominencePercentileString":"83.470","overallScholarlyOutput":0},"dig":"f928e1b938460db6c9eac6bafb80afd2f43527959e6ea6af1737e85615fdae9c"}

SciVal Topic Prominence

Topic:
Prominence percentile: