

As machines play an increasing role in people's daily lives, human-machine communication needs to become more similar to communication between two people. For this reason, the need for automatic emotion recognition from speech has arisen. The aim of this paper is to compare the performance of different machine learning algorithms in automatic emotion recognition on two corpora of expressive speech in the Serbian language, one containing speech samples delivered by professional actors, and the other one produced by amateurs. In both cases acoustic features were extracted using the OpenSmile toolkit. The machine learning algorithms under investigation include: k-nearest neighbours, support vector machines and decision trees. The best performance was achieved by support vector machines with dimensionality reduced by principal component analysis. This support was shown to achieve the accuracy of more than 80% for each of 5 analyzed emotions (joy, sadness, fear, anger and neutral) on the amateur speech corpus. © 2021 IEEE.
| Engineering controlled terms: | Decision treesLearning algorithmsNearest neighbor searchPrincipal component analysisSpeech recognitionSupport vector machines |
|---|---|
| Engineering uncontrolled terms | Acoustic featuresAutomatic emotion recognitionDaily livesEmotion recognitionEmotion recognition from speechExpressive-speechHuman-machine communicationMachine learning algorithmsPerformanceSupport vectors machine |
| Engineering main heading: | Speech |
© Copyright 2022 Elsevier B.V., All rights reserved.