Skip to main content
Applied Sciences (Switzerland)Volume 14, Issue 4, February 2024, Article number 1325

Enhancing Emotion Recognition through Federated Learning: A Multimodal Approach with Convolutional Neural Networks(Article)(Open Access)

  Save all to author list
  • aFaculty of Technical Sciences, University of Novi Sad, Novi Sad, 21000, Serbia
  • bFaculty of Sciences, University of Novi Sad, Novi Sad, 21000, Serbia

Abstract

Human–machine interaction covers a range of applications in which machines should understand humans’ commands and predict their behavior. Humans commonly change their mood over time, which affects the way we interact, particularly by changing speech style and facial expressions. As interaction requires quick decisions, low latency is critical for real-time processing. Edge devices, strategically placed near the data source, minimize processing time, enabling real-time decision-making. Edge computing allows us to process data locally, thus reducing the need to send sensitive information further through the network. Despite the wide adoption of audio-only, video-only, and multimodal emotion recognition systems, there is a research gap in terms of analyzing lightweight models and solving privacy challenges to improve model performance. This motivated us to develop a privacy-preserving, lightweight, CNN-based (CNNs are frequently used for processing audio and video modalities) audiovisual emotion recognition model, deployable on constrained edge devices. The model is further paired with a federated learning protocol to preserve the privacy of local clients on edge devices and improve detection accuracy. The results show that the adoption of federated learning improved classification accuracy by ~2%, as well as that the proposed federated learning-based model provides competitive performance compared to other baseline audiovisual emotion recognition models. © 2024 by the authors.

Author keywords

artificial intelligenceemotion recognitionfederated learningmachine learningmultimodal

Funding details

Funding sponsor Funding number Acronym
European Commission
See opportunities by EC
EC
957337
  • 1

    This work was funded by the European Union\u2019s Horizon 2020 research and innovation program MARVEL under grant agreement No 957337. This publication reflects the authors\u2019 views only. The European Commission is not responsible for any use that may be made of the information it contains.

  • ISSN: 20763417
  • Source Type: Journal
  • Original language: English
  • DOI: 10.3390/app14041325
  • Document Type: Article
  • Publisher: Multidisciplinary Digital Publishing Institute (MDPI)

  Simić, N.; Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;
© Copyright 2024 Elsevier B.V., All rights reserved.

Cited by 11 documents

Vajrobol, V. , Saxena, G.J. , Pundir, A.
A Comprehensive Survey on Federated Learning Applications in Computational Mental Healthcare
(2025) CMES - Computer Modeling in Engineering and Sciences
Đurkić, T. , Simić, N. , Suzić, S.
Multimodal Emotion Recognition Using Compressed Graph Neural Networks
(2025) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Tan, C. , Li, S. , Cao, Y.
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
(2024) Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
View details of all 11 citations
{"topic":{"name":"Speech Emotion Recognition; Neural Network; Speech Analysis","id":273,"uri":"Topic/273","prominencePercentile":99.39232,"prominencePercentileString":"99.392","overallScholarlyOutput":0},"dig":"7afaf14a28f22cd0450f78555ff22b40e7844995ffa8c33c6a7590b626295775"}

SciVal Topic Prominence

Topic:
Prominence percentile: