Multimodal Emotion Recognition Using CNNs: A Web-Based Application for Facial and Vocal Analysis

  • Hind Mestouri University of Caddi Ayyad
  • Abdelilah Jraifi
  • Kamal Baraka
Keywords: Emotion Recognition, Convolutional Neural Networks (CNN), Artificial Intelligence (AI), Facial Expression, Vocal Analysis, Web-Based Application

Abstract

Understanding human emotions is essential for improving the quality of interaction between people andmachines. Emotion recognition systems are gaining increasing importance across various fields such as healthcare,smart living environments, customer service, and affective computing. In this study, we present the development ofa user-friendly web application designed to detect emotions from both facial expressions and vocal cues. Relyingon deep learning, and more specifically on Convolutional Neural Networks (CNNs), our system is capable ofidentifying seven core emotions: happiness, sadness, anger, fear, surprise, disgust, and a neutral state. To trainand evaluate the system, we used the FER2013 dataset for facial expression recognition and the widely knownRAVDESS database for vocal emotion analysis. Audio features were extracted through spectrogram analysis andMel-Frequency Cepstral Coefficients (MFCCs). The application enables users to upload audio or image files, orrecord directly, and receive real-time emotion classification. Initial tests have shown promising accuracy undercontrolled conditions, highlighting the system’s potential for deployment in smart and adaptive applications.
Published
2026-02-18
How to Cite
Mestouri, H., Jraifi, A., & Baraka, K. (2026). Multimodal Emotion Recognition Using CNNs: A Web-Based Application for Facial and Vocal Analysis. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3341
Section
Research Articles