Machine Learning-Based Emotion Classification from Voice Signals Using MFCC Central Tendency Features

Speech emotion recognition (SER) is a key challenge in affective computing, where subtle emotional cues are often embedded not in the linguistic content of speech but in the voice-related acoustic features. This study proposes a machine learning approach that leverages statistical descriptors of Mel-Frequency Cepstral Coefficients (MFCCs) to capture the central tendencies of voice signals for multiclass emotion classification. Raw voice from the Toronto Emotional Speech Set (TESS) was processed into nine statistical features, of which six were retained after correlation-based filtering to reduce redundancy and improve generalization. Several classifiers were evaluated, with Support Vector Machine (SVM) achieving the best performance: 84% accuracy, 83% macro-recall, and 83% macro-F1. The improvements after hyperparameter tuning were statistically significant (McNemar’s test, p = 1.606e-20), underscoring the importance of systematic optimization. A comparative analysis revealed that correlation-based feature selection outperformed PCA and LDA in preserving the discriminative power of SVM. Compared with related works that employ deep learning or multi-dataset setups, the proposed framework offers competitive performance while maintaining greater interpretability and computational efficiency. These findings validate the hypothesis that compact, voice-centered statistical features, when optimized, form a reliable basis for robust and efficient emotion recognition systems.

Topics

No keywords indexed for this article. Browse by subject →

References

28

[1]

A. Koduru, H. Valiveti, and A. Budati, “Feature extraction algorithms to improve the speech emotion recognition rate,” International Journal of Speech Technology, vol. 23, pp. 45–55, 2020. doi: 10.1007/s10772-020-09672-4 10.1007/s10772-020-09672-4

[2]

P. Foggia, A. Greco, A. Roberto, A. Saggese, and M. Vento, “Identity, gender, age, and emotion recognition from speaker voice with multi-task deep networks for cognitive robotics,” Cognitive Computation, vol. 16, no. 5, pp. 2713–2723, 2024. doi: 10.1007/s12559-023-10241-5 10.1007/s12559-023-10241-5

[3]

D. Keltner, D. Sauter, J. Tracy, and A. Cowen, “Emotional expression: Advances in basic emotion theory,” Journal of Nonverbal Behavior, vol. 43, pp. 133–160, 2019. doi: 10.1007/s10919-019-00293-3 10.1007/s10919-019-00293-3

[4]

L. F. Weyher, “Re-reading sociology via the emotions: Karl Marx's theory of human nature and estrangement,” Sociological Perspectives, vol. 55, no. 2, pp. 341–363, 2012. doi: 10.1525/sop.2012.55.2.341 10.1525/sop.2012.55.2.341

[5]

X. Zhu, C. Guo, H. Feng, Y. Huang, Y. Feng, X. Wang, and R. Wang, “A review of key technologies for emotion analysis using multimodal information,” Cognitive Computation, vol. 16, no. 4, pp. 1504–1530, 2024. doi: 10.1007/s12559-024-10287-z 10.1007/s12559-024-10287-z

[6]

H. Aouani, and Y. B. Ayed, “Speech emotion recognition with deep learning,” in Proc. 24th Int. Conf. Knowledge-Based and Intelligent Information & Engineering Systems (KES 2020), Procedia Computer Science, vol. 176, pp. 251–260, 2020. doi: 10.1016/j.procs.2020.08.027. 10.1016/j.procs.2020.08.027

[7]

A. K. Pagidirayi, and A. Bhuma, “Speech emotion recognition using machine learning techniques,” Revue D'intelligence Artificielle, vol. 36, no. 2, pp. 271–278, 2022. doi: 10.18280/ria.360211 10.18280/ria.360211

[8]

G. Ajay, M. Siddhesh, S. Mukul, and C. Supriya, “Speech based emotion recognition using machine learning,” International Research Journal of Engineering and Technology, vol. 08, no. 4, pp. 3289–3295, 2021.

[9]

A. Osipov, E. Pleshakova, Y. Liu, and S. Gataullin, “Machine learning methods for speech emotion recognition on telecommunication systems,” Journal of Computer Virology and Hacking Techniques, vol. 20, no. 3, pp. 415–428, 2024. doi: 10.1007/s11416-023-00500-2 10.1007/s11416-023-00500-2

[10]

K. Daqrouq, A. Balamesh, O. Alrusaini, A. Alkhateeb, and A. Balamash, “Emotion modeling in speech signals: Discrete wavelet transform and machine learning tools for emotion recognition system,” Applied Computational Intelligence and Soft Computing, vol. 2024, no. 1, p. 7184018, 2024. doi: 10.1155/2024/7184018 10.1155/2024/7184018

[11]

Y. Ü. Sönmez, and A. Varol, “In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–,” Intelligent Systems with Applications, vol. 22, p. 200351, 2024. doi: 10.1016/j.iswa.2024.200351 10.1016/j.iswa.2024.200351

[12]

A. Vyakaranam, T. Maul, and B. Ramayah, “A review on speech emotion recognition for late deafened educators in online education,” International Journal of Speech Technology, vol. 27, no. 1, pp. 29–52, 2024. doi: 10.1007/s10772-023-10064-7 10.1007/s10772-023-10064-7

[13]

Speech emotion recognition approaches: A systematic review

Ahlam Hashem, Muhammad Arif, Manal Alghamdi

Speech Communication 10.1016/j.specom.2023.102974

[14]

T. Dimitrova-Grekow, A. Klis, and M. Igras-Cybulska, “Speech emotion recognition based on voice fundamental frequency,” Archives of Acoustics, vol. 44, no. 2, pp. 277–286, 2019. doi: 10.24425/aoa.2019.128491 10.24425/aoa.2019.128491

[15]

R. Jahangir, Y. W. Teh, F. Hanif, and G. Mujtaba, “Deep learning approaches for speech emotion recognition: State of the art and research challenges,” Multimedia Tools and Applications, vol. 80, no. 16, pp. 23745–23812, 2021. doi: 10.1007/s11042-020-09874-7 10.1007/s11042-020-09874-7

[16]

S. Pal, S. Mukhopadhyay, and N. Suryadevara, “Development and progress in sensors and technologies for human emotion recognition,” Sensors, vol. 21, no. 16, p. 5554, 2021. doi: 10.3390/s21165554 10.3390/s21165554

[17]

A. Alslaity, and R. Orji, “Machine learning techniques for emotion detection and sentiment analysis: Current state, challenges, and future directions,” Behaviour & Information Technology, vol. 43, no. 1, pp. 139–164, 2024. doi: 10.1080/0144929X.2022.2156387 10.1080/0144929x.2022.2156387

[18]

A. M. Maithri, U. Raghavendra, A. Gudigar, J. Samanth, P. D. Barua, M. Murugappan, Y. Chakole, and U. R. Acharya, “Automated emotion recognition: Current trends and future perspectives,” Computer Methods and Programs in Biomedicine, vol. 215, p. 106646, 2022. doi: 10.1016/j.cmpb.2022.106646 10.1016/j.cmpb.2022.106646

[19]

X. Ke, Y. Zhu, L. Wen, and W. Zhang, “Speech emotion recognition based on svm and ann,” International Journal of Machine Learning and Computing, vol. 8, no. 3, pp. 198–202, 2018. doi: 10.18178/ijmlc.2018.8.3.687 10.18178/ijmlc.2018.8.3.687

[20]

O. U. Kumala, and A. Zahra, “Indonesian speech emotion recognition using cross-corpus method with the combination of MFCC and teager energy features,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 4, pp. 163–168, 2021. doi: 10.14569/IJACSA.2021.0120422 10.14569/ijacsa.2021.0120422

[21]

Y. Tanoko, and A. Zahra, “Multi feature stacking order impact on speech emotion recognition performance,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 6, pp. 3272–3278, 2022. doi: 10.11591/eei.v11i6.4287 10.11591/eei.v11i6.4287

[22]

M. R. A. Borgalli, and S. Surve, “Deep learning for facial emotion recognition using custom CNN architecture,” Journal of Physics: Conference Series, vol. 2236, no. 1, art. no. 012004, 2022. doi: 10.1088/1742-6596/2236/1/012004 10.1088/1742-6596/2236/1/012004

[23]

R. Chaudhary, S. Saraswat, S. Chaturvedi, and P. Naregalkar, “Speech emotion recognition using neural network,” International Journal of Scientific Research in Engineering and Management, vol. 4, no. 8, p. 5, 2020. 10.70729/ijser15784

[24]

M. K. Pichora-Fuller, and K. Dupuis, Toronto emotional speech set (TESS), version V1. Borealis, Feb. 13, 2020. [Online]. Available: https://doi.org/10.5683/SP2/E8H2MF

[25]

Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges

Bernd Bischl, Martin Binder, Michel Lang et al.

WIREs Data Mining and Knowledge Discovery 10.1002/widm.1484

[26]

B. McFee et al., “librosa,” 2025. [Online]. Available: https://doi.org/10.5281/zenodo.15006942

[27]

J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification: An experimental review,” Journal of Big Data, vol. 7, Art. No. 70, pp. 1–47, 2020. doi: 10.1186/s40537-020-00349-y 10.1186/s40537-020-00349-y

[28]

S. Reakaa, and H. Jeganathan, “Comparison study on speech emotion prediction using machine learning,” Journal of Physics: Conference Series, vol. 1921, p. 012017, 05 2021. doi: 10.1088/1742-6596/1921/1/012017 10.1088/1742-6596/1921/1/012017

Metrics

0

Citations

28

References

Details

Published: Mar 15, 2026
Vol/Issue: 9(1)
Pages: 21-33

Authors

Telkom University

Telkom University

Universitas Negeri Surabaya

R

Rifki Wijaya

Telkom University

T

Tjokorda Agung B. Wirayuda

Telkom University

A

Arfin Nurma Halida

State University of Surabaya

A

Asril Jarin

Research Center for Data and Information Science, National Research and Innovation Agency

I

Insan Ramadhan

Research Center for Artificial Intelligence and Cyber Security, National Research and Innovation Agency

I

Irgi Ahmad Maulana

Telkom University

W

Wandi Yusuf Kurniawan

Telkom University

Funding

Center of Excellent Human Centric Engineering (HUMIC), Telkom University and National Research and Innovation Agency Award: Decree Number 61/II.7/HK/2024 dated 24 December 2024 and Agreement/Contract Numbers 47/IV/KS/02/2025 and 052/SAM4/PPM/2025 with Telkom University dated 21 February 2025

Cite This Article

Putu Harry Gunawan, Yesy Diah Rosita, Yohana Wuri Satwika, et al. (2026). Machine Learning-Based Emotion Classification from Voice Signals Using MFCC Central Tendency Features. Sakarya University Journal of Computer and Information Sciences, 9(1), 21-33. https://doi.org/10.35377/saucis...1728490

Machine Learning-Based Emotion Classification from Voice Signals Using MFCC Central Tendency Features

You May Also Like