An Automated Deep Learning based Speech Emotion Recognition System
Main Article Content
Abstract
Speech Emotion Recognition (SER) is a challenging yet pivotal area with wide-ranging applications spanning psychology, speech therapy, and customer service. This paper introduces a novel approach to SER employing machine learning, specifically deep learning and recurrent neural networks. The proposed model is trained on meticulously labeled datasets containing diverse speech samples representing various emotional states. By scrutinizing key audio features such as pitch, rhythm, and prosody, the system aims to achieve precise emotion recognition for unseen speech data. The primary objective is to advance SER by enhancing accuracy, reliability, and fostering deeper insights into the intricate relationship between emotions and speech. This study proposes the utilization of Long Short-Term Memory (LSTM) neural networks, known for their proficiency in capturing temporal dependencies, for SER tasks. Leveraging a comprehensive dataset covering a spectrum of emotional states, the LSTM model undergoes rigorous training and evaluation. Experimental results showcase the effectiveness of our approach, outperforming conventional methods and underscoring the potential of LSTM models in SER applications. This research contributes to the evolution of emotion recognition technology, with potential implications across domains like human-computer interaction, mental health monitoring, and sentiment analysis.