Facial Expression Recognition Using Deep Learning

View/Open

ITITIU19001 - NGUYEN VO NHAT ANH.pdf (1.820Mb)

Date

2024

Author

Nguyễn, Võ Nhật Anh

Metadata

Show full item record

Abstract

Facial expressions play a vital role in communication by transmitting subtle details about a person's emotional condition and improving social interactions. Therefore, precise recognition and interpretation of facial expressions are crucial for various applications. Facial Expression Recognition (FER) has applications in assessing candidate suitability for client-facing roles, refining video games through beta testing, enhancing marketing research, improving AIhuman interactions, supporting mental health care, and evaluating audience engagement in events. Although humans are proficient at FER, automating this process using computer methods is very difficult because of facial emotions' intricate and unpredictable nature. Deep learning has become a potential method for addressing this difficulty, significantly improving the accuracy and efficiency of FER systems. This thesis uses an innovative deep-learning approach that utilizes the EfficientViTM5 model, an advanced Vision Transformer (ViT) architecture variation. ViT have achieved notable success in computer vision tasks by using self-attention processes to grasp complex patterns and connections inside pictures. EfficientViT improves upon this design by providing a more computationally efficient version that maintains high performance, making it well-suited for real-time applications. The suggested approach entails training the EfficientViTM5 model using three well-known facial expression recognition datasets: FER2013+, AffectNet, and RAF-DB. To increase the variety of the training data and strengthen the model's resilience, a thorough data augmentation pipeline is used. This pipeline incorporates many approaches, including random horizontal and vertical flipping, adding Gaussian noise, applying Gaussian blur, and normalization. These enhancements aid the model's ability to generalize more effectively by emulating a diverse array of real-world differences in facial expressions. To further enhance the training process and avoid overfitting throughout 30 epochs, the model is first trained on a randomly chosen 80% subset of the training data for the first 15 epochs. This approach guarantees that the model is exposed to novel characteristics throughout each epoch. Afterward, the model is trained using the whole training dataset to reinforce its learning. The training approach is specifically intended to optimize the capabilities of the EfficientViTM5 architecture, enabling it to acquire discriminative features and patterns indicative of different facial emotions. The trained model demonstrated exceptional accuracy rates of 94.28%, 94.69%, and 97.76% on the FER2013+, AffectNet, and RAF-DB datasets. The findings emphasize the model's resilience and effectiveness in identifying facial emotions across various datasets, showcasing its potential for practical use in emotion-aware computing, security, and health diagnostics. This work dramatically enhances FER by presenting a dependable and practical approach to identifying emotions via state-of-the-art deep learning methodologies. The results indicate the potential for improved and adaptable interactions between humans and computers, showcasing the effectiveness of advanced deep learning models such as EfficientViTM5 in tackling intricate computer vision tasks.

URI

http://keep.hcmiu.edu.vn:8080/handle/123456789/6648

Collections

Bachelor Thesis - Computer Science and Engineering