Show simple item record

dc.contributor.advisorHuỳnh, Khả Tú
dc.contributor.authorNguyễn, Võ Nhật Anh
dc.date.accessioned2025-02-17T02:22:10Z
dc.date.available2025-02-17T02:22:10Z
dc.date.issued2024
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/6648
dc.description.abstractFacial expressions play a vital role in communication by transmitting subtle details about a person's emotional condition and improving social interactions. Therefore, precise recognition and interpretation of facial expressions are crucial for various applications. Facial Expression Recognition (FER) has applications in assessing candidate suitability for client-facing roles, refining video games through beta testing, enhancing marketing research, improving AIhuman interactions, supporting mental health care, and evaluating audience engagement in events. Although humans are proficient at FER, automating this process using computer methods is very difficult because of facial emotions' intricate and unpredictable nature. Deep learning has become a potential method for addressing this difficulty, significantly improving the accuracy and efficiency of FER systems. This thesis uses an innovative deep-learning approach that utilizes the EfficientViTM5 model, an advanced Vision Transformer (ViT) architecture variation. ViT have achieved notable success in computer vision tasks by using self-attention processes to grasp complex patterns and connections inside pictures. EfficientViT improves upon this design by providing a more computationally efficient version that maintains high performance, making it well-suited for real-time applications. The suggested approach entails training the EfficientViTM5 model using three well-known facial expression recognition datasets: FER2013+, AffectNet, and RAF-DB. To increase the variety of the training data and strengthen the model's resilience, a thorough data augmentation pipeline is used. This pipeline incorporates many approaches, including random horizontal and vertical flipping, adding Gaussian noise, applying Gaussian blur, and normalization. These enhancements aid the model's ability to generalize more effectively by emulating a diverse array of real-world differences in facial expressions. To further enhance the training process and avoid overfitting throughout 30 epochs, the model is first trained on a randomly chosen 80% subset of the training data for the first 15 epochs. This approach guarantees that the model is exposed to novel characteristics throughout each epoch. Afterward, the model is trained using the whole training dataset to reinforce its learning. The training approach is specifically intended to optimize the capabilities of the EfficientViTM5 architecture, enabling it to acquire discriminative features and patterns indicative of different facial emotions. The trained model demonstrated exceptional accuracy rates of 94.28%, 94.69%, and 97.76% on the FER2013+, AffectNet, and RAF-DB datasets. The findings emphasize the model's resilience and effectiveness in identifying facial emotions across various datasets, showcasing its potential for practical use in emotion-aware computing, security, and health diagnostics. This work dramatically enhances FER by presenting a dependable and practical approach to identifying emotions via state-of-the-art deep learning methodologies. The results indicate the potential for improved and adaptable interactions between humans and computers, showcasing the effectiveness of advanced deep learning models such as EfficientViTM5 in tackling intricate computer vision tasks.en_US
dc.subjectFacial Expressionen_US
dc.subjectRecognitionen_US
dc.subjectDeep Learningen_US
dc.titleFacial Expression Recognition Using Deep Learningen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record