Show simple item record

dc.contributor.advisorHuynh, Kha Tu
dc.contributor.authorTran, Cong Man
dc.date.accessioned2024-03-19T02:15:25Z
dc.date.available2024-03-19T02:15:25Z
dc.date.issued2022
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/4744
dc.description.abstractText recognition has been a key factor in the resolution of many of the issues associated with the digitization of documents in this 4.0 era. The majority of techniques begin by identifying character characteristics using a Convolutional Neural Network (CNN) model and then move those features into a Recurrent Neural Network (RNN) to generate character-level information. The primary emphasis of this thesis will be on using a different model for the Vietnamese Optical Character Recognition (OCR) problem and comparing it to models that are presently being utilized. According to a recent paper, the Transformer model has surpassed the well-known CNN models in the classification challenge. This model was accomplished by considering a picture as a sequence similar to a phrase and building a model that is considered to be state-of-the-art. In addition, with the advancement of Natural Language Processing (NLP) of human languages in general and Vietnamese in particular, a research team from VinAI Research has successfully constructed an NLP model for Vietnamese called phoBERT. The phoBERT model is derived from the well-known Roberta model, which can be found all over the globe. It is superior to the RNN model in many ways, including efficiency and the amount of time it takes to train. This study uses a mixture of the two models described above to solve the Vietnamese OCR task and has generated results that are generally consistent and have an accuracy of up to 96.2 percent, demonstrating that this technique is successful. On the other hand, this technique has many drawbacks, from data preparation to training.en_US
dc.language.isoenen_US
dc.subjectOptical character recognitionen_US
dc.titleVietnamese Optical Character Recognition Based On Transformeren_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record