Vietnamese Optical Character Recognition Based On Transformer

View/Open

140022007212 - Man, Tran Cong.pdf (1.544Mb)

Date

2022

Author

Tran, Cong Man

Metadata

Show full item record

Abstract

Text recognition has been a key factor in the resolution of many of the issues associated with the digitization of documents in this 4.0 era. The majority of techniques begin by identifying character characteristics using a Convolutional Neural Network (CNN) model and then move those features into a Recurrent Neural Network (RNN) to generate character-level information. The primary emphasis of this thesis will be on using a different model for the Vietnamese Optical Character Recognition (OCR) problem and comparing it to models that are presently being utilized. According to a recent paper, the Transformer model has surpassed the well-known CNN models in the classification challenge. This model was accomplished by considering a picture as a sequence similar to a phrase and building a model that is considered to be state-of-the-art. In addition, with the advancement of Natural Language Processing (NLP) of human languages in general and Vietnamese in particular, a research team from VinAI Research has successfully constructed an NLP model for Vietnamese called phoBERT. The phoBERT model is derived from the well-known Roberta model, which can be found all over the globe. It is superior to the RNN model in many ways, including efficiency and the amount of time it takes to train. This study uses a mixture of the two models described above to solve the Vietnamese OCR task and has generated results that are generally consistent and have an accuracy of up to 96.2 percent, demonstrating that this technique is successful. On the other hand, this technique has many drawbacks, from data preparation to training.

URI

http://keep.hcmiu.edu.vn:8080/handle/123456789/4744

Collections

Bachelor Thesis - Computer Science and Engineering