Implementing robert a-based text embedding technique for stack -gan model

Show simple item record

dc.contributor.advisor	Nguyen, Trung Ky
dc.contributor.author	Nguyen, Minh Trang
dc.date.accessioned	2024-09-25T06:34:40Z
dc.date.available	2024-09-25T06:34:40Z
dc.date.issued	2023
dc.identifier.uri	http://keep.hcmiu.edu.vn:8080/handle/123456789/6066
dc.description.abstract	Generating high-quality images from text descriptions is a challenging task in computer vision, but it has many practical applications. Existing text-to-image (T2I) approaches can generate images that roughly match the given descriptions, including: Stacked Generative Adversarial Networks (Stack-GAN), Attentional Generative Networks (Attn-GAN), Conditional Generative Networks (CGAN), Mirror Generative Networks (Mirror-GAN) or Variational Autoencoders (VAEs). Although these models had been proven for achieving significant results, there is still room for improvements in terms of generating important details, realistic object features or even the ability in understanding the text description. To overcome this problem, I propose an experimental approach on new text embedding technique whether it can improve the original StackGAN model. The methodology proposed in this thesis comprises using RoBERTa as a text embedding technique to the original StackGAN model. To do this, the RoBERTa model was fine-tuned on a text-to-image synthesis job with a dataset comparable to that used in the original StackGAN paper. In addition, the StackGAN model was modified to accustomed to use RoBERTa embeddings as input instead of the traditional character-level embedding with CNN and RNN. The improved StackGAN model’s performance, with RoBERTa incorporated in the preprocessing stage, was then evaluated on a set of important metrics and also compared to that of the baseline StackGAN model. Finally, this thesis presents a new approach to assess the performance of StackGAN model by adding RoBERTa. My experimental results show that the proposed approach, that including RoBERTa into image captioning models can answer the question of whether it can improve the original StackGAN model. This study has practical implications for the development of more accurate and descriptive picture captioning models, which could have applications in disciplines such as computer vision and natural language processing.	en_US
dc.language.iso	en	en_US
dc.subject	Embedding technique	en_US
dc.title	Implementing robert a-based text embedding technique for stack -gan model	en_US
dc.type	Thesis	en_US

Files in this item

Name:: 140022008459 - Trang, Nguyen ...
Size:: 1.316Mb
Format:: PDF

This item appears in the following Collection(s)

Bachelor Thesis - Computer Science and Engineering

Show simple item record