Show simple item record

dc.contributor.advisorNguyen, Trung Ky
dc.contributor.authorNguyen, Minh Trang
dc.date.accessioned2024-09-25T06:34:40Z
dc.date.available2024-09-25T06:34:40Z
dc.date.issued2023
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/6066
dc.description.abstractGenerating high-quality images from text descriptions is a challenging task in computer vision, but it has many practical applications. Existing text-to-image (T2I) approaches can generate images that roughly match the given descriptions, including: Stacked Generative Adversarial Networks (Stack-GAN), Attentional Generative Networks (Attn-GAN), Conditional Generative Networks (CGAN), Mirror Generative Networks (Mirror-GAN) or Variational Autoencoders (VAEs). Although these models had been proven for achieving significant results, there is still room for improvements in terms of generating important details, realistic object features or even the ability in understanding the text description. To overcome this problem, I propose an experimental approach on new text embedding technique whether it can improve the original StackGAN model. The methodology proposed in this thesis comprises using RoBERTa as a text embedding technique to the original StackGAN model. To do this, the RoBERTa model was fine-tuned on a text-to-image synthesis job with a dataset comparable to that used in the original StackGAN paper. In addition, the StackGAN model was modified to accustomed to use RoBERTa embeddings as input instead of the traditional character-level embedding with CNN and RNN. The improved StackGAN model’s performance, with RoBERTa incorporated in the preprocessing stage, was then evaluated on a set of important metrics and also compared to that of the baseline StackGAN model. Finally, this thesis presents a new approach to assess the performance of StackGAN model by adding RoBERTa. My experimental results show that the proposed approach, that including RoBERTa into image captioning models can answer the question of whether it can improve the original StackGAN model. This study has practical implications for the development of more accurate and descriptive picture captioning models, which could have applications in disciplines such as computer vision and natural language processing.en_US
dc.language.isoenen_US
dc.subjectEmbedding techniqueen_US
dc.titleImplementing robert a-based text embedding technique for stack -gan modelen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record