Show simple item record

dc.contributor.advisorTran, Thanh Tung
dc.contributor.authorPhan, Ngọc Đông Minh
dc.date.accessioned2025-02-21T08:14:24Z
dc.date.available2025-02-21T08:14:24Z
dc.date.issued2024
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/6776
dc.description.abstractExtracting valuable data from the vast resources of the web is crucial for business success. Insights derived from analysing consumer behaviour, market trends, and operational performance can inform strategic decision-making. However, traditional web scraping often requires significant programming expertise and adapts poorly to website changes. While nocode tools exist, they can be costly and have limited capabilities. The project addresses these challenges by developing a user-friendly, no-code web data extraction with a focus on generalizing web scraping and data extraction process. The tool aims to extract data from a wide range of websites while ensuring a meaningful, structured, and analysable data output. The project will explore various methods of web scraping and extraction based on the HTML tree parsing techniques. The tool utilizes a variety of scraping frameworks such as Playwright and BeautifulSoup to achieve robust capabilities. Additionally, the analysis module will integrate a Large Language Model (LLM) to further enhance the usability of extracted data. The research contributes to the field by offering a comprehensive, user-friendly tool that supports the complete process of gather, extracting and analysing web data with no coding involved. The tool will enable users across various fields to efficiently derive insights into from web data, facilitating informed decision making and strategic planning.en_US
dc.subjectWeb Dataen_US
dc.subjectNatural Language Processing Modelen_US
dc.subjectExtracting valuable dataen_US
dc.subjectLarge Language Modelen_US
dc.subjectLLMen_US
dc.titleOptimizing Web Data Extraction Procedure With Natural Language Processing Modelen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record