Keyword extraction for unstructured documents

Show simple item record

dc.contributor.advisor	Quang, Nguyen Hong
dc.contributor.author	An, Doan Phu
dc.date.accessioned	2019-11-11T08:33:52Z
dc.date.available	2019-11-11T08:33:52Z
dc.date.issued	2018
dc.identifier.other	022004454
dc.identifier.uri	http://keep.hcmiu.edu.vn:8080/handle/123456789/3289
dc.description.abstract	Unstructured documents are the documents that can be free-form and don’t have a set structure such as contracts, letters, articles or memos. Keyword extraction is the automatic identification of these keywords which are the important words that describe the contents of the specific documents. Keyword extraction for unstructured documents can help the users to search and classify any dataset of documents that they want, especially big datasets. In present, current researches of keyword extraction focus only on text documents and are based on different approaches such as statistics, linguistics or semantic analysis, etc. They produce relatively accurate results. However, using them separately cannot fully exploit all advantages of these approaches (from the weight of each section or each document, linguistic or semantic features). Therefore, keyword identification cannot return highly precision results. In this research, a text mining approach is proposed to help better extract keywords of any unstructured documents. Within this framework, XML parser as well as some text mining and NLP (Natural Language Processing) techniques are utilized to preprocess and solve the linguistics problem of documents so that all keywords of documents are extracted. After that, the way how to rank candidate keywords according to their importance is presented in this research. The application which is developed from this approach is also indicated in this thesis.	en_US
dc.language.iso	en_US	en_US
dc.publisher	International University - HCMC	en_US
dc.subject	Data mining; Unstructured documents	en_US
dc.title	Keyword extraction for unstructured documents	en_US
dc.type	Thesis	en_US

Files in this item

Name:: 022004454 - An, Phu Doan.pdf
Size:: 1.546Mb
Format:: PDF

This item appears in the following Collection(s)

Bachelor Thesis - Computer Science and Engineering

Show simple item record