Show simple item record

dc.contributor.advisorQuang, Nguyen Hong
dc.contributor.authorAn, Doan Phu
dc.date.accessioned2019-11-11T08:33:52Z
dc.date.available2019-11-11T08:33:52Z
dc.date.issued2018
dc.identifier.other022004454
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/3289
dc.description.abstractUnstructured documents are the documents that can be free-form and don’t have a set structure such as contracts, letters, articles or memos. Keyword extraction is the automatic identification of these keywords which are the important words that describe the contents of the specific documents. Keyword extraction for unstructured documents can help the users to search and classify any dataset of documents that they want, especially big datasets. In present, current researches of keyword extraction focus only on text documents and are based on different approaches such as statistics, linguistics or semantic analysis, etc. They produce relatively accurate results. However, using them separately cannot fully exploit all advantages of these approaches (from the weight of each section or each document, linguistic or semantic features). Therefore, keyword identification cannot return highly precision results. In this research, a text mining approach is proposed to help better extract keywords of any unstructured documents. Within this framework, XML parser as well as some text mining and NLP (Natural Language Processing) techniques are utilized to preprocess and solve the linguistics problem of documents so that all keywords of documents are extracted. After that, the way how to rank candidate keywords according to their importance is presented in this research. The application which is developed from this approach is also indicated in this thesis.en_US
dc.language.isoen_USen_US
dc.publisherInternational University - HCMCen_US
dc.subjectData mining; Unstructured documentsen_US
dc.titleKeyword extraction for unstructured documentsen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record