Building a drug dictionary

View/Open

022003755 - Minh, Tran Ngoc Khanh.pdf (5.631Mb)

Date

2017

Author

Minh, Tran Ngoc Khanh

Metadata

Show full item record

Abstract

The aim of this thesis is to build a drug dictionary that whenever users input a keyword of drug name or drug usages, it presents two types of results: the relevant drugs applying Text Classification, Text Clustering and Vector Space Model concept; and the results of keyword matching by using Database SQL statements. First thing to do is to divide drugs information in Database into k groups by using Text Clustering, based on the similarity between objects. Next, SQL operators will query and return all results matching to the input keyword. The system also calculates and presents the dominant group (based on results of Database-based search). A dominant group is a group that its number of occurrence is the highest. Then, in Text Classification concept, K-Nearest Neighbor algorithm aims to find K most similar drugs in the dominant group. As results, users will receive keyword matching results of Database-based search and relevant drugs applied Text Mining concept. The methodology is used to change every sentence of drug information into a Vector Space Model is TF-IDF, in which each element in a vector is a weighted number. Moreover, the similarity, or so-called the distance between two vectors can be calculated by using Cosine Similarity measurement. Besides, another important step is to pre-process data before mining will also be mentioned in details. Last but not least, some tools and resources, which are used to build the dictionary, will be introduced later.

URI

http://keep.hcmiu.edu.vn:8080/handle/123456789/2741

Collections

Bachelor Thesis - Computer Science and Engineering