Processing missing value for rule induction algorithms

View/Open

022000190 - Nguyen, Tran Dang.pdf (1.671Mb)

Date

2009

Author

Nguyen, Tran Dang

Metadata

Show full item record

Abstract

Missing value is very popular in data acquisition. When handling data sets having missing values, classification methods have some difficulties in their learning process. Several pre-processing techniques, such as data replacing or data imputation, have been introduced to remove missing values before being processed by classification methods. However, when the percentage of missing values in the data set goes up (sometimes to 60-70%), such pre-processing techniques cannot be successfully used. This thesis introduces a new version of algorithm CN2 and Rules6 which represented for Separate and Conquer method in Rule Induction algorithms that has abilities to directly handling missing values and processing data set with large percentages of missing values. Tested on benchmarking data sets from UCI, the new algorithms achieved a better performance than that of popular methods Decision Table and C4.5 in directly handling missing values. These new algorithms also achieved significant results when compare with one common preprocessing missing value data set method, filled with the mode and means of attribute value.

URI

http://10.8.20.7:8080/xmlui/handle/123456789/180

Collections

Bachelor Thesis - Computer Science and Engineering