Optimizing Top-K High Utility Itemset Mining From Massive Data

Show simple item record

dc.contributor.advisor	Nguyễn, Thị Thúy Loan
dc.contributor.author	Hà, Minh Khoa
dc.date.accessioned	2025-02-17T03:21:31Z
dc.date.available	2025-02-17T03:21:31Z
dc.date.issued	2024
dc.identifier.uri	http://keep.hcmiu.edu.vn:8080/handle/123456789/6658
dc.description.abstract	HUI mining, or high utility itemset mining, has significantly impacted our technological progression for a long time. Therefore, the research topic of data mining, especially HUI mining, has captured the interest of countless great minds worldwide. As a result, numerous data mining applications have emerged, such as e-commerce, streaming analysis, bioinformatics, and more. HUI mining can be considered a generalization of the frequent itemset mining (FIM) algorithm, which focuses on extracting items that frequently appeared together (frequent itemsets) in the transactional database. Traditional FIM algorithms only concern the number of times a given itemset appears in a database and neglect other valuable information associated with those items, such as quantities or unit profit. This often leads to the algorithm finding low-utility itemsets that generate low profits but appear in the database enough times to be qualified as high-utility itemsets (HUIs). Even though HUI mining provides an improved data mining algorithm, selecting the appropriate minimum utility threshold remains one of its biggest drawbacks. To address this problem, topKHUIM was created. This approach requires only the value k, which indicates the k-itemsets the users want to find and forgoes the need for a user-defined utility threshold. With the value k, the algorithm can not only find the initial utility threshold but also update the threshold at each stage of the mining process. However, just like most traditional HUI mining algorithms, the currently available top-k HUI mining algorithm still poses drawbacks when dealing with massive datasets. In this thesis, I propose a modified version of the existing TKHUIM algorithm that includes several additions and optimizations so that it can be used to mine HUIs in massive datasets. These modifications include transaction merging for the projected transactions in partitions, external sorting for the corrected input database, and utilizing plentiful hard drive storage to contain the partitions instead of relying on fast but limited computer memory. Additionally, I will provide the results of my proposed top-k HUI mining algorithm compared to the existing ones in a later chapter of this thesis.	en_US
dc.subject	Top-K High	en_US
dc.subject	Mining	en_US
dc.subject	Massive Data	en_US
dc.title	Optimizing Top-K High Utility Itemset Mining From Massive Data	en_US
dc.type	Thesis	en_US

Files in this item

Name:: ITITIU19020 - HA MINH KHOA.pdf
Size:: 4.809Mb
Format:: PDF

This item appears in the following Collection(s)

Bachelor Thesis - Computer Science and Engineering

Show simple item record