Show simple item record

dc.contributor.advisorNguyễn, Thị Thúy Loan
dc.contributor.authorHà, Minh Khoa
dc.date.accessioned2025-02-17T03:21:31Z
dc.date.available2025-02-17T03:21:31Z
dc.date.issued2024
dc.identifier.urihttp://keep.hcmiu.edu.vn:8080/handle/123456789/6658
dc.description.abstractHUI mining, or high utility itemset mining, has significantly impacted our technological progression for a long time. Therefore, the research topic of data mining, especially HUI mining, has captured the interest of countless great minds worldwide. As a result, numerous data mining applications have emerged, such as e-commerce, streaming analysis, bioinformatics, and more. HUI mining can be considered a generalization of the frequent itemset mining (FIM) algorithm, which focuses on extracting items that frequently appeared together (frequent itemsets) in the transactional database. Traditional FIM algorithms only concern the number of times a given itemset appears in a database and neglect other valuable information associated with those items, such as quantities or unit profit. This often leads to the algorithm finding low-utility itemsets that generate low profits but appear in the database enough times to be qualified as high-utility itemsets (HUIs). Even though HUI mining provides an improved data mining algorithm, selecting the appropriate minimum utility threshold remains one of its biggest drawbacks. To address this problem, topKHUIM was created. This approach requires only the value k, which indicates the k-itemsets the users want to find and forgoes the need for a user-defined utility threshold. With the value k, the algorithm can not only find the initial utility threshold but also update the threshold at each stage of the mining process. However, just like most traditional HUI mining algorithms, the currently available top-k HUI mining algorithm still poses drawbacks when dealing with massive datasets. In this thesis, I propose a modified version of the existing TKHUIM algorithm that includes several additions and optimizations so that it can be used to mine HUIs in massive datasets. These modifications include transaction merging for the projected transactions in partitions, external sorting for the corrected input database, and utilizing plentiful hard drive storage to contain the partitions instead of relying on fast but limited computer memory. Additionally, I will provide the results of my proposed top-k HUI mining algorithm compared to the existing ones in a later chapter of this thesis.en_US
dc.subjectTop-K Highen_US
dc.subjectMiningen_US
dc.subjectMassive Dataen_US
dc.titleOptimizing Top-K High Utility Itemset Mining From Massive Dataen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record