Mining High Occupancy Itemsets From Transaction Databases
Abstract
Frequent itemset (FI) mining has been widely studied in data mining over current many
years because of its vital role in applications. Nevertheless, the most traditional mining
framework in preceding research does not apply correctly for some modern-day applications,
which include the travel landscapes recommendation. In 2020, Deng proposed a new algorithm
for mining high occupancy itemsets (HO), HEP algorithm (abbreviation for High-Efficient
algorithm for mining high occupancy itemsets), where occupancy is the support-based mining
structure. It is an efficient algorithm, which helps us to find out all high occupancy itemsets
faster than the traditional mining framework. It uses an occupancy-list (OL) structure to store
the occupancy and pruning all unpromising itemsets based on upper-bound occupancy (UBO)
to mine all HO. However, the HEP algorithm still not optimize in the generating k-itemsets
process. In this thesis, we improved the HEP algorithm by two enhancements: add more
conditions to prune unqualified itemsets and apply the property of equivalence class to reduce
the runtime of the k-itemsets generation process. Finally, we have conducted several
experiments on three datasets to prove that the enhancements offer better performance than the
original HEP algorithm in terms of execution time and peak memory consumption.