Efficient Strategy For Minning High Utility Quanitative Itemsets
Abstract
As an extension from the traditional methods of Pattern Mining, High-Utility Quantita
tive Itemset Mining (HUQIM) has become an important research area that answers the
ever-growing need for useful information from the copious pool of data in reality. Since
the nature of this research category is to unveil all possible important patterns from a
database and manage both quantities (internal utility) with prices (external utility) of
their items, algorithms of HUQIM usually work with very large search spaces that could
heavily affect execution time. A recently proposed algorithm called Fast High-Utility
Quantitative Itemset Miner (FHUQI-Miner) has overcome these limits with its two novel
pruning strategies to narrow down space and subsequently outperformed other algorithms
that were created before.
While these strategies have been demonstrated to enhance the new algorithm’s perfor
mance compared to its predecessors, there were certain shortcomings that the algorithm
still faced. The frst limitation was how the proposed strategies would not operate as
efciently on dense datasets as it would on sparse datasets, deriving from the similar
ity in structure of the transactions, and thus increasing the number of join operations
in the progress. The second limitation was that the two newly proposed strategies of
FHUQI-Miner was based of a pruning strategy that had been developed for quite some
time ago, and up until now, there exists a number of later introduced strategies that were
proven to be more efcient at pruning undesirable items. This was also stated by the
research group in discussing future possible improvements to their proposed algorithm.
Therefore, the aim of this thesis would be two-fold: to refne the proposed pruning
strategies of the existing FHUQI-Miner algorithm and to attempt at a more efcient prun
ing strategy adapted from existing concepts that were proven to be better. The outcomes of these studies would consist of two modifed versions of the original FHUQI-Miner algo
rithm. The frst version will reduce the complexity of the base algorithm by uniting the
two pruning strategies into one, given the Q-item notation theory. The second algorithm
will introduce a novel adaptation of the transaction projection methodology to HUQIM
and enhance the pruning abilities of the base algorithm. These resulted alternatives have
been verifed to be capable of reducing the number of join operations compared to the base
algorithm on both sparse and dense datasets as they effectively remove more unpromising
items during the mining processes.