Customer Classification Using K-Means Clustering Method
Abstract
Customer is the key component of the success of any business, especially in
commercial industry. As a matter of fact, the cost of maintaining existing customers is
considerably less than acquiring a new one. Thus, in order to maintain the sustainable
growth, vendors creates competitive advantages by make every effort to understand
shopping behavior of vendee. Easier said than done, it is difficult for merchants to
know their customers with out the helps of techniques. One of the most popular
techniques is Customer Segmentation, this procedure is helpful in identification the
groups of similar customers base on their characteristics.
Depend on the peculiarities of business, vendors have their own criterions to classify
vendee. However, there is a popular classification standard, which was called as
RFM Segmentation. This criterion scores customers base on three aspects: Recency
– How long has it been since customer’s last activity or transaction with the brand?;
Frequency – How often has the customer negotiated with the brand during a certain
period?; Monetary – How much a customer has spent with the brand during a specific
period?. Each customer is scored from the best which has score 5 to the worst which
has score 1 for each of above aspects. After compute RFM score of customer, we apply
K-mean clustering, which is a famous cluster analysis method, to classify customer
base on their score. The final result of this process is expected to be a dataset of
customer with the cluster they belong to.
Although RFM Segmentation and K-mean clustering are very helpful for vendors in
customer clustering, they also brings difficulty when doing with a huge dataset of
information. In reality, vendors in commercial industry usually work with more-than-
50,000-row dataset of customers, moreover the RFM algorithm requires calculation
from the beginning of dataset when new data are added to dataset, which cost much
time when doing manually. To solve this problem, in this research we try to find a
suitable Prediction method which can precisely classify new customers from existed
sample.