Author: Mustakim

Abstract

K-Means  is  a  very  popular  algorithm  for  clustering,  it  is  reliable  in  computation,  simple  and  flexible. However, K-Means also has a weakness in the process of determining the initial centroid, the change in value causes  the  change  in  resulting  cluster.  Principal  Component  Analysis  (PCA)  Algorithm  is  a  dimension reduction method which can solve the main problem in K-Means byapplying PCA eigenvector of covariance matrix  as  the  initial  centroid  value  on  K-Means.  From  the  results  of  conducted  experiments  with  a combination of 4, 5 and 6 of attributes and the number of clusters, Davies Bouldin Index (DBI), Silhouette Index (SI) and Dunn Index (DI) cluster validity of PCA K-Means are better than the usual K-Means. It is implemented by testing 1,737 and 100,000 data, the result is the patterns formed by PCA K-Means can lower the value of DBI constantly, but for SI and DI, the formed pattern is likely to change. This study concluded that the cluster validity used asreference for comparing the algorithms is DBI.

Conclussion

From  the  results  and  analysis  conducted  and according to the objectives of this research, it can be concluded  that  between  K-Means  and  PCA  KMeans  the  comparison  of  the  best  cluster  validity value  is  PCA  K-Means,  all  the  experiments conducted  is  by  applying  4,  5  and  6  clusters  and attributes, PCA K-Means has the advantage on every experiment.  In  the  case  of  using  generated  data random of 100,000 data, the result of DBI value is 0.5343  with  SI  value  is  0.6264  and  DI  value  is 0.5689. So it can be inferred that the more datasets used, then PCA K-Means is capable on lowering the value  of  DBI.  However,  regarding  to  SI  and  DI values,  they  do  not  have  a  specific  pattern  on  the experimental result for bothdata small and large, no matter  how  much  clusters  and  attributes  is  used. Therefore,  PCA  K-Means  is  an  optimal  algorithm for above cases, if the validity of the cluster used is DBI.  However,  eigen  vector  PCA  affects  the formation of clusters inK-Means, so PCA K-Means can only form clusters as many as attributes used in the clustering process, just like FCM

Publish: Journal of Theoretical and Applied Information Technology Vol.95. No.15 August 2017