k-means++: a C++ implementation


k-means++: a C++ implementation



Yanjie Zhou
Department of Industrial Engineering
Pusan National University


              
S1                                     S2                                     S3                                     S4   
S-sets: Synthetic 2-d data with N=5000 vectors and k=15 Gaussian clusters with different degree of cluster overlapping .


Introduction


k-means++ clustering a classification of data, so that points assigned to the same cluster are similar. It is identical to the K-means algorithm, except for the selection of initial conditions. I implement k-means++ clustering algorithm by using C++. The implemented open source code can be used freely. If you have any questions , please conected y.j.zhou.g@gmail.com. The implemented code including the following features.

Features


  • Standard k-means clustering methed
  • Standard k-means clustering methed with parallel acceleratation
  • k-means++ clustering methed
  • k-means++ clustering methed with parallel acceleratation
  • Export the result as SVG format
  • Export the result as EPS format


  • Code and Dataset



    Download data set: Clustering benchmark datasets
    Download code [.rar] [Github]

    References

  • ARTHUR, David; VASSILVITSKII, Sergei. k-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007. p. 1027-1035.
  • FRÄNTI, Pasi; VIRMAJOKI, Olli. Iterative shrinking method for clustering problems. Pattern Recognition, 2006, 39.5: 761-775.