Scalable Initialization Methods for Large-Scale Clustering

Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo

Computer Science > Machine Learning

arXiv:2007.11937 (cs)

[Submitted on 23 Jul 2020]

Title:Scalable Initialization Methods for Large-Scale Clustering

Authors:Joonas Hämäläinen, Tommi Kärkkäinen, Tuomo Rossi

View PDF

Abstract:In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means|| type of an initialization strategy. The second proposal also utilizes multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means|| methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.

Comments:	11 pages, submitted to IEEE Transactions on Big Data
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2007.11937 [cs.LG]
	(or arXiv:2007.11937v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2007.11937

Submission history

From: Joonas Hämäläinen [view email]
[v1] Thu, 23 Jul 2020 11:29:53 UTC (298 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-07

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

export BibTeX citation

Computer Science > Machine Learning

Title:Scalable Initialization Methods for Large-Scale Clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Scalable Initialization Methods for Large-Scale Clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators