Accelerating K-Means++

Edward Raff, Department of Computer Science and Electrical Engineering
Frank Ferraro, Department of Computer Science and Electrical Engineering
Cynthia Matuszek, Department of Computer Science and Electrical Engineering

K-Means++, and its distributed variant K-Means++, have become de facto tools for selecting the initial seeds of k-means. Over the past decade since their introduction, no uniformly superior algorithms have been developed. Instead, we focus on accelerating these already well known methods. While retaining the exact same results, we can get 10-100x speedups by using the triangle inequality to selectively avoid redundant checks.