Saarland University
Department of Computer Science
Computational Complexity
Computational Complexity

Publications

Teaching

Workshops

Computational Complexity - Research Seminar

Bodo Manthey: Improved Smoothed Analysis of k-Means Clustering I

June 24, 2008, 10:15 a.m.
E 1 3, Room 415

The k-means algorithm is one of the most popular clustering algorithms. While this algorithm is known for its speed in practice, its worst-case performance is poor: Its running-time can be exponential in the number of points, and the worst-case upper bound of nO(kd) is way
off the observed running-time. Arthur and Vassilvitskii (2006) aimed at closing the gap between theoretical and practical performance, and they proved that the smoothed running-time of k-means is polynomial in nk and 1/σ, where σ is the standard
deviation of the Gaussian perturbation.

We improve on their result and show that the smoothed running-time is polynomial in nk and 1/σ. Furthermore, we show that the smoothed running-time is polynomial in n and 1/σ if the number k of clusters and the dimension d of the data points are small compared to the number of points. Finally, we prove a polynomial bound for the smoothed running-time of k-means in one dimension.