# Computational Complexity - Research Seminar

## Bodo Manthey: *Improved Smoothed Analysis of k-Means Clustering II*

July 8, 2008, 10:15 a.m.

E 1 3, Room 415

The *k*-means algorithm is one of the most popular clustering algorithms. While this algorithm is known for its speed in practice, its worst-case performance is poor: Its running-time can be exponential in the number of points, and the worst-case upper bound of *n*^{O(kd)} is way off the observed running-time. Arthur and Vassilvitskii (2006) aimed at closing the gap between theoretical and practical performance, and they proved that the smoothed running-time of *k*-means is polynomial in *n*^{k} and 1/σ, where σ is the standard

deviation of the Gaussian perturbation.

We improve on their result and show that the smoothed running-time is polynomial in *n*^{√k} and 1/σ. Furthermore, we show that the smoothed running-time is polynomial in *n* and 1/σ if the number *k* of clusters and the dimension *d* of the data points are small compared to the number of points. Finally, we prove a polynomial bound for the smoothed running-time of *k*-means in one dimension.