Fast Dual-Tree k-means with Bounded Single-Iteration Runtime

505
15.3
Опубликовано 27 июня 2016, 21:25
k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd's algorithm costs O(kN) time. Although there are a handful of techniques to accelerate single Lloyd iterations, none of these techniques are tailored to the case of large k, which is increasingly common as dataset sizes grow (one example: vector quantization), and none of these techniques have a worst-case runtime bound of less than O(kN) per iteration. I present a dual-tree algorithm with a worst-case runtime bound of O(N + k log k) under certain assumptions on the dataset, and show that this algorithm outperforms all other alternatives as k and N grow. An implementation is readily available in mlpack.
Случайные видео
221 день – 336 16112:24
Adam Savage's Issue With A.I.-Generated Art
29.03.23 – 52 5661:01:30
The Check Up with Google Health 2023
автотехномузыкадетское