Fast Dual-Tree k-means with Bounded Single-Iteration Runtime

507
15.4
Следующее
27.06.16 – 7 81157:33
Toward Causal Machine Learning
Популярные
Опубликовано 27 июня 2016, 21:25
k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd's algorithm costs O(kN) time. Although there are a handful of techniques to accelerate single Lloyd iterations, none of these techniques are tailored to the case of large k, which is increasingly common as dataset sizes grow (one example: vector quantization), and none of these techniques have a worst-case runtime bound of less than O(kN) per iteration. I present a dual-tree algorithm with a worst-case runtime bound of O(N + k log k) under certain assumptions on the dataset, and show that this algorithm outperforms all other alternatives as k and N grow. An implementation is readily available in mlpack.
автотехномузыкадетское