Microsoft Research334 тыс
Следующее
Опубликовано 27 июня 2016, 21:25
k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd's algorithm costs O(kN) time. Although there are a handful of techniques to accelerate single Lloyd iterations, none of these techniques are tailored to the case of large k, which is increasingly common as dataset sizes grow (one example: vector quantization), and none of these techniques have a worst-case runtime bound of less than O(kN) per iteration. I present a dual-tree algorithm with a worst-case runtime bound of O(N + k log k) under certain assumptions on the dataset, and show that this algorithm outperforms all other alternatives as k and N grow. An implementation is readily available in mlpack.
2024's Best iPhone Case & Screen Saver - Ostand 360° Spin Case Review & Test (Most Stable 360 Stand)