Simple practical methods for estimating distances in large and sparse datasets like the web

44
Следующее
Популярные
Опубликовано 7 сентября 2016, 16:45
Random projections are a popular method for estimating distances. Instead of computing L2 distances over a big data matrix, we multiply the big matrix with a very sparse random matrix (over [-1 0 1]) to produce a smaller matrix. Distances in the smaller matrix are close to distances in the big matrix, but the smaller matrix is easier to work with. We then generalize very sparse random projections by introducing a novel data structure for estimating L1 and Lp distances as opposed to L2 distances only. Finally, we end with a new method, conditional random sampling. The new method estimates a variety of distance functions (L1, L2, Lp, Chi-square and more) from a single data structure. A single data structure is more convenient than different data structures for different distance functions.
автотехномузыкадетское