Neighbourhood Component Analysis

1 947
72.1
Следующее
06.09.16 – 5447:37
Unknowable
Популярные
Опубликовано 6 сентября 2016, 5:00
Say I give you a dataset of N points {x_n} in D dimensions. Can you find for me (up to rotation and isotropic scaling) a projection matrix A (of size d by D) such that when you apply nearest neighbour classification to the point set {y_n = A x_n} you get the best possible performance? Nearest neighbour classification is a very simple nonparametric method for supervised learning, and has several appealing properties: the decision surfaces are nonlinear, the quality of the predictions automatically improve as the amount of training data increases, and there is only a single hyperparameter to be tuned. However, there are two significant problems. First, we must define what we mean by nearest, in other words we must specify a metric on the input space. Second, the computational load of the classifier is quite high at test time since we must store and search through the entire training set to find the neighbours of a query point before we can do classification. In this talk I will present an algorithm for linearly reducing the dimensionality of the input data in a way that preserves the performance of nearest neighbour classification as much as possible in the low dimensional space. The algorithm simultaneouly reduces the computational load of the classifier at test time and learns a (low-rank) metric on the input space.Results on a variety of datasets show that the performance of nearest neighbour after using our dimensionality reduction technique consistently exceeds the performance after applying other linear dimensionality reduction methods such as PCA or LDA.
автотехномузыкадетское