Semi-supervised Learning in Gigantic Image Collections

536

35.7

Microsoft Research330 тыс

Следующее

07.09.16 – 2 2121:04:26

First cryptanalysis of the full AES

Популярные

39 дней – 7842:02

Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact

32 дня – 1 7776:09

Direct Nash Optimization: Teaching language models to self-improve with general preferences

Опубликовано 7 сентября 2016, 17:59

With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. ΓÇ£Clean labelsΓÇ¥ can be manually obtained on a small fraction, ΓÇ£noisy labelsΓÇ¥ may be extracted automatically from surrounding text, while for most images there are no labels at all. Semisupervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. We combine this with a label sharing framework obtained from Wordnet to propagate label information to classes lacking manual annotations. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images with 74 thousand classes. Joint work with Yair Weiss and Antonio Torralba.

Свежие видео