Microsoft Research335 тыс
Опубликовано 30 марта 2017, 22:04
Social Data Processing using Exchangeable Models: Recommendation Systems, Crowd-sourcing, and Graphon Estimation
Much of modern data is generated by humans and drives decisions made in a variety of settings, such as recommendations for online markets, analysis of social networks, or denoising crowdsourced labels. Due to the complexities of human behavior, the precise data model is often unknown, creating a need for for flexible models with minimal assumptions. A minimal property that is natural for many datasets is ""exchangeability"", i.e. invariant under relabeling of the dataset, which naturally leads to a nonparametric latent variable model. The corresponding inference problem can be formulated as matrix or graphon estimation.
We propose similarity-based inference algorithms for such nonparametric latent variable models, and we provide theoretical guarantees that bound the error. Our method can be computed in a distributed manner, lending to good scalability properties. As a byproduct, our analysis explains a longstanding mystery of why the collaborative filtering heuristic performs well in practice. While classical collaborative filtering typically requires a dense dataset, we propose a new method which compares larger radius neighborhoods of data to compute similarities, and show that the estimate converges even for very sparse datasets, which has implications towards sparse graphon estimation. For denoising crowd-sourced labels, our algorithm provides guarantees under flexible models allowing for heteregeneity of task and worker types.
See more on this video at microsoft.com/en-us/research/v...
Much of modern data is generated by humans and drives decisions made in a variety of settings, such as recommendations for online markets, analysis of social networks, or denoising crowdsourced labels. Due to the complexities of human behavior, the precise data model is often unknown, creating a need for for flexible models with minimal assumptions. A minimal property that is natural for many datasets is ""exchangeability"", i.e. invariant under relabeling of the dataset, which naturally leads to a nonparametric latent variable model. The corresponding inference problem can be formulated as matrix or graphon estimation.
We propose similarity-based inference algorithms for such nonparametric latent variable models, and we provide theoretical guarantees that bound the error. Our method can be computed in a distributed manner, lending to good scalability properties. As a byproduct, our analysis explains a longstanding mystery of why the collaborative filtering heuristic performs well in practice. While classical collaborative filtering typically requires a dense dataset, we propose a new method which compares larger radius neighborhoods of data to compute similarities, and show that the estimate converges even for very sparse datasets, which has implications towards sparse graphon estimation. For denoising crowd-sourced labels, our algorithm provides guarantees under flexible models allowing for heteregeneity of task and worker types.
See more on this video at microsoft.com/en-us/research/v...
Свежие видео