Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport

3 467

12.6

Microsoft Research340 тыс

Следующее

02.12.20 – 1 2521:01:32

Geometry-constrained Beamforming Network for end-to-end Farfield Sound Source Separation

Популярные

191 день – 5739:39

Fostering appropriate reliance on AI | Microsoft Research Forum

27.06.22 – 7 5491:29:38

MSR-IISc AI Seminar Series: GFlowNets and System 2 Deep Learning - Yoshua Bengio

Опубликовано 24 ноября 2020, 20:45

Machine learning research has traditionally been model-centric, focusing on architectures, parameter optimization,  and model transfer. Much less attention has been given to the datasets on which these models are trained, which are often assumed to be fixed, or subject to extrinsic and inevitable change. However, successful application of ML in practice often requires substantial effort in terms of dataset preprocessing and manipulation, such as augmenting, merging, mixing, or reducing datasets.

In this talk I will present some of our recent work that seeks to formalize and automatize these and other flavors of dataset manipulation under a unified approach. First, I will introduce the Optimal Transport Dataset Distance, which provides a fundamental theoretical building block: a formal notion of similarity between labeled datasets. In the second part of the talk, I will discuss how this notion of distance can be used to formulate a general framework of dataset optimization by means of gradient flows in probability space. I will end by presenting various exciting potential applications of this dataset optimization framework.

Learn more about the 2020-2021 Directions in ML: AutoML and Automating Algorithms virtual speaker series: aka.ms/diml

Свежие видео