Connecting Vision and Language via Interpretation, Grounding and Imagination

1 781

Microsoft Research330 тыс

Следующее

31.07.18 – 1 5621:00:02

GAON: General-purpose Application Offload to Near-Network Processors

Популярные

204 дня – 1 4505:06

GigaPath: Foundation Model for Digital Pathology

244 дня – 1 6853:27

Meet the 2024 Microsoft Research AI & Society fellows

Опубликовано 31 июля 2018, 17:09

Understanding how to model vision and language jointly is a long-standing challenge in artificial intelligence. Vision is one of the primary sensors we use to perceive the world, while language is our data structure to represent and communicate knowledge. In this talk, we will take up three lines of attack to this problem: interpretation, grounding, and imagination. In interpretation, the goal will be to get machine learning models to understand an image and describe its contents using natural language in a contextually relevant manner. In grounding, we will connect natural language to referents in the physical world, and show how this can help learn common sense. Finally, we will study how to ‘imagine’ visual concepts completely and accurately across the full range and (potentially unseen) compositions of their visual attributes. We will study these problems from computational as well as algorithmic perspectives and suggest exciting directions for future work.

See more at microsoft.com/en-us/research/v...

Свежие видео