Both Sides Now: Generating and Understanding Visually-Grounded Language

729

40.5

Microsoft Research333 тыс

Следующее

13.05.19 – 2 5281:03:06

As We May Program

Популярные

74 дня – 7356:16

A generative model of biology for in-silico experimentation and discovery

74 дня – 1 9443:45

CataractBot: An LLM-Powered Experts-in-the-Loop Chatbot for Cataract Patients

Опубликовано 13 мая 2019, 16:21

From robots to cars, virtual assistants and voice-controlled drones, computing devices are increasingly expected to communicate naturally with people and to understand the visual context in which they operate. In this talk, I will present our latest work on generating and comprehending visually-grounded language. First, we will discuss the challenging task of describing an image (image captioning). I will introduce captioning models that leverage multiple data sources, including object detection datasets and unaligned text corpora, in order to learn about the long-tail of visual concepts found in the real world. To support and encourage further efforts in this area, I will present the 'nocaps' benchmark for novel object captioning. In the second part of the talk, I will describe our recent work on developing agents that follow natural language instructions in reconstructed 3D environments using the R2R dataset for vision-and-language navigation.

See more at microsoft.com/en-us/research/v...

Свежие видео