Both Sides Now: Generating and Understanding Visually-Grounded Language

724
40.2
Следующее
13.05.19 – 2 5171:03:06
As We May Program
Популярные
Опубликовано 13 мая 2019, 16:21
From robots to cars, virtual assistants and voice-controlled drones, computing devices are increasingly expected to communicate naturally with people and to understand the visual context in which they operate. In this talk, I will present our latest work on generating and comprehending visually-grounded language. First, we will discuss the challenging task of describing an image (image captioning). I will introduce captioning models that leverage multiple data sources, including object detection datasets and unaligned text corpora, in order to learn about the long-tail of visual concepts found in the real world. To support and encourage further efforts in this area, I will present the 'nocaps' benchmark for novel object captioning. In the second part of the talk, I will describe our recent work on developing agents that follow natural language instructions in reconstructed 3D environments using the R2R dataset for vision-and-language navigation.

See more at microsoft.com/en-us/research/v...
Случайные видео
139 дней – 31 5601:11
2024 Channel Trailer
09.02.20 – 721 3612:07
#MiAcademy: The Secret Behind 108MP
автотехномузыкадетское