Microsoft Research335 тыс
Опубликовано 25 января 2018, 18:29
Bridging visual and natural language understanding is a fundamental requirement for intelligent agents. This talk will focus mainly on automatic image captioning and visual question answering (VQA). I will cover some recent advances in automatic image caption evaluation, visual attention modeling and generalization to images 'in the wild'. I will also introduce my recent work on vision-and-language navigation (VLN), in which we situate agents in a new RL environment constructed from dense RGB-D imagery of 90 real buildings.
See more at microsoft.com/en-us/research/v...
See more at microsoft.com/en-us/research/v...
Случайные видео