Structure Visual Understanding and Interaction with Human and Environment

640
12.5
Опубликовано 14 октября 2019, 17:12
The visual world around us is highly structured. As 2D projection of our world, images are also structured. In images, there are usually a background and some foreground objects (e.g., kites and birds in the sky, sheep and cows on the grass). Moreover, objects usually interact with each other in predictable ways (e.g., mugs are on tables, keyboards are below computer monitors, the sky is in the background). This structure in our world manifests itself in the visual data that captures the world around us. In this talk, I will talk about how to leverage this structure in our visual world for visual understanding and interactions with language and environment. Specifically, I will present: 1) how to learn to prune dense graph and perform relational modeling for scene graph generation; 2) how to leverage structure in images for more grounded caption generation and question generation to actively acquire more information from humans; 3) How to learn a moving strategy for embodied visual system in a 3D environments to achieve better visual perception through actions. Finally, I will briefly talk about my ongoing and future works which are aimed at connecting vision, language, and environment towards better visual understanding and interactions.

Talk slides: microsoft.com/en-us/research/u...

Learn more about this and other talks at Microsoft Research: microsoft.com/en-us/research/v...
Случайные видео
50 дней – 516 25611:08
Found It! Best 65 Inch Oled Tv-- Lg G4
92 дня – 7 143 0981:02
Semi-Century | Samsung
02.05.22 – 6610:55
Nokia ASN Subsea test bed
автотехномузыкадетское