Structure Visual Understanding and Interaction with Human and Environment

643

12.6

Microsoft Research336 тыс

Следующее

14.10.19 – 1 46940:58

Safe and Fair Reinforcement Learning

Популярные

31 день – 451:11:26

Improving the Security of United States Elections with Robust Optimization

52 дня – 1 8453:56

Introducing BiomedParse, a groundbreaking foundation model for biomedical image analysis

Опубликовано 14 октября 2019, 17:12

The visual world around us is highly structured. As 2D projection of our world, images are also structured. In images, there are usually a background and some foreground objects (e.g., kites and birds in the sky, sheep and cows on the grass). Moreover, objects usually interact with each other in predictable ways (e.g., mugs are on tables, keyboards are below computer monitors, the sky is in the background). This structure in our world manifests itself in the visual data that captures the world around us. In this talk, I will talk about how to leverage this structure in our visual world for visual understanding and interactions with language and environment. Specifically, I will present: 1) how to learn to prune dense graph and perform relational modeling for scene graph generation; 2) how to leverage structure in images for more grounded caption generation and question generation to actively acquire more information from humans; 3) How to learn a moving strategy for embodied visual system in a 3D environments to achieve better visual perception through actions. Finally, I will briefly talk about my ongoing and future works which are aimed at connecting vision, language, and environment towards better visual understanding and interactions.

Talk slides: microsoft.com/en-us/research/u...

Learn more about this and other talks at Microsoft Research: microsoft.com/en-us/research/v...

Свежие видео