Beyond Naming: Image Understanding via Physical, Functional and Causal Relationships

58
Опубликовано 17 августа 2016, 1:01
What does it mean to 'understand' an image? One popular answer is simply naming the objects seen in the image. During the last decade most computer vision researchers have focused on this 'object naming' problem. While there has been great progress in detecting things like 'cars' and 'people', such a level of understanding still cannot answer even basic questions about an image such as 'What is the geometric structure of the scene?', 'Where in the image can I walk?' or 'What is going to happen next?'. In this talk, I will show that it is beneficial to go beyond mere object naming and harness relationships between objects for image understanding. These relationships can provide crucial high-level constraints to help construct a globally-consistent model of the scene, as well as allow for powerful ways of understanding and interpreting the underlying image. Specifically, I will present image and video understanding systems that incorporate: (1) physical relationships between objects via a qualitative 3D volumetric representation; (2) functional relationships between objects and actions via data-driven physical interactions; and (3) causal relationships between actions via a storyline representation. I will demonstrate the importance of these relationships on a diverse set of real-world images and videos.
автотехномузыкадетское