Microsoft Research334 тыс
Следующее
Опубликовано 9 сентября 2016, 19:26
How do you infer the 3D properties of the world from a single image? This question has eluded researchers in psychology and computer vision for decades, and enabling machines to accomplish this task remains an open question. In this talk, I will present my research towards solving the 3D interpretation task. I will first talk about Data-Driven 3D Primitives, a new way of inferring surface normals / scene layout from a single image. These primitives are discovered from large-scale RGB-D data by optimizing two simple criteria: primitives should be visually discriminative and geometrically informative. I will show that a straightforward label-transfer inference approach on top of of these primitives produces state-of-the-art results on a complex and cluttered dataset, as well as effective cross-dataset performance. Local cues, however, are inadequate by themselves: scenes are highly constrained in structure. I will therefore also talk about my work on constraining the 3D interpretation of scenes via physical and functional reasoning. Specifically, I will present work on mid-level physical constraints for layout estimation in the form of the convex and concave edges from the classic line-labeling literature. Finally, I will discuss how recognizing humans in scenes, even with imprecise and noisy pose estimators, can provide valuable cues for scene geometry via functional reasoning.
Свежие видео
Случайные видео
Gift Yourself Higher Resolution, FPS, & Better Gaming With an NVIDIA GeForce RTX 40 Series Gaming PC