Mid-level Likelihoods and Constraints for 3D Scene Interpretation

Microsoft Research334 тыс

Следующее

09.09.16 – 8 6241:10:54

Circling Alaska & Yukon

Популярные

337 дней – 11 35312:33

Sébastien Bubeck on Phi-2 and the surprising power of small models

358 дней – 24254:00

AI Forum 2023 | Panel Discussion “AI Synergy: Science and Society”

Опубликовано 9 сентября 2016, 19:26

How do you infer the 3D properties of the world from a single image? This question has eluded researchers in psychology and computer vision for decades, and enabling machines to accomplish this task remains an open question. In this talk, I will present my research towards solving the 3D interpretation task. I will first talk about Data-Driven 3D Primitives, a new way of inferring surface normals / scene layout from a single image. These primitives are discovered from large-scale RGB-D data by optimizing two simple criteria: primitives should be visually discriminative and geometrically informative. I will show that a straightforward label-transfer inference approach on top of of these primitives produces state-of-the-art results on a complex and cluttered dataset, as well as effective cross-dataset performance. Local cues, however, are inadequate by themselves: scenes are highly constrained in structure. I will therefore also talk about my work on constraining the 3D interpretation of scenes via physical and functional reasoning. Specifically, I will present work on mid-level physical constraints for layout estimation in the form of the convex and concave edges from the classic line-labeling literature. Finally, I will discuss how recognizing humans in scenes, even with imprecise and noisy pose estimators, can provide valuable cues for scene geometry via functional reasoning.

Свежие видео