Microsoft Research334 тыс
Следующее
Опубликовано 26 мая 2020, 16:50
Computer vision has seen major success in learning to recognize objects from massive “disembodied” Web photo collections labeled by human annotators. Yet cognitive science tells us that perception develops in the context of acting the world---and without intensive supervision. Meanwhile, many realistic vision tasks require not only categorizing a well-composed human-taken photo, but also actively deciding where to look in the first place. In the context of these challenges, we are exploring how machine perception benefits from anticipating the sights and sounds an agent will experience as a function of its own actions. Based on this premise, we introduce methods for learning to look around intelligently in novel environments, learning from video how to interact with objects, and perceiving audio-visual streams for both semantic and spatial context. Together, these are steps towards first-person perception, where interaction with the world is itself a supervisory signal.
See more at microsoft.com/en-us/research/v...
See more at microsoft.com/en-us/research/v...
Свежие видео
Случайные видео
❄️Extreme Chill Challenge: Watch S200 rugged phone conquer the freeze#DoogeeS200 #testing #challenge