Video Understanding: From Tags to Language

1 059

58.8

Microsoft Research335 тыс

Следующее

08.11.17 – 77826:34

Project Catapult Academic Tutorial: OpenCL

Популярные

331 день – 6635:18

Research Forum: Closing Remarks and Announcements

06.12.23 – 1 72032:52

AI Forum 2023 | The Emergence of General AI for Medicine

Опубликовано 8 ноября 2017, 18:12

The increasing ubiquity of devices capable of capturing videos has led to an explosion in the amount of recorded video content. Instead of “eyeballing” the videos for potentially useful information, it has therefore been a pressing need to develop automatic video analysis and understanding algorithms for various applications. However, understanding videos on a large scale remains challenging: large variations and complexities, time-consuming annotations, and a wide range of involved video concepts. In light of these challenges, my research towards video understanding focuses on designing effective network architectures to learn robust video representations, learning video concepts from weak supervision and building a stronger connection between language and vision. In this talk, I will first introduce a Deep Event Network (DevNet) that can simultaneously detect pre-defined events and localize spatial-temporal key evidence. Then I will show how web crawled videos and images could be utilized for learning video concepts. Finally, I will present our recent efforts to connect visual understanding to language through attractive visual captioning and visual question segmentation.

See more on this video at microsoft.com/en-us/research/v...

Свежие видео