Deep Attention Mechanism for Multimodal Intelligence: Perception, Reasoning, & Expression

2 607

43.5

Microsoft Research334 тыс

Следующее

12.03.18 – 3 2111:48

Microsoft Hands-Free Music

Популярные

38 дней – 28 4633:25

Look Ma, no markers: holistic performance capture without the hassle

355 дней – 2 32518:42

AI Forum 2023 | AI4Science: Accelerating Scientific Discovery with Artificial Intelligence

Опубликовано 12 марта 2018, 15:27

We have long envisioned that machines one day can perform human-like perception, reasoning, and expression across multiple modalities including vision and language, which will augment and transform the ways humans communicate with each other and with the real world. With this vision, I’ll use three tasks as examples to demonstrate recent progress in multimodal intelligence, including image-to-language generation, visual question answering, and language-to-image synthesis. I’ll discuss the open problems behind these tasks that we are thrilled to solve, including image and language understanding, joint reasoning across both modalities, and expressing abstract concepts by natural language or image generation. I’ll also discuss the deep attention mechanisms recently developed to address these challenging problems, and analyze the interpretability and controllability in learning algorithms, which are of fundamental importance to general intelligence.

See more at microsoft.com/en-us/research/v...

Свежие видео