Deep Attention Mechanism for Multimodal Intelligence: Perception, Reasoning, & Expression

2 607
43.5
Опубликовано 12 марта 2018, 15:27
We have long envisioned that machines one day can perform human-like perception, reasoning, and expression across multiple modalities including vision and language, which will augment and transform the ways humans communicate with each other and with the real world. With this vision, I’ll use three tasks as examples to demonstrate recent progress in multimodal intelligence, including image-to-language generation, visual question answering, and language-to-image synthesis. I’ll discuss the open problems behind these tasks that we are thrilled to solve, including image and language understanding, joint reasoning across both modalities, and expressing abstract concepts by natural language or image generation. I’ll also discuss the deep attention mechanisms recently developed to address these challenging problems, and analyze the interpretability and controllability in learning algorithms, which are of fundamental importance to general intelligence.

See more at microsoft.com/en-us/research/v...
автотехномузыкадетское