Connecting Vision and Language via Interpretation, Grounding and Imagination

1 781
54
Следующее
Популярные
Опубликовано 31 июля 2018, 17:09
Understanding how to model vision and language jointly is a long-standing challenge in artificial intelligence. Vision is one of the primary sensors we use to perceive the world, while language is our data structure to represent and communicate knowledge. In this talk, we will take up three lines of attack to this problem: interpretation, grounding, and imagination. In interpretation, the goal will be to get machine learning models to understand an image and describe its contents using natural language in a contextually relevant manner. In grounding, we will connect natural language to referents in the physical world, and show how this can help learn common sense. Finally, we will study how to ‘imagine’ visual concepts completely and accurately across the full range and (potentially unseen) compositions of their visual attributes. We will study these problems from computational as well as algorithmic perspectives and suggest exciting directions for future work.

See more at microsoft.com/en-us/research/v...
автотехномузыкадетское