Microsoft Research352 тыс
Опубликовано 20 августа 2025, 16:35
The video introduces MindJourney, a framework that enhances Vision-Language Models (VLMs), which excel at interpreting single images but struggle to infer the underlying three-dimensional world. By allowing the VLM to “imagine” moving through the scene given a spatial reasoning question, the model proposes trajectories in a simulated imagination space. A world model then generates novel views along these paths, expanding the available observations from a single image. This richer 3D context enables the VLM to answer previously challenging questions with greater ease.
Publication: microsoft.com/en-us/research/p...
Publication: microsoft.com/en-us/research/p...
Свежие видео
Случайные видео






















