Microsoft Research355 тыс
Опубликовано 3 марта 2026, 17:59
We present Adaptively Rotated Optimization (ARO), a matrix optimizer that speeds up LLM training by applying updates in a rotated, geometry-aware coordinate system. Guided by new insights on global structures on LLM loss landscapes, ARO treats rotation as a unifying principle for sample efficiency, and proposed a new update policy that is applicable to all model weight matrices. In large scale controlled experiments, ARO consistently outperforms AdamW and orthogonalization-based method, maintaining its gains as models and training budgets scale.
Paper: arxiv.org/abs/2602.09006
This session aired on March 3, 2026, at Microsoft Research Forum, Season 2 Episode 3.
Register for the series to learn about future episodes: events.microsoft.com/flow/ms/r...
Explore all previous episodes: aka.ms/researchforumYTplaylist
Paper: arxiv.org/abs/2602.09006
This session aired on March 3, 2026, at Microsoft Research Forum, Season 2 Episode 3.
Register for the series to learn about future episodes: events.microsoft.com/flow/ms/r...
Explore all previous episodes: aka.ms/researchforumYTplaylist
Свежие видео
Случайные видео






















