Efficient Distributed Orthonormal Optimizers for Large-Scale Training

643

8.6

Microsoft Research356 тыс

Следующее

79 дней – 48052:29

CROSS — Leveraging AI ASICs for Homomorphic Encryption

Популярные

180 дней – 7230:44

Coming December 9th! Microsoft Research Forum | Season 2, Episode 2

355 дней – 68111:13

Episode 6: Healthcare Agent Orchestrator

Опубликовано 6 марта 2026, 15:16

Speaker: Kwangjun Ahn, Microsoft Research

I delivered a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic has been rapidly gaining attention in the community, emerging as a strong successor to AdamW following the success of orthonormal optimizers in training production-scale models such as Kimi-K2 and GLM-4.5.

The talk centered on the design and practice of orthonormal updates, with a focus on optimizers such as Muon and Dion2. While I briefly discussed their theoretical foundations, the emphasis was on practical usage: how to integrate these optimizers into modern training pipelines, interpret their algorithmic components, and leverage the implementation guidelines provided in our open-source codebase at github.com/microsoft/dion

Свежие видео