Building Production Inference for Trillion-Parameter Models on AMD Instinct MI355X

92
Published on 14 May 2026, 16:53
Speaker: Quentin Anthony, VP of AI Engineering, Zyphra
Talk Abstract: Zyphra built its foundation model training stack on AMD and is now extending that expertise to inference. In this session, Quentin Anthony walks through how Zyphra built Zyphra Cloud to serve frontier open-weight models — including DeepSeek V3.2, Kimi K2.6, and GLM 5.1 — on AMD Instinct™ MI355X GPUs. Topics include ROCm kernel optimization, parallelism, and execution strategies for trillion-parameter MoE models, as well as systems approaches to long-context inference under real-world latency and throughput constraints. Quentin shares lessons from training on AMD Instinct and how that co-design approach extends to inference, along with perspectives on the current optimization frontier for teams building on the AMD stack.

Find the resources you need to develop using AMD products: amd.com/en/developer.html

Join the Developer Community: devcommunity.amd.com

Join the Developer Discord server: discord.gg/amd-dev

***

© 2026 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
autotechmusickids