SGLang: Open-Source Model Performance Optimization

661
36.7
Следующее
Популярные
129 дней – 73 4581:05
The Future of Open, Scalable AI
163 дня – 12620:44
Introduction to Primus
Опубликовано 10 ноября 2025, 18:00
This talk introduces SGLang, a high-performance serving framework for large language models (LLMs) and vision-language models (VLMs), and reviews key advancements achieved in 2025. Yineng Zhang covers optimizations for DeepSeek V3 that improve throughput and latency, large-scale production deployments, and the integration of reinforcement learning to adapt serving policies under real workloads. The session details training acceleration via speculative decoding, hierarchical KV caching for memory efficiency at scale, and deterministic inference for reproducibility and compliance. He also highlights day-0 support for new model families, robust model deployment orchestration, and distributed inference on AMD platforms to unlock cost-effective performance.

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Свежие видео
4 дня – 270 64912:08
Which Linux Distro is Right for You
5 дней – 1 0111:04
Unique Color Combos
автотехномузыкадетское