SGLang: Open-Source Model Performance Optimization

661

36.7

AMD Developer Central29.9 тыс

Следующее

163 дня – 5520:22

gpt-oss, Bringing Agentic Reasoning Models to AMD

Популярные

129 дней – 73 4581:05

The Future of Open, Scalable AI

163 дня – 12620:44

Introduction to Primus

Опубликовано 10 ноября 2025, 18:00

This talk introduces SGLang, a high-performance serving framework for large language models (LLMs) and vision-language models (VLMs), and reviews key advancements achieved in 2025. Yineng Zhang covers optimizations for DeepSeek V3 that improve throughput and latency, large-scale production deployments, and the integration of reinforcement learning to adapt serving policies under real workloads. The session details training acceleration via speculative decoding, hierarchical KV caching for memory efficiency at scale, and deterministic inference for reproducibility and compliance. He also highlights day-0 support for new model families, robust model deployment orchestration, and distributed inference on AMD platforms to unlock cost-effective performance.

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Свежие видео