AMD Developer Central7.19 тыс
Следующее
Опубликовано 18 декабря 2024, 17:35
In this Advancing AI 2024 Luminary Developer Keynote, Dr. Lianmin Zheng introduces SGLang, a high-performance serving framework optimized for inference with LLMs and vision-language models.
SGLang’s core techniques include RadixAttention for improved KV cache reuse and jump-forward decoding for faster grammar-guided decoding. Additional optimizations, such as low-overhead CPU scheduling and torch native enhancements (e.g., torch.compile and torchao), further enhance efficiency. Benchmark results demonstrate that SGLang achieves superior performance compared to other state-of-the-art inference engines.
As an open-source project with broad adoption, SGLang is also deployed for production serving at xAI.
Speaker: Lianmin Zheng, xAI
Gain access to AMD developer tools and resources.
amd.com/en/developer.html#soft...
The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84.
© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
SGLang’s core techniques include RadixAttention for improved KV cache reuse and jump-forward decoding for faster grammar-guided decoding. Additional optimizations, such as low-overhead CPU scheduling and torch native enhancements (e.g., torch.compile and torchao), further enhance efficiency. Benchmark results demonstrate that SGLang achieves superior performance compared to other state-of-the-art inference engines.
As an open-source project with broad adoption, SGLang is also deployed for production serving at xAI.
Speaker: Lianmin Zheng, xAI
Gain access to AMD developer tools and resources.
amd.com/en/developer.html#soft...
The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84.
© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Свежие видео
Случайные видео
How To Use Premium Subscription AIs For Free! - 1Hub.ai Social Tasks Review (No Subscription Needed)