Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

289
Следующее
4 часа – 01:13:03
MPI+ Applications on AMD GPUs
Популярные
Опубликовано 18 декабря 2024, 17:35
In this Advancing AI 2024 Luminary Developer Keynote, Dr. Lianmin Zheng introduces SGLang, a high-performance serving framework optimized for inference with LLMs and vision-language models.

SGLang’s core techniques include RadixAttention for improved KV cache reuse and jump-forward decoding for faster grammar-guided decoding. Additional optimizations, such as low-overhead CPU scheduling and torch native enhancements (e.g., torch.compile and torchao), further enhance efficiency. Benchmark results demonstrate that SGLang achieves superior performance compared to other state-of-the-art inference engines.

As an open-source project with broad adoption, SGLang is also deployed for production serving at xAI.

Speaker: Lianmin Zheng, xAI

Gain access to AMD developer tools and resources.
amd.com/en/developer.html#soft...

The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84.


© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Свежие видео
3 дня – 59 78214:44
The Macbook Pro M4 is Insane.
6 дней – 2306:51
How do I use my ACM certificate?
8 дней – 1 9032:33
Accessing and Using Google Vids
41 день – 2 6941:10
AGM H MAX|Enjoy More Charge Less
автотехномузыкадетское