vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

15
Опубликовано 18 декабря 2024, 17:33
In this Advancing AI 2024 Luminary Developer Keynote, Woosuk Kwon presents vLLM, an open-source high-performance LLM inference engine. Starting as a research project at UC Berkeley, vLLM has been one of the fastest and most popular LLM inference solutions in industry, reaching 27K+ stars and 560+ contributors.

In this talk, he covers how vLLM adopts LLM inference optimizations and how it supports AI accelerators, including AMD GPUs.

Gain access to AMD developer tools and resources.
amd.com/en/developer.html#soft...

The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84.

© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
автотехномузыкадетское