vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

AMD Developer Central7.19 тыс

Следующее

4 часа – 1517:22

The Era of Generative AI in Personal Computing, Adrian Macias,AMD Sr. Director AI Product Management

Популярные

4 часа – 1717:22

Evolution of Llama: From Herd to Stack, Raghotham Murthy, Meta

3 часа – 3120:11

@stephcurry put other celebrations to 🛏️. Night Night is a Breakout Search of 2024 🥱 #YearInSearch

Опубликовано 18 декабря 2024, 17:33

In this Advancing AI 2024 Luminary Developer Keynote, Woosuk Kwon presents vLLM, an open-source high-performance LLM inference engine. Starting as a research project at UC Berkeley, vLLM has been one of the fastest and most popular LLM inference solutions in industry, reaching 27K+ stars and 560+ contributors.

In this talk, he covers how vLLM adopts LLM inference optimizations and how it supports AI accelerators, including AMD GPUs.

Gain access to AMD developer tools and resources.
amd.com/en/developer.html#soft...

The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84.

© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Свежие видео