Simon Mo on vLLM: Easy, Fast, and Cost-Effective LLM Serving for Everyone

1 134
11.1
Опубликовано 30 июня 2025, 14:01
Join Simon Mo, a PhD student at Berkeley Sky Computing Lab, and Co-leader of the vLLM project as he shares insights at AMD Advancing AI 2025. This talk explores the vLLM project journey to create the fastest and easiest to use open-source LLM inference and serving engine. Simon discusses the collaboration with AMD, highlighting performance enhancements on the AMD Instinct™ MI300X GPU. Learn about the innovative scheduling framework, piecewise device graph, and various optimization techniques like prefix caching and speculative decoding. Discover how vLLM integrates with AMD ROCm™ software platform to achieve lower latency and higher throughput for LLMs.

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Свежие видео
1 день – 82 13313:01
My Wall Is Done!
13 дней – 341 37712:08
Which Linux Distro is Right for You
автотехномузыкадетское