Simon Mo on vLLM: Easy, Fast, and Cost-Effective LLM Serving for Everyone

1 152
11
Следующее
Популярные
195 дней – 14320:44
Introduction to Primus
Опубликовано 30 июня 2025, 14:01
Join Simon Mo, a PhD student at Berkeley Sky Computing Lab, and Co-leader of the vLLM project as he shares insights at AMD Advancing AI 2025. This talk explores the vLLM project journey to create the fastest and easiest to use open-source LLM inference and serving engine. Simon discusses the collaboration with AMD, highlighting performance enhancements on the AMD Instinct™ MI300X GPU. Learn about the innovative scheduling framework, piecewise device graph, and various optimization techniques like prefix caching and speculative decoding. Discover how vLLM integrates with AMD ROCm™ software platform to achieve lower latency and higher throughput for LLMs.

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Свежие видео
3 дня – 4 98742:03
What's new in Chrome
3 дня – 14 69342:12
What's new in Android
5 дней – 2 699 2440:59
Google I/O 2026 Developer Keynote Recap
автотехномузыкадетское