Simon Mo on vLLM: Easy, Fast, and Cost-Effective LLM Serving for Everyone

1 152

AMD Developer Central30.4 тыс

Следующее

328 дней – 2 96721:36

Daniel Han on The Future of Training and Reinforcement Learning

Популярные

195 дней – 8022:26

Frontier AI In Your Hands: On-Device Intelligence With Gemma and Gemini Nano

195 дней – 14320:44

Introduction to Primus

Опубликовано 30 июня 2025, 14:01

Join Simon Mo, a PhD student at Berkeley Sky Computing Lab, and Co-leader of the vLLM project as he shares insights at AMD Advancing AI 2025. This talk explores the vLLM project journey to create the fastest and easiest to use open-source LLM inference and serving engine. Simon discusses the collaboration with AMD, highlighting performance enhancements on the AMD Instinct™ MI300X GPU. Learn about the innovative scheduling framework, piecewise device graph, and various optimization techniques like prefix caching and speculative decoding. Discover how vLLM integrates with AMD ROCm™ software platform to achieve lower latency and higher throughput for LLMs.

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Свежие видео