Deploying vLLM from AMD Infinity Hub with AMD ROCm™ Software Platform

1 662

12.6

AMD Developer Central25 тыс

Следующее

342 дня – 1 7405:19

Installing AMD ROCm™ Software Platform on Rocky Linux

Популярные

56 дней – 19729:12

ROCm: Enabling Open Innovation and the Future of GPU Kernel Development

268 дней – 3822:20

The Democratization of Compute: Insights from Beyond CUDA

Опубликовано 28 января 2025, 16:00

Learn how to run and serve LLMs using AMD ROCm™-enabled vLLM with Docker on AMD MI300X GPUs. This tutorial covers everything you need to optimize performance, set up inference pipelines, and serve AI applications efficiently.

Here’s what we’ll walk through:

• Verifying MI300X GPU availability with AMD ROCm™ platform
• Pulling and running AMD ROCm™-enabled vLLM Docker images from AMD Infinity Hub
• Explore vLLM directories, benchmarks, and pre-installed conda environment
• Running offline inference with models from Hugging Face, including Llama 3.2
• Setting up vLLM’s API to serve AI inferences locally or across multiple GPUs
• Utilizing tensor parallel inference for large-scale AI workloads

By the end of this tutorial, you’ll be ready to run advanced AI models and efficiently serve inferences on AMD GPUs using vLLM.

Explore the AMD AI Developer Hub: www.amd.com/gpu-ai-developer

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

M04PADSESEJEYOR4

Свежие видео