Deploying vLLM from AMD Infinity Hub with AMD ROCm™ Software Platform

1 662
12.6
Опубликовано 28 января 2025, 16:00
Learn how to run and serve LLMs using AMD ROCm™-enabled vLLM with Docker on AMD MI300X GPUs. This tutorial covers everything you need to optimize performance, set up inference pipelines, and serve AI applications efficiently.

Here’s what we’ll walk through:

• Verifying MI300X GPU availability with AMD ROCm™ platform
• Pulling and running AMD ROCm™-enabled vLLM Docker images from AMD Infinity Hub
• Explore vLLM directories, benchmarks, and pre-installed conda environment
• Running offline inference with models from Hugging Face, including Llama 3.2
• Setting up vLLM’s API to serve AI inferences locally or across multiple GPUs
• Utilizing tensor parallel inference for large-scale AI workloads

By the end of this tutorial, you’ll be ready to run advanced AI models and efficiently serve inferences on AMD GPUs using vLLM.

Explore the AMD AI Developer Hub: www.amd.com/gpu-ai-developer

Find the resources you need to develop using AMD products: amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

M04PADSESEJEYOR4
автотехномузыкадетское