Deploying a GPU powered LLM on Cloud Run

6 692

11.4

Google Cloud Platform1.32 млн

Следующее

128 дней – 2 2146:30

Common Cloud Run errors and how to fix them with Cloud Logging

Популярные

92 дня – 7376:36

AI/ML frameworks for cloud TPUs

120 дней – 3 5543:18

Connecting your AI agent to a cloud hosted LLM

Опубликовано 6 октября 2025, 23:19

Discover how you can deploy your own GPU-powered Large Language Model (LLM) on Google Cloud Run. This video walks through taking an open-source model like Gemma and deploying it as a scalable, serverless service with GPU acceleration. We explore the essential Dockerfile configurations and the `gcloud run deploy` command, highlighting key flags for optimizing performance and managing costs effectively. This setup enables independent scaling for your AI agent's intelligent core.

Chapters:
0:00 - Introduction
0:32 - Why deploy a separate LLM?
1:32 - Serving the LLM with Ollama and Dockerfile
2:41 - Deploying to Cloud Run with `gcloud run deploy`
2:59 - Hardware configuration
3:20 - Performance and cost control
3:40 - Deployment summary
3:57 - Summary and next steps

Resources:
Codelab → goo.gle/aaiwcr-3
GitHub Repository → goo.gle/4pYAmMi
Google Cloud Run GPU → goo.gle/46EYI6g
ADK Documentation → goo.gle/46Thw0d

Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #LLM #CloudRun

Speakers: Amit Maraj
Products Mentioned: Cloud GPUs, Cloud Run

Свежие видео