Deploying a GPU powered LLM on Cloud Run

6 692
11.4
Следующее
Популярные
92 дня – 7376:36
AI/ML frameworks for cloud TPUs
Опубликовано 6 октября 2025, 23:19
Discover how you can deploy your own GPU-powered Large Language Model (LLM) on Google Cloud Run. This video walks through taking an open-source model like Gemma and deploying it as a scalable, serverless service with GPU acceleration. We explore the essential Dockerfile configurations and the `gcloud run deploy` command, highlighting key flags for optimizing performance and managing costs effectively. This setup enables independent scaling for your AI agent's intelligent core.

Chapters:
0:00 - Introduction
0:32 - Why deploy a separate LLM?
1:32 - Serving the LLM with Ollama and Dockerfile
2:41 - Deploying to Cloud Run with `gcloud run deploy`
2:59 - Hardware configuration
3:20 - Performance and cost control
3:40 - Deployment summary
3:57 - Summary and next steps

Resources:
Codelab → goo.gle/aaiwcr-3
GitHub Repository → goo.gle/4pYAmMi
Google Cloud Run GPU → goo.gle/46EYI6g
ADK Documentation → goo.gle/46Thw0d

Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #LLM #CloudRun

Speakers: Amit Maraj
Products Mentioned: Cloud GPUs, Cloud Run
автотехномузыкадетское