Google Cloud Platform1.32 млн
Опубликовано 6 октября 2025, 23:19
Discover how you can deploy your own GPU-powered Large Language Model (LLM) on Google Cloud Run. This video walks through taking an open-source model like Gemma and deploying it as a scalable, serverless service with GPU acceleration. We explore the essential Dockerfile configurations and the `gcloud run deploy` command, highlighting key flags for optimizing performance and managing costs effectively. This setup enables independent scaling for your AI agent's intelligent core.
Chapters:
0:00 - Introduction
0:32 - Why deploy a separate LLM?
1:32 - Serving the LLM with Ollama and Dockerfile
2:41 - Deploying to Cloud Run with `gcloud run deploy`
2:59 - Hardware configuration
3:20 - Performance and cost control
3:40 - Deployment summary
3:57 - Summary and next steps
Resources:
Codelab → goo.gle/aaiwcr-3
GitHub Repository → goo.gle/4pYAmMi
Google Cloud Run GPU → goo.gle/46EYI6g
ADK Documentation → goo.gle/46Thw0d
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#GoogleCloud #LLM #CloudRun
Speakers: Amit Maraj
Products Mentioned: Cloud GPUs, Cloud Run
Chapters:
0:00 - Introduction
0:32 - Why deploy a separate LLM?
1:32 - Serving the LLM with Ollama and Dockerfile
2:41 - Deploying to Cloud Run with `gcloud run deploy`
2:59 - Hardware configuration
3:20 - Performance and cost control
3:40 - Deployment summary
3:57 - Summary and next steps
Resources:
Codelab → goo.gle/aaiwcr-3
GitHub Repository → goo.gle/4pYAmMi
Google Cloud Run GPU → goo.gle/46EYI6g
ADK Documentation → goo.gle/46Thw0d
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#GoogleCloud #LLM #CloudRun
Speakers: Amit Maraj
Products Mentioned: Cloud GPUs, Cloud Run
Свежие видео
Случайные видео























