Published on 18 Apr 2026, 15:47
GCP credit → goo.gle/handson-ep7-lab1
Lab → goo.gle/guardians
In this episode, we deploy Google's Gemma 4 model to Cloud Run two completely different ways, each with real trade-offs you need to understand before choosing one for production.
🔨 Ollama — model baked into the container. Instant cold starts. Rebuild to update.
⚡ vLLM — model mounted from Cloud Storage via FUSE. Slower first boot, but swap models without redeploying.
Both use Cloud Run GPUs, scale to zero, and ship through automated CI/CD with Cloud Build.
We build both. You decide which fits. 👇
📦 CI/CD with Cloud Build
🖥️ GPU accelerated serverless inference
🔄 Baked in vs. decoupled model architecture
🚀 Scale to zero
⚖️ Cold start speed vs. production agility
Chapters:
0:00 - Intro
6:08 - Getting started with Agentverse lab
7:57 - Laying the foundations of the citadel
16:07 - Forging the power core: Self hosted LLMs
28:02 - Forging the citadel's central core: Deploy vLLM
43:59 - Summary
More resources:
Cloud Run GPU documentation → goo.gle/4sEbTvG
Ollama documentation → goo.gle/3Qdi64w
vLLM documentation → goo.gle/4cvvxE9
Cloud Storage FUSE → goo.gle/4cQAb0V
Watch more Hands on AI → youtube.com/watch?v=qCBreTfjFH...
🔔 Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#Gemma4 #CloudRun
Speakers: Ayo Adedeji, Annie Wang
Products Mentioned: Agent Development Kit, Gemini API, Cloud Run
Lab → goo.gle/guardians
In this episode, we deploy Google's Gemma 4 model to Cloud Run two completely different ways, each with real trade-offs you need to understand before choosing one for production.
🔨 Ollama — model baked into the container. Instant cold starts. Rebuild to update.
⚡ vLLM — model mounted from Cloud Storage via FUSE. Slower first boot, but swap models without redeploying.
Both use Cloud Run GPUs, scale to zero, and ship through automated CI/CD with Cloud Build.
We build both. You decide which fits. 👇
📦 CI/CD with Cloud Build
🖥️ GPU accelerated serverless inference
🔄 Baked in vs. decoupled model architecture
🚀 Scale to zero
⚖️ Cold start speed vs. production agility
Chapters:
0:00 - Intro
6:08 - Getting started with Agentverse lab
7:57 - Laying the foundations of the citadel
16:07 - Forging the power core: Self hosted LLMs
28:02 - Forging the citadel's central core: Deploy vLLM
43:59 - Summary
More resources:
Cloud Run GPU documentation → goo.gle/4sEbTvG
Ollama documentation → goo.gle/3Qdi64w
vLLM documentation → goo.gle/4cvvxE9
Cloud Storage FUSE → goo.gle/4cQAb0V
Watch more Hands on AI → youtube.com/watch?v=qCBreTfjFH...
🔔 Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#Gemma4 #CloudRun
Speakers: Ayo Adedeji, Annie Wang
Products Mentioned: Agent Development Kit, Gemini API, Cloud Run
Fresh videos























