Autoscaling your AI agent under load

3 063
10.7
Опубликовано 21 октября 2025, 22:54
This video demonstrates how to effectively autoscale your AI agent under heavy user load. We simulate a stress test on a decoupled architecture, combining a GPU-powered Gemma LLM with a lightweight ADK agent on Google Cloud Run. Discover how Cloud Run intelligently provisions resources to handle high demand, ensuring graceful scaling and cost efficiency by only scaling the bottleneck component.

Chapters:
0:00 - Introduction: The Challenge of Load
0:19 - Load Testing with Locust
1:31 - Observing Autoscaling in Cloud Run
2:02 - Key Learnings: Decoupling and Cost Efficiency
2:31 - Conclusion

Resources:
Codelab → goo.gle/475sUpV
GitHub Repository → goo.gle/3KJVc1Y
Google Cloud Run GPU → goo.gle/48sn3NV
ADK Documentation → goo.gle/3LauFL8

Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #LLM #Gemma #ADK #CloudRun

Speakers: Amit Maraj
Products Mentioned: Cloud Run, Gemma, AI Infrastructure, Cloud GPUs
автотехномузыкадетское