Autoscaling Your AI Agent Under Load

2 115
8.7
Следующее
Популярные
Опубликовано 21 октября 2025, 22:54
This video demonstrates how to effectively autoscale your AI agent under heavy user load. We simulate a stress test on a decoupled architecture, combining a GPU-powered Gemma LLM with a lightweight ADK agent on Google Cloud Run. Discover how Cloud Run intelligently provisions resources to handle high demand, ensuring graceful scaling and cost efficiency by only scaling the bottleneck component.


Chapters:
0:00 - Introduction: The Challenge of Load
0:19 - Load Testing with Locust
1:31 - Observing Autoscaling in Cloud Run
2:02 - Key Learnings: Decoupling and Cost Efficiency
2:31 - Conclusion


Resources:
Codelab → goo.gle/475sUpV
GitHub Repository → goo.gle/3KJVc1Y
Google Cloud Run GPU → goo.gle/48sn3NV
ADK Documentation → goo.gle/3LauFL8


Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #LLM #Gemma #ADK #CloudRun

Speakers: Amit Maraj
Products Mentioned: Cloud Run, Gemma, AI Infrastructure, Cloud GPUs
автотехномузыкадетское