Model types and performance bottlenecks

1 163

12.5

Google Cloud Platform1.37 млн

Следующее

208 дней – 1 0766:36

AI/ML frameworks for cloud TPUs

Популярные

1 день – 8 09234:43

Inside Google Antigravity 2.0: The complete developer guide | The Agent Factory

26 дней – 3 2997:43

Your AI agent still has no memory? Fix it with these 3 patterns

Опубликовано 12 ноября 2025, 23:17

Learn why your powerful new AI model might be running slowly during inference. This video dives into the landscape of modern AI models, including Large Language Models (LLMs), Diffusion Models, Visual Language Models (VLMs), and Mixture of Experts (MoE). We uncover the four common performance bottlenecks—compute, memory capacity, memory bandwidth, and networking—and provide practical strategies for engineers to identify and address these issues, helping you achieve optimal performance for your AI applications.

Chapters:
0:00 - Introduction: Why is my AI model slow?
0:47 - The 4 types of modern AI models
2:03 - The four common bottlenecks
4:12 - Practical strategies for LLMs (Quantization)
4:52 - Practical strategies for Diffusion Models
5:24 - Practical strategies for Mixture of Experts (MoE)
5:53 - Conclusion: A playbook for performance

Resources:
AI Hypercomputer overview → goo.gle/3JMXNb2
Introduction to Cloud TPU → goo.gle/4nMA0WE

Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #LLM #VLM #AIModel

Speakers: Duncan Campbell
Products Mentioned: AI Infrastructure

Свежие видео