Model types and performance bottlenecks

816
11.8
Следующее
92 дня – 7376:36
AI/ML frameworks for cloud TPUs
Популярные
Опубликовано 12 ноября 2025, 23:17
Learn why your powerful new AI model might be running slowly during inference. This video dives into the landscape of modern AI models, including Large Language Models (LLMs), Diffusion Models, Visual Language Models (VLMs), and Mixture of Experts (MoE). We uncover the four common performance bottlenecks—compute, memory capacity, memory bandwidth, and networking—and provide practical strategies for engineers to identify and address these issues, helping you achieve optimal performance for your AI applications.

Chapters:
0:00 - Introduction: Why is my AI model slow?
0:47 - The 4 types of modern AI models
2:03 - The four common bottlenecks
4:12 - Practical strategies for LLMs (Quantization)
4:52 - Practical strategies for Diffusion Models
5:24 - Practical strategies for Mixture of Experts (MoE)
5:53 - Conclusion: A playbook for performance

Resources:
AI Hypercomputer overview → goo.gle/3JMXNb2
Introduction to Cloud TPU → goo.gle/4nMA0WE


Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #LLM #VLM #AIModel

Speakers: Duncan Campbell
Products Mentioned: AI Infrastructure
Случайные видео
53 дня – 139 17020:09
Tested in 2025: Jen's Favorite Things!
108 дней – 2 1680:15
Tag your duo
09.05.17 – 3 6213:28
LaLiga Empowering the Best
автотехномузыкадетское