How to autoscale a TGI deployment on GKE

967
10.7
Опубликовано 26 ноября 2024, 17:00
Tutorial: Configure autoscaling for TGI on GKE → goo.gle/3Z9a7WK
Learn more about observability on GKE → goo.gle/4951bWY
Hugging Face TGI (Text Generation Inference) → goo.gle/4hXScLk

Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs. TGI is ready for production with its support for observability and metrics built-in.. Watch along as Googlers Wietse Venema and Abdel Sghiouar demonstrate how to autoscale TGI workloads on Google Kubernetes Engine (GKE) using TGI queue size as the scaling signal.

More resources:
Learn more about the TGI architecture → goo.gle/3Oo8mzY
A deep dive into autoscaling LLM workloads on GKE → goo.gle/4fKpD2t

Watch more Google Cloud: Building with Hugging Face → goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #HuggingFace

Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Google Kubernetes Engine, Gemma
автотехномузыкадетское