How to autoscale a TGI deployment on GKE

304
6
Следующее
Популярные
148 дней – 1 80945:25
A deep dive into AlloyDB for PostgreSQL
Опубликовано 26 ноября 2024, 17:00
Tutorial: Configure autoscaling for TGI on GKE → goo.gle/3Z9a7WK
Learn more about observability on GKE → goo.gle/4951bWY
Hugging Face TGI (Text Generation Inference) → goo.gle/4hXScLk

Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs. TGI is ready for production with its support for observability and metrics built-in.. Watch along as Googlers Wietse Venema and Abdel Sghiouar demonstrate how to autoscale TGI workloads on Google Kubernetes Engine (GKE) using TGI queue size as the scaling signal.

More resources:
Learn more about the TGI architecture → goo.gle/3Oo8mzY
A deep dive into autoscaling LLM workloads on GKE → goo.gle/4fKpD2t

Watch more Google Cloud: Building with Hugging Face → goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #HuggingFace

Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Google Kubernetes Engine, Gemma
Случайные видео
313 дней – 16 447 3070:15
Double Shot Dad
13.11.23 – 1 853 48730:51
NVIDIA SC23 Special Address
14 дней – 697 3276:21
Wooting made my dream keyboard.
автотехномузыкадетское