Google Cloud Platform1.17 млн
Следующее
Опубликовано 26 ноября 2024, 17:00
Tutorial: Configure autoscaling for TGI on GKE → goo.gle/3Z9a7WK
Learn more about observability on GKE → goo.gle/4951bWY
Hugging Face TGI (Text Generation Inference) → goo.gle/4hXScLk
Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs. TGI is ready for production with its support for observability and metrics built-in.. Watch along as Googlers Wietse Venema and Abdel Sghiouar demonstrate how to autoscale TGI workloads on Google Kubernetes Engine (GKE) using TGI queue size as the scaling signal.
More resources:
Learn more about the TGI architecture → goo.gle/3Oo8mzY
A deep dive into autoscaling LLM workloads on GKE → goo.gle/4fKpD2t
Watch more Google Cloud: Building with Hugging Face → goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#GoogleCloud #HuggingFace
Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Google Kubernetes Engine, Gemma
Learn more about observability on GKE → goo.gle/4951bWY
Hugging Face TGI (Text Generation Inference) → goo.gle/4hXScLk
Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs. TGI is ready for production with its support for observability and metrics built-in.. Watch along as Googlers Wietse Venema and Abdel Sghiouar demonstrate how to autoscale TGI workloads on Google Kubernetes Engine (GKE) using TGI queue size as the scaling signal.
More resources:
Learn more about the TGI architecture → goo.gle/3Oo8mzY
A deep dive into autoscaling LLM workloads on GKE → goo.gle/4fKpD2t
Watch more Google Cloud: Building with Hugging Face → goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#GoogleCloud #HuggingFace
Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Google Kubernetes Engine, Gemma
Свежие видео