Deploy Gemma 2 with multiple LoRA adapters on GKE

756
7.9
Следующее
Популярные
14 дней – 3 5528:49
How to evaluate AI applications
Опубликовано 10 декабря 2024, 17:00
Tutorial: Deploy Gemma 2 with multiple LoRA adapters using TGI on GKE → goo.gle/4f5KP1C
Video: Train a LoRA adapter with your own dataset → goo.gle/4gkBLar
Deep dive: A conceptual overview of Low-Rank Adaptation (LoRA) → goo.gle/4in4NrA

Learn how to deploy multiple LoRA adapters in one deployment on Google Kubernetes Engine. Low-Rank Adaptation, or LoRA, is a fine-tuning technique used to adapt a base model to specific tasks without retraining the entire model. Watch along and learn how to deploy Gemma 2, a powerful open large language model, and TGI, an open-source LLM inference server from Hugging Face, to deploy multiple LoRA adapters for different tasks.

More resources:
Docs: Hugging Face Hub Inference client → goo.gle/3Zrwo2c
Docs: An overview of the TGI command line interface flags → goo.gle/41Fs1nd

Watch more Google Cloud: Building with Hugging Face → goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #HuggingFace

Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Gemma, Gemini, Google Kubernetes Engine (GKE)
автотехномузыкадетское