Deploy Gemma 2 with multiple LoRA adapters on GKE

756

7.9

Google Cloud Platform1.18 млн

Следующее

13 часов – 1 7244:57

Your first workload with AI Hypercomputer

Популярные

14 дней – 3 5528:49

How to evaluate AI applications

49 дней – 1 6181:00

Architecting a RAG Podcast Summarizer #AI #Tech

Опубликовано 10 декабря 2024, 17:00

Tutorial: Deploy Gemma 2 with multiple LoRA adapters using TGI on GKE → goo.gle/4f5KP1C
Video: Train a LoRA adapter with your own dataset → goo.gle/4gkBLar
Deep dive: A conceptual overview of Low-Rank Adaptation (LoRA) → goo.gle/4in4NrA

Learn how to deploy multiple LoRA adapters in one deployment on Google Kubernetes Engine. Low-Rank Adaptation, or LoRA, is a fine-tuning technique used to adapt a base model to specific tasks without retraining the entire model. Watch along and learn how to deploy Gemma 2, a powerful open large language model, and TGI, an open-source LLM inference server from Hugging Face, to deploy multiple LoRA adapters for different tasks.

More resources:
Docs: Hugging Face Hub Inference client → goo.gle/3Zrwo2c
Docs: An overview of the TGI command line interface flags → goo.gle/41Fs1nd

Watch more Google Cloud: Building with Hugging Face → goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

#GoogleCloud #HuggingFace

Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Gemma, Gemini, Google Kubernetes Engine (GKE)

Свежие видео