Word clustering in a Dataflow ML Pipeline: Part 1

Published on 17 May 2023, 23:31
In this video, Aniket dives into how to run ML and NLP operations in a data processing pipeline at scale. In the first part of the demo, Aniket Agrawal, Strategic Cloud Engineer at Google, employs Dataflow ML for a well-known ML-NLP application called word clustering. Here, we handle the spaCy and scikit-learn models sequentially in a Vertex AI user-managed notebook for creating four BIRCH clusters for the 300-dimensional word embedding vectors. This video will pique your interest if you are an NLP or ML enthusiast!

Check out my Google Cloud Medium blog → goo.gle/3W7Rd0L
Learn more about Run_Inference → goo.gle/3Mzyc31
Bring your own ML model to Beam RunInference → goo.gle/3oefcPm
Study the next generation of Dataflow → goo.gle/3Wtu49j
Explore the powers of Vertex AI Workbench → goo.gle/41IZQzT

Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech