Democratizing Dataproc (Cloud Next '19)

7 652
38.6
Опубликовано 12 апреля 2019, 21:05
dunnhumby uses Dataproc as a data platform on which our data scientist and product teams run ETL and machine learning routines. We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with. Come and learn how we do it.


Watch more:
Next '19 Data Analytics Sessions here → bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → bit.ly/Next19AllSessions

Subscribe to the GCP Channel → bit.ly/GCloudPlatform


Speaker(s): Jamie Thomson


Session ID: DA210
product: Cloud - Data Analytics - Dataproc; fullname: Jamie Thomson; event: Google Cloud Next 2019;
автотехномузыкадетское