Google Cloud Platform1.17 млн
Опубликовано 12 апреля 2019, 21:05
dunnhumby uses Dataproc as a data platform on which our data scientist and product teams run ETL and machine learning routines. We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with. Come and learn how we do it.
Watch more:
Next '19 Data Analytics Sessions here → bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → bit.ly/Next19AllSessions
Subscribe to the GCP Channel → bit.ly/GCloudPlatform
Speaker(s): Jamie Thomson
Session ID: DA210
product: Cloud - Data Analytics - Dataproc; fullname: Jamie Thomson; event: Google Cloud Next 2019;
Watch more:
Next '19 Data Analytics Sessions here → bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → bit.ly/Next19AllSessions
Subscribe to the GCP Channel → bit.ly/GCloudPlatform
Speaker(s): Jamie Thomson
Session ID: DA210
product: Cloud - Data Analytics - Dataproc; fullname: Jamie Thomson; event: Google Cloud Next 2019;
Свежие видео