Scaling Verily's Whole Genome Sequencing with Apache Beam (Cloud Next '18)

763
15.9
Опубликовано 26 июля 2018, 22:48
Whole genome sequencing data processing is a long-running pipeline that moves around large amounts of data per sample through data quality and statistical tools. Verily Life Science runs their whole genome sequencing data processing pipeline entirely on Google Cloud. In this session, we will talk about how we switched parts of the pipeline to Apache Beam on Dataflow to increase our dollar vs. throughput flexibility. We saw our speed triple. We will discuss the APIs we used, the ease of use of the new system, and what we learned.

DA201

Event schedule → g.co/next18

Watch more Data Analytics sessions here → bit.ly/2KXMtcJ
Next ‘18 All Sessions playlist → bit.ly/Allsessions

Subscribe to the Google Cloud channel! → bit.ly/NextSub


re_ty: Publish; product: Cloud - General; fullname: Jean-Phillippe Martin; event: Google Cloud Next 2018;
автотехномузыкадетское