AWS Summit Tel Aviv 2016: Data Science & Best Practices for Apache Spark on Amazon EMR

193

Amazon Web Services784 тыс

Следующее

14.07.16 – 27943:51

AWS Summit Tel Aviv 2016: Deep Dive and Best Practices for Real Time Streaming Applications

Популярные

44 дня – 4232:46

Bayer equips crop science division with AWS Skills | Amazon Web Services

48 дней – 4483:53

Hyper-personalize customer experiences with Amazon Personalize and Bedrock | Amazon Web Services

Опубликовано 14 июля 2016, 15:40

Learn More: amzn.to/29yWmWW

- - - - - - - - - - - - - - - - -
Session Language: Hebrew

Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges.

Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics.

In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.

Свежие видео