AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ecosystem (BDM306)

7 504

38.5

Amazon Web Services776 тыс

Следующее

01.12.16 – 17 83257:06

AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (BDM401)

Популярные

12 часов – 1678:24

How do I turn on query logging on Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL instances?

46 дней – 5337:49

.NET Observability: Traces with OpenTelemetry and AWS X-Ray | Amazon Web Services

Опубликовано 1 декабря 2016, 21:26

Amazon S3 is the central data hub for Netflix's big data ecosystem. We currently have over 1.5 billion objects and 60+ PB of data stored in S3. As we ingest, transform, transport, and visualize data, we find this data naturally weaving in and out of S3. Amazon S3 provides us the flexibility to use an interoperable set of big data processing tools like Spark, Presto, Hive, and Pig. It serves as the hub for transporting data to additional data stores / engines like Teradata, Redshift, and Druid, as well as exporting data to reporting tools like Microstrategy and Tableau. Over time, we have built an ecosystem of services and tools to manage our data on S3. We have a federated metadata catalog service that keeps track of all our data. We have a set of data lifecycle management tools that expire data based on business rules and compliance. We also have a portal that allows users to see the cost and size of their data footprint. In this talk, we’ll dive into these major uses of S3, as well as many smaller cases, where S3 smoothly addresses an important data infrastructure need. We will also provide solutions and methodologies on how you can build your own S3 big data hub.

Свежие видео