High-Throughput Data-Intensive Computing: Shared-Scan Scheduling in Scientific Databases & the Cloud

206

Microsoft Research330 тыс

Следующее

17.08.16 – 9247:11

CFA2: Pushdown Flow Analysis for Higher-Order Languages

Популярные

303 дня – 1 69932:52

AI Forum 2023 | The Emergence of General AI for Medicine

304 дня – 22654:00

AI Forum 2023 | Panel Discussion “AI Synergy: Science and Society”

Опубликовано 17 августа 2016, 2:22

Data-intensive computing consists of batch-processing workloads that scan massive data sets in parallel. The focus on data access, data movement, data ingest, and data production means that these workloads overwhelm the network and I/O capabilities of data centers and supercomputers. Major improvements in throughput are available by co-scheduling tasks that access the same data so that multiple tasks complete processing based on accessing and transferring the data a single time. Multiple tasks share I/O, network data transfer, cache space, and even computing with SIMD or vector processing. This talk will review the evolution of co-scheduling in data-intensive computing systems, including shared-scan scheduling for map/reduce workloads (Agrawal et al., VLDB 2008), data-driven batch processing for scientific databases (LifeRaft and JAWS), shared streaming-I/O for spatial workloads, and shared join processing for Pig programs and Nova workflows.

Свежие видео