Compiler Optimizations for Graph Algorithms on GPUs

1 292

16.6

Microsoft Research335 тыс

Следующее

16.08.16 – 1 30034:51

Data Science Summer School 2016 - Airbrb: Predicting Loyalty

Популярные

202 дня – 8821:02:50

Making Sentence Embeddings Robust to User-Generated Content

02.11.23 – 3101:24

Generative AI & Plural Governance: Mitigating Challenges & Surfacing Opportunities | Madeleine Daepp

Опубликовано 16 августа 2016, 16:35

Graphics Processing Units (GPUs) are an attractive target for graph algorithms because they support massively parallel execution and possess much higher memory bandwidths than CPUs and FPGAs. However, graph programs are also highly irregular and low-level GPU programming models make writing high-performance implementations very challenging. I will present IrGL, an explicitly parallel intermediate representation for irregular algorithms, and its associated compiler that produces optimized CUDA code for NVIDIA GPUs. Evaluated on eight graph algorithms, the IrGL compiler produces code that outperforms all current state-of-the-art handwritten CUDA implementations. To achieve this required recognizing that three optimizations, which we call throughput optimizations, are fundamental to high-performance GPU implementations of graph algorithms. Crucially, all three optimizations are well within the capabilities of a compiler. Graph program performance strongly depends on the inputs, and this makes picking the right set of optimizations very difficult. I will show how combining queuing network models with our compiler has helped to generate sound performance models of graph programs on GPUs and has led us to conclude that a one-size-fits-all approach to optimization not only does not suffice for graph programs, it can actually harm performance.

Свежие видео