Microsoft Research334 тыс
Опубликовано 16 августа 2016, 16:35
Graphics Processing Units (GPUs) are an attractive target for graph algorithms because they support massively parallel execution and possess much higher memory bandwidths than CPUs and FPGAs. However, graph programs are also highly irregular and low-level GPU programming models make writing high-performance implementations very challenging. I will present IrGL, an explicitly parallel intermediate representation for irregular algorithms, and its associated compiler that produces optimized CUDA code for NVIDIA GPUs. Evaluated on eight graph algorithms, the IrGL compiler produces code that outperforms all current state-of-the-art handwritten CUDA implementations. To achieve this required recognizing that three optimizations, which we call throughput optimizations, are fundamental to high-performance GPU implementations of graph algorithms. Crucially, all three optimizations are well within the capabilities of a compiler. Graph program performance strongly depends on the inputs, and this makes picking the right set of optimizations very difficult. I will show how combining queuing network models with our compiler has helped to generate sound performance models of graph programs on GPUs and has led us to conclude that a one-size-fits-all approach to optimization not only does not suffice for graph programs, it can actually harm performance.
Свежие видео
Случайные видео
Blade 10 Max 10300mAh Battery Life VS iPhone 16 Series Battery Life💨😮#tech #shorts #DoogeeBlade10Max