Generation of dense linear algebra software for shared memory and multicore architectures

147
Опубликовано 6 сентября 2016, 17:01
When writing scientific computing software, programmers often need to identify which algorithm would perform best in a given situation. In dense linear algebra, the answer depends on a large number of factors, ranging from processor type and architectural features to matrix size and performance signature of the used BLAS. In this talk I will show that when targeting shared memory and multicore processors, one must take into account not only different algorithms, but also different types of parallelism. I will illustrate two approaches. One uses blocking and careful scheduling to attain high performance while the other leverages multithreaded BLAS. In addition, I will discuss how the generation of algorithms and code can be automated in both scenarios.
автотехномузыкадетское