Tutorial: High-Performance Hardware for Machine Learning

7 571

Microsoft Research334 тыс

Следующее

06.06.16 – 28756:40

Invited Talk: Learning with Intelligent Teacher: Similarity Control and Knowledge Transfer

Популярные

356 дней – 24427:12

AI Forum 2023 | AI for Neurodiverse Society

11.11.23 – 1 25853:29

Research intern talk: Real-time single-channel speech separation in noisy & reverberant environments

Опубликовано 6 июня 2016, 23:00

This tutorial will survey the state of the art in high-performance hardware for machine learning with an emphasis on hardware for training and deployment of deep neural networks (DNNs). We establish a baseline by characterizing the performance and efficiency (perf/W) of DNNs implemented on conventional CPUs. GPU implementations of DNNs make substantial improvements over this baseline. GPU implementations perform best with moderate batch sizes. We examine the sensitivity of performance to batch size. Training of DNNs can be accelerated further using both model and data parallelism, at the cost of inter-processor communication. We examine common parallel formulations and the communication traffic they induce. Training and deployment can also be accelerated by using reduced precision for weights and activations. We will examine the tradeoff between accuracy and precision in these networks. We close with a discussion of dedicated hardware for machine learning. We survey recent publications on this topic and make some general observations about the relative importance of arithmetic and memory bandwidth in such dedicated hardware.

research.microsoft.com

Свежие видео