Speed Up Inference with Mixed Precision | AI Model Optimization with Intel® Neural Compressor
8 726
171.1
Intel Software258 тыс
Опубликовано 26 июля 2023, 14:00
Learn the most simple model optimization technique to speed up AI inference. Mixed precision, often used to speed up training, can also be used to speed up inference without having to worry about sacrificing accuracy.
Mixed precision is a popular technique for speeding up training of large AI models. It can also be a simple way to reduce model size and inference latency. This approach mixes lower-precision floating point formats such as FP16 and Bfloat16, together with the original 32-bit floating point parameters. Choosing how to mix formats requires assessing the accuracy effects, knowing what is supported by a given device, and what layers are used.
Intel® Neural Compressor automatically mixes in lower-precision formats supported by the hardware and the model’s layers. This video shows how to get started, whether you’re using PyTorch*, TensorFlow*, or ONNX* Runtime. It also shows how to automatically assess the accuracy effects of lower precisions.
Intel® Neural Compressor: bit.ly/3Nl6pVj
Intel® Neural Compressor GitHub: bit.ly/3NlBgkH
About Intel Software:
Intel® Developer Zone is committed to empowering and assisting software developers in creating applications for Intel hardware and software products. The Intel Software YouTube channel is an excellent resource for those seeking to enhance their knowledge. Our channel provides the latest news, helpful tips, and engaging product demos from Intel and our numerous industry partners. Our videos cover various topics; you can explore them further by following the links.
Connect with Intel Software:
INTEL SOFTWARE WEBSITE: intel.ly/2KeP1hD
INTEL SOFTWARE on FACEBOOK: bit.ly/2z8MPFF
INTEL SOFTWARE on TWITTER: bit.ly/2zahGSn
INTEL SOFTWARE GITHUB: bit.ly/2zaih6z
INTEL DEVELOPER ZONE LINKEDIN: bit.ly/2z979qs
INTEL DEVELOPER ZONE INSTAGRAM: bit.ly/2z9Xsby
INTEL GAME DEV TWITCH: bit.ly/2BkNshu
#intelsoftware #ai
Speed Up Inference with Mixed Precision | AI Model Optimization with Intel® Neural Compressor
Mixed precision is a popular technique for speeding up training of large AI models. It can also be a simple way to reduce model size and inference latency. This approach mixes lower-precision floating point formats such as FP16 and Bfloat16, together with the original 32-bit floating point parameters. Choosing how to mix formats requires assessing the accuracy effects, knowing what is supported by a given device, and what layers are used.
Intel® Neural Compressor automatically mixes in lower-precision formats supported by the hardware and the model’s layers. This video shows how to get started, whether you’re using PyTorch*, TensorFlow*, or ONNX* Runtime. It also shows how to automatically assess the accuracy effects of lower precisions.
Intel® Neural Compressor: bit.ly/3Nl6pVj
Intel® Neural Compressor GitHub: bit.ly/3NlBgkH
About Intel Software:
Intel® Developer Zone is committed to empowering and assisting software developers in creating applications for Intel hardware and software products. The Intel Software YouTube channel is an excellent resource for those seeking to enhance their knowledge. Our channel provides the latest news, helpful tips, and engaging product demos from Intel and our numerous industry partners. Our videos cover various topics; you can explore them further by following the links.
Connect with Intel Software:
INTEL SOFTWARE WEBSITE: intel.ly/2KeP1hD
INTEL SOFTWARE on FACEBOOK: bit.ly/2z8MPFF
INTEL SOFTWARE on TWITTER: bit.ly/2zahGSn
INTEL SOFTWARE GITHUB: bit.ly/2zaih6z
INTEL DEVELOPER ZONE LINKEDIN: bit.ly/2z979qs
INTEL DEVELOPER ZONE INSTAGRAM: bit.ly/2z9Xsby
INTEL GAME DEV TWITCH: bit.ly/2BkNshu
#intelsoftware #ai
Speed Up Inference with Mixed Precision | AI Model Optimization with Intel® Neural Compressor
Свежие видео