Next
Published on 14 May 2026, 16:53
Speakers
Gaël Delalleau. Founder and CEO, Kog
Augustin Verneuil, GPU engineer, Kog
Talk Abstract: In this talk, we share our vision for real-time generative AI, and the techniques we developed to achieve the fastest LLM inference on GPU ever, with a generation speed of 2500 tokens/s per request. We first showcase our end-to-end stack optimized for minimal latency on AMD hardware, spanning model re-architecting, a single monokernel implementation, along with topology-aware algorithms. In the second part, we focus on one of the defining challenges of megakernels, intra-GPU grid synchronization barriers and reduce/gather primitives. Using a chiplet-aware approach grounded in deep hardware insight, we are able to decrease the overhead from 1.5µs to 600ns.
Find the resources you need to develop using AMD products: amd.com/en/developer.html
Join the Developer Community: devcommunity.amd.com
Join the Developer Discord server: discord.gg/amd-dev
***
© 2026 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Gaël Delalleau. Founder and CEO, Kog
Augustin Verneuil, GPU engineer, Kog
Talk Abstract: In this talk, we share our vision for real-time generative AI, and the techniques we developed to achieve the fastest LLM inference on GPU ever, with a generation speed of 2500 tokens/s per request. We first showcase our end-to-end stack optimized for minimal latency on AMD hardware, spanning model re-architecting, a single monokernel implementation, along with topology-aware algorithms. In the second part, we focus on one of the defining challenges of megakernels, intra-GPU grid synchronization barriers and reduce/gather primitives. Using a chiplet-aware approach grounded in deep hardware insight, we are able to decrease the overhead from 1.5µs to 600ns.
Find the resources you need to develop using AMD products: amd.com/en/developer.html
Join the Developer Community: devcommunity.amd.com
Join the Developer Discord server: discord.gg/amd-dev
***
© 2026 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Fresh videos























