KV Cache as the New AI Memory Abstraction

199
Published on 14 May 2026, 16:53
Speaker: Junchen Jiang, CEO & Co-Founder, Tensormesh; Faculty Lead, LMCache Lab

Talk Abstract: Modern AI agents increasingly operate over long contexts—reading documents, executing multi-step plans, and maintaining evolving state—but today’s inference engines reprocess this growing context from scratch, and this causes low throughput, high cost and latency bottlenecks. The key to tackling these bottlenecks is to elevate KV cache as a first-class memory layer for agentic systems, allowing the LLMs to reuse their past reasoning instead of recomputing it. I present LMCache, an open-source industry-adopted KV caching library that provides agents with fast, persistent, and addressable AI-native memory across steps, tasks, and model invocations, which dramatically reduces prefill time and GPU cost. I will also highlight the roadmap that brings in more research to industry adoption, in particular, KV cache compression and KV cache blending. Together, they make KV cache much smaller and reusable in many more cases. In short, KV cache is the missing substrate for scalable, efficient, and long-context AI agents, and building KV-aware memory infrastructure will be essential to the next generation of AI systems.​

Find the resources you need to develop using AMD products: amd.com/en/developer.html

Join the Developer Community: devcommunity.amd.com

Join the Developer Discord server: discord.gg/amd-dev

***

© 2026 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
autotechmusickids