Microsoft Research355 тыс
Предыдущее
Опубликовано 6 марта 2026, 15:37
Artificial Intelligence (AI) is driving a new industrial revolution, transforming human workflows increasingly into digital tokens, i.e., tokenizing the entire world. However, this transformation exposes sensitive data at an unprecedented scale, leading to heavy privacy breaches that stalled AI's adoption. Homomorphic Encryption (HE) provides strong data privacy for cloud services but at the cost of prohibitive computational overhead. While GPUs have emerged as a practical platform for accelerating, HE, there remains an order-of-magnitude energy-efficiency gap compared to specialized (but expensive) HE ASICs. This talk explores an alternate direction: leveraging existing AI accelerators, like Google's TPUs, to accelerate homomorphic encryption and broadly cryptography primitives. The key focus is the advanced compilation techniques that could transform "any application with static scheduling of modular arithmetic” into kernels natively supported by AI ASICs such TPU without any hardware modification for acceleration. Our evaluation shows that CROSS achieving the SoTA throughput in NTT and HE operators, SoTA energy efficiency among commodity devices including CPUs, GPUs and FPGAs.
- Paper: arxiv.org/pdf/2501.07047v3
- Code: github.com/EfficientPPML/CROSS
- Tutorial: github.com/EfficientPPML/CROSS...
TL;DR: CROSS is the first project to demonstrate that Homomorphic Encryption operators with static scheduling of modular arithmetic could be transformed into kernels suitable for TPU, inheriting the SoTA energy efficiency and throughput of modern AI ASICs without any hardware modification. This paves the road for accelerating broad cryptography primitives on AI ASICs like Google’s TPU, sparking a new direction of hardware-friendly protocol design.
Speaker Bio:
Jianming Tong (jianmingtong.github.io) is a 5th-year PhD candidate at Georgia Tech, advised by Tushar Krishna (GT), he is a computer architect focusing on focusing on system for AI and Cryptography, i.e., enabling today’s AI systems to work in a privacy-preserving manner without sacrificing performance.
A few representative highlights:
- CROSS Compiler (HPCA’26 with Google, MLSys’24): Converts Homomorphic Encryption workloads into AI workloads to be executed efficiently on TPUs, enabling immediate, scalable, and low-cost privacy-preserving AI on existing AI accelerators without hardware modifications.
- Reconfigurable Accelerator (ISCA'24): Proposes next-gen computer architecture with the capability of dataflow-layout co-switching to sustain high compute utilization for workloads with irregular shapes.
- His works are deployed in NVIDIA (NV Labs) and Google (Jaxite), and recognized by 2nd place in university demo @ DAC, Qualcomm Innovation Fellowship, Machine Learning and System Rising Star, and GT NEXT Award.
- Paper: arxiv.org/pdf/2501.07047v3
- Code: github.com/EfficientPPML/CROSS
- Tutorial: github.com/EfficientPPML/CROSS...
TL;DR: CROSS is the first project to demonstrate that Homomorphic Encryption operators with static scheduling of modular arithmetic could be transformed into kernels suitable for TPU, inheriting the SoTA energy efficiency and throughput of modern AI ASICs without any hardware modification. This paves the road for accelerating broad cryptography primitives on AI ASICs like Google’s TPU, sparking a new direction of hardware-friendly protocol design.
Speaker Bio:
Jianming Tong (jianmingtong.github.io) is a 5th-year PhD candidate at Georgia Tech, advised by Tushar Krishna (GT), he is a computer architect focusing on focusing on system for AI and Cryptography, i.e., enabling today’s AI systems to work in a privacy-preserving manner without sacrificing performance.
A few representative highlights:
- CROSS Compiler (HPCA’26 with Google, MLSys’24): Converts Homomorphic Encryption workloads into AI workloads to be executed efficiently on TPUs, enabling immediate, scalable, and low-cost privacy-preserving AI on existing AI accelerators without hardware modifications.
- Reconfigurable Accelerator (ISCA'24): Proposes next-gen computer architecture with the capability of dataflow-layout co-switching to sustain high compute utilization for workloads with irregular shapes.
- His works are deployed in NVIDIA (NV Labs) and Google (Jaxite), and recognized by 2nd place in university demo @ DAC, Qualcomm Innovation Fellowship, Machine Learning and System Rising Star, and GT NEXT Award.
Свежие видео
Случайные видео
Wednesday Build Hour is your weekly space to sharpen your technical skills, stay close to what’s nex























