NVIDIA2.13 млн
Предыдущее
Опубликовано 12 февраля 2026, 1:49
As AI moves into the era of real-time reasoning, performance alone is no longer enough. The true measure of AI at scale is efficient tokenomics—how much it costs to generate each token of intelligence.
Reasoning models like mixture-of-experts (MoE) architectures generate massive volumes of tokens, improving answer quality but placing simultaneous pressure on compute, memory, networking, storage, and software. In this new paradigm, the hidden costs of communication and routing matter just as much as raw FLOPS.
This video explores why extreme co-design—engineering the entire stack as one unified system—is the key to lowering cost per token and maximizing AI ROI.
You’ll learn:
- Why cost per token is becoming the defining metric for reasoning AI
- The networking and communication challenges behind MoE inference
- How rack-scale systems like GB200 NVL72 deliver breakthrough token efficiency
- How Blackwell and Ruben integrate silicon, interconnect, networking, and software to power AI at scale
Featuring insights from NVIDIA, Signal65 Microsoft Azure, and CoreWeave, this discussion makes one thing clear: End-to-end system design is the most powerful lever for delivering efficient tokenomics and scaling reasoning AI.
Reasoning models like mixture-of-experts (MoE) architectures generate massive volumes of tokens, improving answer quality but placing simultaneous pressure on compute, memory, networking, storage, and software. In this new paradigm, the hidden costs of communication and routing matter just as much as raw FLOPS.
This video explores why extreme co-design—engineering the entire stack as one unified system—is the key to lowering cost per token and maximizing AI ROI.
You’ll learn:
- Why cost per token is becoming the defining metric for reasoning AI
- The networking and communication challenges behind MoE inference
- How rack-scale systems like GB200 NVL72 deliver breakthrough token efficiency
- How Blackwell and Ruben integrate silicon, interconnect, networking, and software to power AI at scale
Featuring insights from NVIDIA, Signal65 Microsoft Azure, and CoreWeave, this discussion makes one thing clear: End-to-end system design is the most powerful lever for delivering efficient tokenomics and scaling reasoning AI.
Свежие видео
Случайные видео























