AWS and Cerebras are teaming up to build the fastest possible AI inference | Amazon Web Services

246
Опубликовано 13 марта 2026, 21:25
AWS and Cerebras announced a collaboration to set a new standard for AI inference speed and performance in the Cloud, which will be available through Amazon Bedrock.

The solution combines AWS Trainium3-powered servers, Cerebras CS-3 systems, and Elastic Fabric Adapter (EFA) networking. The Trainium3 + CS-3 solution enables “inference disaggregation,” a technique which separates AI inference into two stages: prompt processing, or “prefill,” and output generation, or “decode.” These two stages have profoundly different computational characteristics. Prefill is natively parallel, computationally intensive, and requires moderate memory bandwidth. Decode, on the other hand, is inherently serial, computationally light, and memory bandwidth intensive. Decode typically represents the majority of inference time in these scenarios because each output token must be generated sequentially. Together, we're leveraging the fastest system for each stage of inference. Trainium3 handles compute-intensive prefill, and Cerebras's wafer-scale CS-3 handles memory-intensive decode. Each stage runs on the hardware it excels at. The result is the fastest inference in Amazon Bedrock. Later this year, AWS will also offer leading open-source LLMs and Amazon Nova using Cerebras hardware.

Learn more: go.aws/47uR5z6

Subscribe to AWS: go.aws/subscribe

Create a free AWS account: go.aws/signup
Try AWS for free: go.aws/free
Connect with an expert: go.aws/contact
Explore more: go.aws/more

Next steps:
Explore on AWS in Analyst Research: go.aws/reports
Discover, deploy, and manage software that runs on AWS: go.aws/marketplace
Join the AWS Partner Network: go.aws/partners
Learn more on how Amazon builds and operates software: go.aws/library

Do you have technical AWS questions?
Ask the community of experts on AWS re:Post: go.aws/3lPaoPb

Why AWS?
Amazon Web Services is the world’s most comprehensive and broadly adopted cloud, enabling customers to build anything they can imagine. We offer the greatest choice of innovative cloud capabilities and expertise, on the most extensive global infrastructure with industry-leading security, reliability, and performance.

#AWS #Cerebras #ArtificialIntelligence #MachineLearning #CloudComputing #AmazonBedrock #AIInference #Innovation #TechNews #Trainium #CostOptimization #AmazonWebServices #CloudComputing
автотехномузыкадетское