Back to Basics: Best Practices for Selecting Inference Options to Deploy SageMaker ML Models
1 021
12.6
Amazon Web Services782 тыс
Следующее
Опубликовано 12 сентября 2024, 14:59
Learn how to choose the best Amazon SageMaker inferencing option for deploying your machine learning models based on your requirements like latency, throughput, payload size, and traffic patterns.
In this episode, join Jyoti as she discusses four deployment options:
1️⃣ SageMaker Real-Time Inference: Ideal for low latency, high throughput use cases like fraud detection, ad serving, and personalized recommendations. Supports payload up to 6MB and 60s processing time.
2️⃣ SageMaker Serverless Inference: Best for intermittent or unpredictable traffic with ability to tolerate cold starts. Automatically scales resources. Supports payload up to 4MB and 60s processing time.
3️⃣ SageMaker Asynchronous Inference: Queue requests with large payloads up to 1GB or long processing times up to 15 mins. Cost-effective by scaling endpoints to zero. Great for computer vision and object detection.
4️⃣ SageMaker Batch Transform: For offline processing of large datasets in GBs or longer processing times up to days. Highest throughput option for data pre-processing, churn prediction, predictive maintenance.
Using a real-world fraud detection example, we'll walk through how to set up a SageMaker Real-Time Inference endpoint, make requests, and get predictions in real-time to meet low latency and high throughput needs.
Additional Resources:
docs.aws.amazon.com/sagemaker/...
Check out more resources for architecting in the #AWS cloud:
amzn.to/3qXIsWN
#AWS #AmazonWebServices #CloudComputing #BackToBasics #AmazonSageMaker #SagemakerDeployments #AIML
In this episode, join Jyoti as she discusses four deployment options:
1️⃣ SageMaker Real-Time Inference: Ideal for low latency, high throughput use cases like fraud detection, ad serving, and personalized recommendations. Supports payload up to 6MB and 60s processing time.
2️⃣ SageMaker Serverless Inference: Best for intermittent or unpredictable traffic with ability to tolerate cold starts. Automatically scales resources. Supports payload up to 4MB and 60s processing time.
3️⃣ SageMaker Asynchronous Inference: Queue requests with large payloads up to 1GB or long processing times up to 15 mins. Cost-effective by scaling endpoints to zero. Great for computer vision and object detection.
4️⃣ SageMaker Batch Transform: For offline processing of large datasets in GBs or longer processing times up to days. Highest throughput option for data pre-processing, churn prediction, predictive maintenance.
Using a real-world fraud detection example, we'll walk through how to set up a SageMaker Real-Time Inference endpoint, make requests, and get predictions in real-time to meet low latency and high throughput needs.
Additional Resources:
docs.aws.amazon.com/sagemaker/...
Check out more resources for architecting in the #AWS cloud:
amzn.to/3qXIsWN
#AWS #AmazonWebServices #CloudComputing #BackToBasics #AmazonSageMaker #SagemakerDeployments #AIML
Свежие видео