🤖 Amazon SageMaker AI Async Inference now supports inline request payloads Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inferenc
Amazon SageMaker AI Async Inference has introduced support for inline request payloads, allowing users to send inference data directly within the InvokeEndpointAsync API request body. This update eliminates the previous requirement of uploading small payloads to Amazon S3, simplifying client-side code and reducing latency by removing a network round-trip. The new feature is particularly beneficial for workloads with smaller input sizes (up to 128,000 bytes) that require longer processing times than real-time inference. AI
IMPACT Simplifies ML inference workflows by reducing latency and operational overhead for specific use cases.