This week's Fully Connected podcast episode dives into the practicalities of AI inference, focusing on how to utilize trained models. Key discussions include Amazon's new machine learning chip designed for inference and NVIDIA's decision to open-source TensorRT for GPU-optimized inference. The conversation also touches on performing inference at the edge and within web browsers, highlighting projects like ONNX JS and the Snapdragon Neural Processing Engine SDK. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Discussion of new hardware and software tools for AI inference, including open-sourcing of a key library.