Researchers have developed Tempus, a new framework designed to optimize General Matrix Multiplication (GEMM) for edge AI deployments on AMD Versal SoCs. Unlike existing spatial scaling methods that fail on resource-constrained devices, Tempus uses a fixed compute block and temporal scaling through iterative execution and data tiling. This approach achieves significant performance gains, delivering 607 GOPS at 10.677W while demonstrating superior resource and power frugality compared to prior state-of-the-art methods. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Enables more efficient LLM inference on resource-constrained edge devices by optimizing core matrix multiplication operations.
RANK_REASON Academic paper detailing a new framework for optimizing AI inference on edge hardware.