SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling
Researchers have developed SOLARIS, a new framework designed to make large recommendation models more efficient for real-time serving. SOLARIS uses a speculative approach to precompute user-item interaction embeddings, generating foundation model representations ahead of time for predicted future requests. This method, deployed within Meta's advertising system, has shown a 0.67% gain in revenue-driving metrics by decoupling expensive inference from the critical serving path. AI
IMPACT Enables real-time serving of complex recommendation models, potentially improving user experience and revenue for large-scale systems.