RT @perplexity_ai: Today we announce that hybrid agentic inference is coming to Perplexity Computer. Computer can perform tasks between a local
A new technique called AirLLM enables the execution of 70 billion parameter large language models on a 4GB GPU by employing layer-wise inference. This method loads and computes model layers sequentially rather than loading the entire model at once. Additionally, Perplexity AI is rolling out hybrid agentic inference for its Perplexity Computer, allowing tasks to be distributed between local and cloud resources. AI
IMPACT Enables running large models on consumer hardware and improves AI agent efficiency.