Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 6d · [2 sources]

E2LLM: Towards Efficient LLM Serving in Heterogeneous Edge/Fog Environments

Researchers have developed E2LLM, a new framework for deploying large language models (LLMs) efficiently in resource-constrained edge and fog environments. Unlike traditional methods that assume single-device hosting, E2LLM replicates models across device groups and uses model parallelism. It assigns specialized roles (PREFILL or DECODER) to replicas based on their efficiency with input/output tokens, leveraging differences between these inference phases. The framework employs a Genetic Algorithm for clustering devices and Dynamic Programming for optimal partitioning, significantly reducing waiting times by over 50% under high demand compared to the Splitwise baseline. AI

IMPACT Optimizes LLM deployment in constrained environments, potentially enabling wider use of AI on edge devices.

LLM
Edge
Splitwise
E2LLM
Large Language Models