SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models
Researchers have developed three distinct methods to enhance privacy in large language models (LLMs). SharedRequest offers a model-agnostic framework that mixes prompts with noisy variants to obscure sensitive information at the batch level, improving utility and reducing inference costs. Echelon provides a boundary-first training architecture that enforces device-level model-state non-export, enabling auditable, aggregate-only adaptation across privacy boundaries. Privacy-Aware Decoding (PAD) is a lightweight, inference-time defense that injects calibrated noise into token logits during generation, specifically for Retrieval-Augmented Generation (RAG) systems, to mitigate private information leakage while preserving response utility. AI
IMPACT These advancements offer improved privacy guarantees for LLM users and developers, potentially enabling wider adoption in sensitive domains.