Researchers have developed a new method to recover input text from the hidden states of decoder-only language models. This approach treats the inversion as a continuous embedding-space optimization, driving a soft proxy towards the leaked target without hard-token projection until the end. The study reveals that while content-bearing tokens are recovered almost perfectly, space-prefixed, high-frequency function words in dense embedding regions are more prone to breaking reconstructions. This continuous formulation allows for observable optimization and detectable failures, showing that last-layer hidden states of GPT-2 are as sensitive as the original text. AI
IMPACT Highlights potential vulnerabilities in LLM privacy and security by demonstrating input text recovery from hidden states.
RANK_REASON Academic paper detailing a new method for recovering input text from language model hidden states. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →