New method recovers input text from LLM hidden states

By PulseAugur Editorial · [1 sources] · 2026-07-01 12:18

Researchers have developed a new method to recover input text from the hidden states of decoder-only language models. This approach treats the inversion as a continuous embedding-space optimization, driving a soft proxy towards the leaked target without hard-token projection until the end. The study reveals that while content-bearing tokens are recovered almost perfectly, space-prefixed, high-frequency function words in dense embedding regions are more prone to breaking reconstructions. This continuous formulation allows for observable optimization and detectable failures, showing that last-layer hidden states of GPT-2 are as sensitive as the original text. AI

IMPACT Highlights potential vulnerabilities in LLM privacy and security by demonstrating input text recovery from hidden states.

RANK_REASON Academic paper detailing a new method for recovering input text from language model hidden states. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

GPT-2

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New method recovers input text from LLM hidden states

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Maciej Witold Majewski · 2026-07-01 12:18

Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

This work studies the hidden-state inversion problem: recovering the original input token sequence of a decoder-only language model from its last-layer hidden states. Rather than treating inversion as a one-shot reconstruction, we study it as a continuous embedding-space optimisa…

COVERAGE [1]

Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

RELATED ENTITIES

RELATED TOPICS