On-device PII substitution pipeline uses locale-prompting to fix regurgitation

By PulseAugur Editorial · [2 sources] · 2026-05-13 13:47

Researchers have developed an on-device pipeline for substituting Personally Identifiable Information (PII) with consistent, type-preserving fake values, aiming to maintain downstream text utility. The system uses a small language model (SLM) for generating surrogates, but initially encountered issues with demonstration regurgitation. A novel locale-conditioned rotating few-shot prompting technique was introduced to fix this, enabling successful PII substitution across multiple locales. However, the study found that while SLM surrogates produce more natural text, they lead to less varied training data, negatively impacting downstream Named Entity Recognition (NER) performance compared to simpler methods. AI

IMPACT This research offers a method to improve on-device PII handling while preserving text utility, though it highlights trade-offs impacting downstream NER tasks.

RANK_REASON The cluster describes a research paper detailing a novel method for PII substitution using small language models and a specific prompting technique.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

On-device PII substitution pipeline uses locale-prompting to fix regurgitation

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Deepak Kumar · 2026-05-13 13:47

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

Personally Identifiable Information (PII) redaction usually replaces detected entities with placeholder tokens such as [PERSON], destroying the downstream utility of the redacted text for retrieval and Named Entity Recognition (NER) training. We propose a fully on-device pipeline…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-13 13:47

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

Personally Identifiable Information (PII) redaction usually replaces detected entities with placeholder tokens such as [PERSON], destroying the downstream utility of the redacted text for retrieval and Named Entity Recognition (NER) training. We propose a fully on-device pipeline…

COVERAGE [2]

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

RELATED ENTITIES

RELATED TOPICS