PulseAugur
EN
LIVE 11:34:08

LLM privacy research tackles Japanese data, multi-modal risks, and DP adaptation

Researchers are exploring privacy risks associated with large language models (LLMs) and their adaptations. One study focuses on detecting sensitive personal information in Japanese pre-training corpora, developing a classifier for special care-required personal information (SCPI) under Japan's APPI. Another paper investigates privacy vulnerabilities in multi-modal LLMs, highlighting how they can leak sensitive data from images and memory, and introduces a dataset for evaluation. A third study benchmarks the effectiveness of differential privacy (DP) in adapting LLMs, finding that data distribution shifts significantly impact privacy risks, with parameter-efficient fine-tuning methods like LoRA offering better protection for out-of-distribution data. AI

IMPACT These studies highlight critical privacy challenges in LLMs, informing developers on data handling, multi-modal risks, and effective privacy protection techniques during model adaptation.

RANK_REASON The cluster consists of multiple academic papers discussing LLM privacy research.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 7 sources. How we write summaries →

COVERAGE [7]

  1. arXiv cs.CL TIER_1 English(EN) · Rei Minamoto, Yusuke Oda, Daisuke Kawahara ·

    Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

    arXiv:2606.12114v1 Announce Type: new Abstract: Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and preven…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

    Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in co…

  3. arXiv cs.CL TIER_1 English(EN) · Daisuke Kawahara ·

    Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

    Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in co…

  4. arXiv cs.AI TIER_1 English(EN) · Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei ·

    Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

    arXiv:2606.09125v1 Announce Type: cross Abstract: Privacy risks in text-only Large Language Models (LLMs) are well studied, particularly their tendency to memorize and leak sensitive information. However, Multi-modal Large Language Models (MLLMs), which process both text and imag…

  5. arXiv cs.LG TIER_1 English(EN) · Bart{\l}omiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic ·

    Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

    arXiv:2606.09401v1 Announce Type: new Abstract: Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining…

  6. arXiv cs.LG TIER_1 English(EN) · Adam Dziedzic ·

    Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

    Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adap…

  7. Hugging Face Daily Papers TIER_1 English(EN) ·

    Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

    Privacy risks in text-only Large Language Models (LLMs) are well studied, particularly their tendency to memorize and leak sensitive information. However, Multi-modal Large Language Models (MLLMs), which process both text and images, introduce unique privacy challenges that remai…