New method uses LLMs to bound missing data in statistics

By PulseAugur Editorial · [1 sources] · 2026-06-09 04:00

Researchers have developed a new statistical framework for estimating population quantities when data is missing, particularly when users with stronger opinions are more likely to respond. This method uses predictions from pretrained models, including large language models (LLMs), as 'weak shadow variables' to tighten identification bounds. The approach effectively shrinks identification intervals by up to 83% in experiments, offering a more robust way to handle non-randomly missing data. AI

IMPACT Provides a more robust statistical method for analyzing datasets with non-randomly missing user feedback, potentially improving platform evaluation and social science research.

RANK_REASON The cluster contains a new academic paper detailing a novel statistical method. [lever_c_demoted from research: ic=1 ai=0.7]

Read on arXiv stat.ML →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv stat.ML TIER_1 English(EN) · Hongyu Chen, David Simchi-Levi, Ruoxuan Xiong · 2026-06-09 04:00

Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

arXiv:2602.16061v2 Announce Type: replace Abstract: Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely…

COVERAGE [1]

Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

RELATED ENTITIES

RELATED TOPICS