Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models
Researchers have developed a new statistical framework for estimating population quantities when data is missing, particularly when users with stronger opinions are more likely to respond. This method uses predictions from pretrained models, including large language models (LLMs), as 'weak shadow variables' to tighten identification bounds. The approach effectively shrinks identification intervals by up to 83% in experiments, offering a more robust way to handle non-randomly missing data. AI
IMPACT Provides a more robust statistical method for analyzing datasets with non-randomly missing user feedback, potentially improving platform evaluation and social science research.