English(EN) Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

新框架验证大语言模型可作为A/B测试的代理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-17 04:00

已开发出一个新的统计框架，用于解决使用大语言模型（LLM）替代人类参与者进行A/B测试的问题。该框架借鉴了代理终点理论，以评估LLM结果何时能准确恢复在人类群体中测量的处理效应。它引入了识别平均处理效应的条件，并提供了用于证伪过往实验代理性的诊断方法，同时强调人类实验对于新干预措施仍然至关重要。 AI

影响提供了一个统计框架，用于验证LLM结果在A/B测试中作为代理的有效性，有可能提高实验效率，同时强调了持续进行人类验证的必要性。

排序理由学术论文，提出了一种基于大语言模型的A/B测试新统计方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Joel Persson, M{\aa}rten Schultzberg, Sebastian Ankargren · 2026-06-17 04:00

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

arXiv:2606.17165v1 Announce Type: cross Abstract: Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect esti…

报道来源 [1]

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

相关话题