A new research paper investigates whether large language models exhibit skepticism towards entertainment news, finding that some frontier models are more prone to misclassifying legitimate entertainment articles as fake compared to hard news. Specifically, DeepSeek-V3.2 and GPT-5.2 showed significant genre asymmetries in false positives, while Claude Opus 4.6 and Gemini 3 Flash did not. The study suggests that LLMs may not only assess truth claims but also differentially recognize the legitimacy of journalistic genres, advocating for genre-stratified analysis in evaluations. AI
影响 Highlights potential biases in LLM news credibility assessment, suggesting a need for genre-specific evaluation methods.
排序理由 Academic paper analyzing LLM behavior on news credibility assessment. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →