English(EN) LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

LLMs在庇护决定可信度评估方面展现潜力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-13 12:07

研究人员探索了使用大型语言模型（LLMs）对丹麦庇护决定中的可信度评估进行标注，这是一项新颖的法律NLP任务。他们引入了RAB-Cred数据集，其中包含专家标注和元数据，用于在零样本和少样本设置中评估21个开放权重模型和各种提示组合。研究发现，虽然LLMs在成本效益标注方面显示出潜力，但它们的标注并不完美且不一致，因此需要仔细考虑，不能仅依赖单一模型预测。 AI

影响展示了LLM在专业法律领域的效用，但强调了仔细验证其输出的必要性。

排序理由学术论文，详细介绍了用于特定NLP任务的新型数据集和LLM评估。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Thomas B. Moeslund · 2026-05-13 12:07

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

Off-the-shelf large language models (LLMs) are increasingly used to automate text annotation, yet their effectiveness remains underexplored for underrepresented languages and specialized domains where the class definition requires subtle expert understanding. We investigate LLM-b…

报道来源 [1]

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

相关实体

相关话题