English(EN) Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

新基准‘Prosa’评估巴西葡萄牙语聊天中的LLM

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-05 04:00

研究人员推出了Prosa，这是一个新的基准，旨在通过巴西葡萄牙语的真实用户对话来评估大型语言模型（LLM）。该基准使用基于评分标准的评分系统，并结合多裁判过滤来减轻整体LLM作为裁判评估中常见的偏见。Prosa包含1000个WildChat对话，旨在通过增加模型之间的分数差距来提高LLM评估的区分能力。 AI

影响为巴西葡萄牙语中的LLM引入了新的评估基准，有可能改进模型评估和比较。

排序理由该集群包含一篇介绍LLM评估新颖基准的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Roseval Malaquias Junior, Giovana Kerche Bon\'as, Thales Sales Almeida, Hugo Abonizio, Thiago Laitz, Ramon Pires, Marcos Piau, Celio Larcher, Rodrigo Nogueira · 2026-05-05 04:00

Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

arXiv:2605.01630v1 Announce Type: new Abstract: Rankings produced by holistic LLM-as-a-judge scoring are sensitive to the bias of the chosen judge model. We show that switching to binary rubric scoring with multi-judge filtering removes this sensitivity: decomposing the judgement…
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-05 19:10

How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andre

How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andrej Karpathy's technical deep dive. # AI # LLM https:// ynarwal.github.io/how-llms-wor k/

链接 ynarwal.github.io/how-llms-work ynarwal.github.io/how-llms-wor

报道来源 [2]

Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andre

相关实体

相关话题