PulseAugur
实时 10:52:00
English(EN) Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

新基准‘Prosa’评估巴西葡萄牙语聊天中的LLM

研究人员推出了Prosa,这是一个新的基准,旨在通过巴西葡萄牙语的真实用户对话来评估大型语言模型(LLM)。该基准使用基于评分标准的评分系统,并结合多裁判过滤来减轻整体LLM作为裁判评估中常见的偏见。Prosa包含1000个WildChat对话,旨在通过增加模型之间的分数差距来提高LLM评估的区分能力。 AI

影响 为巴西葡萄牙语中的LLM引入了新的评估基准,有可能改进模型评估和比较。

排序理由 该集群包含一篇介绍LLM评估新颖基准的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新基准‘Prosa’评估巴西葡萄牙语聊天中的LLM

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Roseval Malaquias Junior, Giovana Kerche Bon\'as, Thales Sales Almeida, Hugo Abonizio, Thiago Laitz, Ramon Pires, Marcos Piau, Celio Larcher, Rodrigo Nogueira ·

    Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

    arXiv:2605.01630v1 Announce Type: new Abstract: Rankings produced by holistic LLM-as-a-judge scoring are sensitive to the bias of the chosen judge model. We show that switching to binary rubric scoring with multi-judge filtering removes this sensitivity: decomposing the judgement…

  2. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andre

    How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andrej Karpathy's technical deep dive. # AI # LLM https:// ynarwal.github.io/how-llms-wor k/