New benchmark 'Prosa' evaluates LLMs on Brazilian Portuguese chats

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have introduced Prosa, a new benchmark designed to evaluate Large Language Models (LLMs) using real user conversations in Brazilian Portuguese. This benchmark utilizes a rubric-based scoring system with multi-judge filtering to mitigate bias often found in holistic LLM-as-a-judge evaluations. Prosa includes 1,000 WildChat conversations and aims to improve the discriminative power of LLM evaluations by increasing score gaps between models. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Introduces a new evaluation benchmark for LLMs in Brazilian Portuguese, potentially improving model assessment and comparison.

RANK_REASON The cluster contains a new academic paper introducing a novel benchmark for LLM evaluation.

Read on arXiv cs.CL →

paper
other

COVERAGE [2]

arXiv cs.CL TIER_1 · Roseval Malaquias Junior, Giovana Kerche Bon\'as, Thales Sales Almeida, Hugo Abonizio, Thiago Laitz, Ramon Pires, Marcos Piau, Celio Larcher, Rodrigo Nogueira · 2026-05-05 04:00

Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

arXiv:2605.01630v1 Announce Type: new Abstract: Rankings produced by holistic LLM-as-a-judge scoring are sensitive to the bias of the chosen judge model. We show that switching to binary rubric scoring with multi-judge filtering removes this sensitivity: decomposing the judgement…
Mastodon — mastodon.social TIER_1 · [email protected] · 2026-05-05 19:10

How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andre

How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andrej Karpathy's technical deep dive. # AI # LLM https:// ynarwal.github.io/how-llms-wor k/

LINKS ynarwal.github.io/how-llms-work ynarwal.github.io/how-llms-wor

COVERAGE [2]

Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

How LLMs Work - A complete Walkthrough of how Large Language Models like ChatGPT are built: from raw Internet Text to a conversational Assistant. Based on Andre

RELATED ENTITIES

RELATED TOPICS