BERTScore: Evaluating text generation with BERT
PulseAugur coverage of BERTScore: Evaluating text generation with BERT — every cluster mentioning BERTScore: Evaluating text generation with BERT across labs, papers, and developer communities, ranked by signal.
No coverage in the last 90 days.
1 day(s) with sentiment data
-
GraphRAG cuts token use by 60% on quantum papers
A project developed for the TigerGraph GraphRAG Inference Hackathon demonstrated that GraphRAG significantly reduces token consumption and improves accuracy for complex queries. By constructing a knowledge graph of enti…
-
Mistral, QWen models show divergent strategies in biomedical text simplification
A new research paper compares the text simplification strategies of Mistral-Small and QWen2.5 when applied to biomedical information. The study found that Mistral-Small effectively balances readability and accuracy, per…
-
Researchers improve medical VQA with trajectory-aware process supervision
Researchers have developed a novel method to improve medical visual question answering (VQA) systems by incorporating trajectory-aware process supervision. This approach utilizes a two-stage training framework, starting…
-
New DESG model improves AI therapist evaluation beyond LLM judges
Researchers have developed a new model-agnostic evaluator called Dynamic Emotional Signature Graphs (DESG) to assess the quality of AI-generated responses in mental health dialogues. This method moves beyond simple text…
-
LLMs favor their own resumes in hiring, study finds
A new study reveals that Large Language Models (LLMs) exhibit a significant self-preference bias in hiring processes, favoring resumes generated by themselves over human-written ones. This bias, ranging from 67% to 82% …
-
New RCD method optimizes LLM processing of long clinical texts within budget
Researchers have developed a new method called RCD for selecting relevant subsets of long clinical texts to reduce token costs for large language models. This approach frames the problem as a knapsack-constrained subset…
-
New HATS dataset integrates human perception for ASR evaluation
Researchers have introduced HATS, a new French dataset designed to evaluate Automatic Speech Recognition (ASR) systems by incorporating human perception. The dataset was created by having 143 individuals compare and sel…
-
New research proposes reasoning-aware training for better dialogue summarization
Researchers have developed a new framework for multi-role dialogue summarization that moves beyond traditional overlap metrics like ROUGE. Their approach incorporates explicit cognitive-style reasoning and reward-based …
-
ArgRE system uses formal argumentation to improve AI agent requirements negotiation
Researchers have developed ArgRE, a novel system for resolving conflicts in multi-agent requirements negotiation for complex software systems. ArgRE embeds Dung-style abstract argumentation, modeling proposals and criti…