ENTITY BERTScore: Evaluating text generation with BERT

BERTScore: Evaluating text generation with BERT

PulseAugur coverage of BERTScore: Evaluating text generation with BERT — every cluster mentioning BERTScore: Evaluating text generation with BERT across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

19 over 90d

Releases · 30d

0 over 90d

Papers · 30d

18 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 1/1 · 19 TOTAL

RESEARCH · CL_109576 · Jun 24 · 03:54

New AI models tackle low-resource Tangkhul-English translation

Researchers have developed two neural machine translation systems for the low-resource Tangkhul-English language pair. The primary system, utilizing ByT5-large fine-tuned on over 38,000 parallel sentences, achieved a BL…
RESEARCH · CL_107685 · Jun 22 · 20:25

LLM attribution metrics lack transferability across datasets, study finds

A new research paper investigates the reliability of automatic metrics used to evaluate attribution in retrieval-augmented generation (RAG) systems. The study found that common attribution metrics, including lexical, em…
TOOL · CL_104724 · Jun 20 · 23:23

LLMs struggle with Hausa and Fongbe translation, metrics unreliable

A new study evaluated the machine translation capabilities of four large language models (LLMs) for Hausa and Fongbe, two West African languages. The research found that while Hausa achieved acceptable translation quali…
RESEARCH · CL_98102 · Jun 17 · 15:55

New RECOM dataset reveals metric tradeoff in LLM evaluation

Researchers have introduced RECOM, a new evaluation dataset designed to assess automatic metrics for open-ended question answering, particularly for LLM-generated text. The dataset, comprising 15,000 r/AskReddit questio…
TOOL · CL_93412 · Jun 16 · 04:00

Researchers caution on synthetic data quality after fine-tuning Mistral 7B

Researchers have developed a method to fine-tune a 7B language model on free-tier GPUs by using an adapter-handoff technique. This approach allows for multi-epoch fine-tuning by checkpointing only the small LoRA adapter…
TOOL · CL_84920 · Jun 11 · 04:00

New geometric framework measures semantic information in text

Researchers have developed a new geometric framework to measure the semantic information contained within a text. This framework, detailed in a recent paper, offers a three-coordinate semantic profile that captures nove…
TOOL · CL_74383 · Jun 6 · 04:00

AI uses curriculum learning and multiple models for better medical text generation

Researchers have developed a new framework for medical text generation that uses a severity-aware curriculum learning approach with multiple large language models. This method trains models sequentially on cases of incr…
TOOL · CL_72640 · Jun 5 · 04:00

New framework uses multiple models for better text summarization

Researchers have developed a Multi-Model Adaptive Summarization Framework (MASF) to enhance abstractive text summarization. This framework integrates multiple fine-tuned transformer models, each generating a summary for…
RESEARCH · CL_53567 · May 26 · 17:47

New MATCHA metric improves LLM text evaluation by penalizing contradictions

Researchers have developed MATCHA, a new metric designed to more accurately evaluate the semantic similarity of text generated by large language models. Unlike existing metrics like ROUGE and BERTScore, which can incorr…
RESEARCH · CL_51284 · May 25 · 16:06

Medical QA RAG trainability hinges on checker output distribution, not accuracy

A new research paper explores the trainability of medical question-answering systems that use retrieval-augmented generation (RAG) guided by a Natural Language Inference (NLI) checker. The study reveals that the checker…
TOOL · CL_29008 · May 12 · 19:43

GraphRAG cuts token use by 60% on quantum papers

A project developed for the TigerGraph GraphRAG Inference Hackathon demonstrated that GraphRAG significantly reduces token consumption and improves accuracy for complex queries. By constructing a knowledge graph of enti…
TOOL · CL_20626 · May 7 · 04:00

Mistral, QWen models show divergent strategies in biomedical text simplification

A new research paper compares the text simplification strategies of Mistral-Small and QWen2.5 when applied to biomedical information. The study found that Mistral-Small effectively balances readability and accuracy, per…
TOOL · CL_20382 · May 7 · 04:00

Researchers improve medical VQA with trajectory-aware process supervision

Researchers have developed a novel method to improve medical visual question answering (VQA) systems by incorporating trajectory-aware process supervision. This approach utilizes a two-stage training framework, starting…
RESEARCH · CL_18258 · May 5 · 07:56

New DESG model improves AI therapist evaluation beyond LLM judges

Researchers have developed a new model-agnostic evaluator called Dynamic Emotional Signature Graphs (DESG) to assess the quality of AI-generated responses in mental health dialogues. This method moves beyond simple text…
RESEARCH · CL_13212 · May 2 · 15:28

LLMs favor their own resumes in hiring, study finds

A new study reveals that Large Language Models (LLMs) exhibit a significant self-preference bias in hiring processes, favoring resumes generated by themselves over human-written ones. This bias, ranging from 67% to 82% …
RESEARCH · CL_14134 · May 1 · 01:34

New RCD method optimizes LLM processing of long clinical texts within budget

Researchers have developed a new method called RCD for selecting relevant subsets of long clinical texts to reduce token costs for large language models. This approach frames the problem as a knapsack-constrained subset…
RESEARCH · CL_11448 · Apr 30 · 07:48

New HATS dataset integrates human perception for ASR evaluation

Researchers have introduced HATS, a new French dataset designed to evaluate Automatic Speech Recognition (ASR) systems by incorporating human perception. The dataset was created by having 143 individuals compare and sel…
RESEARCH · CL_08628 · Apr 29 · 04:00

New research proposes reasoning-aware training for better dialogue summarization

Researchers have developed a new framework for multi-role dialogue summarization that moves beyond traditional overlap metrics like ROUGE. Their approach incorporates explicit cognitive-style reasoning and reward-based …
RESEARCH · CL_06982 · Apr 28 · 04:00

ArgRE system uses formal argumentation to improve AI agent requirements negotiation

Researchers have developed ArgRE, a novel system for resolving conflicts in multi-agent requirements negotiation for complex software systems. ArgRE embeds Dung-style abstract argumentation, modeling proposals and criti…

New AI models tackle low-resource Tangkhul-English translation

LLM attribution metrics lack transferability across datasets, study finds

LLMs struggle with Hausa and Fongbe translation, metrics unreliable

New RECOM dataset reveals metric tradeoff in LLM evaluation

Researchers caution on synthetic data quality after fine-tuning Mistral 7B

New geometric framework measures semantic information in text

AI uses curriculum learning and multiple models for better medical text generation

New framework uses multiple models for better text summarization

New MATCHA metric improves LLM text evaluation by penalizing contradictions

Medical QA RAG trainability hinges on checker output distribution, not accuracy

GraphRAG cuts token use by 60% on quantum papers

Mistral, QWen models show divergent strategies in biomedical text simplification

Researchers improve medical VQA with trajectory-aware process supervision

New DESG model improves AI therapist evaluation beyond LLM judges

LLMs favor their own resumes in hiring, study finds

New RCD method optimizes LLM processing of long clinical texts within budget

New HATS dataset integrates human perception for ASR evaluation

New research proposes reasoning-aware training for better dialogue summarization

ArgRE system uses formal argumentation to improve AI agent requirements negotiation