ENTITY TriviaQA

TriviaQA

PulseAugur coverage of TriviaQA — every cluster mentioning TriviaQA across labs, papers, and developer communities, ranked by signal.

Total · 30d

14

14 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

14

14 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 14 TOTAL

TOOL · CL_86812 · Jun 12 · 04:00

New method uses cross-model disagreement to detect AI errors

Researchers have introduced a novel method for detecting errors in language models without needing ground truth labels. This new approach, termed cross-model disagreement, utilizes a secondary model to assess the genera…
TOOL · CL_65721 · Jun 2 · 04:00

Ev-Trust mechanism boosts LLM agent trust and cooperation

Researchers have developed Ev-Trust, a novel mechanism designed to enhance trust within decentralized multi-agent systems powered by large language models (LLMs). This system addresses vulnerabilities like fraud, qualit…
RESEARCH · CL_58502 · May 28 · 00:00

New CorVer method improves QA factual accuracy using Wikipedia stats

Researchers have developed CorVer, a new method for improving factual accuracy in question-answering models trained with reinforcement learning. This lightweight system uses Wikipedia co-occurrence statistics to provide…
TOOL · CL_44647 · May 22 · 04:00

$ECUAS_n$ metrics offer principled evaluation for AI uncertainty

Researchers have introduced a new family of metrics called $ECUAS_n$ for evaluating uncertainty-augmented systems. These systems provide both predictions and uncertainty scores, which are crucial for high-stakes decisio…
TOOL · CL_42139 · May 21 · 04:00

New framework optimizes LLM use for extractive question answering

Researchers have developed a Learning-to-Defer framework to improve the efficiency of extractive question answering (EQA) using large language models. This method intelligently allocates queries to specialized models, e…
RESEARCH · CL_30773 · May 13 · 13:06

PersonalAI 2.0 enhances LLMs with knowledge graphs and planning

Researchers have developed PersonalAI 2.0 (PAI-2), a new framework that improves large language model (LLM) systems by integrating external knowledge graphs. PAI-2 employs a dynamic, multistage query processing pipeline…
TOOL · CL_20411 · May 7 · 04:00

New method quantifies LLM uncertainty using semantic entropy and conformal calibration

Researchers have developed a new method called Adaptive Conformal Semantic Entropy (ACSE) to better estimate the uncertainty of Large Language Models (LLMs). This approach focuses on the semantic dispersion of different…
RESEARCH · CL_15929 · May 4 · 00:02

New methods like SMF and SAM reduce catastrophic forgetting in LLMs

Two new research papers explore methods to mitigate catastrophic forgetting in language models during fine-tuning. One paper introduces Sparse Memory Finetuning (SMF), which adds memory layers and updates only heavily a…
RESEARCH · CL_08278 · Apr 28 · 07:21

Researchers release Faithfulness-QA dataset to train context-faithful RAG models

Researchers have developed Faithfulness-QA, a new dataset containing nearly 100,000 samples designed to train Retrieval-Augmented Generation (RAG) models to prioritize retrieved context over their internal knowledge. Th…
RESEARCH · CL_07004 · Apr 28 · 04:00

S2G-RAG improves multi-hop QA by judging evidence sufficiency and gaps

Researchers have introduced S2G-RAG, a novel iterative framework designed to improve retrieval-augmented generation (RAG) for multi-hop question answering. The system features a controller, S2G-Judge, which determines i…
RESEARCH · CL_06290 · Apr 27 · 05:53

Gemma 3 4B LLM confidence training shows mixed results, improves accuracy post-hoc

A study on the Gemma 3 4B model investigated methods to improve its verbal confidence in responses. Initial attempts using a filtered dataset for confidence-conditioned supervised fine-tuning (CSFT) yielded negative res…
RESEARCH · CL_13525 · Apr 26 · 16:17

S2G-RAG framework improves multi-hop QA by judging evidence sufficiency

Researchers have introduced S2G-RAG, an iterative framework designed to improve retrieval-augmented question answering, particularly for multi-hop queries. The system features a controller called S2G-Judge that determin…
RESEARCH · CL_05078 · Apr 24 · 06:33

LLMs use internal confidence signals to detect and correct errors

Researchers have investigated how large language models can identify and correct their own mistakes without external input, drawing parallels to second-order confidence models in decision neuroscience. Their findings su…
RESEARCH · CL_04990 · Apr 24 · 04:45

Study finds 3-9B LLMs fail verbal confidence tests, impacting uncertainty estimates

A new study examined the verbal confidence of seven instruction-tuned, open-weight large language models (LLMs) with 3-9 billion parameters. Researchers found that these models failed to meet minimal validity criteria f…