PulseAugur
LIVE 00:08:54
ENTITY TriviaQA

TriviaQA

PulseAugur coverage of TriviaQA — every cluster mentioning TriviaQA across labs, papers, and developer communities, ranked by signal.

Total · 30d
8
8 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
8
8 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 8 TOTAL
  1. TOOL · CL_20411 ·

    New method quantifies LLM uncertainty using semantic entropy and conformal calibration

    Researchers have developed a new method called Adaptive Conformal Semantic Entropy (ACSE) to better estimate the uncertainty of Large Language Models (LLMs). This approach focuses on the semantic dispersion of different…

  2. RESEARCH · CL_15929 ·

    New methods like SMF and SAM reduce catastrophic forgetting in LLMs

    Two new research papers explore methods to mitigate catastrophic forgetting in language models during fine-tuning. One paper introduces Sparse Memory Finetuning (SMF), which adds memory layers and updates only heavily a…

  3. RESEARCH · CL_08278 ·

    Researchers release Faithfulness-QA dataset to train context-faithful RAG models

    Researchers have developed Faithfulness-QA, a new dataset containing nearly 100,000 samples designed to train Retrieval-Augmented Generation (RAG) models to prioritize retrieved context over their internal knowledge. Th…

  4. RESEARCH · CL_07004 ·

    S2G-RAG improves multi-hop QA by judging evidence sufficiency and gaps

    Researchers have introduced S2G-RAG, a novel iterative framework designed to improve retrieval-augmented generation (RAG) for multi-hop question answering. The system features a controller, S2G-Judge, which determines i…

  5. RESEARCH · CL_06290 ·

    Gemma 3 4B LLM confidence training shows mixed results, improves accuracy post-hoc

    A study on the Gemma 3 4B model investigated methods to improve its verbal confidence in responses. Initial attempts using a filtered dataset for confidence-conditioned supervised fine-tuning (CSFT) yielded negative res…

  6. RESEARCH · CL_13525 ·

    S2G-RAG framework improves multi-hop QA by judging evidence sufficiency

    Researchers have introduced S2G-RAG, an iterative framework designed to improve retrieval-augmented question answering, particularly for multi-hop queries. The system features a controller called S2G-Judge that determin…

  7. RESEARCH · CL_05078 ·

    LLMs use internal confidence signals to detect and correct errors

    Researchers have investigated how large language models can identify and correct their own mistakes without external input, drawing parallels to second-order confidence models in decision neuroscience. Their findings su…

  8. RESEARCH · CL_04990 ·

    Study finds 3-9B LLMs fail verbal confidence tests, impacting uncertainty estimates

    A new study examined the verbal confidence of seven instruction-tuned, open-weight large language models (LLMs) with 3-9 billion parameters. Researchers found that these models failed to meet minimal validity criteria f…