HotpotQA
PulseAugur coverage of HotpotQA — every cluster mentioning HotpotQA across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
新的审计协议测试NLP基准的证据依赖性
研究人员为自然语言处理中的弱标签基准开发了一种新的审计协议。该协议区分了仅凭元数据即可预测的输出与真正依赖于所提供证据的输出。通过结合元数据先验主导得分和证据干预统计量,该方法旨在提供对基准可靠性更稳健的评估。
-
PersonalAI 2.0 enhances LLMs with knowledge graphs and planning
Researchers have developed PersonalAI 2.0 (PAI-2), a new framework that improves large language model (LLM) systems by integrating external knowledge graphs. PAI-2 employs a dynamic, multistage query processing pipeline…
-
CANTANTE framework optimizes LLM multi-agent systems via credit attribution
Researchers have developed CANTANTE, a new framework designed to optimize the configuration of large language model-based multi-agent systems. This system addresses the challenge of assigning credit for performance when…
-
自我一致性技术对现代大型语言模型显示出收益递减
一项新研究表明,自我一致性技术(通过生成多个推理路径来提高大型语言模型的准确性)的有效性正在降低,成本也在增加。研究人员发现,在 HotpotQA 和 MATH-500 等基准测试中,增加样本数量只能带来微小的准确性提升,而标记成本却呈线性增长。在某些情况下,样本越多,性能甚至会下降,这表明对于更现代、能力更强的模型来说,自我一致性可能引入的是噪声而非信号。
-
ROZA Graphs improve RAG accuracy and efficiency via evidence-centric feedback
Researchers have developed ROZA Graphs, a novel approach to enhance Retrieval-Augmented Generation (RAG) systems by incorporating evidence-centric feedback. This method stores per-evidence chains of thought as structure…
-
New RAG methods aim to boost AI factuality and reduce hallucinations
Several research papers published on arXiv in May 2026 introduce novel methods to enhance Retrieval-Augmented Generation (RAG) systems. These approaches focus on improving the robustness and trustworthiness of RAG by ad…
-
NeocorRAG framework optimizes retrieval quality for RAG models, achieving SOTA performance
Researchers have introduced NeocorRAG, a novel framework designed to enhance Retrieval-Augmented Generation (RAG) systems by focusing on retrieval quality rather than just recall. This new approach utilizes "Evidence Ch…
-
S2G-RAG improves multi-hop QA by judging evidence sufficiency and gaps
Researchers have introduced S2G-RAG, a novel iterative framework designed to improve retrieval-augmented generation (RAG) for multi-hop question answering. The system features a controller, S2G-Judge, which determines i…
-
New RAG research tackles tabular data, cost, and cross-lingual knowledge
Several recent research papers explore advancements in Retrieval-Augmented Generation (RAG) systems. One paper introduces Orthogonal Subspace Decomposition (OSD) to separate task-specific behavior from document knowledg…
-
S2G-RAG framework improves multi-hop QA by judging evidence sufficiency
Researchers have introduced S2G-RAG, an iterative framework designed to improve retrieval-augmented question answering, particularly for multi-hop queries. The system features a controller called S2G-Judge that determin…