TruthfulQA
PulseAugur coverage of TruthfulQA — every cluster mentioning TruthfulQA across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
新研究将LLM训练后阶段的视角从Token转向状态分布
研究人员提出了一种新的大语言模型训练后阶段的视角,将重点放在状态分布而非仅仅是Token。他们的研究表明,训练状态的来源和局部性与监督信号本身同等重要。使用Qwen3-0.6B-Base进行的实验表明,来自较弱教师模型的On-Policy蒸馏仍然可以提高多个基准的性能,而轻量级强化学习在保留原有能力的同时增强了特定任务的表现。
-
LLM benchmark costs analyzed: $0.12 for 3 tasks
Benchmarking three large language model tasks (GSM8K, HellaSwag, and TruthfulQA) on a single T4 GPU costs approximately $0.12. The analysis reveals that generative tasks are the primary cost driver, while log-likelihood…
-
New probe reveals how RAG handles conflicting information
Researchers have developed a new method called Context-Driven Decomposition (CDD) to analyze how Retrieval-Augmented Generation (RAG) systems handle conflicting information. CDD operates at inference time to measure and…
-
New diagnostic tool probes LLM circuits for safety and behavior insights
A new research paper introduces "Perturbation Probing," a diagnostic method for understanding the internal workings of large language models. This technique uses two forward passes per prompt to identify and analyze "be…
-
New framework uses multiple LLMs to reduce hallucination and bias
Researchers have developed a new framework called Council Mode designed to mitigate hallucinations and biases in Large Language Models. This approach involves querying multiple diverse LLMs simultaneously and then synth…