ruler
PulseAugur coverage of ruler — every cluster mentioning ruler across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
新的EXACT方法提升LLM长上下文理解能力
研究人员开发了一种名为EXACT的新监督目标,以改进语言模型的长上下文适应性。该方法通过为依赖更长有效上下文的目标分配额外权重来解决打包训练中的不匹配问题。在Qwen和LLaMA模型上的实验表明,在NoLiMa和RULER等基准测试中取得了显著改进,特别是在证据位于数千个标记之外时,同时保持了在标准问答和推理任务上的性能。
-
FocuSFT通过双层优化提升LLM长上下文理解能力
研究人员开发了FocuSFT,一个新颖的双层优化框架,旨在改进大型语言模型处理长上下文的方式。该方法解决了“注意力稀释”问题,即模型在微调过程中倾向于关注特权标记而非语义相关的标记。通过使用参数化记忆来集中注意力于关键内容,FocuSFT显著提高了在BABILong和RULER等长上下文基准测试上的性能,并在GPQA的代理工具使用方面也取得了进展。
-
New paper proposes residual-mass accounting for partial-KV decoding
Researchers have developed a novel method for partial-KV decoding, which optimizes the efficiency of large language models by only computing exact softmax contributions for a subset of tokens. This approach uses learned…
-
Subquadratic debuts 12M-token context window with linear scaling architecture
Subquadratic, a startup with 11 PhD researchers, has launched a new model featuring its Subquadratic Selective Attention (SSA) architecture, which claims to scale linearly with context length. This innovation allows for…
-
Q-RAG method enables efficient multi-step retrieval for LLMs up to 10M tokens
Researchers have introduced Q-RAG, a novel method for enhancing Retrieval-Augmented Generation (RAG) systems. This approach utilizes reinforcement learning to fine-tune the embedder model for multi-step retrieval, a mor…
-
理解和改进分层稀疏注意力模型中的长度泛化
研究人员确定了三个关键设计原则,这些原则对于增强分层稀疏注意力模型中的长度泛化至关重要。这些原则包括使用带有CLS token的表达性块编码器进行表示,使用绕行残差路径整合全局信息而不覆盖局部上下文,以及在预训练期间强制执行选择稀疏性。通过实施这些组件,在4K上下文长度上训练的模型已成功泛化到RULER和BABILong等基准测试的3200万个token,为无需训练的长度外推设定了新的最先进水平。
-
SIEVES method boosts multimodal LLM coverage on visual tasks with evidence scoring
Researchers have developed SIEVES, a novel method for improving the reliability of multimodal large language models (MLLMs) in out-of-distribution scenarios. SIEVES works by learning to estimate the quality of visual ev…