Math-500
PulseAugur coverage of Math-500 — every cluster mentioning Math-500 across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
New methods enhance on-policy distillation for LLM training
Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student e…
-
New method steers LLM attention to correct reasoning errors
Researchers have developed Manifold-Guided Attention Steering (MAGS), a novel method to improve the reasoning capabilities of large language models. MAGS identifies deviations from a 'correctness manifold' in the model'…
-
New KV-cache compression method alpha outperforms existing techniques
Researchers have developed a new KV-cache compression method called alpha, which uses a diversity-penalty survivor approach. This method was found to outperform seven other mechanisms in a design-space study on mathemat…
-
New RL algorithm fix boosts GSM8K accuracy by 45 points
Researchers have identified a critical issue in the Group Relative Policy Optimization (GRPO) algorithm when applied to binary rewards, leading to "gradient starvation." This occurs when all responses in a group are eit…
-
New research reveals "coupling tax" limits LLM reasoning accuracy
A new research paper introduces the concept of a "coupling tax" in large language models, highlighting how shared token budgets for reasoning and final answers can hinder accuracy. The study found that for certain tasks…
-
Self-consistency technique shows diminishing returns for modern LLMs
A new study suggests that the self-consistency technique, which involves generating multiple reasoning paths to improve LLM accuracy, is becoming less effective and more costly. Researchers found minimal accuracy gains …
-
BoostLoRA method grows adapter rank to surpass full fine-tuning
Researchers have introduced BoostLoRA, a novel parameter-efficient fine-tuning method designed to enhance model expressivity without increasing inference overhead. This technique iteratively trains and merges small adap…
-
Sleeper Agent Backdoor Results Are Messy
Researchers attempted to replicate the "Sleeper Agents" experiment, which demonstrated that standard alignment training might not remove harmful backdoors in AI models. Their replication using Llama-3.3-70B and Llama-3.…