GSM8K
PulseAugur coverage of GSM8K — every cluster mentioning GSM8K across labs, papers, and developer communities, ranked by signal.
15 day(s) with sentiment data
-
Riazi-8B: Urdu LLM enhances mathematical reasoning for low-resource languages
Researchers have developed Riazi-8B, a new large language model specifically designed for mathematical reasoning in the Urdu language. This model addresses the limitations of existing English-centric LLMs, which perform…
-
New methods accelerate Diffusion LLMs, addressing speed-quality trade-offs · 3 sources tracked
Researchers are developing new methods to accelerate Diffusion Large Language Models (dLLMs), which are computationally intensive due to their sequence length scaling. Two new frameworks, Dynamic-dLLM and Streaming-dLLM…
-
New research explores weight-space geometry of AI reasoning distillation methods
A new research paper analyzes the geometric properties of weight updates across various offline reinforcement learning methods used for distilling reasoning capabilities into smaller AI models. The study trained six dif…
-
Small language model trained on single GPU detailed in new study
Researchers have detailed a method for training a small language model, L20-Edu-135M, using significantly fewer computational resources, specifically on a single NVIDIA L20 GPU. The study focused on data efficiency, uti…
-
New research explores interactive visualization and causal attribution for LLM reasoning
Researchers are exploring new methods to enhance the interpretability and reliability of large language models (LLMs) through chain-of-thought (CoT) reasoning. One approach, Vis-CoT, transforms linear CoT text into inte…
-
New pruning method preserves LLM reasoning performance
Researchers have developed a new training-free method called Causal Attribution Pruning (CAP) to reduce the size of large language models while preserving their reasoning capabilities. CAP identifies and prunes less cri…
-
AI math reasoning benchmarks have a 'sampling blind spot', study finds
A new research paper published on arXiv explores a critical limitation in evaluating the difficulty of math reasoning problems for AI models. The study reveals that standard benchmarks, which rely on the success rate of…
-
New HeRo-Q framework enhances stable low-bit quantization for LLMs
Researchers have developed a new framework called HeRo-Q to improve the stability of low-bit quantization in large language models. This method addresses the 'low error, high loss' phenomenon by reshaping the loss lands…
-
New SEVRA method optimizes LLM reasoning for better accuracy and efficiency
Researchers have developed a new method called Selective Verification for Reasoning Allocation (SEVRA) to optimize the use of reasoning in large language models. SEVRA acts as a serving-layer controller, deciding whethe…
-
LLM benchmarks miss crucial tool-use gap for agentic AI
Public LLM benchmarks often fail to reflect real-world performance, particularly for agentic systems that rely on tool use. Models excelling in static benchmarks like MMLU may perform poorly when integrated into pipelin…
-
Open RLHF training success hinges on evaluation instrument, study finds
A new study explores the complexities of Reinforcement Learning from Human Feedback (RLHF) in open language models, specifically using Qwen2.5-0.5B-Instruct. The research highlights that the perceived "improvement" of a…
-
V-pretraining method improves AI model task-specific performance
Researchers have developed a novel method called V-pretraining to enhance the effectiveness of continued pretraining for AI models. This technique uses a small set of downstream examples to provide step-level feedback, …
-
HRM-Text: 1B parameter model with novel architecture challenges LLM paradigms
A new language model called HRM-Text, developed by Sapient Intelligence, is gaining attention for its innovative architecture that focuses on internal reasoning rather than simply increasing model size or training data.…
-
New method uses cross-model disagreement to detect AI errors
Researchers have introduced a novel method for detecting errors in language models without needing ground truth labels. This new approach, termed cross-model disagreement, utilizes a secondary model to assess the genera…
-
MixReasoning framework optimizes AI model efficiency by adapting reasoning depth
Researchers have developed a new framework called MixReasoning that dynamically adjusts the depth of reasoning within a single response. This approach allows models to apply detailed reasoning to complex steps while usi…
-
TD learning fails to improve LLM few-shot retrieval on GSM8K
A researcher explored TD learning for improving retrieval of few-shot examples in LLM reasoning, aiming to assign learned values to traces based on their utility. The experiment involved storing reasoning traces, retrie…
-
New VISTA framework enhances LLM prompt optimization
Researchers have developed VISTA, a new framework for automatically optimizing prompts used with large language models. This method aims to overcome limitations in existing reflective prompt optimization techniques, whi…
-
LLMs show arithmetic fragility on GSM8K dataset via numeric attacks
Researchers have developed an automated method to test the robustness of large language models in arithmetic reasoning by creating numeric-remapping attacks. These attacks modify word problems with different numbers whi…
-
New framework stress-tests AI process reward models for vulnerabilities
Researchers have developed EST-PRM, a new framework designed to stress-test process reward models (PRMs) used in language model training. PRMs assume their scores remain stable even when reasoning steps are altered whil…
-
eMoT framework boosts LLM reasoning with memory and symbolic anchoring
Researchers have introduced eMoT, a framework designed to enhance the reliability of large language models in multi-step reasoning tasks. eMoT stabilizes reasoning by treating trajectories as evolving memories, incorpor…