ENTITY GSM8K

GSM8K

PulseAugur coverage of GSM8K — every cluster mentioning GSM8K across labs, papers, and developer communities, ranked by signal.

Total · 30d

70

70 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

68

68 over 90d

TIER MIX · 90D

research 27
tool 42
commentary 1

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

15 day(s) with sentiment data

RECENT · PAGE 1/4 · 70 TOTAL

RESEARCH · CL_109564 · Jun 24 · 08:44

Riazi-8B: Urdu LLM enhances mathematical reasoning for low-resource languages

Researchers have developed Riazi-8B, a new large language model specifically designed for mathematical reasoning in the Urdu language. This model addresses the limitations of existing English-centric LLMs, which perform…
RESEARCH · CL_108093 · Jun 24 · 04:00

New methods accelerate Diffusion LLMs, addressing speed-quality trade-offs · 3 sources tracked

Researchers are developing new methods to accelerate Diffusion Large Language Models (dLLMs), which are computationally intensive due to their sequence length scaling. Two new frameworks, Dynamic-dLLM and Streaming-dLLM…
TOOL · CL_107973 · Jun 24 · 04:00

New research explores weight-space geometry of AI reasoning distillation methods

A new research paper analyzes the geometric properties of weight updates across various offline reinforcement learning methods used for distilling reasoning capabilities into smaller AI models. The study trained six dif…
TOOL · CL_104732 · Jun 20 · 18:42

Small language model trained on single GPU detailed in new study

Researchers have detailed a method for training a small language model, L20-Edu-135M, using significantly fewer computational resources, specifically on a single NVIDIA L20 GPU. The study focused on data efficiency, uti…
RESEARCH · CL_104693 · Jun 20 · 01:18

New research explores interactive visualization and causal attribution for LLM reasoning

Researchers are exploring new methods to enhance the interpretability and reliability of large language models (LLMs) through chain-of-thought (CoT) reasoning. One approach, Vis-CoT, transforms linear CoT text into inte…
TOOL · CL_100162 · Jun 19 · 04:00

New pruning method preserves LLM reasoning performance

Researchers have developed a new training-free method called Causal Attribution Pruning (CAP) to reduce the size of large language models while preserving their reasoning capabilities. CAP identifies and prunes less cri…
TOOL · CL_100107 · Jun 19 · 04:00

AI math reasoning benchmarks have a 'sampling blind spot', study finds

A new research paper published on arXiv explores a critical limitation in evaluating the difficulty of math reasoning problems for AI models. The study reveals that standard benchmarks, which rely on the success rate of…
TOOL · CL_98076 · Jun 18 · 04:00

New HeRo-Q framework enhances stable low-bit quantization for LLMs

Researchers have developed a new framework called HeRo-Q to improve the stability of low-bit quantization in large language models. This method addresses the 'low error, high loss' phenomenon by reshaping the loss lands…
RESEARCH · CL_99535 · Jun 18 · 00:00

New SEVRA method optimizes LLM reasoning for better accuracy and efficiency

Researchers have developed a new method called Selective Verification for Reasoning Allocation (SEVRA) to optimize the use of reasoning in large language models. SEVRA acts as a serving-layer controller, deciding whethe…
COMMENTARY · CL_94706 · Jun 16 · 13:24

LLM benchmarks miss crucial tool-use gap for agentic AI

Public LLM benchmarks often fail to reflect real-world performance, particularly for agentic systems that rely on tool use. Models excelling in static benchmarks like MMLU may perform poorly when integrated into pipelin…
TOOL · CL_92574 · Jun 15 · 19:56

Open RLHF training success hinges on evaluation instrument, study finds

A new study explores the complexities of Reinforcement Learning from Human Feedback (RLHF) in open language models, specifically using Qwen2.5-0.5B-Instruct. The research highlights that the perceived "improvement" of a…
TOOL · CL_91396 · Jun 15 · 04:00

V-pretraining method improves AI model task-specific performance

Researchers have developed a novel method called V-pretraining to enhance the effectiveness of continued pretraining for AI models. This technique uses a small set of downstream examples to provide step-level feedback, …
RESEARCH · CL_89191 · Jun 13 · 12:40

HRM-Text: 1B parameter model with novel architecture challenges LLM paradigms

A new language model called HRM-Text, developed by Sapient Intelligence, is gaining attention for its innovative architecture that focuses on internal reasoning rather than simply increasing model size or training data.…
TOOL · CL_86812 · Jun 12 · 04:00

New method uses cross-model disagreement to detect AI errors

Researchers have introduced a novel method for detecting errors in language models without needing ground truth labels. This new approach, termed cross-model disagreement, utilizes a secondary model to assess the genera…
TOOL · CL_79919 · Jun 9 · 04:00

MixReasoning framework optimizes AI model efficiency by adapting reasoning depth

Researchers have developed a new framework called MixReasoning that dynamically adjusts the depth of reasoning within a single response. This approach allows models to apply detailed reasoning to complex steps while usi…
TOOL · CL_75680 · Jun 7 · 03:45

TD learning fails to improve LLM few-shot retrieval on GSM8K

A researcher explored TD learning for improving retrieval of few-shot examples in LLM reasoning, aiming to assign learned values to traces based on their utility. The experiment involved storing reasoning traces, retrie…
RESEARCH · CL_74171 · Jun 5 · 23:11

New VISTA framework enhances LLM prompt optimization

Researchers have developed VISTA, a new framework for automatically optimizing prompts used with large language models. This method aims to overcome limitations in existing reflective prompt optimization techniques, whi…
RESEARCH · CL_68172 · Jun 2 · 13:09

LLMs show arithmetic fragility on GSM8K dataset via numeric attacks

Researchers have developed an automated method to test the robustness of large language models in arithmetic reasoning by creating numeric-remapping attacks. These attacks modify word problems with different numbers whi…
TOOL · CL_65916 · Jun 2 · 04:00

New framework stress-tests AI process reward models for vulnerabilities

Researchers have developed EST-PRM, a new framework designed to stress-test process reward models (PRMs) used in language model training. PRMs assume their scores remain stable even when reasoning steps are altered whil…
TOOL · CL_65389 · Jun 2 · 04:00

eMoT framework boosts LLM reasoning with memory and symbolic anchoring

Researchers have introduced eMoT, a framework designed to enhance the reliability of large language models in multi-step reasoning tasks. eMoT stabilizes reasoning by treating trajectories as evolving memories, incorpor…