PulseAugur
EN
LIVE 10:07:20
ENTITY GSM8K

GSM8K

PulseAugur coverage of GSM8K — every cluster mentioning GSM8K across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
70
70 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
68
68 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

15 day(s) with sentiment data

RECENT · PAGE 1/4 · 70 TOTAL
  1. RESEARCH · CL_109564 ·

    Riazi-8B: Urdu LLM enhances mathematical reasoning for low-resource languages

    Researchers have developed Riazi-8B, a new large language model specifically designed for mathematical reasoning in the Urdu language. This model addresses the limitations of existing English-centric LLMs, which perform…

  2. RESEARCH · CL_108093 ·

    New methods accelerate Diffusion LLMs, addressing speed-quality trade-offs · 3 sources tracked

    Researchers are developing new methods to accelerate Diffusion Large Language Models (dLLMs), which are computationally intensive due to their sequence length scaling. Two new frameworks, Dynamic-dLLM and Streaming-dLLM…

  3. TOOL · CL_107973 ·

    New research explores weight-space geometry of AI reasoning distillation methods

    A new research paper analyzes the geometric properties of weight updates across various offline reinforcement learning methods used for distilling reasoning capabilities into smaller AI models. The study trained six dif…

  4. TOOL · CL_104732 ·

    Small language model trained on single GPU detailed in new study

    Researchers have detailed a method for training a small language model, L20-Edu-135M, using significantly fewer computational resources, specifically on a single NVIDIA L20 GPU. The study focused on data efficiency, uti…

  5. RESEARCH · CL_104693 ·

    New research explores interactive visualization and causal attribution for LLM reasoning

    Researchers are exploring new methods to enhance the interpretability and reliability of large language models (LLMs) through chain-of-thought (CoT) reasoning. One approach, Vis-CoT, transforms linear CoT text into inte…

  6. TOOL · CL_100162 ·

    New pruning method preserves LLM reasoning performance

    Researchers have developed a new training-free method called Causal Attribution Pruning (CAP) to reduce the size of large language models while preserving their reasoning capabilities. CAP identifies and prunes less cri…

  7. TOOL · CL_100107 ·

    AI math reasoning benchmarks have a 'sampling blind spot', study finds

    A new research paper published on arXiv explores a critical limitation in evaluating the difficulty of math reasoning problems for AI models. The study reveals that standard benchmarks, which rely on the success rate of…

  8. TOOL · CL_98076 ·

    New HeRo-Q framework enhances stable low-bit quantization for LLMs

    Researchers have developed a new framework called HeRo-Q to improve the stability of low-bit quantization in large language models. This method addresses the 'low error, high loss' phenomenon by reshaping the loss lands…

  9. RESEARCH · CL_99535 ·

    New SEVRA method optimizes LLM reasoning for better accuracy and efficiency

    Researchers have developed a new method called Selective Verification for Reasoning Allocation (SEVRA) to optimize the use of reasoning in large language models. SEVRA acts as a serving-layer controller, deciding whethe…

  10. COMMENTARY · CL_94706 ·

    LLM benchmarks miss crucial tool-use gap for agentic AI

    Public LLM benchmarks often fail to reflect real-world performance, particularly for agentic systems that rely on tool use. Models excelling in static benchmarks like MMLU may perform poorly when integrated into pipelin…

  11. TOOL · CL_92574 ·

    Open RLHF training success hinges on evaluation instrument, study finds

    A new study explores the complexities of Reinforcement Learning from Human Feedback (RLHF) in open language models, specifically using Qwen2.5-0.5B-Instruct. The research highlights that the perceived "improvement" of a…

  12. TOOL · CL_91396 ·

    V-pretraining method improves AI model task-specific performance

    Researchers have developed a novel method called V-pretraining to enhance the effectiveness of continued pretraining for AI models. This technique uses a small set of downstream examples to provide step-level feedback, …

  13. RESEARCH · CL_89191 ·

    HRM-Text: 1B parameter model with novel architecture challenges LLM paradigms

    A new language model called HRM-Text, developed by Sapient Intelligence, is gaining attention for its innovative architecture that focuses on internal reasoning rather than simply increasing model size or training data.…

  14. TOOL · CL_86812 ·

    New method uses cross-model disagreement to detect AI errors

    Researchers have introduced a novel method for detecting errors in language models without needing ground truth labels. This new approach, termed cross-model disagreement, utilizes a secondary model to assess the genera…

  15. TOOL · CL_79919 ·

    MixReasoning framework optimizes AI model efficiency by adapting reasoning depth

    Researchers have developed a new framework called MixReasoning that dynamically adjusts the depth of reasoning within a single response. This approach allows models to apply detailed reasoning to complex steps while usi…

  16. TOOL · CL_75680 ·

    TD learning fails to improve LLM few-shot retrieval on GSM8K

    A researcher explored TD learning for improving retrieval of few-shot examples in LLM reasoning, aiming to assign learned values to traces based on their utility. The experiment involved storing reasoning traces, retrie…

  17. RESEARCH · CL_74171 ·

    New VISTA framework enhances LLM prompt optimization

    Researchers have developed VISTA, a new framework for automatically optimizing prompts used with large language models. This method aims to overcome limitations in existing reflective prompt optimization techniques, whi…

  18. RESEARCH · CL_68172 ·

    LLMs show arithmetic fragility on GSM8K dataset via numeric attacks

    Researchers have developed an automated method to test the robustness of large language models in arithmetic reasoning by creating numeric-remapping attacks. These attacks modify word problems with different numbers whi…

  19. TOOL · CL_65916 ·

    New framework stress-tests AI process reward models for vulnerabilities

    Researchers have developed EST-PRM, a new framework designed to stress-test process reward models (PRMs) used in language model training. PRMs assume their scores remain stable even when reasoning steps are altered whil…

  20. TOOL · CL_65389 ·

    eMoT framework boosts LLM reasoning with memory and symbolic anchoring

    Researchers have introduced eMoT, a framework designed to enhance the reliability of large language models in multi-step reasoning tasks. eMoT stabilizes reasoning by treating trajectories as evolving memories, incorpor…