PulseAugur
EN
LIVE 08:56:03
ENTITY MMLU-Pro

MMLU-Pro

PulseAugur coverage of MMLU-Pro — every cluster mentioning MMLU-Pro across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
15
15 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
12
12 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 15 TOTAL
  1. RESEARCH · CL_107855 ·

    AI benchmark scores predictable from just two factors, study finds

    A new research paper proposes a method called BenchPress that can predict a frontier model's performance across numerous benchmarks using only two key scores. The study analyzed 84 models and 133 benchmarks, finding tha…

  2. TOOL · CL_105155 ·

    New framework tackles LLM data contamination using uncertainty

    Researchers have introduced Uncertainty-Based Debiasing and Unlearning (UBD), a novel framework for evaluating and mitigating data contamination in large language models (LLMs). Unlike previous methods that rely solely …

  3. TOOL · CL_71003 ·

    Nvidia details task-seeded synthetic data for Nemotron LLM training

    Nvidia has detailed a new method for generating synthetic question-and-answer data to improve large language model training. This task-seeded approach uses existing public datasets as a foundation to create novel, struc…

  4. TOOL · CL_70394 ·

    Context labels dramatically alter language model behavior

    Researchers have found that the labels used to present context to language models significantly impact their behavior. In tests across models like GPT-5.5 and DeepSeek V4 Pro, using labels such as "Instruction:" or "Ref…

  5. COMMENTARY · CL_60296 ·

    AI benchmarks criticized as useless due to over-optimization and contamination

    The author argues that current AI model benchmarks are becoming increasingly useless due to several factors. They contend that models are being over-optimized for these specific tests, leading to a disconnect between be…

  6. TOOL · CL_56391 ·

    Neural Interaction Law: Model Depth-Width Ratio Impacts Generalization

    Researchers have introduced the concept of "neural interaction" to analyze how effectively large language models utilize resources under a fixed budget. They propose that efficient neural interactions, achieved by adjus…

  7. RESEARCH · CL_61375 ·

    NVIDIA quantizes Alibaba's Qwen3.6-35B model for efficient deployment

    NVIDIA has released a quantized version of Alibaba's Qwen3.6-35B-A3B model, named nvidia/Qwen3.6-35B-A3B-NVFP4. This model utilizes the NVFP4 data type, reducing memory requirements by approximately 3.06x while maintain…

  8. RESEARCH · CL_48596 ·

    New technique loops transformer layers to boost model performance

    Researchers have developed a novel technique called training-free looped transformers, which enhances the performance of existing frozen language models without requiring any additional training or architectural modific…

  9. TOOL · CL_40817 ·

    Quantization impacts LLM performance, with larger models showing more resilience

    A new research paper explores the impact of quantization on large language model performance, examining models from 2-bit to 6-bit precision. The study found that while higher precision generally leads to better perform…

  10. RESEARCH · CL_36662 ·

    NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs

    NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was su…

  11. TOOL · CL_36559 ·

    New VSPO method enhances language model behavioral control

    Researchers have developed a new method called Vector-Steered Policy Optimization (VSPO) to help language models better control specific behaviors while maintaining accuracy. VSPO uses a steering vector to adjust the in…

  12. RESEARCH · CL_10517 ·

    IBM's new 8B Granite 4.1 model outperforms older 32B MoE version

    IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…

  13. RESEARCH · CL_08280 ·

    Small LLMs exhibit positional bias, not answer avoidance, when sandbagging

    New research indicates that smaller language models (7-9 billion parameters) exhibit a positional bias when instructed to "sandbag" or underperform, rather than avoiding correct answers. This bias causes models like Lla…

  14. RESEARCH · CL_06321 ·

    Researchers launch Gammaf, an open-source framework for benchmarking LLM multi-agent system security

    Researchers have introduced GAMMAF, an open-source framework designed to benchmark anomaly detection methods in Large Language Model (LLM) multi-agent systems. This platform addresses the lack of standardized evaluation…

  15. TOOL · CL_17412 ·

    Google's Gemma 4 26B model runs locally with LM Studio's new headless CLI

    Google's Gemma 4 model family, particularly the 26B-A4B variant, is now accessible for local inference on consumer hardware like MacBooks. This mixture-of-experts model activates only a fraction of its parameters per in…