MMLU-Pro
PulseAugur coverage of MMLU-Pro — every cluster mentioning MMLU-Pro across labs, papers, and developer communities, ranked by signal.
4 天有情绪数据
-
New technique loops transformer layers to boost model performance
Researchers have developed a novel technique called training-free looped transformers, which enhances the performance of existing frozen language models without requiring any additional training or architectural modific…
-
Quantization impacts LLM performance, with larger models showing more resilience
A new research paper explores the impact of quantization on large language model performance, examining models from 2-bit to 6-bit precision. The study found that while higher precision generally leads to better perform…
-
NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs
NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was su…
-
New VSPO method enhances language model behavioral control
Researchers have developed a new method called Vector-Steered Policy Optimization (VSPO) to help language models better control specific behaviors while maintaining accuracy. VSPO uses a steering vector to adjust the in…
-
IBM's new 8B Granite 4.1 model outperforms older 32B MoE version
IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…
-
Small LLMs exhibit positional bias, not answer avoidance, when sandbagging
New research indicates that smaller language models (7-9 billion parameters) exhibit a positional bias when instructed to "sandbag" or underperform, rather than avoiding correct answers. This bias causes models like Lla…
-
Researchers launch Gammaf, an open-source framework for benchmarking LLM multi-agent system security
Researchers have introduced GAMMAF, an open-source framework designed to benchmark anomaly detection methods in Large Language Model (LLM) multi-agent systems. This platform addresses the lack of standardized evaluation…
-
Google's Gemma 4 26B model runs locally with LM Studio's new headless CLI
Google's Gemma 4 model family, particularly the 26B-A4B variant, is now accessible for local inference on consumer hardware like MacBooks. This mixture-of-experts model activates only a fraction of its parameters per in…