BIG-Bench Hard
PulseAugur coverage of BIG-Bench Hard — every cluster mentioning BIG-Bench Hard across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
New framework optimizes LLM fine-tuning by modeling task relationships
Researchers have developed a new framework called TaskPGM to optimize the fine-tuning process for large language models. This method uses an energy-based model over tasks, representing them as a Markov random field to c…
-
New research reveals ML benchmarks are vulnerable to manipulation
Researchers have analyzed the susceptibility of machine learning benchmarks to manipulation, treating datasets as voters and models as candidates. They found that strategically including benchmark data in a model's trai…
-
New research reveals "coupling tax" limits LLM reasoning accuracy
A new research paper introduces the concept of a "coupling tax" in large language models, highlighting how shared token budgets for reasoning and final answers can hinder accuracy. The study found that for certain tasks…
-
SCALE-LoRA framework audits and composes Low-Rank Adaptation adapters for reliable AI outputs
Researchers have developed SCALE-LoRA, a framework designed to improve the reuse of Low-Rank Adaptation (LoRA) adapters from open pools for new tasks. This system addresses challenges in adapter compatibility and output…