PulseAugur
实时 11:02:56
实体 BIG-Bench Hard

BIG-Bench Hard

PulseAugur coverage of BIG-Bench Hard — every cluster mentioning BIG-Bench Hard across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
3
90 天内 3
发布 · 30天
0
90 天内 0
论文 · 30天
3
90 天内 3
层级分布 · 90 天
情绪 · 30 天

2 天有情绪数据

最近 · 第 1/1 页 · 共 3 条
  1. RESEARCH · CL_48926 ·

    新研究揭示机器学习基准易受操纵

    研究人员分析了机器学习基准被操纵的易感性,将数据集视为选民,模型视为候选人。他们发现,为了在排行榜上获得最高排名而策略性地将基准数据包含在模型的训练集中是一个NP难问题,类似于选举贿赂。该研究引入了“实例级鲁棒性”来量化操纵所需的最小数据集,并评估了其在MMLU和BIG-Bench Hard排行榜上的表现。

  2. TOOL · CL_25616 ·

    New research reveals "coupling tax" limits LLM reasoning accuracy

    A new research paper introduces the concept of a "coupling tax" in large language models, highlighting how shared token budgets for reasoning and final answers can hinder accuracy. The study found that for certain tasks…

  3. TOOL · CL_16166 ·

    SCALE-LoRA framework audits and composes Low-Rank Adaptation adapters for reliable AI outputs

    Researchers have developed SCALE-LoRA, a framework designed to improve the reuse of Low-Rank Adaptation (LoRA) adapters from open pools for new tasks. This system addresses challenges in adapter compatibility and output…