AlpacaEval
PulseAugur coverage of AlpacaEval — every cluster mentioning AlpacaEval across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
New DoubtProbe defense significantly reduces LLM jailbreaks
Researchers have developed DoubtProbe, a novel defense mechanism designed to counter jailbreaking attempts on large language models (LLMs) in black-box scenarios. This dual-branch framework combines structural verificat…
-
EvoDefense uses LLMs to co-evolve defenses against black-box attacks
Researchers have developed EvoDefense, a novel approach to protect large language models (LLMs) from attacks in black-box scenarios. This system uses a guard LLM and an experience memory to continuously refine defense s…
-
IBM's new 8B Granite 4.1 model outperforms older 32B MoE version
IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…
-
Researchers develop new methods to debias and improve reward models for LLMs
Researchers have developed new methods to improve the reliability and interpretability of reward models (RMs) used in aligning large language models (LLMs). One approach introduces a causally motivated intervention tech…
-
New DPO methods enhance LLM alignment with adaptive techniques
Researchers have developed several advancements to Direct Preference Optimization (DPO), a method for aligning large language models (LLMs) with human preferences. AdaDPO introduces self-adaptive coefficients to balance…