ENTITY IFEval

IFEval

PulseAugur coverage of IFEval — every cluster mentioning IFEval across labs, papers, and developer communities, ranked by signal.

Total · 30d

7

7 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

7

7 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 7 TOTAL

RESEARCH · CL_99637 · Jun 18 · 11:12

LLMs show no self-preference in text revision, study finds

A new study published on arXiv investigated whether large language models exhibit self-preference when revising their own text. Researchers tested four mid-tier model families using the IFEval benchmark, comparing how m…
RESEARCH · CL_94915 · Jun 16 · 13:44

New 3B model VibeThinker matches frontier math & coding performance

Researchers have developed VibeThinker-3B, a compact 3-billion parameter model that achieves performance comparable to much larger models in mathematics and coding tasks. This model, built upon Qwen2.5-Coder-3B and util…
TOOL · CL_65456 · Jun 2 · 04:00

New RAFT framework refines domain fine-tuning, reduces model forgetting

Researchers have introduced RAFT, a novel two-stage framework designed to improve domain-specific fine-tuning of language models while mitigating performance degradation on general tasks. RAFT addresses issues like supe…
TOOL · CL_46753 · May 24 · 06:35

Thinking Machines unveils real-time interaction models with 200ms processing

Thinking Machines has unveiled a new class of "interaction models" designed for real-time conversational AI. These models process audio, video, and text in rapid 200-millisecond intervals, eliminating the need for separ…
RESEARCH · CL_20427 · May 6 · 03:48

New Anchored Learning framework stabilizes LLM fine-tuning, cuts catastrophic forgetting

Researchers have developed a new framework called Anchored Learning to mitigate catastrophic forgetting in large language models during supervised fine-tuning. This method explicitly controls distributional updates by u…
RESEARCH · CL_07099 · Apr 28 · 01:55

Sleeper Agent Backdoor Results Are Messy

Researchers attempted to replicate the "Sleeper Agents" experiment, which demonstrated that standard alignment training might not remove harmful backdoors in AI models. Their replication using Llama-3.3-70B and Llama-3.…
TOOL · CL_17386 · Apr 17 · 15:29

Anthropic's Claude 4.7 tokenizer increases token usage by up to 47%

A recent analysis of Anthropic's Claude Opus 4.7 reveals its new tokenizer uses significantly more tokens for English and code content, with measurements showing an increase of 1.20x to 1.47x compared to Claude 4.6. Thi…