PulseAugur
EN
LIVE 09:16:03
ENTITY IFEval

IFEval

PulseAugur coverage of IFEval — every cluster mentioning IFEval across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
7
7 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
7
7 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 7 TOTAL
  1. RESEARCH · CL_99637 ·

    LLMs show no self-preference in text revision, study finds

    A new study published on arXiv investigated whether large language models exhibit self-preference when revising their own text. Researchers tested four mid-tier model families using the IFEval benchmark, comparing how m…

  2. RESEARCH · CL_94915 ·

    New 3B model VibeThinker matches frontier math & coding performance

    Researchers have developed VibeThinker-3B, a compact 3-billion parameter model that achieves performance comparable to much larger models in mathematics and coding tasks. This model, built upon Qwen2.5-Coder-3B and util…

  3. TOOL · CL_65456 ·

    New RAFT framework refines domain fine-tuning, reduces model forgetting

    Researchers have introduced RAFT, a novel two-stage framework designed to improve domain-specific fine-tuning of language models while mitigating performance degradation on general tasks. RAFT addresses issues like supe…

  4. TOOL · CL_46753 ·

    Thinking Machines unveils real-time interaction models with 200ms processing

    Thinking Machines has unveiled a new class of "interaction models" designed for real-time conversational AI. These models process audio, video, and text in rapid 200-millisecond intervals, eliminating the need for separ…

  5. RESEARCH · CL_20427 ·

    New Anchored Learning framework stabilizes LLM fine-tuning, cuts catastrophic forgetting

    Researchers have developed a new framework called Anchored Learning to mitigate catastrophic forgetting in large language models during supervised fine-tuning. This method explicitly controls distributional updates by u…

  6. RESEARCH · CL_07099 ·

    Sleeper Agent Backdoor Results Are Messy

    Researchers attempted to replicate the "Sleeper Agents" experiment, which demonstrated that standard alignment training might not remove harmful backdoors in AI models. Their replication using Llama-3.3-70B and Llama-3.…

  7. TOOL · CL_17386 ·

    Anthropic's Claude 4.7 tokenizer increases token usage by up to 47%

    A recent analysis of Anthropic's Claude Opus 4.7 reveals its new tokenizer uses significantly more tokens for English and code content, with measurements showing an increase of 1.20x to 1.47x compared to Claude 4.6. Thi…