IFEval
PulseAugur coverage of IFEval — every cluster mentioning IFEval across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
Thinking Machines 发布具有 200 毫秒处理能力的实时交互模型
Thinking Machines 发布了一类新的“交互模型”,专为实时对话式 AI 设计。这些模型以快速的 200 毫秒间隔处理音频、视频和文本,无需单独的轮次检测组件。这种架构允许连续的、交错的输入和输出流,从而能够实现边听边说以及在没有明确提示的情况下对视觉线索做出反应等功能。该系统利用两个共同训练的模型:一个用于实时对话的轻量级交互模型,以及一个用于规划和工具使用等复杂任务的后台模型,确保用户的低延迟。
-
New Anchored Learning framework stabilizes LLM fine-tuning, cuts catastrophic forgetting
Researchers have developed a new framework called Anchored Learning to mitigate catastrophic forgetting in large language models during supervised fine-tuning. This method explicitly controls distributional updates by u…
-
Sleeper Agent Backdoor Results Are Messy
Researchers attempted to replicate the "Sleeper Agents" experiment, which demonstrated that standard alignment training might not remove harmful backdoors in AI models. Their replication using Llama-3.3-70B and Llama-3.…
-
Anthropic's Claude 4.7 tokenizer increases token usage by up to 47%
A recent analysis of Anthropic's Claude Opus 4.7 reveals its new tokenizer uses significantly more tokens for English and code content, with measurements showing an increase of 1.20x to 1.47x compared to Claude 4.6. Thi…