Terminal-Bench
PulseAugur coverage of Terminal-Bench — every cluster mentioning Terminal-Bench across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
Fireworks AI 实现万亿参数 MoE 模型训练
Fireworks AI 开发了新的训练基础设施,能够微调万亿参数的混合专家(MoE)模型,克服了之前的内存和编排瓶颈。该平台在最近发布的 Cursor Composer 2.5 中发挥了关键作用,Composer 2.5 是一个在多个基准测试中取得顶尖性能的编码模型。该系统利用低精度专家量化和优化器状态卸载等技术来管理大型 MoE 模型内存需求,使其更容易进行训练和微调。
-
AI models: Choose benchmarks over hype for true performance
A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …
-
DeepClaude slashes coding agent costs by 17x using DeepSeek V4 Pro
An open-source tool called DeepClaude has gained significant traction by allowing developers to use the Claude Code agent loop with DeepSeek V4 Pro instead of Anthropic's models. This swap drastically reduces costs, wit…
-
公开AI模型复现了Anthropic的漏洞发现研究结果
研究人员已成功使用GPT-5.4和Claude Opus 4.6等公开可用的AI模型复现了Anthropic的Mythos研究结果。这表明用于发现软件漏洞的高级AI能力不再是前沿实验室的专属,而是可以通过公开模型获得。防御者的重点现在应从这些工具的独特性转移到验证和应用AI生成的安全洞察。