Poolside AI
PulseAugur coverage of Poolside AI — every cluster mentioning Poolside AI across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
开源模型落后于前沿闭源模型,基准测试存在争议
多家领先的 AI 实验室发布了新的开源模型,包括 DeepSeek V4、Gemma 4、Kimi K2.6 和 MiMo 2.5。CAISI 的一项评估表明,这些开源模型落后于前沿闭源模型,且差距正在扩大。然而,评估方法和基准测试的局限性也引发了争议,一些人认为标准化测试未能完全捕捉实际能力,尤其是在编码等复杂任务中。
-
Blog post critiques AI benchmark hacking
A blog post on Poolside.ai critiques the practice of "benchmark hacking" in AI development. It argues that the focus on optimizing models for specific benchmarks can lead to systems that perform well on tests but fail i…
-
Poolside AI releases open-weight agentic coding models Laguna XS.2 and M.1
Poolside AI has launched two new open-weight agentic coding models, Laguna XS.2 and M.1. The models achieved impressive scores on the SWE-bench Verified benchmark, with M.1 reaching 72.5% and XS.2 reaching 68.2%. The XS…
-
Poolside AI releases open-weight Laguna XS.2 and M.1 coding models
Poolside AI has released two new agentic coding models, Laguna M.1 and Laguna XS.2, along with their agent training and operation runtime. Laguna M.1 is a large Mixture of Experts (MoE) model trained on 30T tokens using…