PulseAugur
实时 07:15:23
实体 MiniMax M2.5

MiniMax M2.5

PulseAugur coverage of MiniMax M2.5 — every cluster mentioning MiniMax M2.5 across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
10
90 天内 10
发布 · 30天
0
90 天内 0
论文 · 30天
7
90 天内 7
层级分布 · 90 天
关系
情绪 · 30 天

3 天有情绪数据

最近 · 第 1/1 页 · 共 10 条
  1. RESEARCH · CL_48041 ·

    Fireworks AI:AI智能体瓶颈在于可靠性而非智力

    Fireworks AI 的一项新基准测试显示,AI模型执行的可靠性,而不仅仅是智力,是智能体AI系统的关键瓶颈。在 720 项浏览器自动化任务中,一个模型近 20% 的时间未能产生有效输出,导致重试率、延迟和成本显著增加。该研究引入了“智能体执行税”来量化这一开销,强调在生产环境中,具有一致、可靠输出的模型比只有高推理分数的模型更有价值。

  2. RESEARCH · CL_47631 ·

    新的代理框架通过主动证据搜寻提升大型语言模型临床推理能力

    研究人员开发了ClinSeekAgent,这是一个旨在增强大型语言模型临床推理能力的新型框架,使其能够主动搜寻和综合多模态证据。与依赖预选数据的先前方法不同,ClinSeekAgent动态查询医学知识库、导航电子健康记录并利用成像工具来收集信息。这种主动证据搜寻过程显著提高了Claude Opus 4.6和MiniMax M2.5等模型在纯文本和多模态临床任务上的表现,ClinSeek-Bench基准的创建证明了这一点。

  3. TOOL · CL_37611 ·

    LLM benchmark shows routing strategy outperforms single model selection

    A recent benchmark tested 15 LLMs on 38 real-world coding tasks, revealing that a routing strategy combining different models is more effective than selecting a single top-tier model. The study found that cheaper models…

  4. TOOL · CL_24306 ·

    LLM benchmarking issues fixed by adjusting 'thinking mode' parameters

    A developer encountered issues benchmarking three large language models, Kimi K2.5, MiniMax M2.5, and Gemma 4, initially deeming them broken due to low scores or errors. The root cause was identified as a default "think…

  5. TOOL · CL_23871 ·

    Low-cost AI model beats top performers on coding benchmark with new context engine

    A new method called Xanther Context Engine (XCE) has enabled the MiniMax M2.5 model to achieve a 78.2% score on the SWE-bench Verified benchmark, outperforming all other models. This achievement is notable because MiniM…

  6. COMMENTARY · CL_20705 ·

    AI models: Choose benchmarks over hype for true performance

    A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …

  7. RESEARCH · CL_16506 ·

    Hugging Face 博客文章涵盖 Intel CPU VLM、MiniMax M2 代理和 Gradio 自定义前端

    此集群重点介绍了 Hugging Face 的三篇不同的技术博客文章,通过 Mastodon 分享。第一篇文章详细介绍了如何使用 OpenVINO 在 Intel CPU 上运行视觉语言模型 (VLM)。第二篇探讨了 MiniMax M2 背景下的代理泛化。第三篇文章侧重于利用 Gradio 的后端功能创建自定义前端。

  8. TOOL · CL_17917 ·

    IonRouter launches AI inference service with custom IonAttention engine

    IonRouter has launched a new inference service designed for high throughput and low cost, utilizing its proprietary IonAttention engine. This engine is capable of multiplexing multiple models on a single GPU, enabling r…

  9. RESEARCH · CL_01008 ·

    Chinese AI Labs Release Frontier Models Qwen 3.5, GLM 5, and MiniMax 2.5

    Several Chinese AI labs have released new flagship open-weight models, including Qwen 3.5, GLM 5, and MiniMax 2.5. These releases represent a significant push in the frontier of AI development from these organizations. …

  10. FRONTIER RELEASE · CL_01763 ·

    new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

    Google DeepMind has released Gemini 3 Deep Think V2, a new reasoning mode for Google AI Ultra subscribers and available via API early access. This model achieves new state-of-the-art results on benchmarks like ARC-AGI-2…