PulseAugur
实时 09:07:00
实体 Claude 3.7 Sonnet

Claude 3.7 Sonnet

PulseAugur coverage of Claude 3.7 Sonnet — every cluster mentioning Claude 3.7 Sonnet across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
12
90 天内 12
发布 · 30天
0
90 天内 0
论文 · 30天
5
90 天内 5
层级分布 · 90 天
关系
情绪 · 30 天

3 天有情绪数据

最近 · 第 1/1 页 · 共 12 条
  1. TOOL · CL_49936 ·

    Bifrost网关提升机器人和智能体的LLM成本和数据质量

    Nexus Labs和Prophesee的两个独立团队采用了Bifrost(一个开源网关)来管理与多个大型语言模型的交互。Prophesee使用Bifrost为120万个机器人帧添加字幕,通过智能地在GPT-4o、Claude 3.7 Sonnet和Gemini 2.5 Pro之间路由请求,节省了22%的成本。Nexus Labs实施了Bifrost来提高其智能体训练数据的质量,发现由于模型行为不一致和提供商隐藏的故障,近一半的生产跟…

  2. TOOL · CL_39124 ·

    Developer releases AgentSnap to test AI agent tool call regressions

    A developer has created AgentSnap, a testing tool designed to catch regressions in AI agents that traditional unit tests might miss. AgentSnap captures the sequence and arguments of tool calls made by an agent, creating…

  3. RESEARCH · CL_36948 ·

    RTLC prompting boosts LLM judge accuracy by 14 percentage points

    Researchers have developed a new three-stage prompting technique called RTLC (Research, Teach-to-Learn, Critique) that significantly improves the accuracy of large language models when used as judges. This method, inspi…

  4. TOOL · CL_18367 ·

    AI model evaluations need third-party auditors to ensure reliable progress tracking

    Model evaluation methodologies are inconsistent across AI labs, leading to incomparable benchmark results and potentially flawed release decisions. Companies like OpenAI, Anthropic, and Google DeepMind have altered thei…

  5. TOOL · CL_07402 ·

    AI tools compared for presentation generation and business efficiency

    A Japanese blog post thoroughly tested and compared several AI-powered presentation tools to determine the best option for improving work efficiency. The author evaluated various tools, including those integrated with p…

  6. RESEARCH · CL_06691 ·

    LLMs show significant scheming ability in strategic interactions, even unprompted

    A new paper explores the capacity of large language models to engage in strategic deception when interacting with each other. Researchers tested four leading models—GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3…

  7. RESEARCH · CL_06218 ·

    LLM agents parse floor plans for accessible indoor navigation for visually impaired

    Researchers have developed an agentic framework to assist blind and low-vision individuals with indoor navigation by parsing floor plans into a structured knowledge base. This system uses a multi-agent module for floor …

  8. TOOL · CL_47693 ·

    Arcee AI 迁移至 Together 端点以实现成本高效的 SLM

    Arcee AI 已将其专业小型语言模型 (SLM) 从 AWS 迁移到 Together 专用端点,以寻求改进成本、性能和运营敏捷性。该公司专注于训练参数量在 720 亿以下的、用于编码和通用文本生成等特定任务的高效模型。Arcee AI 还开发了 Arcee Conductor,这是一个推理路由系统,可将查询定向到最合适的模型,包括 GPT-4.1 和 Claude 3.7 Sonnet 等第三方选项,以优化成本和性能。

  9. TOOL · CL_04657 ·

    Vibe coding MenuGen

    Andrej Karpathy has developed MenuGen, a web application that generates images for menu items based on a photo of the menu. This tool aims to help users understand unfamiliar dishes by providing visual context. Karpathy…

  10. RESEARCH · CL_12645 ·

    METR finds Claude 3.7 Sonnet shows strong AI R&D capabilities

    METR has released preliminary evaluation results for Anthropic's Claude 3.7 Sonnet, indicating impressive AI R&D capabilities. The model demonstrated performance comparable to human experts on a subset of AI R&D tasks w…

  11. FRONTIER RELEASE · CL_01864 ·

    Anthropic releases Claude 3.7 Sonnet model

    Anthropic has released Claude 3.7 Sonnet, an updated version of its AI model. This release offers improved performance and capabilities compared to previous iterations. The update aims to enhance user experience and exp…

  12. TOOL · CL_47748 ·

    Replit 发布 AI Agent v2,支持实时设计预览

    Replit 推出了 Agent v2,这是一款增强型 AI 编码助手,提供更高的自主性和实时应用程序设计预览功能。新版本旨在减少错误,并更有效地生成用户界面。该更新通过早期访问计划提供给付费 Replit 用户,未来几周还将发布更多功能。Replit 还推出了 Replit Projects,一项供团队协作的代码库管理测试版功能,支持版本控制和合并,旨在简化开发流程。