PulseAugur
实时 07:08:21
English(EN) Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

新框架通过自我分析提高 AI 代理的可靠性

研究人员开发了两种新框架:回顾性约束优化 (RHO) 和 HarnessFix,旨在提高 AI 代理的可靠性和性能。RHO 通过分析过去的轨迹并根据自我偏好选择最有效的更新,采用一种自监督方法来优化代理的约束。另一方面,HarnessFix 通过将执行跟踪编译成专门的中间表示来诊断和修复代理约束中的缺陷,从而实现有针对性的修复。这两种方法都在各种基准测试(包括软件工程任务)上显著提高了代理性能,而无需外部验证数据。 AI

影响 这些方法通过实现无需外部监督的自我改进和有针对性的缺陷修复,为提高 AI 代理性能和可靠性提供了新途径。

排序理由 两篇介绍改进 AI 代理新方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.CL TIER_1 English(EN) · Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia ·

    回顾性约束优化:通过轨迹回放的自我偏好改进LLM代理

    arXiv:2606.05922v1 Announce Type: cross Abstract: AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-trut…

  2. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Qing Wang ·

    从失败的轨迹到可靠的LLM代理:诊断和修复Harness缺陷

    LLM-based agents increasingly rely on harnesses that provide execution environments, tool interfaces, context, lifecycle orchestration, observability, verification, and governance. Existing self-improving agents and automatic harness evolution methods mainly improve agents throug…

  3. arXiv cs.CL TIER_1 English(EN) · Xiaohua Jia ·

    回顾性约束优化:通过轨迹回放的自我偏好改进LLM代理

    AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is diffic…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    回顾性约束优化:通过轨迹回放的自我偏好改进LLM代理

    AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is diffic…