新研究解决多智能体系统和 LLM 代理效率问题

arXiv cs.AI TIER_1 English(EN) · Akash Bonagiri, Devang Borkar, Gerard Janno Anderias, Setareh Rafatirad, Houman Homayoun · 2026-05-26 04:00

CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures

arXiv:2605.25338v1 Announce Type: cross Abstract: Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals a…

arXiv cs.AI TIER_1 English(EN) · Ya-Ting Yang, Quanyan Zhu · 2026-05-26 04:00

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

arXiv:2605.23929v1 Announce Type: new Abstract: Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs b…

arXiv cs.AI TIER_1 English(EN) · Wenqian Ye, Bo Yuan, Zhichao Xu, Yijun Tian, Yawei Wang, Henry Kautz, Aidong Zhang · 2026-05-26 04:00

A Sober Look at Agentic Misalignment in Automated Workflows

arXiv:2605.24197v1 Announce Type: new Abstract: We study a class of emergent misalignment in multi-agent systems (MAS), with a focus on automated workflows, which we refer to agentic misalignment. Although these systems can solve complex tasks, they often fail because agents act …

arXiv cs.AI TIER_1 English(EN) · Yifan Zeng, Yiran Wu, Yaolun Zhang, Wentian Zhao, Kun Wan, Qingyun Wu, Huazheng Wang · 2026-05-26 04:00

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

arXiv:2605.24202v1 Announce Type: new Abstract: Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinforcement learning is unstable in ways that are poorly understood. We study when end-to-end RL …

arXiv cs.AI TIER_1 English(EN) · Harshada Badave, Santosh Borse, Andrea Gomez, Harshitha Narahari, Sara Carter, Vishwa Bhatt, Aishani Rachakonda, Shuxin Lin, Dhaval Patel · 2026-05-26 04:00

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

arXiv:2605.24219v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents that reason, use tools, and act over multiple steps. Yet most hallucination benchmarks still evaluate only the final output, missing failures that originate…

arXiv cs.AI TIER_1 English(EN) · Yuyang Hu, Hongjin Qian, Shuting Wang, Jiongnan Liu, Tong Zhao, Xiaoxi Li, Zheng Liu, Zhicheng Dou · 2026-05-26 04:00

AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning

arXiv:2605.24486v1 Announce Type: new Abstract: Recent progress on long-horizon agentic tasks has been driven largely by scaling up individual agents through stronger models, better tools, and more effective scaffolding. In contrast, much less is understood about scaling out: whe…

arXiv cs.AI TIER_1 English(EN) · Yuxin Zhang, Mengxue Hu, Zheng Lin, Xiaoyi Fan, Fan Xie, Zihan Fang, Jing Yang, Wenjun Zhu, Zhiwen Chen, Chengfei Lv, Zhe Chen · 2026-05-26 04:00

Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents

arXiv:2605.24598v1 Announce Type: new Abstract: Large language model (LLM) agents excel at solving complex long-horizon tasks through autonomous interaction with environments. However, their real-world deployment faces a fundamental device--cloud dilemma: on-device models are eff…

arXiv cs.AI TIER_1 English(EN) · Zhimin Lin, Kun Cheng, Fan Bai, Jie Gao · 2026-05-26 04:00

Agent-as-Peer-Debriefer: A Multi-Agent Framework with Perspective-Based Refinement for Qualitative Analysis

arXiv:2605.24600v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for qualitative data analysis (QDA), yet their outputs often miss the depth and nuance of human analysis. We argue this gap reflects a missing credibility practice from human QDA: p…

arXiv cs.AI TIER_1 English(EN) · Sasank Annapureddy · 2026-05-26 04:00

PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback

arXiv:2605.24775v1 Announce Type: new Abstract: Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot: upstream providers throttle without warning, sub-agents drift the task to fit accessible tool…

arXiv cs.AI TIER_1 English(EN) · Yilei Zhang · 2026-05-26 04:00

Agent Manufacturing: Foundation-Model Agents as First-Class Industrial Entities

arXiv:2605.24823v1 Announce Type: new Abstract: Manufacturing has passed through four widely recognized paradigms - mechanization, electrification, programmable automation, and Smart Manufacturing - each defined by the kind of work it shifted from humans to machines. In every cas…

arXiv cs.AI TIER_1 English(EN) · Yi Li, Songtao Wei, Dongming Jiang, Zhichun Guo, Qiannan Li, Bingzhe Li · 2026-05-26 04:00

DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

arXiv:2605.25188v1 Announce Type: new Abstract: Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning trac…

arXiv cs.AI TIER_1 English(EN) · Andy Xu, Yu-Wing Tai · 2026-05-26 04:00

Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems

arXiv:2605.25233v1 Announce Type: new Abstract: AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in scale and depth. Small errors at intermediate stages can propagate through agent interaction…

arXiv cs.AI TIER_1 English(EN) · Qiming Ye, Peixain Zhang, Yupeng He, Zifan Peng, Gareth Tyson · 2026-05-26 04:00

Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network

arXiv:2605.25815v1 Announce Type: new Abstract: Agent-to-Agent (A2A) networks enable autonomous AI agents to collaborate by sharing reusable problem-solving instructions. However, how these decentralized ecosystems operate in practice remains largely unexplored. We present the fi…

arXiv cs.AI TIER_1 English(EN) · Nikos Pagonas, Matthew Lou, Tianyi Peng, Dan Rubenstein, Kostis Kaffes · 2026-05-26 04:00

VineLM: Trie-Based Fine-Grained Control for Agentic Workflows

arXiv:2605.23914v1 Announce Type: cross Abstract: Agentic workflows interleave configurable LLM stages with tool stages and often include retries or refinement loops. Existing workflow managers profile full workflow configurations offline and assign each request a static workflow…

arXiv cs.AI TIER_1 English(EN) · Inseo Jung, Yoonseok Oh, Kyungryul Back, Jinkyu Kim, Jungbeom Lee · 2026-05-26 04:00

SODE: Analyzing Social Dynamics in LLM Agents

arXiv:2605.23949v1 Announce Type: cross Abstract: As Large Language Models (LLMs) evolve into interactive agents, understanding their behavioral alignment within human social dynamics becomes essential. While behavioral game theory offers a framework to study these interactions, …

arXiv cs.AI TIER_1 English(EN) · Darek Kleczek, Fuheng Zhao, Alexander W. Lee, Julien Tissier, Pawel Liskowski, Ugur Cetintemel, Anupam Datta · 2026-05-26 04:00

AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery

arXiv:2605.24183v1 Announce Type: cross Abstract: We introduce AvalancheBench, a benchmark for evaluating enterprise data agents through \emph{latent world recovery}. AvalancheBench improves on existing benchmarks in three ways. First, it evaluates analytical understanding rather…

arXiv cs.AI TIER_1 English(EN) · Nesreen K. Ahmed, Nima Nafisi · 2026-05-26 04:00

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

arXiv:2605.24216v1 Announce Type: cross Abstract: Monitoring autonomous large language model (LLM) agents for covert malicious behavior is challenging due to delayed, context-dependent, and long-horizon attack patterns. Agents may pursue hidden objectives while maintaining superf…

arXiv cs.AI TIER_1 English(EN) · Alin-Gabriel V\u{a}duva, Anca-Ioana Andreescu, Simona-Vasilica Oprea, Adela B\^ara · 2026-05-26 04:00

Code2UML: Agentic LLMs with context engineering for scalable software visualization

arXiv:2605.24453v1 Announce Type: cross Abstract: Large Language Model (LLM)-based code analysis tools are adopted to automate software documentation tasks. However, the scalability of these approaches to real codebases, where Intermediate Representations (IR) exceed LLM context …

arXiv cs.AI TIER_1 English(EN) · Haoran Li, Shulun Chen, Shaoyuan Sun, Hanchen Wang · 2026-05-26 04:00

Multi-Agent Coordination Adaptation via Structure-Guided Orchestration

arXiv:2605.25746v1 Announce Type: cross Abstract: As large language model (LLM)-based multi-agent systems scale to handle increasingly complex tasks, balancing structural stability and dynamic adaptability becomes increasingly challenging. Existing systems typically adopt either …

arXiv cs.AI TIER_1 English(EN) · Wei Fan, Yining Zhou, Mufan Zhang, Yanbing Weng, Yiran HU, Tianshi Zheng, Baixuan Xu, Chunyang Li, Jianhui Yang, Haoran Li, Yangqiu Song · 2026-05-26 04:00

Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning

arXiv:2605.25920v1 Announce Type: cross Abstract: While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental constraint that applicable law must match the temporal context of each case, as retroactiv…

arXiv cs.AI TIER_1 English(EN) · Tatiana Petrova (SEDAN SnT, University of Luxembourg, Luxembourg, Luxembourg), Boris Bliznioukov (SEDAN SnT, University of Luxembourg, Luxembourg, Luxembourg), Aleksandr Puzikov (SEDAN SnT, University of Luxembourg, Luxembourg, Luxembourg), Radu State (S… · 2026-05-26 04:00

From Multi-Agent Systems and the Semantic Web to Agentic AI: A Unified Narrative of the Web of Agents

arXiv:2507.10644v4 Announce Type: replace Abstract: The Web of Agents (WoA) transforms the document-centric Web into an environment of autonomous agents acting on users' behalf, a vision newly tractable as large language models (LLMs) mature. We argue that across three decades th…

arXiv cs.AI TIER_1 English(EN) · Zoran Milosevic, Fethi Rabhi · 2026-05-26 04:00

Architecting Agentic Communities using Design Patterns

arXiv:2601.03624v3 Announce Type: replace Abstract: The rapid evolution of Large Language Models (LLM) and subsequent Agentic AI technologies requires systematic architectural guidance for building sophisticated, production-grade systems. This paper presents an approach for archi…

arXiv cs.AI TIER_1 Italiano(IT) · Yinyi Luo, Yiqiao Jin, Weichen Yu, Mengqi Zhang, Srijan Kumar, Xiaoxiao Li, Weijie Xu, Xin Chen, Jindong Wang · 2026-05-26 04:00

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

arXiv:2602.03955v3 Announce Type: replace Abstract: While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes Ag…

arXiv cs.AI TIER_1 English(EN) · Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dong Li, Dongbin Zhao · 2026-05-26 04:00

Dynamic Dual-Granularity Skill Bank for Agentic RL

arXiv:2603.28716v2 Announce Type: replace Abstract: Agentic RL can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D…

arXiv cs.AI TIER_1 English(EN) · Yijuan Liang, Xinghao Chen, Yifan Ge, Ziyi Wu, Hao Wu, Changyu Zeng, Wei Xing, Xiaoyu Shen · 2026-05-26 04:00

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

arXiv:2604.11557v2 Announce Type: replace Abstract: Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, large…

arXiv cs.AI TIER_1 English(EN) · Yidong He, Yutao Lai, Pengxu Yang, Jiarui Gan, Jiexin Wang, Yi Cai, Mengchen Zhao · 2026-05-26 04:00

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

arXiv:2605.04906v2 Announce Type: replace Abstract: While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other a…

arXiv cs.AI TIER_1 English(EN) · Simon Yu, Derek Chong, Ananjan Nandi, Dilara Soylu, Jiuding Sun, Christopher D Manning, Weiyan Shi · 2026-05-26 04:00

Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

arXiv:2605.10913v2 Announce Type: replace Abstract: As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that operate on other agents, much as managers supervise employees. Whatever a meta-agent does: coordinating agents, hal…

arXiv cs.AI TIER_1 English(EN) · Haibo Jin, Peng Kuang, Ye Yu, Xiaopeng Yuan, Haohan Wang · 2026-05-26 04:00

Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems

arXiv:2602.03695v2 Announce Type: replace-cross Abstract: While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, wh…

arXiv cs.AI TIER_1 English(EN) · Dixi Yao, Tahseen Rabbani, Manzil Zaheer, Tian Li · 2026-05-26 04:00

Federation over Text: Insight Sharing for Multi-Agent Reasoning

arXiv:2604.16778v2 Announce Type: replace-cross Abstract: We propose a federated learning-like framework, Federation over Text (FoT), that enables multiple clients solving different tasks to collectively generate a shared library of metacognitive insights by iteratively federatin…

arXiv cs.CL TIER_1 English(EN) · Yihao Hu, Zhihao Wen, Xiujin Liu, Pan Wang, Xin Zhang, Wei Wu · 2026-05-26 04:00

SEAL: Synergistic Co-Evolution of Agents and Learning Environments

arXiv:2605.24426v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly improved through interaction, yet most self-evolution methods adapt either the policy or the learning environment in isolation. We identify this structural gap as \emph{Agent-Enviro…

arXiv cs.CL TIER_1 English(EN) · Tianda Sun, Dimitar Kazakov · 2026-05-26 04:00

Tool-Call Dependency Structure is Linearly Decodable in LLM Agent Residual Streams

arXiv:2605.25310v1 Announce Type: new Abstract: Tool-using LLM agents produce trajectories whose calls form a directed dependency graph: earlier tool outputs supply arguments to later calls. Whether this execution structure is represented inside the model is unknown; prior struct…

arXiv cs.CL TIER_1 English(EN) · Daren Wang, Hong Xu, Jiawen Xian · 2026-05-26 04:00

PolyGnosis 2.0: Enhancing LLM Reasoning via Agentic Harness Engineering for Polymarket and OSINT Insight Extraction

arXiv:2605.25958v1 Announce Type: new Abstract: This paper introduces PolyGnosis 2.0, a pioneering multi-agent architecture designed to extract predictive intelligence by synthesizing Polymarket anomaly signals with global Open Source Intelligence (OSINT) streams, specifically Gl…

arXiv cs.LG TIER_1 English(EN) · Ariel Fogel, Omer Hofman, Eilon Cohen, Roman Vainshtein · 2026-05-26 04:00

Inference-Time Backdoors via Chat Templates: From LLM Supply Chains to Agentic System Compromise

arXiv:2602.04653v4 Announce Type: replace-cross Abstract: Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat is backdoor attacks, in which adversaries embed hidden behaviors that activate under specific …

arXiv cs.AI TIER_1 English(EN) · Yangqiu Song · 2026-05-25 14:57

Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning

While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental constraint that applicable law must match the temporal context of each case, as retroactive application of statutes violates core legal prin…

arXiv cs.AI TIER_1 English(EN) · Gareth Tyson · 2026-05-25 13:12

Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network

Agent-to-Agent (A2A) networks enable autonomous AI agents to collaborate by sharing reusable problem-solving instructions. However, how these decentralized ecosystems operate in practice remains largely unexplored. We present the first large-scale empirical study of EvoMap, a pro…

arXiv cs.AI TIER_1 English(EN) · Hanchen Wang · 2026-05-25 11:59

Multi-Agent Coordination Adaptation via Structure-Guided Orchestration

As large language model (LLM)-based multi-agent systems scale to handle increasingly complex tasks, balancing structural stability and dynamic adaptability becomes increasingly challenging. Existing systems typically adopt either structure-centric methods, committing to structure…

arXiv cs.AI TIER_1 English(EN) · Sajjad Khan · 2026-05-25 04:00

S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

arXiv:2605.17076v2 Announce Type: replace-cross Abstract: We address concurrency control for LLM agents sharing mutable state over HTTP, where agents cannot be modified to declare read sets. S-Bus is an HTTP middleware whose central mechanism, a server-side DeliveryLog, reconstru…

arXiv cs.AI TIER_1 English(EN) · Musa Cim, Burak Topcu, Chita Das, Mahmut Taylan Kandemir · 2026-05-25 04:00

Parallel Context Compaction for Long-Horizon LLM Agent Serving

arXiv:2605.23296v1 Announce Type: new Abstract: Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's context window. Context compaction via LLM-based summarization keeps the conversation bounded, but summarization is inherently loss…

arXiv cs.AI TIER_1 English(EN) · Joydeep Chandra · 2026-05-25 04:00

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

arXiv:2605.23887v1 Announce Type: cross Abstract: Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and un…

arXiv cs.AI TIER_1 English(EN) · Zixuan Ke, Yifei Ming, Austin Xu, Ryan Chin, Xuan-Phi Nguyen, Prathyusha Jwalapuram, Jiayu Wang, Semih Yavuz, Caiming Xiong, Shafiq Joty · 2026-05-25 04:00

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

arXiv:2601.14652v5 Announce Type: replace Abstract: While multi-agent systems (MAS) promise elevated intelligence through coordination of agents, current approaches to automatic MAS design under-deliver. Such shortcomings stem from two key factors: (1) methodological complexity -…

arXiv cs.AI TIER_1 English(EN) · Pei Yang, Wanyi Chen, Tongyun Yang, Pengbin Feng, Jiarong Xing, Wentao Guo, Yuhang Yao, Yuhang Han, Hanchen Li, Xu Wang, Zeyu Wang, Jie Xiao, Anjie Yang, Liang Tian, Lynn Ai, Eric Yang, Tianyu Shi · 2026-05-25 04:00

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing

arXiv:2605.18859v2 Announce Type: replace-cross Abstract: LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many model calls. Routing each call to the cheapest sufficie…

arXiv cs.CL TIER_1 English(EN) · Jiahao Ying, Boxian Ai, Wei Tang, Siyuan Liu, Yixin Cao · 2026-05-25 04:00

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

arXiv:2605.23657v1 Announce Type: new Abstract: Skills, i.e., structured workflow instructions distilled for large language models (LLMs), are becoming an increasingly important mechanism for improving agent performance on real-world downstream tasks. However, as the open-source …

arXiv cs.LG TIER_1 English(EN) · Yuandao Cai, Yuzhang Zhu, Liyou Gao, Wensheng Tang, Shengchao Qin · 2026-05-25 04:00

Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

arXiv:2605.23574v1 Announce Type: new Abstract: Long-horizon language agents can make many plausible local tool calls yet fail to persist until a requested count is actually complete. We study this gap as Quantitative Goal Persistence (QGP): whether an agent keeps working until a…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Sasank Annapureddy · 2026-05-23 23:27

PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback

Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot: upstream providers throttle without warning, sub-agents drift the task to fit accessible tools, narrate machinery instead of using it, open r…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Zhe Chen · 2026-05-23 14:29

Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents

Large language model (LLM) agents excel at solving complex long-horizon tasks through autonomous interaction with environments. However, their real-world deployment faces a fundamental device--cloud dilemma: on-device models are efficient but often brittle, while cloud models are…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-23 00:00

SEAL: Synergistic Co-Evolution of Agents and Learning Environments

SEAL is a closed-loop co-evolution framework that simultaneously adapts both agent policies and training environments to improve interactive tool-use capabilities in large language models.

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-22 17:47

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared different…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Joydeep Chandra · 2026-05-22 17:47

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared different…

arXiv cs.CL TIER_1 English(EN) · Yixin Cao · 2026-05-22 14:09

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

Skills, i.e., structured workflow instructions distilled for large language models (LLMs), are becoming an increasingly important mechanism for improving agent performance on real-world downstream tasks. However, as the open-source skill ecosystem rapidly expands, it remains uncl…

arXiv cs.LG TIER_1 English(EN) · Shengchao Qin · 2026-05-22 12:44

Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

Long-horizon language agents can make many plausible local tool calls yet fail to persist until a requested count is actually complete. We study this gap as Quantitative Goal Persistence (QGP): whether an agent keeps working until an external verifier confirms enough distinct val…

arXiv cs.AI TIER_1 English(EN) · Mahmut Taylan Kandemir · 2026-05-22 07:12

Parallel Context Compaction for Long-Horizon LLM Agent Serving

Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's context window. Context compaction via LLM-based summarization keeps the conversation bounded, but summarization is inherently lossy and the blocking call stalls agent inference f…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Hayoung Chung · 2026-05-22 06:27

Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework

Topology optimization is a widely used design method that produces optimized material distributions for prescribed objectives and constraints through well-established numerical algorithms. Throughout the workflow, engineers make a series of decisions ranging from setting and adju…

arXiv cs.AI TIER_1 English(EN) · Shuaike Shen, Wenduo Cheng, Shike Wang, Mingqian Ma, Jian Ma · 2026-05-22 04:00

AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

arXiv:2605.20425v1 Announce Type: new Abstract: Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardized interfaces between existing tools and agents. We …

arXiv cs.LG TIER_1 English(EN) · Ao Li, Shangpeng Yang, Fahao Chen, Tianheng Xu, Peng Li, Zhou Su · 2026-05-22 04:00

GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving

arXiv:2605.22566v1 Announce Type: new Abstract: Large Language Model (LLM)-based agents demonstrate strong reasoning and execution capabilities on complex tasks when guided by structured instructions, commonly referred to as workflows. However, existing workflow-assisted agent se…

arXiv cs.AI TIER_1 English(EN) · Benedikt Bollig · 2026-05-22 04:00

Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows

arXiv:2605.20923v1 Announce Type: cross Abstract: Distributed LLM agent workflows should not be monitored as if they produced a single sequential log. In an asynchronous execution, a decision can only depend on events that are causally visible to the lifeline that makes it: an ev…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Estevam Hruschka · 2026-05-21 20:47

How to Steer Your Multi-Agent System: Human-LLM Collaborative Planning

In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency. Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning. We formalize a…

arXiv cs.LG TIER_1 English(EN) · Zhou Su · 2026-05-21 14:45

GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving

Large Language Model (LLM)-based agents demonstrate strong reasoning and execution capabilities on complex tasks when guided by structured instructions, commonly referred to as workflows. However, existing workflow-assisted agent serving systems typically rely on predefined templ…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Yohei Nakajima · 2026-05-21 04:55

The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Most agent frameworks are built around the language model: a conversation loop comes first, then tools, then rules, and finally a logging layer bolted on for observability, with state persisted as retrievable "memory." We describe ActiveGraph, a runtime that inverts this arrangem…

arXiv cs.AI TIER_1 English(EN) · Benedikt Bollig · 2026-05-20 09:09

Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows

Distributed LLM agent workflows should not be monitored as if they produced a single sequential log. In an asynchronous execution, a decision can only depend on events that are causally visible to the lifeline that makes it: an event that appears earlier in some log may still be …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 09:09

Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows

Distributed LLM agent workflows should not be monitored as if they produced a single sequential log. In an asynchronous execution, a decision can only depend on events that are causally visible to the lifeline that makes it: an event that appears earlier in some log may still be …

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Jason J. Choi · 2026-05-20 02:16

Time-To-Reach Separation and Safety Filtering for Safe, Fair, and Efficient Multi-Agent Coordination

Advanced Air Mobility (AAM) operations are expected to significantly increase aerial traffic in urban airspace, requiring autonomous traffic management systems to ensure collision-free operations in highly congested environments. In this paper, we propose a multi-agent coordinati…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Yew Soon Ong · 2026-05-19 14:39

LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions

Classical models of opinion dynamics assume human participants with bounded rationality and limited coordination. The rise of LLM-based agents introduces a qualitative shift: agents can now participate in online discussions at scale, maintain consistent persuasion strategies, and…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Ao Qu · 2026-05-18 20:37

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a delegation interface (call_model plus an optional …

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Chi Jin · 2026-05-17 23:36

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask dece…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Yang Shu · 2026-05-17 07:12

MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation

Multi-agent large language model (LLM) systems have shown promise for solving complex tasks through agent collaboration. However, existing frameworks assign tasks based on predefined roles without considering whether an agent can accurately assess its own competence boundaries, l…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Sajjad Khan · 2026-05-16 16:46

S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

We address concurrency control for LLM agents sharing mutable state over HTTP, where agents cannot be modified to declare read sets. S-Bus is an HTTP middleware whose central mechanism, a server-side DeliveryLog, reconstructs each agent's read set at commit time from observed HTT…

Replit blog TIER_1 English(EN) · 2026-04-06 15:00

How product managers ship faster using Replit's agentic workflows

This is part 4 of a 6-part series we’re running about how product managers are using AI tools and vibe coding. Written by and for product managers. Summary Requirements docs, decks, and tickets go stale because PMs update them by hand. Agentic workflows fix the source of that pro…

Replit blog TIER_1 English(EN) · 2025-07-22 22:08

Introducing Queue: A smarter way to work with Agent

Today, we’re excited to introduce Queue, a new capability designed to enhance the core Replit Agent experience. Queue allows users to submit multiple requests while the agent is actively working on a task, ensuring a continuous, uninterrupted app creation flow. As each task is co…

dev.to — MCP tag TIER_1 English(EN) · ekb · 2026-05-24 17:50

Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable

<p><em>How application observability extends to stochastic agent loops — and why the tool boundary matters.</em></p> <p>Production failures in LLM systems are often misattributed to the model. In practice, many incidents live in the <strong>action layer</strong>: a downstream API…

dev.to — LLM tag TIER_1 English(EN) · Alan West · 2026-05-25 23:42

为什么大型语言模型编码代理在处理长后端任务时会“跑偏”（以及如何解决）

<p>Last month I spent three days debugging a Django service where the AI agent had written... mostly correct code. The endpoints worked. The tests passed. But somewhere around the fourth file, it had quietly dropped a database transaction wrapper around a multi-step write. By fil…

r/MachineLearning TIER_1 English(EN) · /u/johnnaliu · 2026-05-25 01:02

Sponsio: Deterministic Contract Layer for LLM Agents [P]

<div class="md"><p>We've been trying to put LangGraph agents into production for a while. The thing that kept biting us was tool-call boundary enforcement: stuff like "must call X before Y", "max N retries", "approval gate before destructiv…

r/ClaudeAI TIER_2 English(EN) · /u/Dramatic_Squash_3502 · 2026-05-23 16:30

Deterministic multi-subagent orchestration - what's new in CC 2.1.146 (+4,755 tokens)

<table> <tr><td> <a href="https://www.reddit.com/r/ClaudeAI/comments/1tll4mv/deterministic_multisubagent_orchestration_whats/"> <img alt="Deterministic multi-subagent orchestration - what's new in CC 2.1.146 (+4,755 tokens)" src="https://preview.redd.it/4ptgd2yzyw2h1.png?width=64…

报道来源 [72]

相关实体

相关话题