PulseAugur
LIVE 03:58:18
research · [158 sources] ·
0
research

AI agents evolve: Research tackles scaling, safety, and emergent network risks

Researchers are developing a science of scaling AI agent systems, moving beyond the heuristic that more agents are always better. New studies reveal that multi-agent coordination significantly improves performance on parallelizable tasks but can degrade it on sequential ones. Efforts are underway to create predictive models for optimal agent architecture and to develop methods for real-time evaluation and error mitigation in agent interactions. AI

Summary written by gemini-2.5-flash-lite from 158 sources. How we write summaries →

IMPACT New research is defining principles for effective AI agent system design, moving beyond simple scaling heuristics and addressing complex coordination and safety challenges.

RANK_REASON Multiple research papers and studies are exploring the science of scaling AI agent systems, their coordination, and their interactions.

Read on OpenAI News →

AI agents evolve: Research tackles scaling, safety, and emergent network risks

COVERAGE [158]

  1. Google AI / Research TIER_1 ·

    Towards a science of scaling agent systems: When and why agent systems work

    Generative AI

  2. OpenAI News TIER_1 ·

    Netomi’s lessons for scaling agentic systems into the enterprise

    How Netomi scales enterprise AI agents using GPT-4.1 and GPT-5.2—combining concurrency, governance, and multi-step reasoning for reliable production workflows.

  3. Apple Machine Learning Research TIER_1 ·

    Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

    This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026. Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnect…

  4. Microsoft Research TIER_1 · Gagan Bansal, Shujaat Mirza, Keegan Hines, Will Epperson, Zachary Huang, Whitney Maxwell, Pete Bryan, Tyler Payne, Adam Fourney, Amanda Swearngin, Wenyue Hua, Tori Westerhoff, Amanda Minnich, Maya Murad, Ece Kamar, Ram Shankar Siva Kumar, Saleema Amershi ·

    Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

    <p>Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches.</p> <p>The post <a href="https://www.microsoft.com/en-us/research/blog/red-teaming-a-netwo…

  5. arXiv cs.LG TIER_1 · S. Pasricha ·

    MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters

    Large Language Models (LLMs) have become increasingly prevalent in cloud-based platforms, propelled by the introduction of AI-based consumer and enterprise services. LLM inference requests in particular account for up to 90% of total LLM lifecycle energy use, dwarfing training en…

  6. arXiv cs.AI TIER_1 · Elias Calboreanu ·

    Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance

    Prompt specifications for multi-agent large language model (LLM) systems carry data contracts and integration logic across many interdependent files but are rarely subjected to structured-inspection rigor. This paper reports a single-system empirical case study of iterative, agen…

  7. arXiv cs.CL TIER_1 · Rui Wang ·

    GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

    Reinforcement learning has become a widely used post-training approach for LLM agents, where training commonly relies on outcome-level rewards that provide only coarse supervision. While finer-grained credit assignment is promising for effective policy updates, obtaining reliable…

  8. arXiv cs.AI TIER_1 · Hongyang Chen ·

    ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

    Current LLM agents are proficient at calling isolated APIs but struggle with the "last mile" of commercial software automation. In real-world scenarios, tools are not independent; they are atomic, interdependent, and prone to environmental noise. We introduce $\textbf{ComplexMCP}…

  9. Hugging Face Daily Papers TIER_1 ·

    ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

    Current LLM agents are proficient at calling isolated APIs but struggle with the "last mile" of commercial software automation. In real-world scenarios, tools are not independent; they are atomic, interdependent, and prone to environmental noise. We introduce $\textbf{ComplexMCP}…

  10. arXiv cs.AI TIER_1 · Jiawei Li ·

    Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

    Experience-driven self-evolving agents aim to overcome the static nature of large language models by distilling reusable experience from past interactions, thus enabling adaptation to novel tasks at deployment time. This process places substantial demands on the foundation model'…

  11. arXiv cs.CL TIER_1 · Cristiano De Nobili ·

    Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics

    We investigate the emergent collective dynamics of LLM-based multi-agent systems on a 2D square lattice and present a model-agnostic statistical-physics method to disentangle social conformity from intrinsic bias, compute critical exponents, and probe the collective behavior and …

  12. arXiv cs.AI TIER_1 · Zhangchun Zhao ·

    Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

    As artificial intelligence engineering paradigms shift from single-agent Prompt and Context Engineering toward multi-agent \textbf{Coordination Engineering}, the ability to codify and systematically improve how multiple agents collaborate has emerged as a critical bottleneck. Whi…

  13. arXiv cs.CL TIER_1 · Fuli Feng ·

    TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

    Multi-agent systems (MAS) have emerged as a promising paradigm for solving complex tasks. Recent work has explored self-evolving MAS that automatically optimize agent capabilities or communication topologies. However, existing methods either learn a topology that remains fixed at…

  14. arXiv cs.CL TIER_1 · Heng Huang ·

    LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

    Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuris…

  15. arXiv cs.AI TIER_1 · Vincent Conitzer ·

    The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

    Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game set…

  16. arXiv cs.AI TIER_1 · Xunliang Cai ·

    AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

    As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tests whether agents ca…

  17. arXiv cs.LG TIER_1 · İsmail İlkan Ceylan ·

    RelAgent: LLM Agents as Data Scientists for Relational Learning

    Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., large language model…

  18. arXiv cs.AI TIER_1 · Hoki Kim ·

    CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

    Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportion…

  19. arXiv cs.AI TIER_1 · Keisuke Okumura ·

    Alternating Target-Path Planning for Scalable Multi-Agent Coordination

    The concurrent target assignment and pathfinding (TAPF) problem extends multi-agent pathfinding (MAPF) by asking planners to allocate distinct targets and collision-free paths to agents. Prior work on TAPF has relied exclusively on Conflict-Based Search (CBS), which tightly coupl…

  20. arXiv cs.LG TIER_1 · Yi Xie, Yangyang Xu, Yi Fan, Bo Liu ·

    SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

    arXiv:2605.05216v1 Announce Type: new Abstract: Large language models (LLMs) with a large number of parameters achieve strong performance but are often prohibitively expensive to deploy. Recent work explores using teams of smaller, more efficient LLMs that collectively match or e…

  21. arXiv cs.AI TIER_1 · Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Hao Peng ·

    SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

    arXiv:2605.05704v1 Announce Type: cross Abstract: With the rapid evolution of foundation models, Large Language Model (LLM) agents have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces significant security risks, as malicious actors c…

  22. arXiv cs.AI TIER_1 · Keisuke Kamahori, Shihang Li, Simon Peter, Baris Kasikci ·

    VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

    arXiv:2605.06068v1 Announce Type: new Abstract: For years, we have built LLM serving systems like any other critical infrastructure: a single general-purpose stack, hand-tuned over many engineer-years, meant to support every model and workload. In this paper, we take the opposite…

  23. arXiv cs.AI TIER_1 · Yuliang Xu, Xiang Xu, Yao Wan, Hu Wei, Tong Jia ·

    MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

    arXiv:2605.05949v1 Announce Type: new Abstract: Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios.Existing approaches pr…

  24. arXiv cs.AI TIER_1 Dansk(DA) · Hongcheol Cho, Ryangkyung Kang, Youngeun Kim ·

    SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

    arXiv:2605.05726v1 Announce Type: new Abstract: As LLM agents are increasingly deployed with large libraries of reusable skills, selecting the right skill for a user request has become a critical systems challenge. In small libraries, users may invoke skills explicitly by name, b…

  25. arXiv cs.AI TIER_1 · Zhengru Fang, Senkang Forest Hu, Zhonghao Chang, Yu Guo, Yihang Tao, Hongyao Liu, Mengzhe Ruan, Jun Huang, Yuguang Fang ·

    Inference-Time Budget Control for LLM Search Agents

    arXiv:2605.05701v1 Announce Type: new Abstract: LLM search agents increasingly rely on tools at inference time, but their trajectories are often constrained by hard limits on both tool calls and generated tokens. Under such dual budgets, better answers require not only stronger m…

  26. arXiv cs.AI TIER_1 · Haoyang Xie, Xinyuan Wang, Yancheng Wang, Puda Zhao, Feng Ju ·

    From History to State: Constant-Context Skill Learning for LLM Agents

    arXiv:2605.05413v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used to operate browsers, files, code and tools, making personal assistants a natural deployment target. Yet personal agents face a privacy-cost-capability tension: cloud models exe…

  27. arXiv cs.CL TIER_1 · Bufang Yang, Lilin Xu, Liekang Zeng, Yunqi Guo, Siyang Jiang, Wenrui Lu, Kaiwei Liu, Yixuan Li, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan ·

    ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild

    arXiv:2512.06721v2 Announce Type: replace-cross Abstract: Recent studies have begun to explore proactive large language model (LLM) agents that provide unobtrusive assistance by automatically leveraging contextual information, such as in code editing and in-app suggestions. Howev…

  28. arXiv cs.CL TIER_1 · Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang ·

    MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

    arXiv:2605.06623v1 Announce Type: cross Abstract: Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivota…

  29. arXiv cs.CL TIER_1 · Ming Liu ·

    More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

    arXiv:2605.05716v1 Announce Type: cross Abstract: LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destru…

  30. arXiv cs.LG TIER_1 · Hamed Hamzeh ·

    AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

    arXiv:2603.12031v2 Announce Type: replace-cross Abstract: State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by …

  31. arXiv cs.LG TIER_1 · Huchen Yang, Xinghao Dong, Dan Negrut, Jin-Long Wu ·

    Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems

    arXiv:2605.05703v1 Announce Type: cross Abstract: Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled traini…

  32. arXiv cs.LG TIER_1 Română(RO) · Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig ·

    Recursive Agent Optimization

    arXiv:2605.06639v1 Announce Type: new Abstract: We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents impleme…

  33. arXiv cs.LG TIER_1 · Zheng Zhang, Cuong C. Nguyen, Kevin Wells, Gustavo Carneiro ·

    Multi-agent decision making: A Blackwell's informativeness approach

    arXiv:2605.06028v1 Announce Type: new Abstract: The rapid development of large language models (LLMs) has motivated research on decision-making in multi-agent systems, where multiple agents collaborate to achieve shared objectives. Existing aggregation approaches, such as voting …

  34. arXiv cs.LG TIER_1 · Zhiyuan Zhai, Xin Wang ·

    Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL

    arXiv:2605.05802v1 Announce Type: new Abstract: Group-relative RL training (GRPO) samples a small group of parallel rollouts for every training prompt and uses their within-group reward spread to compute per-trajectory advantages. In agentic environments each rollout is a long mu…

  35. arXiv cs.AI TIER_1 Română(RO) · Graham Neubig ·

    Recursive Agent Optimization

    We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that natu…

  36. arXiv cs.AI TIER_1 · Min Zhang ·

    MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

    Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agen…

  37. arXiv cs.AI TIER_1 · Andrea Iannoli, Lorenzo Gigli, Luca Sciullo, Angelo Trotta, Marco Di Felice ·

    Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones

    arXiv:2605.03788v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited …

  38. arXiv cs.AI TIER_1 · Hanchen Li, Runyuan He, Qiuyang Mang, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica ·

    Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live

    arXiv:2511.02230v4 Announce Type: replace-cross Abstract: KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. This policy breaks for agentic workloads, whi…

  39. arXiv cs.AI TIER_1 · Zhiyuan Li, Wenshuai Zhao, Joni Pajarinen ·

    Closed-Loop Vision-Language Planning for Multi-Agent Coordination

    arXiv:2502.10148v3 Announce Type: replace Abstract: Cooperative multi-agent reinforcement learning (MARL) struggles with sample efficiency, interpretability, and generalization. While Large Language Models (LLMs) offer powerful planning capabilities, their application has been ha…

  40. arXiv cs.AI TIER_1 · Maxim Chupilkin ·

    Multi-Agent Strategic Games with LLMs

    arXiv:2605.03604v1 Announce Type: cross Abstract: This paper asks whether large language models (LLMs) can be used to study the strategic foundations of conflict and cooperation. I introduce LLMs as experimental subjects in a repeated security dilemma and evaluate whether they re…

  41. arXiv cs.AI TIER_1 · Kerri Prinos, Lilianne Brush, Cameron Denton, Zhanqi Wang, Joshua Knox, Snehal Antani, Anton Foltz, Amy Villase\~nor ·

    Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

    arXiv:2605.03034v1 Announce Type: new Abstract: Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches. Motivated by the operational needs of security operations centers (SOCs) that must configur…

  42. arXiv cs.LG TIER_1 · Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Jizhou Guo, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Hanrong Zhang, Fangxin Wang, Pengfei Zhang, Huacan Wang, Langzhou He, Yangning Li, Dongyuan Li, Renhe Jiang, Xue Liu, Ph ·

    LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey

    arXiv:2505.00753v5 Announce Type: replace-cross Abstract: Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability…

  43. arXiv cs.AI TIER_1 · Mengchen Zhao ·

    Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

    While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluatio…

  44. arXiv cs.LG TIER_1 · Jackie Baek, Yaopeng Fu, Will Ma, Tianyi Peng ·

    AI Agents for Inventory Control: Human-LLM-OR Complementarity

    arXiv:2602.12631v2 Announce Type: replace-cross Abstract: Inventory control is a fundamental operations problem in which ordering decisions are traditionally guided by theoretically grounded operations research (OR) algorithms. However, such algorithms often rely on rigid modelin…

  45. arXiv cs.LG TIER_1 · Maksym Nechepurenko, Pavel Shuvalov ·

    Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems

    arXiv:2605.03310v1 Announce Type: cross Abstract: Multi-agent LLM systems fail in production at rates between 41% and 87%, mostly due to coordination defects rather than base-model capability. Existing responses split between cataloguing failure modes empirically and shipping dec…

  46. arXiv cs.LG TIER_1 · Robert-Jeron Reifert, Alaa Alameer Ahmad, Hayssam Dahrouj, Aydin Sezgin ·

    Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models

    arXiv:2605.02911v1 Announce Type: new Abstract: Future sixth-generation (6G) mobile networks are envisioned to be equipped with a diverse set of powerful, yet highly specialized, optimization experts. Such a promising vision is concurrently expected to give rise to the need for s…

  47. arXiv cs.AI TIER_1 · Shuo Liu, Tianle Chen, Ryan Amiri, Christopher Amato ·

    Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

    arXiv:2601.21972v4 Announce Type: replace Abstract: Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined execution protocols, which often require centralized execution…

  48. arXiv cs.AI TIER_1 · Jose Manuel de la Chica, Juan Manuel Vera, Jairo Rodr\'iguez ·

    When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems

    arXiv:2605.02463v1 Announce Type: cross Abstract: Multi-agent LLM systems are increasingly used to solve complex tasks through decomposition, debate, specialization, and ensemble reasoning. However, these systems are usually evaluated in terms of robustness: whether performance i…

  49. arXiv cs.AI TIER_1 · Vicente Pelechanoa, Antoni Mestre, Manoli Albert, Miriam Gil ·

    HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

    arXiv:2605.02832v1 Announce Type: new Abstract: Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks…

  50. arXiv cs.AI TIER_1 · Xiyuan Zhou, Ruixi Zou, Xinlei Wang, Yuheng Cheng, Yan Xu, Junhua Zhao, Jinjin Gu ·

    EngiAgent: Fully Connected Coordination of LLM Agents for Solving Open-ended Engineering Problems with Feasible Solutions

    arXiv:2605.02289v1 Announce Type: new Abstract: Engineering problem solving is central to real-world decision-making, requiring mathematical formulations that not only represent complex problems but also produce feasible solutions under data and physical constraints. Unlike mathe…

  51. arXiv cs.AI TIER_1 · Manuel Hern\'andez, Eduardo S\'anchez-Soto ·

    Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems

    arXiv:2605.01879v1 Announce Type: new Abstract: The challenge of engineering autonomous agents capable of navigating the stochastic and adversarial nature of the physical world has historically resided at the intersection of symbolic logic and control theory. Traditional multi-ag…

  52. arXiv cs.AI TIER_1 · Guowei Zou, Haitao Wang, Beiwen Zhang, Boning Zhang, Hejun Wu ·

    CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

    arXiv:2605.01457v1 Announce Type: new Abstract: Generative models have emerged as a major paradigm for offline multi-agent reinforcement learning (MARL), but existing approaches require many iterative sampling steps. Recent few-step accelerations either distill a joint teacher in…

  53. arXiv cs.LG TIER_1 · Hongbo Jin, Rongpeng Zhu, Jiayu Ding, Guibo Luo, Ge Li ·

    HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

    arXiv:2603.00977v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable exe…

  54. arXiv cs.AI TIER_1 · Marco Di Felice ·

    Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones

    Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long-running closed-…

  55. arXiv cs.AI TIER_1 · Maxim Chupilkin ·

    Multi-Agent Strategic Games with LLMs

    This paper asks whether large language models (LLMs) can be used to study the strategic foundations of conflict and cooperation. I introduce LLMs as experimental subjects in a repeated security dilemma and evaluate whether they reproduce canonical mechanisms from international re…

  56. arXiv cs.LG TIER_1 · Vik Pant, Eric Yu ·

    Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition

    arXiv:2605.02063v1 Announce Type: cross Abstract: We present Coopetition-Gym v1, a benchmark platform for mixed-motive multi-agent reinforcement learning under strategic coopetition. The platform comprises twenty environments organized into four mechanism classes that correspond …

  57. arXiv cs.LG TIER_1 · Wenyi Wu, Sibo Zhu, Kun Zhou, Biwei Huang ·

    Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning

    arXiv:2605.02168v1 Announce Type: cross Abstract: Language model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we …

  58. arXiv cs.CL TIER_1 · Siddeshwar Raghavan, Tanwi Mallick ·

    MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding

    arXiv:2510.08804v3 Announce Type: replace Abstract: We present MOSAIC, a multi-agent Large Language Model (LLM) framework for solving challenging scientific coding tasks. Unlike general-purpose coding, scientific workflows require algorithms that are rigorous, interconnected with…

  59. arXiv cs.CL TIER_1 · Chenchen Zhang ·

    Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

    arXiv:2605.02801v1 Announce Type: new Abstract: As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, an…

  60. arXiv cs.CL TIER_1 · Jianze Wang, Ying Liu, Jinlong Chen, Xuchun Hu, Qilong Zhang, Yu Cao, Jun Wang, Hua Yang, Yong Xie, Qianglong Chen ·

    MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

    arXiv:2605.01347v1 Announce Type: new Abstract: On-policy distillation (OPD) trains a student on its own trajectories under token-level teacher supervision, but existing methods are capped by a single-teacher capability ceiling: when the teacher errs, the student inherits the err…

  61. arXiv cs.AI TIER_1 · Miriam Gil ·

    HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

    Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks or take complementary roles depending on contex…

  62. arXiv cs.CL TIER_1 · Chenchen Zhang ·

    Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

    As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based m…

  63. arXiv cs.AI TIER_1 · Jairo Rodríguez ·

    When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems

    Multi-agent LLM systems are increasingly used to solve complex tasks through decomposition, debate, specialization, and ensemble reasoning. However, these systems are usually evaluated in terms of robustness: whether performance is preserved under perturbation. This paper studies…

  64. Hugging Face Daily Papers TIER_1 ·

    EngiAgent: Fully Connected Coordination of LLM Agents for Solving Open-ended Engineering Problems with Feasible Solutions

    Engineering problem solving is central to real-world decision-making, requiring mathematical formulations that not only represent complex problems but also produce feasible solutions under data and physical constraints. Unlike mathematical problem solving, which operates on prede…

  65. arXiv cs.LG TIER_1 · Chunlei Meng, Pengbin Feng, Rong Fu, Hoi Leong Lee, Xiaojing Du, Zhaolu Kang, Zeyu Zhang, Weilin Zhou, Chun Ouyang, Zhongxue Gan ·

    Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

    arXiv:2605.00370v1 Announce Type: new Abstract: Centralized multimodal learning commonly compresses language, acoustic, and visual signals into a single fused representation for prediction. While effective, this paradigm suffers from two limitations: modality dominance, where opt…

  66. arXiv cs.LG TIER_1 · Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood, Hongtai Wei, Sudeep Das, Danny Nightingale, Meg Watson, Charles Pollnow V ·

    Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

    arXiv:2603.03565v2 Announce Type: replace-cross Abstract: Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to o…

  67. arXiv cs.AI TIER_1 · Giuseppe Arbore, Andrea Sillano, Luigi De Russis ·

    Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

    arXiv:2604.27882v1 Announce Type: new Abstract: Recent advances in agentic AI are shifting automation from discrete tools to proactive multi-agent systems that coordinate multi-specialized capabilities behind unified interfaces. However, today's agent systems typically rely on ha…

  68. arXiv cs.AI TIER_1 · Junan Hu, Jian Liu, Jingxiang Lai, Jiarui Hu, Yiwei Sheng, Shuang Chen, Jian Li, Dazhao Du, Song Guo ·

    GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

    arXiv:2604.27955v1 Announce Type: new Abstract: Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone cannot handle long-horizon credit ass…

  69. arXiv cs.AI TIER_1 · Rahul Ramachandran, Nidhi Jha, Muthukumaran Ramasubramanian ·

    Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

    arXiv:2604.28043v1 Announce Type: new Abstract: We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches, CARE specifies behavior, groun…

  70. arXiv cs.AI TIER_1 · Pedro-Aar\'on Hern\'andez-\'Avalos, Luciano Garc\'ia-Ba\~nuelos ·

    Pragmos: A Process Agentic Modeling System

    arXiv:2604.27311v1 Announce Type: cross Abstract: The advent of Large Language Models (LLMs) has significantly transformed tasks across Software Engineering. In the context of Business Process Management, LLMs are now being explored as tools to derive process models directly from…

  71. arXiv cs.AI TIER_1 · Jiaju Chen, Jinghua Piao, Xia Xu, Songwei Li, Tong Xia, Xiangnan He, Yong Li ·

    AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments

    arXiv:2604.27725v1 Announce Type: cross Abstract: A long-standing challenge in economics lies not in the lack of intuition, but in the difficulty of translating intuitive insights into verifiable research. To address this challenge, we introduce AgentEconomist, an end-to-end inte…

  72. arXiv cs.AI TIER_1 · Francesca Gomez ·

    From surveillance to signalling: escalation channels as environmental controls for agentic AI

    arXiv:2510.05192v2 Announce Type: replace-cross Abstract: When AI agents operating with access to sensitive information encounter a conflict between completing an assigned task and following rules or ethical constraints, they can resort to unsanctioned behaviour. Existing inferen…

  73. arXiv cs.CL TIER_1 · Jiacheng Liu, Zichen Tang, Zhongjun Yang, Xinyi Hu, Xueyuan Lin, Linwei Jia, Ruofei Bai, Rongjin Li, Shiyao Peng, Haocheng Gao, Haihong E ·

    RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems

    arXiv:2604.27616v1 Announce Type: new Abstract: People commonly leverage structured content to accelerate knowledge acquisition and research problem solving. Among these, roadmaps guide researchers through hierarchical subtasks to solve complex research problems step by step. Des…

  74. arXiv cs.AI TIER_1 · Chunhui Zhang, Yuxuan Wang, Aoyang Qin, Yi-Long Lu, Kunlun Wu, Yizhou Wang, Wei Wang ·

    Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

    arXiv:2604.27699v1 Announce Type: new Abstract: Current embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior and resolving motivational confli…

  75. arXiv cs.AI TIER_1 · Anh Ta, Junjie Zhu, Shahin Shayandeh ·

    Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

    arXiv:2604.27233v1 Announce Type: new Abstract: Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors…

  76. arXiv cs.LG TIER_1 · Yifei Wang, Hancheng Ye, Yechen Xu, Cong Guo, Chiyue Wei, Qinsi Wang, Dongting Li, Tingjun Chen, Hai "Helen" Li, Danyang Zhuo, Yiran Chen ·

    MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

    arXiv:2604.26963v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-to…

  77. arXiv cs.AI TIER_1 · Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan ·

    Step-level Optimization for Efficient Computer-use Agents

    arXiv:2604.27151v1 Announce Type: new Abstract: Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite re…

  78. arXiv cs.AI TIER_1 · Muthukumaran Ramasubramanian ·

    Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

    We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches, CARE specifies behavior, grounding, tool orchestration, and verification throu…

  79. arXiv cs.AI TIER_1 · Luigi De Russis ·

    Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

    Recent advances in agentic AI are shifting automation from discrete tools to proactive multi-agent systems that coordinate multi-specialized capabilities behind unified interfaces. However, today's agent systems typically rely on hard-coded agent architectures with fixed roles, c…

  80. arXiv cs.AI TIER_1 · Yong Li ·

    AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments

    A long-standing challenge in economics lies not in the lack of intuition, but in the difficulty of translating intuitive insights into verifiable research. To address this challenge, we introduce AgentEconomist, an end-to-end interactive system designed to translate abstract intu…

  81. Hugging Face Daily Papers TIER_1 ·

    Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

    Current embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior and resolving motivational conflicts. We introduce \textit{ValuePlanner}, a hiera…

  82. arXiv cs.CL TIER_1 · Haihong E ·

    RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems

    People commonly leverage structured content to accelerate knowledge acquisition and research problem solving. Among these, roadmaps guide researchers through hierarchical subtasks to solve complex research problems step by step. Despite progress in structured content generation, …

  83. arXiv cs.AI TIER_1 · Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng ·

    FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

    arXiv:2604.26733v1 Announce Type: new Abstract: Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents th…

  84. arXiv cs.AI TIER_1 · Benedikt Bollig, Matthias F\"ugger, Thomas Nowak ·

    Provable Coordination for LLM Agents via Message Sequence Charts

    arXiv:2604.17612v2 Announce Type: replace-cross Abstract: Multi-agent systems built on large language models (LLMs) are difficult to reason about. Coordination errors such as deadlocks or type-mismatched messages are often hard to detect through testing. We introduce a domain-spe…

  85. arXiv cs.AI TIER_1 · Xingyan Liu, Xiyue Luo, Linyu Li, Ganghong Huang, Jianfeng Liu, Honglin Qiao ·

    SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support

    arXiv:2604.08618v2 Announce Type: replace-cross Abstract: Deploying LLM-powered agents in enterprise scenarios such as cloud technical support demands high-quality, domain-specific skills. However, existing skill creators lack domain grounding, producing skills poorly aligned wit…

  86. arXiv cs.AI TIER_1 Nederlands(NL) · Christoph Riedl ·

    Emergent Coordination in Multi-Agent Language Models

    arXiv:2510.05174v4 Announce Type: replace-cross Abstract: When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way …

  87. arXiv cs.AI TIER_1 · Junxing Hu, Tianlong Li, Lei Yu, Ai Han ·

    OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction

    arXiv:2604.25602v2 Announce Type: replace Abstract: Deploying production-ready multi-agent systems (MAS) in complex industrial environments remains challenging due to limitations in scalability, observability, and autonomous evolution. We present OxyGent, an open-source framework…

  88. arXiv cs.AI TIER_1 · Ariel Sela ·

    Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

    arXiv:2604.26561v1 Announce Type: cross Abstract: Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assig…

  89. arXiv cs.AI TIER_1 · Bochao Liu, Zhipeng Qian, Yang Zhao, Xinyuan Jiang, Zihan Liang, Yufei Ma, Junpeng Zhuang, Ben Chen, Shuo Yang, Hongen Wan, Yao Wu, Chenyi Lei, Xiao Liang ·

    Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

    arXiv:2604.26805v1 Announce Type: new Abstract: Operating and maintaining (O&amp;M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are…

  90. arXiv cs.AI TIER_1 · Mahnoor Shahid, Hannes Rothe ·

    AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents

    arXiv:2604.26522v1 Announce Type: new Abstract: Large Language Model (LLM)-based agents exhibit systemic failures in compositional generalization, limiting their robustness in interactive environments. This work introduces AGEL-Comp, a neuro-symbolic AI agent architecture designe…

  91. arXiv cs.CL TIER_1 · Rui Wang, Ce Zhang, Jun-Yu Ma, Jianshu Zhang, Hongru Wang, Yi Chen, Boyang Xue, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu, Kam-Fai Wong ·

    WebAggregator: Enhancing Compositional Reasoning Capabilities of Deep Research Agent Foundation Models

    arXiv:2510.14438v2 Announce Type: replace Abstract: The hallmark of Deep Research agents lies in compositional reasoning, the capacity to aggregate distributed, heterogeneous information into coherent logical insights. However, current agentic systems are often retrieval-heavy bu…

  92. arXiv cs.CL TIER_1 · Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian Yang, Chenghua Lin ·

    A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

    arXiv:2604.19572v2 Announce Type: replace Abstract: As terminal agents scale to long-horizon, multi-turn workflows, a key bottleneck is not merely limited context length, but the accumulation of noisy terminal observations in the interaction history. Retaining raw observations pr…

  93. arXiv cs.AI TIER_1 · Tom Liptay, Dan Schwarz, Rafael Poyiadzi, Jack Wildman, Nikos I. Bosse ·

    Evaluating Strategic Reasoning in Forecasting Agents

    arXiv:2604.26106v1 Announce Type: new Abstract: Forecasting benchmarks produce accuracy leaderboards but little insight into why some forecasters are more accurate than others. We introduce Bench to the Future 2 (BTF-2), 1,417 pastcasting questions with a frozen 15M-document rese…

  94. Hugging Face Daily Papers TIER_1 ·

    Pragmos: A Process Agentic Modeling System

    The advent of Large Language Models (LLMs) has significantly transformed tasks across Software Engineering. In the context of Business Process Management, LLMs are now being explored as tools to derive process models directly from textual descriptions. Existing approaches range f…

  95. Hugging Face Daily Papers TIER_1 ·

    Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

    Operating and maintaining (O&amp;M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment b…

  96. arXiv cs.AI TIER_1 · Xiao Liang ·

    Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

    Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottl…

  97. Hugging Face Daily Papers TIER_1 ·

    FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

    Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just a…

  98. arXiv cs.AI TIER_1 · Shuxin Zheng ·

    FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

    Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just a…

  99. arXiv cs.AI TIER_1 · Ariel Sela ·

    Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

    Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectives. We present the AI Council,…

  100. arXiv cs.AI TIER_1 · Hannes Rothe ·

    AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents

    Large Language Model (LLM)-based agents exhibit systemic failures in compositional generalization, limiting their robustness in interactive environments. This work introduces AGEL-Comp, a neuro-symbolic AI agent architecture designed to address this challenge by grounding actions…

  101. arXiv cs.CL TIER_1 · Stan Loosmore ·

    Leverage Laws: A Per-Task Framework for Human-Agent Collaboration

    arXiv:2604.25040v1 Announce Type: cross Abstract: We propose a per-task leverage ratio for human-agent collaboration: human work displaced by an agent, divided by the human time required to specify the task, resolve mid-run interrupts, and review the result. The denominator decom…

  102. arXiv cs.LG TIER_1 · Shiyi Du, Jiayuan Liu, Weihua Du, Yue Huang, Jiayi Li, Yingtao Luo, Xiangliang Zhang, Vincent Conitzer, Carl Kingsford ·

    Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors

    arXiv:2604.25012v1 Announce Type: new Abstract: Automated agentic workflow design currently relies on per-task iterative search, which is computationally prohibitive and fails to reuse structural knowledge across tasks. We observe that optimized workflows converge to a small fami…

  103. arXiv cs.CL TIER_1 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao ·

    Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

    arXiv:2603.14248v2 Announce Type: replace-cross Abstract: Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering li…

  104. arXiv cs.CL TIER_1 · Kaixuan Fan, Kaituo Feng, Manyuan Zhang, Tianshuo Peng, Zhixun Li, Yilei Jiang, Shuang Chen, Peng Pei, Xunliang Cai, Xiangyu Yue ·

    Exploring Reasoning Reward Model for Agents

    arXiv:2601.22154v2 Announce Type: replace-cross Abstract: Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still relies on sparse outcome-based reward for training. Such fe…

  105. arXiv cs.CL TIER_1 · Abhijnan Nath, Hannah VanderHoeven, Nikhil Krishnaswamy ·

    CRAFT: Grounded Multi-Agent Coordination Under Partial Information

    arXiv:2603.25268v2 Announce Type: replace Abstract: We introduce CRAFT, a multi-agent benchmark for evaluating pragmatic communication in large language models under strict partial information. In this setting, multiple agents with complementary but incomplete views must coordina…

  106. arXiv cs.CL TIER_1 Română(RO) · Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou ·

    Recursive Multi-Agent Systems

    arXiv:2604.25917v1 Announce Type: cross Abstract: Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to mul…

  107. arXiv cs.CL TIER_1 · Abigail O'Neill, Alan Zhu, Mihran Miroyan, Narges Norouzi, Joseph E. Gonzalez ·

    Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

    arXiv:2604.25088v1 Announce Type: cross Abstract: Language Model (LM)-based agents remain largely untested in mixed-motive settings where agents must leverage short-term cooperation for long-term competitive goals (e.g., multi-party politics). We introduce Cooperate to Compete (C…

  108. arXiv cs.CL TIER_1 · Yunsu Kim, Kaden Uhlig, Joern Wuebker ·

    GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

    arXiv:2604.24929v1 Announce Type: new Abstract: Agent benchmarks remain largely English-centric, while their multilingual versions are often built with machine translation (MT) and limited post-editing. We argue that, for agentic tasks, this minimal workflow can easily break benc…

  109. arXiv cs.CL TIER_1 Română(RO) · James Zou ·

    Recursive Multi-Agent Systems

    Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration…

  110. arXiv cs.AI TIER_1 · Ai Han ·

    OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction

    Deploying production-ready multi-agent systems (MAS) in complex industrial environments remains challenging due to limitations in scalability, observability, and autonomous evolution. We present OxyGent, an open-source framework that enables modular, observable, and evolvable MAS…

  111. arXiv cs.CL TIER_1 · Yizhe Chi, Deyao Hong, Dapeng Jiang, Tianwei Luo, Kaisen Yang, Boshi Zhang, Zhe Cao, Xiaoyan Fan, Bingxiang He, Han Hao, Weiyang Jin, Dianqiao Lei, Qingle Liu, Houde Qian, Bowen Wang, Situ Wang, Youjie Zheng, Yifan Zhou, Calvin Xiao, Eren Cai, Qinhuai Na ·

    Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization

    arXiv:2604.12290v2 Announce Type: replace-cross Abstract: Current LLM agent benchmarks, which predominantly focus on binary pass/fail tasks such as code generation or search-based question answering, often neglect the value of real-world engineering that is often captured through…

  112. arXiv cs.AI TIER_1 · Wenji Fang, Yao Lu, Shang Liu, Jing Wang, Ziyan Guo, Junxian He, Fengbin Tu, Zhiyao Xie ·

    Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

    arXiv:2604.14989v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have sparked growing interest in automatic RTL optimization for better performance, power, and area (PPA). However, existing methods are still far from realistic RTL optimization. …

  113. arXiv cs.AI TIER_1 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang ·

    Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

    arXiv:2603.25158v4 Announce Type: replace Abstract: Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields f…

  114. arXiv cs.AI TIER_1 · Yifan Zhang, Jianmin Ye, Jiahao Yang, Xi Wang ·

    RefEvo: Agentic Design with Co-Evolutionary Verification for Agile Reference Model Generation

    arXiv:2604.24218v1 Announce Type: cross Abstract: As the complexity of System-on-Chip (SoC) designs grows, the shift-left paradigm necessitates the rapid development of high-fidelity reference models (typically written in SystemC) for early architecture exploration and verificati…

  115. arXiv cs.AI TIER_1 · Zhuohui Zhang, Bin Cheng, Bin He ·

    DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making

    arXiv:2604.23557v1 Announce Type: cross Abstract: Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action sp…

  116. arXiv cs.AI TIER_1 · Patrizio Dazzi, Emanuele Carlini, Matteo Mordacchini, Saul Urso ·

    Usable Agent Discovery for Decentralized AI Systems

    arXiv:2604.23080v1 Announce Type: cross Abstract: Large-scale agentic systems run on distributed infrastructures where many software agents share physical hosts and are discovered via peer-to-peer mechanisms. Discovery must handle node-level churn from failures and host departure…

  117. arXiv cs.AI TIER_1 · Zavier Ndum Ndum, Jian Tao, John Ford, Mansung Yim, Yang Liu ·

    RADIANT-LLM: an Agentic Retrieval Augmented Generation Framework for Reliable Decision Support in Safety-Critical Nuclear Engineering

    arXiv:2604.22755v1 Announce Type: cross Abstract: Reliable decision support in nuclear engineering requires traceable, domain-grounded knowledge retrieval, yet safety and risk analysis workflows remain hampered by fragmented documentation and hallucination when use pre-trained la…

  118. arXiv cs.AI TIER_1 · Boqin Yuan, Renchu Song, Yue Su, Sen Yang, Jing Qin ·

    ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

    arXiv:2604.23853v1 Announce Type: new Abstract: Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal: how much each step costs. Without per-step cost, a pipeline cannot distinguish adding a missing step to fix a bug from removi…

  119. arXiv cs.CL TIER_1 · Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu ·

    From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

    arXiv:2604.24026v1 Announce Type: new Abstract: LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts,…

  120. arXiv cs.AI TIER_1 · Rong Xiang ·

    Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

    arXiv:2604.23646v1 Announce Type: new Abstract: Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user requests. Existing mitigation methods…

  121. arXiv cs.AI TIER_1 · Haoran Tan, Zeyu Zhang, Chen Ma, Tianze Liu, Quanyu Dai, Xu Chen ·

    From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents

    arXiv:2604.23194v1 Announce Type: new Abstract: Large language model-based agents have recently emerged as powerful approaches for solving dynamic and multi-step tasks. Most existing agents employ planning mechanisms to guide long-term actions in dynamic environments. However, cu…

  122. arXiv cs.AI TIER_1 · Edward Cheng, Jeshua Cheng ·

    A Decoupled Human-in-the-Loop System for Controlled Autonomy in Agentic Workflows

    arXiv:2604.23049v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute tasks and make decisions within agentic workflows, introducing new requirements for safe and controlled autonomy. Prior work has established the importance of human oversight for ensuri…

  123. arXiv cs.LG TIER_1 · Tianbao Zhang ·

    Harness as an Asset: Enforcing Determinism via the Convergent AI Agent Framework (CAAF)

    arXiv:2604.17025v2 Announce Type: replace-cross Abstract: Large Language Models produce a controllability gap in safety-critical engineering: even low rates of undetected constraint violations render a system undeployable. Current orchestration paradigms suffer from sycophantic c…

  124. arXiv cs.LG TIER_1 · Jie Wu, Ming Gong ·

    Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems

    arXiv:2604.22879v1 Announce Type: cross Abstract: We identify and formalize a novel security risk: Context-Fragmented Violations (CFVs) - a class of policy breaches where individual agent actions appear locally safe and reasonable, yet collectively violate organizational policies…

  125. Hugging Face Daily Papers TIER_1 ·

    Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

    Language Model (LM)-based agents remain largely untested in mixed-motive settings where agents must leverage short-term cooperation for long-term competitive goals (e.g., multi-party politics). We introduce Cooperate to Compete (C2C), a multi-agent environment where players can e…

  126. arXiv cs.CL TIER_1 · Joseph E. Gonzalez ·

    Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest

    Language Model (LM)-based agents remain largely untested in mixed-motive settings where agents must leverage short-term cooperation for long-term competitive goals (e.g., multi-party politics). We introduce Cooperate to Compete (C2C), a multi-agent environment where players can e…

  127. Hugging Face Daily Papers TIER_1 ·

    Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

    Rapid advances in Large Language Models (LLMs) create new opportunities by enabling efficient exploration of broad, complex design spaces. This is particularly valuable in computer architecture, where performance depends on microarchitectural designs and policies drawn from vast …

  128. arXiv cs.CL TIER_1 · Stan Loosmore ·

    Leverage Laws: A Per-Task Framework for Human-Agent Collaboration

    We propose a per-task leverage ratio for human-agent collaboration: human work displaced by an agent, divided by the human time required to specify the task, resolve mid-run interrupts, and review the result. The denominator decomposes into three channels through which a conserve…

  129. arXiv cs.LG TIER_1 · Carl Kingsford ·

    Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors

    Automated agentic workflow design currently relies on per-task iterative search, which is computationally prohibitive and fails to reuse structural knowledge across tasks. We observe that optimized workflows converge to a small family of domain-specific topologies, suggesting tha…

  130. arXiv cs.CL TIER_1 · Joern Wuebker ·

    GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

    Agent benchmarks remain largely English-centric, while their multilingual versions are often built with machine translation (MT) and limited post-editing. We argue that, for agentic tasks, this minimal workflow can easily break benchmark validity through query-answer misalignment…

  131. arXiv cs.AI TIER_1 · Xi Wang ·

    RefEvo: Agentic Design with Co-Evolutionary Verification for Agile Reference Model Generation

    As the complexity of System-on-Chip (SoC) designs grows, the shift-left paradigm necessitates the rapid development of high-fidelity reference models (typically written in SystemC) for early architecture exploration and verification. While Large Language Models (LLMs) show promis…

  132. Hugging Face Daily Papers TIER_1 ·

    Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis

    Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first …

  133. arXiv cs.CL TIER_1 · Yang Liu ·

    From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

    LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structur…

  134. arXiv cs.AI TIER_1 · Aimin Zhang, Jiajing Guo, Fuwei Jia, Chen Lv, Boyu Wang, Fangzheng Li ·

    EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation

    arXiv:2604.20133v2 Announce Type: replace Abstract: This paper proposes EvoAgent - an evolvable large language model (LLM) agent framework that integrates structured skill learning with a hierarchical sub-agent delegation mechanism. EvoAgent models skills as multi-file structured…

  135. arXiv cs.AI TIER_1 · Zhengxu Yu, Yu Fu, Zhiyuan He, Yuxuan Huang, Lee Ka Yiu, Meng Fang, Weilin Luo, Jun Wang ·

    From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

    arXiv:2604.22446v1 Announce Type: new Abstract: Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. W…

  136. arXiv cs.AI TIER_1 · Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fen ·

    Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

    arXiv:2604.22748v1 Announce Type: new Abstract: As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with…

  137. arXiv cs.AI TIER_1 · Binyan Xu, Dong Fang, Haitao Li, Kehuan Zhang ·

    From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?

    arXiv:2604.01608v3 Announce Type: replace Abstract: Multi-agent systems (MAS) tackle complex tasks by distributing expertise, though this often comes at the cost of heavy coordination overhead, context fragmentation, and brittle phase ordering. Distilling a MAS into a single-agen…

  138. Hugging Face Daily Papers TIER_1 ·

    TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents

    On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, its behavior in multi-turn agent settings remains underexplored. In this work, we i…

  139. arXiv cs.AI TIER_1 · Jiaya Jia ·

    Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

    As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictiv…

  140. arXiv cs.AI TIER_1 · Jun Wang ·

    From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

    Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a p…

  141. arXiv cs.CV TIER_1 · Song Guo ·

    GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

    Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone cannot handle long-horizon credit assignment, distribution shifts, and safe explorati…

  142. Databricks Blog TIER_1 ·

    Inside one of the first production deployments of Lakebase: LangGuard's agentic workflow governance engine

    The invisible problem with agentic AIMost enterprises are experimenting with autonomous AI agents...

  143. Towards AI TIER_1 · Deepanshu Gupta ·

    The Orchestration Tax: Why Multi-Agent Systems Get Expensive

    <h4>How context propagation, supervisor loops, tool calls, memory, and observability quietly drive up the cost of production agentic systems.</h4><p>Multi-agent AI systems are quickly becoming a default pattern for building advanced LLM applications. Instead of relying on one mod…

  144. dev.to — MCP tag TIER_1 · Shir Meir Lador ·

    Architect A Personalized Multi-Agent System with Long-Term Memory

    <p>In support of our mission to accelerate the developer journey on Google Cloud, we built <strong>Dev Signal</strong> — a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation.</p>…

  145. Towards AI TIER_1 · Anubhav ·

    Multi-Agent Systems: When 2 Agents Beat 1 (and When They Don’t)

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://pub.towardsai.net/multi-agent-systems-when-2-agents-beat-1-and-when-they-dont-f4e352541695?source=rss----98111c9905da---4"><img src="https://cdn-images-1.medium.com/max/2600/1*XHvN4bndLs56ffkXLJtw3w.png" …

  146. dev.to — MCP tag TIER_1 · Jangwook Kim ·

    LangGraph + MCP: Build a Supervisor Multi-Agent System

    <h2> Why This Pattern Matters </h2> <p>Most LangGraph tutorials stop at single agents. A single agent that does research, writes code, and formats a report is juggling three jobs — and as the task list grows, the prompt grows with it. The supervisor pattern solves this: one orche…

  147. dev.to — LLM tag TIER_1 · Sergei Peleskov ·

    Why Single Agents Beat Multi-Agent Systems at Equal Token Budgets

    <h2> TL;DR </h2> <ul> <li>Stanford (Tran &amp; Kiela, arXiv 2604.02460) tested single-agent vs multi-agent systems with <strong>identical thinking-token budgets</strong> </li> <li>Single agent wins on accuracy AND on compute, across three model families</li> <li>The mechanism is …

  148. dev.to — LLM tag TIER_1 · Rishabh Sethia ·

    Multi-Agent Systems Explained: How Orchestrator + Specialist Agent Architecture Works

    <p>Here's the uncomfortable truth about single-agent AI systems: they don't scale. Not because the models aren't capable, but because you're asking one entity to simultaneously plan, execute, research, verify, and synthesize — often in a single context window that fills up faster…

  149. dev.to — LLM tag TIER_1 · 丁久 ·

    Multi-Agent Systems: Coordination, Communication, Consensus

    <blockquote> <p><em>This article was originally published on <a href="https://dingjiu1989-hue.github.io/en/ai/multi-agent-systems.html" rel="noopener noreferrer">AI Study Room</a>. For the full version with working code examples and related articles, visit the original post.</em>…

  150. Mastodon — fosstodon.org TIER_1 日本語(JA) · [email protected] ·

    CarPlay Another AI from Mr. Musk Appears. Musk's Grok Starts Operating in Cars – Letem svetem Applem https://www.yayafa.com/2798752/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialInt

    CarPlay マスク氏のAIがまた一つ登場。マスク氏のGrokが車内で稼働開始 – Letem svetem Applem https://www. yayafa.com/2798752/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence # Grok # xai # XAIGrok # エージェント型AI # 人工知能 # 汎用人工知能

  151. Mastodon — fosstodon.org TIER_1 日本語(JA) · [email protected] ·

    Will AI Assigned a Task Tamper with Data? Microsoft's Latest Research Exposes the Trap of Autonomous Agents | XenoSpectrum https://www.yayafa.com/2798750/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIn

    仕事を任されたAIはデータを改ざんする?Microsoftの最新研究が暴く自律型エージェントの罠 | XenoSpectrum https://www. yayafa.com/2798750/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence # Copilot # Microsoft # MicrosoftAI # MicrosoftCopilot # エージェント型AI # 人工知能 # 汎用人工知能

  152. Mastodon — fosstodon.org TIER_1 日本語(JA) · [email protected] ·

    What is Copilot Cowork? Explaining the latest features, how to use it, and pricing https://www.yayafa.com/2798748/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence # Copilot # Microsoft

    Copilot Coworkとは?最新機能・使い方・料金を解説 https://www. yayafa.com/2798748/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence # Copilot # Microsoft # MicrosoftAI # MicrosoftCopilot # エージェント型AI # 人工知能 # 汎用人工知能

  153. Mastodon — fosstodon.org TIER_1 Italiano(IT) · [email protected] ·

    🧠 Multi-agent orchestration is a new feature of Claude's Managed Agents. 🤖 A coordinating agent can delegate tasks to multiple independent agents

    🧠 La multi-agent orchestration è una nuova funzionalità dei Managed Agents di # Claude . 🤖 Un agente coordinatore può delegare attività a più agenti indipendenti. 👉 I dettagli: https://www. linkedin.com/posts/alessiopoma ro_claude-ai-ai-activity-7458473224192962560-Yr4O ___ ✉️ 𝗦𝗲…

  154. dev.to — LLM tag TIER_1 · code plato ·

    LLM-Based AI Agent Architecture: A New Kind of Personal Computer on Your Device

    <p>For a long time, we've thought of AI as a "chatbot."</p> <p>But if you step back and look from a systems architecture perspective, you'll find that a truly mature AI agent looks more like a new kind of personal computer — one that lives on your device.</p> <p>It has:</p> <ul> …

  155. Mastodon — mastodon.social TIER_1 · bagrounds ·

    2026-05-02 | 🤖 🧩 The Agency Mesh: Orchestrating the Swarm 🤖 # AI Q: 🤖 Should AI prioritize efficiency? 🤖 Multi-Agent Systems | ⚖️ Distributed Consensus | 🕸️ Sys

    2026-05-02 | 🤖 🧩 The Agency Mesh: Orchestrating the Swarm 🤖 # AI Q: 🤖 Should AI prioritize efficiency? 🤖 Multi-Agent Systems | ⚖️ Distributed Consensus | 🕸️ System Architecture | 📜 Protocol Design https:// bagrounds.org/auto-blog-zero/2 026-05-02-the-agency-mesh-orchestrating-the…

  156. Mastodon — mastodon.social TIER_1 · taoofmac ·

    Agentic Systems Notes and resources on building and operating agentic AI systems, covering orchestration frameworks, task routing, memory, and evaluation approa

    Agentic Systems Notes and resources on building and operating agentic AI systems, covering orchestration frameworks, task routing, memory, and evaluation approaches that extend baseline LLM capabi(...) # agents # ai # orchestration https:// taoofmac.com/space/ai/agentic? utm_cont…

  157. Mastodon — mastodon.social TIER_1 · taoofmac ·

    OpenClaw Ecosystem OpenClaw is a self-hosted personal AI assistant you run on your own devices, with a gateway control plane that connects to the chat channels

    OpenClaw Ecosystem OpenClaw is a self-hosted personal AI assistant you run on your own devices, with a gateway control plane that connects to the chat channels you already use (WhatsApp, Telegram, Sl(...) # agentic # ai # assistants # openclaw https:// taoofmac.com/space/ai/agent…

  158. Mastodon — mastodon.social TIER_1 · bagrounds ·

    2026-05-01 | 🤖 The Digital Agora: Negotiating Reality in Multi-Agent Swarms 🤖 # AI Q: 🤖 AI negotiate? 🤖 Multi-Agent Systems | 🤝 Algorithmic Negotiation | ⚖️ Gam

    2026-05-01 | 🤖 The Digital Agora: Negotiating Reality in Multi-Agent Swarms 🤖 # AI Q: 🤖 AI negotiate? 🤖 Multi-Agent Systems | 🤝 Algorithmic Negotiation | ⚖️ Game Theory | 🕸️ Distributed Systems https:// bagrounds.org/auto-blog-zero/2 026-05-01-the-digital-agora-negotiating-realit…