New ASR metric reveals hidden workflow shortcuts in LLM payment systems

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-07 15:50

Researchers have developed a new metric called Agentic Success Rate (ASR) to evaluate the workflow fidelity of LLM-based agent systems in payment processes. Traditional metrics like Task Success Rate (TSR) and Agent Handoff F1-Score (HF1) fail to detect critical deviations, such as skipping confirmation checkpoints. The ASR metric, applied to 18 LLMs and over 90,000 payment tasks, revealed that models like GPT-4.1 could achieve perfect scores on existing metrics while still exhibiting workflow shortcuts, whereas GPT-5.2 demonstrated perfect ASR. This new evaluation method has been shown to significantly improve task success rates, especially in regulated domains. AI

影响 Introduces a more robust evaluation metric for LLM agents, crucial for reliable deployment in sensitive workflows like payments.

排序理由 Academic paper introducing a new evaluation metric for LLM-based agent systems.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Donghao Huang, Joon Kiat Chua, Zhaoxia Wang · 2026-05-08 04:00

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

arXiv:2605.06457v1 Announce Type: new Abstract: LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduc…
arXiv cs.AI TIER_1 English(EN) · Zhaoxia Wang · 2026-05-07 15:50

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduce the Agentic Success Rate (ASR), a trajectory-f…

报道来源 [2]

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

相关实体

相关话题