English(EN) Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

LLM对话代理在NFR评估中准确性低，影响用户满意度

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-23 17:15

一篇新的研究论文探讨了基于LLM的对话助手在软件开发中评估非功能性需求（NFR）的有效性，特别是在HIPAA合规的背景下。该研究涉及49名程序员使用GitHub Copilot评估了148个源自HIPAA的NFR与iTrust代码库的匹配情况。研究结果表明，尽管开发人员通常同意LLM的评估，但与专家基准真相相比，实际准确性较低。此外，用户满意度受到更长的系统响应和更多提供信息的轮次的影响，而主动互动则倾向于提高满意度。 AI

影响强调了当前LLM对话代理在关键NFR评估中的局限性，表明需要改进交互设计以提高准确性和用户满意度。

排序理由该集群包含一篇研究论文，详细介绍了LLM在特定软件开发场景下的准确性和用户满意度调查结果。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Ali Pourghasemi Fatideh, Wilder Baldwin, Maria Dhakal, Collin McMillan, Sepideh Ghanavati · 2026-06-24 04:00

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

arXiv:2606.24834v1 Announce Type: new Abstract: LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of t…
arXiv cs.AI TIER_1 English(EN) · Sepideh Ghanavati · 2026-06-23 17:15

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional …

报道来源 [2]

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

相关实体

相关话题