A new research paper explores the effectiveness of LLM-based dialogue assistants in assessing Non-Functional Requirements (NFRs) for software development, particularly in the context of HIPAA compliance. The study involved 49 programmers interacting with GitHub Copilot to evaluate 148 HIPAA-derived NFRs against the iTrust codebase. Findings indicate that while developers often agree with the LLM's assessments, the actual accuracy against expert ground truth is low. Furthermore, user satisfaction is negatively impacted by longer system responses and more information-providing turns, while proactive interactions tend to improve it. AI
IMPACT Highlights limitations in current LLM dialogue agents for critical NFR assessment, suggesting a need for improved interaction design to boost accuracy and user satisfaction.
RANK_REASON The cluster contains a research paper detailing findings on LLM accuracy and user satisfaction in a specific software development context.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →