SWE-QA benchmark tests LLMs on repository-level code questions

By PulseAugur Editorial · [1 sources] · 2026-04-28 04:00

Researchers have introduced SWE-QA, a new benchmark designed to evaluate language models' ability to answer questions about entire software repositories. This benchmark addresses limitations of previous datasets by focusing on complex, real-world code scenarios that require understanding multi-file dependencies and software architecture. SWE-QA includes 576 question-answer pairs derived from GitHub issues and has been used to test several large language models, with a proposed agentic framework showing promise. AI

IMPACT This benchmark could drive the development of more capable AI assistants for software development by testing repository-level code understanding.

RANK_REASON The cluster describes a new academic benchmark for evaluating LLMs on software engineering tasks.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Weihan Peng, Yuling Shi, Yuhang Wang, Xinyun Zhang, Beijun Shen, Xiaodong Gu · 2026-04-28 04:00

SWE-QA: Can Language Models Answer Repository-level Code Questions?

arXiv:2509.14635v2 Announce Type: replace Abstract: Understanding and reasoning about entire software repositories is an essential capability for intelligent software engineering tools. While existing benchmarks such as CoSQA and CodeQA have advanced the field, they predominantly…

COVERAGE [1]

SWE-QA: Can Language Models Answer Repository-level Code Questions?

RELATED ENTITIES

RELATED TOPICS