SWE-QA benchmark tests LLMs on repository-level code questions

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced SWE-QA, a new benchmark designed to evaluate language models' ability to answer questions about entire software repositories. This benchmark addresses limitations of previous datasets by focusing on complex, real-world code scenarios that require understanding multi-file dependencies and software architecture. SWE-QA includes 576 question-answer pairs derived from GitHub issues and has been used to test several large language models, with a proposed agentic framework showing promise. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This benchmark could drive the development of more capable AI assistants for software development by testing repository-level code understanding.

RANK_REASON The cluster describes a new academic benchmark for evaluating LLMs on software engineering tasks.

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Weihan Peng, Yuling Shi, Yuhang Wang, Xinyun Zhang, Beijun Shen, Xiaodong Gu · 2026-04-28 04:00

SWE-QA: Can Language Models Answer Repository-level Code Questions?

arXiv:2509.14635v2 Announce Type: replace Abstract: Understanding and reasoning about entire software repositories is an essential capability for intelligent software engineering tools. While existing benchmarks such as CoSQA and CodeQA have advanced the field, they predominantly…

COVERAGE [1]

SWE-QA: Can Language Models Answer Repository-level Code Questions?

RELATED ENTITIES

RELATED TOPICS