PulseAugur
EN
LIVE 21:54:08

SWE-QA benchmark tests LLMs on repository-level code questions

Researchers have introduced SWE-QA, a new benchmark designed to evaluate language models' ability to answer questions about entire software repositories. This benchmark addresses limitations of previous datasets by focusing on complex, real-world code scenarios that require understanding multi-file dependencies and software architecture. SWE-QA includes 576 question-answer pairs derived from GitHub issues and has been used to test several large language models, with a proposed agentic framework showing promise. AI

IMPACT This benchmark could drive the development of more capable AI assistants for software development by testing repository-level code understanding.

RANK_REASON The cluster describes a new academic benchmark for evaluating LLMs on software engineering tasks.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

SWE-QA benchmark tests LLMs on repository-level code questions

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Weihan Peng, Yuling Shi, Yuhang Wang, Xinyun Zhang, Beijun Shen, Xiaodong Gu ·

    SWE-QA: Can Language Models Answer Repository-level Code Questions?

    arXiv:2509.14635v2 Announce Type: replace Abstract: Understanding and reasoning about entire software repositories is an essential capability for intelligent software engineering tools. While existing benchmarks such as CoSQA and CodeQA have advanced the field, they predominantly…