Researchers have introduced CoQuIR, a new benchmark designed to evaluate code retrieval systems on software quality dimensions beyond just functional relevance. This benchmark includes fine-grained quality annotations across correctness, efficiency, security, and maintainability for over 42,000 queries and 134,000 code snippets in 11 languages. Initial testing of 23 retrieval models revealed that even top performers often fail to distinguish between buggy and robust code, highlighting a significant gap in current systems. The research also explores training methods to improve quality-aware retrieval, showing promising results without compromising semantic relevance. AI
IMPACT Highlights the need for AI systems to consider software quality beyond functional correctness, potentially improving developer tools.
RANK_REASON The cluster contains an academic paper introducing a new benchmark for evaluating AI systems. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →