A new paper evaluates the performance of commercial and open-source large language models on Arabic Islamic inheritance reasoning tasks. The study found that commercial models generally outperform open-source models, showing greater reliability in identifying heirs, applying exclusion rules, and maintaining consistency. Gemini 2.5 Flash achieved the best performance among the evaluated models, with a Mean Reciprocal Error (MRE) of 0.989. AI
IMPACT Highlights the current limitations of open-source models in complex legal and numerical reasoning, suggesting areas for future development.
RANK_REASON This is a research paper evaluating LLM performance on a specific reasoning task. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →