MortarBench: Evaluating Mortgage Loan Origination Agents
A new benchmark called MortarBench has been developed to evaluate the performance of AI agents in mortgage loan origination. Researchers found that current state-of-the-art large language models struggle with this task, achieving a maximum of 77.1% exact match accuracy and exhibiting biases related to non-English names. To address these limitations, a confidence calibration framework named CRIT was introduced, which improved accuracy to 80.5% while also enhancing risk management and reducing bias. AI
IMPACT Highlights limitations of current LLMs in specialized financial tasks and introduces a method to improve accuracy and reduce bias.