Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 10h

MortarBench: Evaluating Mortgage Loan Origination Agents

A new benchmark called MortarBench has been developed to evaluate the performance of AI agents in mortgage loan origination. Researchers found that current state-of-the-art large language models struggle with this task, achieving a maximum of 77.1% exact match accuracy and exhibiting biases related to non-English names. To address these limitations, a confidence calibration framework named CRIT was introduced, which improved accuracy to 80.5% while also enhancing risk management and reducing bias. AI

IMPACT Highlights limitations of current LLMs in specialized financial tasks and introduces a method to improve accuracy and reduce bias.

Hugging Face
arXiv
large language models
DagsHub
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
MortarBench