Mitigating Label Bias with Interpretable Rubric Embeddings
Researchers have developed a new method called interpretable rubric embeddings to address label bias in AI models trained on historical human evaluations. This approach replaces standard black-box embeddings with features derived from expert-defined criteria, aiming to prevent models from inheriting biases present in past decisions. Empirical evaluations on a dataset of master's program applications demonstrated that this method reduces group disparities while enhancing cohort quality, offering a practical solution for learning with biased labels. AI
IMPACT Offers a novel approach to mitigate bias in AI systems trained on historical data, potentially improving fairness in applications like hiring and admissions.