Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
Researchers have developed a novel method called trust functions to improve the generalization capabilities of AI models. This technique involves assigning a trust score to each weak label in a dataset, allowing for the filtering of unreliable supervision. The approach has demonstrated success across various domains, including knowledge, reasoning, and strategy games, enabling students to match or even surpass ground-truth supervision. Furthermore, trust functions facilitate an iterative process where a trained student model can be reused as a teacher in subsequent training cycles, compounding performance gains. AI
IMPACT Enables AI models to achieve higher performance with less reliable data, potentially reducing data labeling costs.