Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox
Researchers have revived the Apple Dense Material Segmentation (DMS) benchmark by establishing a new Vision Transformer baseline. They identified that standard training methods struggle with amorphous textures due to high-variance gradients, leading to the development of a stabilized training recipe. This new approach achieved a state-of-the-art mIoU of 0.4572 on the original dataset split, surpassing previous convolutional models. However, the study also uncovered a "Generalization Paradox" where a data-rich split inflated metrics but degraded real-world performance, highlighting ongoing challenges in physically grounded AI. AI
IMPACT Establishes a new SOTA for material segmentation and highlights critical generalization challenges for physically grounded AI.