Ensemble of Vision Encoders Wins Second Place in ICRA 2026 Segmentation Challenge

By PulseAugur Editorial · [1 sources] · 2026-06-22 09:57

Researchers have developed a pretraining-diverse ensemble of foundation vision encoders for the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge. Their approach combines encoders like DINOv3, SigLIP2, and InternImage with a Mask2Former decoder, employing extensive training schedules and augmentation techniques. This ensemble achieved second place in the challenge, scoring 75.40% composite mIoU, and highlighted the pretraining recipe as the key factor for accuracy over model size or decoder design. AI

IMPACT Demonstrates effective ensemble techniques for robust outdoor scene understanding in computer vision.

RANK_REASON Technical report detailing a solution for a specific academic challenge. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Ensemble of Vision Encoders Wins Second Place in ICRA 2026 Segmentation Challenge

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Zhun Zhong · 2026-06-22 09:57

Technical Report for the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge: Pretraining-Diverse Ensemble of Foundation Vision Encoders for Robust Outdoor Scene Understanding

This report presents our solution for the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge, which requires parsing unstructured outdoor scenes from four camera platforms into 56 fine-grained categories. Our approach pairs foundation vision encoders (including DINOv…

COVERAGE [1]

Technical Report for the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge: Pretraining-Diverse Ensemble of Foundation Vision Encoders for Robust Outdoor Scene Understanding

RELATED ENTITIES

RELATED TOPICS