New FOSSA architecture achieves 55.7% error reduction in zero-shot depth estimation

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have developed a novel Transformer-based architecture called FOSSA for the task of estimating dense metric depth maps from focus stacks, known as Depth from Defocus (DfD). This new model emphasizes zero-shot generalization, aiming to perform well on unseen datasets without specific overfitting. To support this, a new benchmark named ZEDD was created, featuring a significantly larger number of scenes and higher quality data than previous benchmarks. The FOSSA architecture incorporates a stack attention layer with focus distance embedding for efficient cross-stack information exchange, and experiments show it reduces errors by up to 55.7% on the ZEDD benchmark. AI

IMPACT Introduces a novel architecture and benchmark for depth estimation, potentially improving scene understanding in computer vision applications.

RANK_REASON This is a research paper detailing a new model architecture and benchmark for a computer vision task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New FOSSA architecture achieves 55.7% error reduction in zero-shot depth estimation

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng · 2026-06-30 04:00

Zero-Shot Depth from Defocus

arXiv:2603.26658v2 Announce Type: replace Abstract: Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot gen…

COVERAGE [1]

Zero-Shot Depth from Defocus

RELATED ENTITIES

RELATED TOPICS