PulseAugur
EN
LIVE 06:58:05

New FOSSA architecture achieves 55.7% error reduction in zero-shot depth estimation

Researchers have developed a novel Transformer-based architecture called FOSSA for the task of estimating dense metric depth maps from focus stacks, known as Depth from Defocus (DfD). This new model emphasizes zero-shot generalization, aiming to perform well on unseen datasets without specific overfitting. To support this, a new benchmark named ZEDD was created, featuring a significantly larger number of scenes and higher quality data than previous benchmarks. The FOSSA architecture incorporates a stack attention layer with focus distance embedding for efficient cross-stack information exchange, and experiments show it reduces errors by up to 55.7% on the ZEDD benchmark. AI

IMPACT Introduces a novel architecture and benchmark for depth estimation, potentially improving scene understanding in computer vision applications.

RANK_REASON This is a research paper detailing a new model architecture and benchmark for a computer vision task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New FOSSA architecture achieves 55.7% error reduction in zero-shot depth estimation

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng ·

    Zero-Shot Depth from Defocus

    arXiv:2603.26658v2 Announce Type: replace Abstract: Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot gen…