PulseAugur
EN
LIVE 00:14:06

RayDer transformer scales novel view synthesis with real-world video

Researchers have developed RayDer, a novel transformer model designed to improve self-supervised novel view synthesis from real-world videos. This unified model consolidates camera estimation, scene reconstruction, and rendering into a single backbone, enabling stable training on dynamic video content. RayDer demonstrates predictable power-law scaling with data and compute, achieving competitive zero-shot performance on various benchmarks. AI

IMPACT Enables more scalable and robust novel view synthesis by leveraging general video data, potentially impacting 3D reconstruction and content creation.

RANK_REASON The cluster contains an academic paper detailing a new model and its performance.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

RayDer transformer scales novel view synthesis with real-world video

COVERAGE [3]

  1. arXiv cs.AI TIER_1 English(EN) · Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Bj\"orn Ommer ·

    RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

    arXiv:2605.31535v1 Announce Type: cross Abstract: Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network…

  2. arXiv cs.AI TIER_1 English(EN) · Björn Ommer ·

    RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

    Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, f…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

    RayDer is a unified feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering for self-supervised novel view synthesis, enabling stable training on real-world video through dynamic state absorption and demonstrating clean scaling behavior.