PulseAugur
实时 17:33:05
English(EN) ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild

计算机视觉研究推动多模态理解和鲁棒分割

研究人员开发了 WeatherSeg,这是一个半监督分割框架,旨在通过使用双教师-学生模型进行知识蒸馏和分类器权重更新机制,来改善恶劣天气条件下自动驾驶的感知能力。另外,为多摄像头系统提出了一种新的仅姿态几何约束,以提高视觉导航和 3D 场景重建中捆绑调整的计算效率。另一项进展通过将摄像头嵌入校准目标中,实现了多投影仪校准的可扩展性限制,从而可以同时估计投影仪参数。此外,DeepTaxon 提供了一个检索增强的多模态框架,用于生物多样性研究中的统一物种识别和发现,而 TSMNet 将文本监督与视觉表示相结合,用于遥感中的开放词汇语义分割。 AI

影响 计算机视觉领域的新研究为自动驾驶、生物多样性研究和遥感提供了改进的方法。

排序理由 多篇 arXiv 论文发表,详细介绍了计算机视觉的新方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 13 个来源。 我们如何撰写摘要 →

计算机视觉研究推动多模态理解和鲁棒分割

报道来源 [13]

  1. arXiv cs.CV TIER_1 English(EN) · Chang Liu, Henghui Ding, Nikhila Ravi, Yunchao Wei, Shuting He, Song Bai, Philip Torr, Leilei Cao, Jinrong Zhang, Deshui Miao, Xusheng He, Dengxian Gong, Zhiyu Wang, Mingqi Gao, Jihwan Hong, Canyang Wu, Weili Guan, Jianlong Wu, Liqiang Nie, Xingsen Huang, ·

    第五届PVUW挑战赛报告:迈向像素级理解中更多样的模态

    arXiv:2604.26031v1 Announce Type: new Abstract: This report summarizes the objectives, datasets, and top-performing methodologies of the 2026 Pixel-level Video Understanding in the Wild (PVUW) Challenge, hosted at CVPR 2026, which evaluates state-of-the-art models under highly un…

  2. arXiv cs.CV TIER_1 English(EN) · Zhang Zhang, Yifeng Zeng, Jinquan Pan, Yinghui Pan ·

    WeatherSeg:利用师生对偶学习和分类器更新注意力实现鲁棒的恶劣天气图像分割

    arXiv:2604.22824v1 Announce Type: new Abstract: WeatherSeg, an advanced semi-supervised segmentation framework, addresses autonomous driving's environmental perception challenges in adverse weather while reducing annotation costs. This framework integrates a Dual Teacher-Student …

  3. arXiv cs.CV TIER_1 English(EN) · Shunkun Liang, Banglei Guan, Bin Li, Qifeng Yu, Yang Shang ·

    用于多相机姿态调整的仅姿态几何约束

    arXiv:2604.23704v1 Announce Type: new Abstract: Multi-camera systems offer rich observation capabilities for visual navigation and 3D scene reconstruction; however, the resulting feature redundancy often compromises computational efficiency. This challenge is particularly pronoun…

  4. arXiv cs.CV TIER_1 English(EN) · Takumi Kawano, Kohei Miura, Daisuke Iwai ·

    嵌入式相机突破多投影仪校准的可扩展性限制

    arXiv:2604.24024v1 Announce Type: new Abstract: Conventional multi-projector calibration requires projecting and capturing structured light patterns for each projector sequentially, causing calibration time and effort to increase linearly with the number of projectors. This scala…

  5. arXiv cs.CV TIER_1 English(EN) · Jiawei Wang, Ming Lei, Yaning Yang, Xinyan Lin, Yuquan Le, Qiwei Ma, Zhiwei Xu, Zheqi Lv, Yuchen Ang, Zhe Quan, Tat-Seng Chua ·

    DeepTaxon:一种可解释的检索增强多模态框架,用于统一物种识别和发现

    arXiv:2604.24029v1 Announce Type: new Abstract: Identifying species in biology among tens of thousands of visually similar taxa while discovering unknown species in open-world environments remains a fundamental challenge in biodiversity research. Current methods treat identificat…

  6. arXiv cs.CV TIER_1 English(EN) · Jinkun Dai, Yuanxin Ye, Peng Tang, Tengfeng Tang, Xianping Ma, Jing Xiao, Mi Wang ·

    用于多模态遥感图像的集成对象级标签和场景级语义特征的开放词汇语义分割网络

    arXiv:2604.24125v1 Announce Type: new Abstract: Semantic segmentation of multi-modal remote sensing imagery plays a pivotal role in land use/land cover (LULC) mapping, environmental monitoring, and precision earth observation. Current multi-modal approaches mainly focus on integr…

  7. arXiv cs.CV TIER_1 English(EN) · Guillaume Perez, Janarbek Matai, Takahiro Harada ·

    PEPS: 位置编码投影采样 -- 扩展版

    arXiv:2604.24167v1 Announce Type: new Abstract: Implicit neural representations (INRs) are increasingly being used as tools to map coordinates to signals, encompassing applications from neural fields to texture compression, shape representations, and beyond. Most INR methods are …

  8. arXiv cs.CV TIER_1 English(EN) · Takahiro Harada ·

    PEPS: 位置编码投影采样 -- 扩展版

    Implicit neural representations (INRs) are increasingly being used as tools to map coordinates to signals, encompassing applications from neural fields to texture compression, shape representations, and beyond. Most INR methods are based on using high-dimensional projections of t…

  9. arXiv cs.CV TIER_1 English(EN) · Mi Wang ·

    面向多模态遥感图像的集成目标级标签和场景级语义特征的开放词汇语义分割网络

    Semantic segmentation of multi-modal remote sensing imagery plays a pivotal role in land use/land cover (LULC) mapping, environmental monitoring, and precision earth observation. Current multi-modal approaches mainly focus on integrating complementary visual modalities, yet negle…

  10. arXiv cs.CV TIER_1 English(EN) · Tat-Seng Chua ·

    DeepTaxon:一种可解释的检索增强多模态框架,用于统一物种识别和发现

    Identifying species in biology among tens of thousands of visually similar taxa while discovering unknown species in open-world environments remains a fundamental challenge in biodiversity research. Current methods treat identification and discovery as separate problems, with cla…

  11. arXiv cs.CV TIER_1 English(EN) · Daisuke Iwai ·

    打破嵌入式摄像头在多投影仪校准中的可扩展性限制

    Conventional multi-projector calibration requires projecting and capturing structured light patterns for each projector sequentially, causing calibration time and effort to increase linearly with the number of projectors. This scalability bottleneck has long limited the deploymen…

  12. arXiv cs.CV TIER_1 English(EN) · Hanyu Chen, Ruojin Cai, Steve Marschner, Noah Snavely ·

    ArchSym:在野外检测 3D 接地建筑对称性

    arXiv:2604.22202v1 Announce Type: new Abstract: Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost e…

  13. arXiv cs.CV TIER_1 English(EN) · Noah Snavely ·

    ArchSym:在野外检测 3D 接地建筑对称性

    Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost exclusively trained and evaluated on object-centr…