Computer vision research advances multimodal understanding and robust segmentation

By PulseAugur Editorial · [13 sources] · 2026-04-24 04:12

Researchers have developed WeatherSeg, a semi-supervised segmentation framework designed to improve autonomous driving perception in adverse weather conditions by using a dual teacher-student model for knowledge distillation and a classifier weight updating mechanism. Separately, a new pose-only geometric constraint for multi-camera systems has been proposed to enhance computational efficiency in bundle adjustment for visual navigation and 3D scene reconstruction. Another advancement addresses the scalability limitations of multi-projector calibration by embedding cameras into calibration targets, allowing for simultaneous estimation of projector parameters. Additionally, DeepTaxon offers a retrieval-augmented multimodal framework for unified species identification and discovery in biodiversity research, while TSMNet integrates textual supervision with visual representations for open-vocabulary semantic segmentation in remote sensing. AI

IMPACT New research in computer vision offers improved methods for autonomous driving, biodiversity research, and remote sensing.

RANK_REASON Multiple research papers published on arXiv detailing new methods in computer vision.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 13 sources. How we write summaries →

Computer vision research advances multimodal understanding and robust segmentation

COVERAGE [13]

arXiv cs.CV TIER_1 English(EN) · Chang Liu, Henghui Ding, Nikhila Ravi, Yunchao Wei, Shuting He, Song Bai, Philip Torr, Leilei Cao, Jinrong Zhang, Deshui Miao, Xusheng He, Dengxian Gong, Zhiyu Wang, Mingqi Gao, Jihwan Hong, Canyang Wu, Weili Guan, Jianlong Wu, Liqiang Nie, Xingsen Huang, · 2026-04-30 04:00

Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding

arXiv:2604.26031v1 Announce Type: new Abstract: This report summarizes the objectives, datasets, and top-performing methodologies of the 2026 Pixel-level Video Understanding in the Wild (PVUW) Challenge, hosted at CVPR 2026, which evaluates state-of-the-art models under highly un…
arXiv cs.CV TIER_1 English(EN) · Zhang Zhang, Yifeng Zeng, Jinquan Pan, Yinghui Pan · 2026-04-28 04:00

WeatherSeg: Weather-Robust Image Segmentation using Teacher-Student Dual Learning and Classifier-Updating Attention

arXiv:2604.22824v1 Announce Type: new Abstract: WeatherSeg, an advanced semi-supervised segmentation framework, addresses autonomous driving's environmental perception challenges in adverse weather while reducing annotation costs. This framework integrates a Dual Teacher-Student …
arXiv cs.CV TIER_1 English(EN) · Shunkun Liang, Banglei Guan, Bin Li, Qifeng Yu, Yang Shang · 2026-04-28 04:00

A Pose-only Geometric Constraint for Multi-Camera Pose Adjustment

arXiv:2604.23704v1 Announce Type: new Abstract: Multi-camera systems offer rich observation capabilities for visual navigation and 3D scene reconstruction; however, the resulting feature redundancy often compromises computational efficiency. This challenge is particularly pronoun…
arXiv cs.CV TIER_1 English(EN) · Takumi Kawano, Kohei Miura, Daisuke Iwai · 2026-04-28 04:00

Breaking the Scalability Limit of Multi-Projector Calibration with Embedded Cameras

arXiv:2604.24024v1 Announce Type: new Abstract: Conventional multi-projector calibration requires projecting and capturing structured light patterns for each projector sequentially, causing calibration time and effort to increase linearly with the number of projectors. This scala…
arXiv cs.CV TIER_1 English(EN) · Jiawei Wang, Ming Lei, Yaning Yang, Xinyan Lin, Yuquan Le, Qiwei Ma, Zhiwei Xu, Zheqi Lv, Yuchen Ang, Zhe Quan, Tat-Seng Chua · 2026-04-28 04:00

DeepTaxon: An Interpretable Retrieval-Augmented Multimodal Framework for Unified Species Identification and Discovery

arXiv:2604.24029v1 Announce Type: new Abstract: Identifying species in biology among tens of thousands of visually similar taxa while discovering unknown species in open-world environments remains a fundamental challenge in biodiversity research. Current methods treat identificat…
arXiv cs.CV TIER_1 English(EN) · Jinkun Dai, Yuanxin Ye, Peng Tang, Tengfeng Tang, Xianping Ma, Jing Xiao, Mi Wang · 2026-04-28 04:00

Open-Vocabulary Semantic Segmentation Network Integrating Object-Level Label and Scene-Level Semantic Features for Multimodal Remote Sensing Images

arXiv:2604.24125v1 Announce Type: new Abstract: Semantic segmentation of multi-modal remote sensing imagery plays a pivotal role in land use/land cover (LULC) mapping, environmental monitoring, and precision earth observation. Current multi-modal approaches mainly focus on integr…
arXiv cs.CV TIER_1 English(EN) · Guillaume Perez, Janarbek Matai, Takahiro Harada · 2026-04-28 04:00

PEPS: Positional Encoding Projected Sampling -- Extended

arXiv:2604.24167v1 Announce Type: new Abstract: Implicit neural representations (INRs) are increasingly being used as tools to map coordinates to signals, encompassing applications from neural fields to texture compression, shape representations, and beyond. Most INR methods are …
arXiv cs.CV TIER_1 English(EN) · Takahiro Harada · 2026-04-27 08:23

PEPS: Positional Encoding Projected Sampling -- Extended

Implicit neural representations (INRs) are increasingly being used as tools to map coordinates to signals, encompassing applications from neural fields to texture compression, shape representations, and beyond. Most INR methods are based on using high-dimensional projections of t…
arXiv cs.CV TIER_1 English(EN) · Mi Wang · 2026-04-27 07:23

Open-Vocabulary Semantic Segmentation Network Integrating Object-Level Label and Scene-Level Semantic Features for Multimodal Remote Sensing Images

Semantic segmentation of multi-modal remote sensing imagery plays a pivotal role in land use/land cover (LULC) mapping, environmental monitoring, and precision earth observation. Current multi-modal approaches mainly focus on integrating complementary visual modalities, yet negle…
arXiv cs.CV TIER_1 English(EN) · Tat-Seng Chua · 2026-04-27 04:30

DeepTaxon: An Interpretable Retrieval-Augmented Multimodal Framework for Unified Species Identification and Discovery

Identifying species in biology among tens of thousands of visually similar taxa while discovering unknown species in open-world environments remains a fundamental challenge in biodiversity research. Current methods treat identification and discovery as separate problems, with cla…
arXiv cs.CV TIER_1 English(EN) · Daisuke Iwai · 2026-04-27 04:11

Breaking the Scalability Limit of Multi-Projector Calibration with Embedded Cameras

Conventional multi-projector calibration requires projecting and capturing structured light patterns for each projector sequentially, causing calibration time and effort to increase linearly with the number of projectors. This scalability bottleneck has long limited the deploymen…
arXiv cs.CV TIER_1 English(EN) · Hanyu Chen, Ruojin Cai, Steve Marschner, Noah Snavely · 2026-04-27 04:00

ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild

arXiv:2604.22202v1 Announce Type: new Abstract: Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost e…
arXiv cs.CV TIER_1 English(EN) · Noah Snavely · 2026-04-24 04:12

ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild

Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost exclusively trained and evaluated on object-centr…

COVERAGE [13]

RELATED ENTITIES

RELATED TOPICS