Brief

last 24h

[15/15] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [4 sources]

Surflo: Consistent 3D Surface Flow Model with Global State

Researchers have introduced Surflo, a novel 3D surface reconstruction model that processes unposed RGB views into a global latent state. This approach allows for the decoding of oriented 3D surface points through flow matching, enabling arbitrary output resolutions from a few thousand to over a million points in a single pass. Surflo demonstrates competitive performance against existing feed-forward methods while being significantly faster than optimization-based techniques, offering a unique combination of global latent representation and flexible decoding. AI

IMPACT Enables flexible and efficient 3D surface reconstruction from multiple views, potentially impacting fields like computer graphics and robotics.
TOOL · arXiv cs.AI English(EN) · 5d

AxisGuide: Grounding Robot Action Coordinate System in RGB Observations for Robust Visuomotor Manipulation

Researchers have developed AxisGuide, a new method to improve robot visuomotor manipulation by grounding action coordinate systems in visual observations. This technique renders robot base-frame axes in camera views, providing explicit cues for motion in image space. Experiments show AxisGuide enhances performance and generalization for robots in both simulated and real-world tasks, particularly under distribution shifts. AI

IMPACT Enhances robot generalization and robustness in manipulation tasks by improving action understanding from visual input.
- AxisGuide
RESEARCH · arXiv cs.CV English(EN) · 1w · [2 sources]

Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection

Researchers have developed a new framework for multi-modal video representation alignment to improve self-supervised learning for driver distraction detection. This approach addresses challenges with noisy or faulty data from multiple sensors by jointly modeling unreliable positives and negatives. The method uses soft targets and a similarity-based weighting mechanism to achieve principled global multi-modal alignment, outperforming existing baselines on the Drive&Act dataset. AI

IMPACT Enhances robustness of AI systems in real-world multi-modal video understanding tasks like driver safety.
RESEARCH · arXiv cs.CV English(EN) · 1w · [2 sources]

DisPlace: Discriminative Place Projections for Multi-Reference Visual Place Recognition

Two new research papers explore advancements in Visual Place Recognition (VPR), a critical technology for robot localization and SLAM. The first paper, "One Channel to Rule Them All," suggests that grayscale imagery is sufficient for VPR and can even outperform RGB under severe appearance shifts, offering practical benefits in storage and bandwidth. The second paper, "DisPlace," introduces a novel framework that fuses multiple reference descriptors to create a more discriminative and compact place representation, outperforming existing multi-reference baselines in various challenging conditions. AI

IMPACT These papers advance core AI research in computer vision and robotics, potentially improving robot navigation and localization systems.
RESEARCH · Hugging Face Daily Papers English(EN) · 3w · [3 sources]

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Researchers have developed a new framework for 3D scene graph generation that can operate using only RGB cameras, eliminating the need for depth sensors like LiDAR. This approach allows for more flexible deployment on various robotic platforms and in environments where depth sensors are not feasible. The system also incorporates an active exploration strategy, where the robot intelligently selects viewpoints to gather information, significantly improving object detection and scene understanding compared to traditional methods. AI

IMPACT Enables robots to build detailed 3D maps using only standard cameras, expanding applications beyond specialized hardware.
TOOL · arXiv cs.CV English(EN) · 1mo

iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning

Researchers have developed iPay, a new framework for recognizing payment actions in transit surveillance footage. This system utilizes a multimodal mixture-of-experts architecture, combining RGB and skeleton data streams with a dual-attention fusion mechanism. An additional Spatial Difference Discriminator explicitly models hand-to-anchor motion to enhance discriminability. iPay achieved 83.45% recognition accuracy on a dataset of over 500 payment clips collected from real onboard transit surveillance, demonstrating its suitability for edge deployment. AI

IMPACT This multimodal AI framework offers improved accuracy for automated transit payment analysis, potentially enhancing fare auditing and passenger analytics in real-world surveillance scenarios.
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

Researchers have developed FusionProxy, a novel module that integrates thermal imaging with standard RGB vision systems for real-time perception. This plug-and-play component addresses the latency issues of existing high-fidelity fusion methods, making it suitable for edge deployment. FusionProxy demonstrates improved performance in static recognition and enhanced robustness in dynamic tasks like autonomous driving, operating efficiently across various hardware platforms. AI

IMPACT Enables real-time, all-day perception for systems like autonomous vehicles by fusing thermal and RGB data efficiently.
- arXiv
- FusionProxy
TOOL · arXiv cs.CV English(EN) · 1mo

InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making

Researchers have developed InterFuserDVS, an enhanced sensor fusion model for autonomous driving that integrates Dynamic Vision Sensors (DVS) with traditional RGB cameras and LiDAR. This novel approach uses a token-based fusion strategy within a transformer architecture to incorporate event-based data, which excels in high-dynamic-range and high-speed scenarios where conventional sensors struggle with motion blur and latency. Evaluations on the CARLA Leaderboard demonstrated that InterFuserDVS achieved a Driving Score of 77.2 and a Route Completion of 100%, highlighting the potential of event cameras for improving driving safety and performance in challenging conditions. AI

IMPACT Event-based vision integration could enhance the safety and robustness of autonomous driving systems in adverse conditions.
- LiDAR
- CARLA
- InterFuserDVS
- InterFuser
- Leaderboard
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

3D Human Face Reconstruction with 3DMM face model from RGB image

Researchers have developed a new pipeline for reconstructing detailed 3D human face models from single RGB images. This method utilizes convolutional neural networks and a 3D Morphable Model (3DMM) to overcome limitations in generating photorealistic data with fine details like wrinkles. The pipeline encompasses face detection, landmark detection, parameter regression for the 3DMM, and soft rendering. AI

IMPACT This research advances 3D face reconstruction techniques, potentially improving applications in areas like virtual reality and digital avatars.
- arXiv
- CNN
- NYU
- 3DMM
- Zhipeng Fan
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

REALM: An RGB and Event Aligned Latent Manifold for Cross-Modal Perception

Researchers have developed REALM, a novel cross-modal framework designed to align RGB and event camera data within a shared latent manifold. This approach projects event representations into the latent space of pre-trained RGB foundation models, leveraging low-rank adaptation (LoRA) to bridge the modality gap. REALM enables zero-shot application of image-trained decoders to event streams for tasks like depth estimation and semantic segmentation, achieving state-of-the-art results in wide-baseline feature matching. AI

IMPACT Enables zero-shot transfer of image-trained models to event camera data, potentially broadening applications in robotics and autonomous systems.
- arXiv
- LoRA
- ViT
- REALM
- MASt3R
- Vincenzo Polizzi
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

Researchers have developed MAEPose, a novel self-supervised approach for human pose estimation using mmWave radar video. This method directly processes spectrogram videos, learning spatiotemporal representations from unlabeled data to improve privacy compared to RGB methods. MAEPose demonstrates significant performance gains, outperforming existing baselines by up to 22.1% and maintaining accuracy even with bystander interference. AI

IMPACT Introduces a privacy-preserving, self-supervised method for human pose estimation using mmWave radar, potentially impacting surveillance and healthcare applications.
- mmWave
- MAEPose
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

Researchers have developed OmniRobotHome, a novel platform designed to facilitate multiadic human-robot interaction in realistic home environments. This system addresses the challenge of real-time 3D tracking in cluttered spaces by employing 48 synchronized RGB cameras for occlusion-robust perception of humans and objects. The platform integrates this advanced sensing with two Franka robotic arms, enabling coordinated actions and the study of long-horizon human behavior modeling. Initial findings demonstrate that real-time perception and accumulated behavior memory significantly improve safety and human-anticipatory robotic assistance. AI

IMPACT Enables more complex and safer human-robot collaboration in shared spaces, advancing research in anticipatory robotic assistance.
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark

Researchers have introduced the IRON dataset, a large-scale collection of infrared and RGB images specifically designed for off-road autonomous driving perception, particularly under nighttime conditions. This dataset includes over 24,000 annotated images and supports the development of new algorithms for temporal freespace detection. To leverage this dataset, the team also proposed IRONet, a novel framework that uses a memory-attention mechanism to improve consistency across image frames, achieving state-of-the-art results on the IRON dataset and demonstrating generalization to other datasets. AI

IMPACT Establishes a new benchmark and dataset for improving all-day perception in off-road autonomous driving systems.
- IRON dataset
- IRONet
TOOL · Tom's Hardware English(EN) · 1mo

SteelSeries Aerox 3 Wireless Gen 2 Review: The Bright and Bold

The SteelSeries Aerox 3 Wireless Gen 2 is a lightweight gaming mouse notable for its vibrant RGB lighting and semi-translucent design, available in three colors for $110. It features a comfortable grip, satisfyingly loud switches rated for 80 million clicks, and an IPX4 water-resistant rating despite its honeycomb construction. However, the mouse experiences tracking issues on its 2.4GHz connection and requires an email address for its software. AI
TOOL · Mastodon — mastodon.social English(EN) · 1mo

Google Home is stripping RGB color controls from smart bulbs, but this workaround can bring it back Your lights are still smart, they just need a little reminde

Google Home is reportedly removing RGB color control functionality from certain smart bulbs. This issue prevents users from adjusting the colors of their smart lighting through the Google Home app. A workaround has been identified that allows users to restore these color controls, ensuring their smart bulbs retain their full functionality. AI

IMPACT Minor inconvenience for users of Google Home and smart bulbs; workaround available.
- Google
- Google Home