Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 1d · [2 sources]

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

A new research paper analyzes multimodal approaches for classifying visually-rich documents, comparing transformer and LLM-based architectures. The study evaluated LayoutLMv3, Donut, Qwen3-VL-32B-Instruct, and Qwen3-32B on the RVL-CDIP benchmark. Results indicate that specialized multimodal Transformers are superior for documents with complex layouts, with image information being the most critical factor for classification. AI

IMPACT Provides guidance on selecting effective multimodal architectures and feature combinations for document classification tasks.
RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

FILTR: Extracting Topological Features from Pretrained 3D Models

Researchers have developed FILTR, a novel framework designed to extract topological features from pretrained 3D models. This approach adapts a transformer decoder to generate persistence diagrams, which summarize a shape's multiscale structure, directly from frozen encoders. While existing 3D encoders show limited global topological signal, FILTR effectively utilizes their outputs to approximate these diagrams, enabling data-driven extraction from raw point clouds. AI

IMPACT Enables data-driven extraction of topological features from 3D point clouds, potentially improving shape analysis and understanding in computer vision.
- arXiv
- Point-MAE
- Point-BERT
- DONUT

Brief

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

FILTR: Extracting Topological Features from Pretrained 3D Models