Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 1d · [2 sources]

DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

Researchers have developed DFIR-DETR, a novel approach to small object detection in complex visual scenes. This method addresses fundamental limitations in existing neural network designs, such as uniform attention distribution and the suppression of high-frequency details by spatial convolutions. DFIR-DETR specifically targets issues like norm drift in upsampled features and the loss of critical edge components. The model demonstrates significant performance gains on the NEU-DET and VisDrone datasets, achieving high mAP50 scores with a relatively small parameter count and computational cost. AI

IMPACT Enhances object detection capabilities for small objects, potentially improving performance in applications like autonomous driving and surveillance.
- RT-DETR
- VisDrone
- NEU-DET
- Xingsheng Chen
- DFIR-DETR
RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [2 sources]

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

Researchers have developed a new method to improve how Vision-Language Models (VLMs) understand document layouts, particularly for documents with structures not seen during training. The approach pre-resolves layout information using a lightweight detector and injects it into the VLM's prompt, allowing the model to better distinguish between layout and content processing. This technique significantly boosts performance on out-of-distribution benchmarks, reducing errors and improving structural accuracy with only a minor increase in latency. AI

IMPACT Improves VLM robustness for document analysis, potentially enabling better information extraction from diverse document types.

Brief

DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding