New Video Anomaly Model 'CaC' Improves Detection Accuracy

By PulseAugur Editorial · [1 sources] · 2026-05-29 04:00

Researchers have introduced Concentrate and Concentrate (CaC), a novel anomaly detection model for videos that leverages Vision-Language Models. CaC employs a coarse-to-fine approach, first identifying anomalous time windows globally and then performing detailed spatial localization within those windows. The model is trained using a three-stage progressive paradigm, incorporating supervised fine-tuning and reinforcement learning with custom temporal and spatial IoU rewards. Experiments show CaC achieves a 25.7% accuracy improvement on fine-grained anomaly benchmarks and reduces anomalies in generated videos by 11.7%. AI

RANK_REASON The cluster contains an academic paper detailing a new model and its performance. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Video Anomaly Model 'CaC' Improves Detection Accuracy

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Jiyuan Wang, Huan Ouyang, Jiuzhou Lin, Chunyu Lin, Dewen Fan, Boheng Zhang, Haonan Fan, Fei Zuo, Jia Sun, Huaiqing Wang, Honglie Wang, Yiyang Fan, Zhenlong Yuan, Zijun Li, Yongrui Heng, Guosheng Lin, Fan Yang, Tingting Gao · 2026-05-29 04:00

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

arXiv:2605.11723v2 Announce Type: replace-cross Abstract: In this paper, we propose Concentrate and Concentrate (CaC), a coarse-to-fine anomaly reward model based on Vision-Language Models. During inference, it first conducts a global temporal scan to anchor anomalous time window…

COVERAGE [1]

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

RELATED ENTITIES

RELATED TOPICS