Researchers have introduced Concentrate and Concentrate (CaC), a novel anomaly detection model for videos that leverages Vision-Language Models. CaC employs a coarse-to-fine approach, first identifying anomalous time windows globally and then performing detailed spatial localization within those windows. The model is trained using a three-stage progressive paradigm, incorporating supervised fine-tuning and reinforcement learning with custom temporal and spatial IoU rewards. Experiments show CaC achieves a 25.7% accuracy improvement on fine-grained anomaly benchmarks and reduces anomalies in generated videos by 11.7%. AI
RANK_REASON The cluster contains an academic paper detailing a new model and its performance. [lever_c_demoted from research: ic=1 ai=1.0]
- Concentrate and Concentrate (CaC)
- Group Relative Policy Optimization (GRPO)
- JiYuan Wang
- Vision-Language Models
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →