ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions
Researchers have developed ShotCrop$^3$, a novel system for automatically generating cinematic triple-shot compositions from single human-centric images. This system aims to provide multiple crops—establishing, medium, and close-up—each with a descriptive caption to aid visual storytelling. ShotCrop$^3$ utilizes a three-stage training process involving Chain-of-Thought fine-tuning, semi-supervised learning with pseudo-labels, and Group Relative Policy Optimization (GRPO-S) to enhance its aesthetic and narrative cropping capabilities. AI
IMPACT This research could enable more efficient content creation workflows by automating the generation of varied shots for visual storytelling.