PulseAugur / Brief
EN
LIVE 22:14:45

Brief

last 24h
[1/1] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

    Researchers have developed Video-o3, a new framework designed to improve the understanding of long videos by enabling iterative discovery of relevant visual clues and fine-grained inspection of key segments. The system addresses challenges in tool invocation for multimodal models by using Task-Decoupled Attention Masking to separate reasoning and tool-calling while preserving global context. To manage context length and improve efficiency, it employs a Verifiable Trajectory-Guided Reward mechanism. The framework is supported by a data synthesis pipeline that created Seeker-173K, a dataset of 173,000 tool-interaction trajectories, leading to significant performance gains on benchmarks like MLVU and Video-Holmes. AI

    IMPACT Introduces a novel framework for long video understanding, potentially improving AI's ability to process and reason over extensive video content.