PulseAugur / Brief
EN
LIVE 11:39:49

Brief

last 24h
[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. baidu/NAVA

    Baidu has released NAVA, a 6.3 billion parameter model capable of generating synchronized audio and video from a single text prompt. This model utilizes an Align-then-Fuse MMDiT architecture to achieve state-of-the-art performance on audio-visual synchronization benchmarks. NAVA can produce 720p, one-minute videos with stereo audio in approximately one minute and offers precise control over speaker voice timbre. AI

    IMPACT Sets new SOTA on audio-visual synchronization benchmarks with a significantly smaller parameter count.

  2. LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

    Researchers have introduced OmniCustom, a framework for customizing both video identity and audio timbre simultaneously from reference images and audio. This DiT-based model uses separate LoRA modules for identity and timbre control, enhanced by a contrastive learning objective. Separately, the NAVA framework offers native audio-visual alignment for joint generation, improving synchronization and timbre controllability with a 6.3B parameter model. Additionally, LongAV-Compass has been developed as a benchmark for evaluating minute-long audio-visual generation across various conditioning modalities, assessing consistency and alignment over extended durations. AI

    IMPACT New models and benchmarks improve control and evaluation for audio-visual generation, pushing the boundaries of synchronized media synthesis.