Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 1w · [14 sources]

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Researchers have introduced OmniCustom, a framework for customizing both video identity and audio timbre simultaneously from reference images and audio. This DiT-based model uses separate LoRA modules for identity and timbre control, enhanced by a contrastive learning objective. Separately, the NAVA framework offers native audio-visual alignment for joint generation, improving synchronization and timbre controllability with a 6.3B parameter model. Additionally, LongAV-Compass has been developed as a benchmark for evaluating minute-long audio-visual generation across various conditioning modalities, assessing consistency and alignment over extended durations. AI

IMPACT New models and benchmarks improve control and evaluation for audio-visual generation, pushing the boundaries of synchronized media synthesis.

DINO-v2
LoRA
MMDiT
NAVA
ImageBind
FlatSounds
Verse-Bench
Seed-TTS
OmniCustom
LongAV-Compass
ArcFace