ByteDance has introduced Lance, a novel AI model capable of understanding, generating, and editing both images and videos within a single architecture. Unlike previous systems that often separate these functions, Lance was jointly trained from the outset to handle diverse tasks including captioning, visual question answering, text-to-image, text-to-video, and complex editing operations. The model achieves this by unifying all input modalities into a shared sequence and employing decoupled expert pathways for understanding and generation, enhanced by a new Modality-Aware Rotary Positional Encoding (MaPE) to manage different token types. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Sets a new precedent for unified multimodal AI, potentially simplifying development for applications requiring cross-modal understanding and generation.
RANK_REASON New multimodal model release from a major AI lab (ByteDance) with a novel architecture and capabilities. [lever_c_demoted from frontier_release: ic=1 ai=1.0]