DeepSeek has released Janus, a new multimodal model capable of processing both images and expressive speech. Concurrently, Meta AI has introduced Spirit-LM, a model that separates image understanding from the generation of expressive voice output. These advancements focus on enhancing the nuanced interaction between visual and auditory AI capabilities. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Release of new multimodal models by DeepSeek and Meta AI.