Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Daily Papers English(EN) · 5d · [13 sources]

Latent Spatial Memory for Video World Models

Researchers have introduced "ImageTime," a new benchmark designed to evaluate how well image generation models can understand and represent temporal changes. This benchmark assesses spatiotemporal consistency by requiring models to generate four ordered key states of an action, moving beyond single-image quality metrics. Separately, a new framework called BiWM has been developed to advance open-source interactive video world models using bidirectional autoregression, aiming to improve generation quality and inference speed. Another paper proposes "latent spatial memory" for video world models, storing scene information directly in the diffusion latent space to significantly speed up generation and reduce memory footprint. AI

IMPACT Advances in video world modeling benchmarks and frameworks could accelerate progress in generative AI for video and simulation.
- RealEstate10K
- Mirage
- WorldScore
- Yume-1.5
- Wan2.1-1.3B
- HunyuanVideo-1.5-8B
- ImageTime
- GPT-5.5
- CALVIN
- Matrix-Game-3.0
- minWM
- Wan2.2-5B
- LTX-2.3-22B
TOOL · arXiv cs.CV English(EN) · 1mo

From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation

Researchers have developed a new method called MoLA (Mixture of Latent Actions) to improve robot manipulation by better utilizing predicted future video frames. MoLA transforms these imagined futures into executable actions by employing a mixture of pretrained inverse dynamics models. This approach captures various visual cues to infer physically grounded actions, bridging the gap between video generation and policy execution. Evaluations on simulated and real-world tasks show MoLA enhances task success, temporal consistency, and generalization capabilities. AI

IMPACT Enhances robot control by leveraging video generation for more precise action execution.
- LIBERO
- MoLA
- robot manipulation
- CALVIN
- LIBERO-Plus

Brief

Latent Spatial Memory for Video World Models

From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation