A new model, an 800M parameter version of a previous iteration, has been developed that can transform an image into a controllable character. This model increases the context window to 12 latent frames, improving stability over its predecessor, though consistency remains a challenge. The architecture is similar to the prior version, with an expanded MLP and a denoising component trained from scratch using diffusion forcing. The model operates using a causal diffusion approach, where each frame undergoes a denoising loop and is added to a KV cache, effectively storing past frames. AI
IMPACT Enables new forms of interactive content and character generation for users with consumer hardware.
RANK_REASON The item describes a specific application of AI models for creating controllable characters from images, which falls under AI tooling.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →