New model turns images into controllable characters with expanded context

By PulseAugur Editorial · [1 sources] · 2026-06-28 23:55

A new model, an 800M parameter version of a previous iteration, has been developed that can transform an image into a controllable character. This model increases the context window to 12 latent frames, improving stability over its predecessor, though consistency remains a challenge. The architecture is similar to the prior version, with an expanded MLP and a denoising component trained from scratch using diffusion forcing. The model operates using a causal diffusion approach, where each frame undergoes a denoising loop and is added to a KV cache, effectively storing past frames. AI

IMPACT Enables new forms of interactive content and character generation for users with consumer hardware.

RANK_REASON The item describes a specific application of AI models for creating controllable characters from images, which falls under AI tooling.

Read on r/LocalLLaMA →

LocalLLaMA

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New model turns images into controllable characters with expanded context

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/lucidml_lover · 2026-06-28 23:55

Locally running mode turns an Image into a Cute Controllable Character you can Play as

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1uicq8x/locally_running_mode_turns_an_image_into_a_cute/"> <img alt="Locally running mode turns an Image into a Cute Controllable Character you can Play as" src="https://external-preview.redd.it/ZmF3NGJ2M2EwNG…

COVERAGE [1]

Locally running mode turns an Image into a Cute Controllable Character you can Play as

RELATED ENTITIES

RELATED TOPICS