PulseAugur
EN
LIVE 02:39:10

New model turns images into controllable characters with expanded context

A new model, an 800M parameter version of a previous iteration, has been developed that can transform an image into a controllable character. This model increases the context window to 12 latent frames, improving stability over its predecessor, though consistency remains a challenge. The architecture is similar to the prior version, with an expanded MLP and a denoising component trained from scratch using diffusion forcing. The model operates using a causal diffusion approach, where each frame undergoes a denoising loop and is added to a KV cache, effectively storing past frames. AI

IMPACT Enables new forms of interactive content and character generation for users with consumer hardware.

RANK_REASON The item describes a specific application of AI models for creating controllable characters from images, which falls under AI tooling.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New model turns images into controllable characters with expanded context

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/lucidml_lover ·

    Locally running mode turns an Image into a Cute Controllable Character you can Play as

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1uicq8x/locally_running_mode_turns_an_image_into_a_cute/"> <img alt="Locally running mode turns an Image into a Cute Controllable Character you can Play as" src="https://external-preview.redd.it/ZmF3NGJ2M2EwNG…