English(EN) LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

新研究解决了视频生成在控制、一致性和效率方面的挑战

作者 PulseAugur 编辑部 · [59 个来源] · 2026-06-01 00:00

研究人员正在开发先进的视频生成技术，重点是提高控制、一致性和效率。CineOrchestra 旨在统一对电影视频中主体、摄像机和镜头转换的控制。TetherCache 通过管理缓存内存来解决长格式自回归视频生成中的漂移和质量下降问题。Argus 使用新颖的身份注入方法来增强在各种挑战性条件下的主体保留能力。MilliVid 采用分层潜在空间来实现长程一致性，而 RhymeFlow 通过解耦去噪轨迹来加速扩散变换器。Echo-Infinity 引入了可学习的演化内存来实现实时无限视频生成，MBench 为评估视频世界模型中的内存能力提供了基准。 AI

影响视频生成模型的进步正在提高控制、一致性和效率，为更复杂的应用铺平了道路。

排序理由在 arXiv 和 Hugging Face 上发表了多篇研究论文，详细介绍了视频生成的新方法和基准。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 59 个来源。我们如何撰写摘要 →

报道来源 [59]

arXiv cs.AI TIER_1 English(EN) · Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu · 2026-06-18 04:00

VidCRAFT3：用于图像到视频生成的相机、对象和光照控制

arXiv:2502.07531v5 Announce Type: replace-cross Abstract: Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential f…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-17 05:27

UniTemp：通过双向蒸馏解锁任意时间顺序的视频生成

Autoregressive video diffusion models have emerged as a promising approach for long video generation, achieving strong performance in streaming settings. However, existing methods are restricted to forward temporal generation, whereas practical video creation often requires flexi…
arXiv cs.AI TIER_1 English(EN) · Zhenyu Yang, Kairui Zhang, Bing Wang, Shengsheng Qian, Changsheng Xu · 2026-06-17 04:00

LiveStarPro：具有分层记忆的流式视频主动理解，用于长时流

arXiv:2606.17798v1 Announce Type: cross Abstract: Despite the remarkable progress of Video Large Language Models (Video-LLMs), current online architectures still struggle to simultaneously process continuous video streams, decide autonomously when to respond, and preserve long-ho…
arXiv cs.LG TIER_1 English(EN) · Adnan El Assadi, Roman Solomatin, Isaac Chung, Chenghao Xiao, Deep Shah, Manan Dey, Shriya Sudhakar, Zacharie Bugaud, Wissam Siblini, Ayush Sunil Munot, Yashwanth Devavarapu, Rakshitha Ireddi, Michelle Yang, M\'arton Kardos, Niklas Muennighoff, Kenneth E… · 2026-06-16 04:00

MVEB: 海量视频嵌入基准测试

arXiv:2606.14958v1 Announce Type: cross Abstract: We introduce the Massive Video Embedding Benchmark (MVEB), a 23-task benchmark for video embeddings spanning classification, zero-shot classification, clustering, pair classification, retrieval, and video-centric question answerin…
arXiv cs.AI TIER_1 English(EN) · Sharath Girish, Tsai-Shien Chen, Zhikang Dong, Mukesh Singhal, Hao Chen, Sergey Tulyakov, Aliaksandr Siarohin · 2026-06-15 04:00

CineOrchestra：统一的以实体为中心的电影视频生成条件

arXiv:2606.13768v1 Announce Type: cross Abstract: Cinematic video depicts multiple subjects acting or interacting at specific moments, captured with deliberate camera movement, and stitched together by shot transitions. Together, these elements demand a level of fine-grained cont…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-15 00:00

PermaVid：通过解耦上下文记忆实现跨编辑的一致视频生成

PermaVid addresses long-term video consistency after edits by using multi-modal memory banks that separate appearance and geometric structure, enabling coherent video generation across time and viewpoints.
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Kenneth Enevoldsen · 2026-06-12 21:06

MVEB：大规模视频嵌入基准

We introduce the Massive Video Embedding Benchmark (MVEB), a 23-task benchmark for video embeddings spanning classification, zero-shot classification, clustering, pair classification, retrieval, and video-centric question answering. We evaluate 33 models and find that no single m…
arXiv cs.AI TIER_1 English(EN) · Yu Meng, Xiangyang Luo, Letian Li, Wenyuan Jiang, Chen Gao, Xinlei Chen, Yong Li, Xiao-Ping Zhang · 2026-06-12 04:00

TetherCache：通过门控回忆和可信对齐稳定自回归长视频生成

arXiv:2606.13035v1 Announce Type: cross Abstract: Autoregressive video diffusion models provide a natural formulation for streaming and variable-length video generation by conditioning newly generated frames on previously generated content. However, extending these models to minu…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-12 00:00

MVEB：大规模视频嵌入基准

A large-scale video embedding benchmark evaluates diverse models across multiple video understanding tasks, revealing that different model architectures excel in specific domains and demonstrating the nuanced impact of audio on performance based on dataset characteristics.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-12 00:00

Memento：为实现一致的长视频生成而进行重建以进行记忆

Memento is a subject-reconstruction-guided framework that improves long-form video generation by preserving recurring subjects through memory-based reconstruction and dual-query mechanisms.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 08:16

TetherCache：通过门控回忆和可信对齐稳定自回归长视频生成

Autoregressive video diffusion models provide a natural formulation for streaming and variable-length video generation by conditioning newly generated frames on previously generated content. However, extending these models to minute-level generation remains challenging: the limit…
arXiv cs.AI TIER_1 English(EN) · Zijie Meng, Jiwen Liu, Yufei Liu, Chengzhuo Tong, Xiaoqiang Liu, Yuanxing Zhang, Yulong Xu, Pengfei Wan · 2026-06-11 04:00

ARGUS：用于主体保留视频生成的堆叠多视图身份马赛克注入

arXiv:2606.11670v1 Announce Type: cross Abstract: Subject-preserving video generation is not solved by frontal-face similarity alone: a generated person must remain recognizable across motion, large viewpoint changes, expression shifts, occlusion, scale variation, and conflicts a…
arXiv cs.LG TIER_1 English(EN) · Ishaan Preetam Chandratreya, David Charatan, Basile Van Hoorick, Sergey Zakharov, Vitor Guizilini, Phillip Isola, Vincent Sitzmann · 2026-06-09 04:00

MilliVid：视频生成中长距离一致性的分层潜在表示

arXiv:2606.09056v1 Announce Type: cross Abstract: Video generative models have become increasingly powerful, but long-range consistency remains challenging to achieve because even a few dozen frames require impractically long transformer sequence lengths. We show that this issue …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 00:00

MBench：面向视频世界模型的全面内存能力基准测试

A new benchmark called MBench is introduced to evaluate the memory capabilities of video world models, focusing on entity, environment, and causal consistency over extended temporal horizons.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 00:00

MilliVid：视频生成中长程一致性的分层潜在表示

Video generative models achieve improved long-range consistency through coarse-to-fine token generation using a multi-scale autoencoder and diffusion model architecture.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

LoomVideo：将多模态输入统一到视频生成和编辑中

LoomVideo presents an efficient 5B-parameter unified architecture for video generation and editing that reduces computational overhead through novel conditioning mechanisms and multi-modal alignment techniques.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

RhymeFlow：无需训练即可加速视频生成，采用异步去噪流调度

RhymeFlow accelerates diffusion transformers for video generation by decoupling denoising trajectories across frames, using keyframe anchoring and latent trajectory projection to maintain visual quality while reducing computational overhead.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-03 00:00

Echo-Infinity：学习进化记忆以实现实时无限视频生成

Echo Infinity enables real-time infinite video generation using learnable evolving memory and unified relative RoPE to overcome limitations in existing autoregressive methods.
arXiv cs.AI TIER_1 English(EN) · Chenxu Wang, Mingda Chen · 2026-06-02 04:00

知识密集型视频生成

arXiv:2606.01285v1 Announce Type: cross Abstract: Text-to-video generation has advanced rapidly in visual quality, but remains under-evaluated for factuality and practical usefulness. We introduce knowledge-intensive video generation (KIVI), where models generate videos from shor…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-01 00:00

LongLive-RAG：长视频生成通用检索增强框架

LongLive-RAG addresses long-video generation challenges by using retrieval-augmented generation to overcome error accumulation from sliding-window attention, enabling better temporal coherence and quality.
arXiv cs.CV TIER_1 English(EN) · Siyi Chen, Shaowei Liu, Yixuan Jia, Zian Wang, Huan Ling, Qing Qu, Jun Gao · 2026-06-24 04:00

数据强制蒸馏：恢复少样本视频生成的丰富性和保真度

arXiv:2606.18478v2 Announce Type: replace Abstract: Recent progress has shown promise in distilling multi-step video diffusion models into efficient few-step students. Among them, Distribution Matching Distillation (DMD) and its successor DMD2 achieved strong generation quality a…
arXiv cs.CV TIER_1 English(EN) · Hui Ren, Yuval Alaluf, Omer Bar Tal, Alexander Schwing, Antonio Torralba, Yael Vinker · 2026-06-19 04:00

VideoSketcher：利用视频模型先验进行顺序草图生成

arXiv:2602.15819v2 Announce Type: replace Abstract: Sketching is inherently sequential: strokes are drawn progressively to explore and refine ideas. Yet most generative approaches treat sketches as static images, ignoring the temporal process underlying creative exploration. Mode…
arXiv cs.CV TIER_1 English(EN) · Yang Tan, Junlong Tong, Linan Yue, Hao Wu, Pengfei Fang, Xiaoyu Shen · 2026-06-19 04:00

ViCoStream：分阶段协同推理的视频大模型可实现超过100 FPS的流式传输

arXiv:2606.19849v1 Announce Type: new Abstract: Streaming VideoLLMs must continuously process incoming video while maintaining low query latency, making both video-ingestion throughput and query-time responsiveness critical for real-time deployment. Existing methods largely focus…
arXiv cs.CV TIER_1 English(EN) · Yin Li · 2026-06-17 05:27

UniTemp：通过双向蒸馏解锁任意时间顺序的视频生成

Autoregressive video diffusion models have emerged as a promising approach for long video generation, achieving strong performance in streaming settings. However, existing methods are restricted to forward temporal generation, whereas practical video creation often requires flexi…
arXiv cs.CV TIER_1 English(EN) · Zhe Zhao · 2026-06-17 01:39

连接创意意图与视觉质量：通过代理反馈循环实现创作者驱动的循环视频生成

Generative AI has made content creation increasingly accessible, but many AI-generated videos lack narrative coherence and creative direction, issues that become more substantial at longer durations. Unlike coding, where AI generation benefits from reliable feedback and technique…
arXiv cs.CV TIER_1 English(EN) · Jun Gao · 2026-06-16 20:38

数据强制蒸馏：在少样本视频生成中恢复多样性和保真度

Recent progress has shown promise in distilling multi-step video diffusion models into efficient few-step students. Among them, Distribution Matching Distillation (DMD) and its successor DMD2 achieved strong generation quality and fast convergence. However, due to the nature of t…
arXiv cs.CV TIER_1 English(EN) · Changsheng Xu · 2026-06-16 11:18

LiveStarPro：具有分层记忆的流式视频主动理解，用于长时流

Despite the remarkable progress of Video Large Language Models (Video-LLMs), current online architectures still struggle to simultaneously process continuous video streams, decide autonomously when to respond, and preserve long-horizon contextual memory. These obstacles undermine…
arXiv cs.CV TIER_1 English(EN) · Yizhou Zhao, Yifan Wang, Xiaoyuan Wang, Yushu Wu, Hao Zhang, Moayed Haji-Ali, Rameen Abdal, Ashkan Mirzaei, Yanyu Li, Willi Menapace, Laszlo Jeni, Sergey Tulyakov, Peter Wonka, Chaoyang Wang · 2026-06-16 04:00

GeoStream：迈向精确的相机控制流式视频生成

arXiv:2606.15162v1 Announce Type: new Abstract: Accurate interactive camera control is essential for video-based world models, but most existing approaches learn camera motion implicitly, leading to inaccurate control under out-of-distribution trajectories. Explicit geometric con…
arXiv cs.CV TIER_1 English(EN) · Xinlei Yin, Xiulian Peng, Xiao Li, Zhiwei Xiong, Yan Lu · 2026-06-16 04:00

闭环三元组协同生成长视频

arXiv:2606.16184v1 Announce Type: new Abstract: Multi-shot long-form video generation remains challenging due to identity drift and compounding inconsistencies across shots. While storyboard-driven pipelines improve controllability, they are often executed in a feed-forward manne…
arXiv cs.CV TIER_1 English(EN) · Shuai Yang, Bingjie Gao, Ziwei Liu, Jiaqi Wang, Dahua Lin, Tong Wu · 2026-06-16 04:00

PermaVid：通过解耦上下文记忆实现跨编辑的一致视频生成

arXiv:2606.16449v1 Announce Type: new Abstract: Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs stru…
arXiv cs.CV TIER_1 English(EN) · Chaoyu Li, Tianzhi Li, Fei Tao, Zhenyu Zhao, Ziqian Wu, Maozheng Zhao, Juntong Song, Cheng Niu, Pooyan Fazli · 2026-06-16 04:00

FrameOracle：学习视频中应该看什么以及看多少

arXiv:2510.03584v3 Announce Type: replace Abstract: Vision-language models (VLMs) advance video understanding but operate under tight computational budgets, making performance dependent on selecting a small, high-quality subset of frames. Existing frame sampling strategies, such …
arXiv cs.CV TIER_1 English(EN) · Tong Wu · 2026-06-15 09:20

PermaVid：通过解耦上下文记忆实现跨编辑的一致视频生成

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after suc…
arXiv cs.CV TIER_1 English(EN) · Xuan Wei, Longbin Ji, Guan Wang, Xiangrui Liu, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Qingqi Hong · 2026-06-15 04:00

Memento：为实现一致的长视频生成而进行重建以实现记忆

arXiv:2606.14667v1 Announce Type: new Abstract: Long-form video generation requires recurring subjects to remain consistent across various shots, viewpoints, motions, and scene transitions. Existing temporal decomposition methods improve scalability by generating videos shot by s…
arXiv cs.CV TIER_1 English(EN) · Qingqi Hong · 2026-06-12 17:37

Memento：为实现一致的长视频生成而重建以进行记忆

Long-form video generation requires recurring subjects to remain consistent across various shots, viewpoints, motions, and scene transitions. Existing temporal decomposition methods improve scalability by generating videos shot by shot. However, they mainly focus on optimizing pl…
arXiv cs.CV TIER_1 English(EN) · Xiao-Ping Zhang · 2026-06-11 08:16

TetherCache：通过门控回忆和可信对齐稳定自回归长视频生成

Autoregressive video diffusion models provide a natural formulation for streaming and variable-length video generation by conditioning newly generated frames on previously generated content. However, extending these models to minute-level generation remains challenging: the limit…
arXiv cs.CV TIER_1 English(EN) · Jingxu Zhang, Yuqian Hong, Daneul Kim, Kai Qiu, Qi Dai, Jianmin Bao, Yifan Yang, Xiaoyan Sun, Chong Luo · 2026-06-11 04:00

面向开放域定制化视频生成的综合生态系统

arXiv:2606.11783v1 Announce Type: new Abstract: Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-speci…
arXiv cs.CV TIER_1 English(EN) · Chong Luo · 2026-06-10 08:15

面向开放域定制化视频生成的综合生态系统

Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-specific attributes. To address this, we introduce Pe…
arXiv cs.CV TIER_1 English(EN) · Cong Wang, Zhentao Yu, Hongmei Wang, Weicong Liang, Zixiang Zhou, Zilin Yang, Jiarong Ou, Rui Chen, Yuan Zhou, Qinglin Lu · 2026-06-10 04:00

HarmoView：为身份一致性视频生成协调多视图约束

arXiv:2606.10839v1 Announce Type: new Abstract: Current identity-consistent video generation methods struggle to preserve appearance fidelity under large viewpoint changes. While introducing multi-view reference input offers a natural solution, progress remains constrained by the…
arXiv cs.CV TIER_1 English(EN) · Qinglin Lu · 2026-06-09 13:26

HarmoView：协调多视图约束以实现身份一致的视频生成

Current identity-consistent video generation methods struggle to preserve appearance fidelity under large viewpoint changes. While introducing multi-view reference input offers a natural solution, progress remains constrained by the lack of effective frameworks for multi-view inp…
arXiv cs.CV TIER_1 English(EN) · Xinshuang Liu, Runfa Blark Li, Truong Nguyen · 2026-06-08 04:00

保持一致性的多样化视频生成

arXiv:2602.15287v2 Announce Type: replace Abstract: Text-to-video generation is expensive, so only a few samples are typically produced per prompt. In this low-sample regime, maximizing the value of each batch requires high cross-video diversity. Recent methods improve diversity …
arXiv cs.CV TIER_1 English(EN) · Tao Liu, Leela Krishna, Gouti Pavan Kumar, Sreeja K, Vishav Garg · 2026-06-05 04:00

V2V-Bench：视频到视频生成评估的综合基准

arXiv:2606.05665v1 Announce Type: new Abstract: Video-to-video (V2V) generation is difficult to evaluate because outputs must both follow editing instructions and preserve frame-level correspondence with the source video, which existing T2V and I2V metrics do not capture. We intr…
arXiv cs.CV TIER_1 English(EN) · Jianzong Wu, Hao Lian, Jiongfan Yang, Dachao Hao, Ye Tian, Yunhai Tong, Jingyuan Zhu, Biaolong Chen, Qiaosong Qi, Aixi Zhang, Wanggui He, Mushui Liu, Jinlong Liu, Hao Jiang · 2026-06-05 04:00

LoomVideo：将多模态输入统一到视频生成和编辑中

arXiv:2606.06042v1 Announce Type: new Abstract: Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically …
arXiv cs.CV TIER_1 English(EN) · Hao Jiang · 2026-06-04 11:35

LoomVideo：将多模态输入统一到视频生成和编辑中

Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source v…
arXiv cs.CV TIER_1 English(EN) · Yuxuan Bian, Zeyue Xue, Songchun Zhang, Shiyi Zhang, Weiyang Jin, Yaowei Li, Junhao Zhuang, Haoran Li, Jie Huang, Haoyang Huang, Nan Duan, Qiang Xu · 2026-06-04 04:00

Echo-Infinity：学习进化记忆以实现实时无限视频生成

arXiv:2606.04527v1 Announce Type: cross Abstract: We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Exi…
arXiv cs.CV TIER_1 English(EN) · Qiang Xu · 2026-06-03 07:09

Echo-Infinity：学习进化记忆以实现实时无限视频生成

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined…
arXiv cs.CV TIER_1 English(EN) · Zhengxuan Wei, Xu Guo, Xinghui Li, Xunzhi Xiang, Min Wei, Yiran Zhu, Qiulin Wang, Xintao Wang, Pengfei Wan, Xiangwang Hou, Qi Fan · 2026-06-02 04:00

面向视频世界模型的几何感知隐式记忆

arXiv:2606.02436v1 Announce Type: new Abstract: Video world models aim to simulate controllable visual environments, but long-horizon rollouts depend on what the model remembers after observations leave its native context window. Explicit memories retain frames or online 3D recon…
arXiv cs.CV TIER_1 English(EN) · Lei Zhu, Xing Cai, Yingjie Chen, Yiheng Li, Binxin Yang, Hao Liu, Jie Chen, Chen Li, Jing LYu · 2026-06-02 04:00

OmniHuman：面向以人为中心的视频生成的大规模数据集和基准

arXiv:2604.18326v2 Announce Type: replace Abstract: Recent advancements in audio-video joint generation models have demonstrated impressive capabilities in content creation. However, generating high-fidelity human-centric videos in complex, real-world physical scenes remains a si…
arXiv cs.CV TIER_1 English(EN) · Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen · 2026-06-02 04:00

LongLive-RAG：长视频生成通用检索增强框架

arXiv:2606.02553v1 Announce Type: new Abstract: Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention du…
arXiv cs.CV TIER_1 English(EN) · Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee, Hyunwoo J. Kim · 2026-06-02 04:00

检索缺失部分：最大化覆盖率以实现一致的长视频生成

arXiv:2606.02479v1 Announce Type: new Abstract: Maintaining long-term geometric consistency remains challenging for long-horizon autoregressive video generation. Memory-augmented generative models address this by retrieving historical frames, but their effectiveness depends on tw…
arXiv cs.CV TIER_1 English(EN) · Shengjun Zhang, Zhang Zhang, Simin Huang, Zhenyu Tang, Hanyang Wang, Chensheng Dai, Min Chen, Yifan Li, Yuxin Li, Yingjie Chen, Hao Liu, Chen Li, Yueqi Duan · 2026-06-02 04:00

MBench：面向视频世界模型的全面内存能力基准测试

arXiv:2606.00793v1 Announce Type: new Abstract: Recent advancements in video-based world models have demonstrated an unprecedented ability to synthesize high-fidelity visual sequences. However, a fundamental gap persists between visually plausible video generation and the functio…
arXiv cs.CV TIER_1 English(EN) · Yukang Chen · 2026-06-01 17:50

LongLive-RAG：长视频生成通用检索增强框架

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible ge…
arXiv cs.CV TIER_1 English(EN) · Hyunwoo J. Kim · 2026-06-01 16:49

检索缺失内容：最大化覆盖率以实现一致的长视频生成

Maintaining long-term geometric consistency remains challenging for long-horizon autoregressive video generation. Memory-augmented generative models address this by retrieving historical frames, but their effectiveness depends on two key design choices: what 3D-geometric evidence…
arXiv cs.CV TIER_1 English(EN) · Qi Fan · 2026-06-01 16:08

面向视频世界模型的几何感知隐式记忆

Video world models aim to simulate controllable visual environments, but long-horizon rollouts depend on what the model remembers after observations leave its native context window. Explicit memories retain frames or online 3D reconstructions, which can suffer from heuristic retr…
arXiv cs.CV TIER_1 English(EN) · Lin Zhao, Yushu Wu, Yifan Gong, Yanzhi Wang, Pu Zhao · 2026-06-01 04:00

OmniMem：面向长视频生成的、可扩展且自适应的记忆检索

arXiv:2605.30519v1 Announce Type: new Abstract: Autoregressive (AR) video generation extends videos by producing latent chunks sequentially, but scaling to long videos requires repeated access to a growing historical KV cache. Existing methods reduce this cost by truncating the K…
dev.to — Claude Code tag TIER_1 English(EN) · Aliaksei Zelianouski · 2026-06-13 20:46

我构建的自生成视频管道

<p>Let me show you something cool. This two-minute video was built by Claude Code from a single prompt.</p> <p> </p> <p>Okay — one prompt and about thirty follow-ups. And then twenty more after Claude Code fumbled a git command and wiped out half of my video-editing material (don…
r/StableDiffusion TIER_2 English(EN) · /u/Sporeboss · 2026-06-22 03:04

PermaVid：通过解耦上下文记忆实现跨编辑的一致视频生成（GitHub链接在描述中，包含400GB训练数据集）

<div class="md"><p><a href="https://ys-imtech.github.io/projects/PermaVid/">https://ys-imtech.github.io/projects/PermaVid/</a></p> <p><a href="https://huggingface.co/datasets/ysmikey/PermaVid_datasets">https://huggingface.co/datasets/ysmikey/PermaVid_datasets</a></…
r/StableDiffusion TIER_2 Dansk(DA) · /u/hpyfox · 2026-06-18 16:23

寻找旧的视频生成模型

<table> <tr><td> <a href="https://www.reddit.com/r/StableDiffusion/comments/1u9ava9/finding_old_video_generation_models/"> <img alt="Finding old video generation models" src="https://preview.redd.it/zsu8inzlh28h1.gif?frame=1&width=140&height=140&crop=1:1,smart&aut…
r/StableDiffusion TIER_2 English(EN) · /u/DesireForDopamine · 2026-06-16 18:36

SCAIL-2 Infinity — 单节点实现无限长视频（无需再拼接采样器）+ Pusa LoRA 集成

<table> <tr><td> <a href="https://www.reddit.com/r/StableDiffusion/comments/1u7m255/scail2_infinity_a_single_node_for_unlimitedlength/"> <img alt="SCAIL-2 Infinity — a single node for unlimited-length video (no more chaining samplers) + Pusa LoRA integration" src="https://externa…
r/StableDiffusion TIER_2 Dansk(DA) · /u/DoskvolDenizen · 2026-06-08 10:31

关于视频关键帧整体图像生成管线的建议

<div class="md"><p>Been learning stable diffusion text to image and image to video with comfyui for a few months.</p> <p>Now I have so many tools at my disposal that I'm feeling a bit lost, so I'm hoping that people in here won't mind sharing some advice on an over…

报道来源 [59]

相关实体

相关话题