Hacker News with Generative AI: Video Generation

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation (lllyasviel.github.io)
Diffuse thousands of frames at full fps-30 with 13B models using 6GB laptop GPU memory. Finetune 13B video model at batch size 64 on a single 8xA100/H100 node for personal/lab experiments. Personal RTX 4090 generates at speed 2.5 seconds/frame (unoptimized) or 1.5 seconds/frame (teacache). No timestep distillation. Video diffusion, but feels like image diffusion.
Tom and Jerry One-Minute Video Generation with Test-Time Training (test-time-training.github.io)
Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos with strong temporal consistency and motion smoothness.
Show HN: VaporVibe – auto-generate video demos for vibe-coded projects (influme.ai)
Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)
TL;DR: Video generation with DiTs is painfully slow – HunyuanVideo takes 16 minutes to generate just a 5-second video on an H100 with FlashAttention3. Our sliding tile attention (STA) slashes this to 5 minutes with zero quality loss, no extra training required. Specifically, STA accelerates attention alone by 2.8–17x over FlashAttention-2 and 1.6–10x over FlashAttention-3.
Goku Flow Based Video Generative Foundation Models (github.com/Saiyan-World)
Goku is a new family of joint image-and-video generation models based on rectified flow Transformers.
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation (hila-chefer.github.io)
Despite tremendous recent progress, generative video models still struggle to capture real-world motion, dynamics, and physics.
Veo 2: Our video generation model (deepmind.google)
Veo creates videos with realistic motion and high quality output, up to 4K. Explore different styles and find your own with extensive camera controls.
Veo and Imagen 3: Announcing new video and image generation models on Vertex AI (cloud.google.com)
Generative AI is leading to real business growth and transformation. Among enterprise companies with gen AI in production, 86% report an increase in revenue1, with an estimated 6% growth. That’s why Google is investing in its AI technology with new models like Veo, our most advanced video generation model, and Imagen 3, our highest quality image generation model.
OpenAI's Sora has been leaked (techcrunch.com)
A group appears to have leaked access to Sora, OpenAI’s video generator, in protest of what they’re calling duplicity and “art washing” on OpenAI’s part.
The Matrix: a foundation world model for generating infinite-length videos (twitter.com)
Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0 (twitter.com)
Sora-like text-to-video model from Chinese startup Minimax, 10 examples (twitter.com)
CogVideoX: A Cutting-Edge Video Generation Model (medium.com)
Show HN: AnimeGenAi – AI-powered anime style image and video generator (animegenai.com)
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation (stevenlsw.github.io)
Open-Sora does pretty good video generation on consumer GPUs (backprop.co)
Gen-3 Alpha: A New Frontier for Video Generation (runwayml.com)
Highly realistic talking head video generation (github.com/fudan-generative-vision)
Google announces Veo, their Sora competing text/image-to-video model (aitestkitchen.withgoogle.com)
StoryDiffusion: Long-range image and video generation (storydiffusion.github.io)
China's VIDU Video Generation AI Competes with OpenAI's Sora (medium.com)