Hacker News with Generative AI: Generative Models

GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting (ucwxb.github.io)
Gaze estimation encounters generalization challenges when dealing with out-of-distribution data.
The geometry of data: the missing metric tensor and the Stein score [Part II] (christianperone.com)
I’m writing this second part of the series because I couldn’t find any formalisation of this metric tensor that naturally arises from the Stein score (especially when used with learned models), and much less blog posts or articles about it, which is surprising given its deep connection between score-based generative models, diffusion models and the geometry of the data manifold.
GenXD: Generating Any 3D and 4D Scenes (arxiv.org)
Recent developments in 2D visual generation have been remarkably successful. However, 3D and 4D generation remain challenging in real-world applications due to the lack of large-scale 4D data and effective model design.
Lotus: Diffusion-Based Visual Foundation Model for High-Quality Dense Prediction (lotus3d.github.io)
We present Lotus, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency (loopyavatar.github.io)
Degas: Detailed Expressions on Full-Body Gaussian Avatars (initialneil.github.io)
The Path to StyleGan2 – Implementing the Progressive Growing GAN (ym2132.github.io)
Stable Diffusion 3 Medium Released (huggingface.co)
Level of Gaussians: Real-Time View Synthesis for Millions of Square Meters (zju3dv.github.io)
VideoGigaGAN: Towards Detail-Rich Video Super-Resolution (videogigagan.github.io)