Hacker News with Generative AI: Deep Learning

AI image recognition detects bubble-like structures in the universe (phys.org)
To learn more about the deepest reaches of our own galaxy and the mysteries of star formation, Japanese researchers have created a deep learning model.
The Matrix Calculus You Need for Deep Learning (explained.ai)
Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function.
Self-Supervised Learning from Images with JEPA (2023) (arxiv.org)
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations.
How DeepSeek Rewrote the Transformer [video] (youtube.com)
Physics-Based Deep Learning v4 (arxiv.org)
This document is a hands-on, comprehensive guide to deep learning in the realm of physical simulations.
Optimizing ML training with metagradient descent (arxiv.org)
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space.
VGGT: Visual Geometry Grounded Transformer (github.com/facebookresearch)
DeepSeek V3 is now the highest scoring non-reasoning model (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
The Original 2012 AlexNet Is Open Source Now (github.com/computerhistory)
This package contains the original 2012 AlexNet code.
Attention is NOT all you need (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
MIT 6.S191: Deep Generative Modeling [video] (youtube.com)
Deepseek V3-0324 (huggingface.co)
This model is not currently available via any of the supported Inference Providers.
Mac Studio M3 Ultra can run Deepseek R1 671B in memory using <200W (techradar.com)
PyTorch Internals: Ezyang's Blog (ezyang.com)
This post is a long form essay version of a talk about PyTorch internals, that I gave at the PyTorch NYC meetup on May 14, 2019.
Quantitative Finance: Kronecker-Factored Approximate Curvature Deep Hedging (arxiv.org)
This paper advances the computational efficiency of Deep Hedging frameworks through the novel integration of Kronecker-Factored Approximate Curvature (K-FAC) optimization.
Physics-based Deep Learning Book (v0.3, the GenAI edition) (physicsbaseddeeplearning.org)
Welcome to the Physics-based Deep Learning Book (v0.3, the GenAI edition) 👋
Running DeepSeek R1 on my desk (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
Nvidia Blackwell Delivers World-Record DeepSeek-R1 Inference Performance (nvidia.com)
NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025.
Implementing a GPT Model from Scratch [video] (youtube.com)
Deep Learning Is Not So Mysterious or Different (arxiv.org)
Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization.
Transformers Without Normalization (jiachenzhu.github.io)
This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique.
DeepSeek-R1 now available as a managed serverless model in Amazon Bedrock (amazon.com)
As of January 30, DeepSeek-R1 models became available in Amazon Bedrock through the Amazon Bedrock Marketplace and Amazon Bedrock Custom Model Import. Since then, thousands of customers have deployed these models in Amazon Bedrock. Customers value the robust guardrails and comprehensive tooling for safe AI deployment. Today, we’re making it even easier to use DeepSeek in Amazon Bedrock through an expanded range of options, including a new serverless solution.
Understanding Transformers (beyond the Math) – kalomaze's kalomazing blog (bearblog.dev)
Maybe you don't want to attempt the conventional approaches for understanding the transformer architecture for language models. If you're anything like me, an informal approach is what you'd prefer - one that helps you reason about what's happening with these models in the abstract, without requiring mastery on the technical level to begin with.
RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals (arxiv.org)
Transformers have achieved great success in effectively processing sequential data such as text.
InstantStyle: Free Lunch Towards Style-Preserving in Text-to-Image Generation (github.com/instantX-research)
InstantStyle is a general framework that employs two straightforward yet potent techniques for achieving an effective disentanglement of style and content from reference images.
Paper Review: Variational Lossy Auto-Encoders (theahura.substack.com)
This post is part of a series of paper reviews, covering the ~30 papers Ilya Sutskever sent to John Carmack to learn about AI. To see the rest of the reviews, go here.
Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] (arxiv.org)
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.
ChatGPT is made from 100M of these [The Perceptron] [video] (youtube.com)
DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion (aslp-lab.github.io)
Recent advancements in music generation have garnered significant attention, yet existing approaches face critical limitations.
Cautious Optimizers: Improving Training with One Line of Code (arxiv.org)
AdamW has been the default optimizer for transformer pretraining. For many years, our community searched for faster and more stable optimizers with only constrained positive outcomes. In this work, we propose a single-line modification in Pytorch to any momentum-based optimizer, which we rename cautious optimizer, e.g. C-AdamW and C-Lion.