Hacker News with Generative AI: Deep Learning

Titans: Learning to Memorize at Test Time (arxiv.org)
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
Ε, a Nuisance No More (zna.do)
For a while now I have been advocating for tuning ε in various parts of the modern deep learning stack, and in this post I’ll explain why.
Titans: Learning to Memorize at Test Time (arxiv.org)
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
Intrinsic Dimensions: How Learning in Large Models Is Driven by a Few Parameters (medium.com)
Learned over-parameterized models inherently exist within a low intrinsic dimension {Li et al.¹ and Aghajanyan et al.³}. To understand this concept better, let’s delve into the following questions:
Show HN: DeepFace – A lightweight deep face recognition library for Python (github.com/serengil)
DeepFace is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python.
1.58-Bit Flux (chenglin-yang.github.io)
We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images.
Tenstorrent Wormhole Series (corsix.org)
A company called Tenstorrent design and sell PCIe cards for AI acceleration. At the time of writing, they've recently started shipping their Wormhole n150s and Wormhole n300s cards.
Beyond Gradient Averaging in Parallel Optimization (arxiv.org)
We introduce Gradient Agreement Filtering (GAF) to improve on gradient averaging in distributed deep learning optimization.
Hallucination of closed repeat proteins containing central pockets (2023) (nature.com)
Inspired by these proteins, we devised a deep-learning-based approach to broadly exploring the space of closed repeat proteins starting from only a specification of the repeat number and length.
The Structure of Neural Embeddings (seanpedersen.github.io)
A small collection of insights on the structure of embeddings (latent spaces) produced by deep neural networks.
DeepSeek-V3 (deepseek.com)
DeepSeek-v3 Technical Report [pdf] (github.com/deepseek-ai)
Exploring LoRA – Part 1: The Idea Behind Parameter Efficient Fine-Tuning (medium.com)
Pre-trained large language models undergo extensive training on vast data from the internet, resulting in exceptional performance across a broad spectrum of tasks. Nonetheless, in most real-world scenarios, there arises a necessity for the model to possess expertise in a particular, specialized domain.
No More Adam: Learning Rate Scaling at Initialization Is All You Need (arxiv.org)
In this work, we question the necessity of adaptive gradient methods for training deep neural networks.
FastVideo: a lightweight framework for accelerating large video diffusion models (github.com/hao-ai-lab)
FastVideo is a lightweight framework for accelerating large video diffusion models.
Veo 2: Our video generation model (deepmind.google)
Veo creates videos with realistic motion and high quality output, up to 4K. Explore different styles and find your own with extensive camera controls.
Sequence to sequence learning with neural networks: what a decade (youtube.com)
Founder who built Snap's AI launches a snappy new take on video chatbots (techcrunch.com)
A deep learning scientist whose last startup was acquired by Snap to build its My AI chatbot has raised seed funding for his latest venture: a platform for building and operating real-time, video-based conversational AI agents.
From Unemployment to Lisp: Running GPT-2 on a Teen's Deep Learning Compiler (github.com/hikettei)
This repository is still in the early stages of development. Additionally, it includes many experimental approaches. Please consider this as a place to experiment with my ideas. Do not use it in a product under any circumstances.
Ways to Use Torch.compile (ezyang.com)
On the surface, the value proposition of torch.compile is simple: compile your PyTorch model and it runs X% faster. But after having spent a lot of time helping users from all walks of life use torch.compile, I have found that actually understanding how this value proposition applies to your situation can be quite subtle! In this post, I want to walk through the ways to use torch.compile, and within these use cases, what works and what doesn't.
Convolutional Neural Network Visualization [video] (youtube.com)
FlashAttention on a Napkin:A Diagrammatic Approach to Deep Learning IO-Awareness (arxiv.org)
Optimizing deep learning algorithms currently requires slow, manual derivation, potentially leaving much performance untapped.
Google DeepMind team develops AI weather model (technologyreview.com)
Google DeepMind has unveiled an AI model that’s better at predicting the weather than the current best systems.
Google's DeepMind tackles weather forecasting, with great performance (arstechnica.com)
By some measures, AI systems are now competitive with traditional computing methods for generating weather forecasts.
Laser: Attention with Exponential Transformation (arxiv.org)
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention.
A Deep Dive into DDPMs (magic-with-latents.github.io)
DDPMs - Part 3
The Machine and Deep Learning Compendium (gitbook.io)
Hi! When I created the Machine & Deep Learning Compendium, it was a personal list of resources curated in a private Google document, for my own education.
Grant Sanderson: Visualizing transformers and attention [video] (youtube.com)
Using uv with PyTorch (astral.sh)
The PyTorch ecosystem is a popular choice for deep learning research and development. You can use uv to manage PyTorch projects and PyTorch dependencies across different Python versions and environments, even controlling for the choice of accelerator (e.g., CPU-only vs. CUDA).
PyTorch 101: Understanding Graphs, Automatic Differentiation and Autograd (digitalocean.com)
PyTorch is one of the foremost python deep learning libraries out there. It’s the go to choice for deep learning research, and as each days passes by, more and more companies and research labs are adopting this library.