Hacker News with Generative AI: Deep Learning

Show HN: Free mammogram analysis tool combining deep learning and vision LLM (neuralrad.com:5300)

Machine Learning, Healthcare, Deep Learning, Computer Vision

17 points by coolwulf 56 days ago | 15 comments

Scaling RNNs to Billions of Parameters with Zero Order (arxiv.org)
During inference, Recurrent Neural Networks (RNNs) scale constant in both FLOPs and GPU memory with increasing context length, as they compress all prior tokens into a fixed-size memory.

Machine Learning, Artificial Intelligence, Deep Learning

7 points by fchaubard 57 days ago | 3 comments

You could have invented Transformers (gwern.net)
‘You Could Have Invented Transformers’ tutorial proposal

Artificial Intelligence, Machine Learning, Computer Science, Deep Learning

34 points by jxmorris12 58 days ago | 0 comments

Ask HN: AI Reading List (ycombinator.com)
In the thread about John Carmack presentation, somebody mentioned the reading list he got from Ilya which were crucial to understand what matters and the current state of the knowledge (at the time).<p>After some googling, it seems like this list is plausible, although not confirmed: https://github.com/dzyim/ilya-sutskever-recommended-reading?tab=readme-ov-file<p>What would an actualized list look today ?

Artificial Intelligence, Machine Learning, Reading Lists, OpenAI, Deep Learning

13 points by TheAlchemist 60 days ago | 4 comments

Attention Wasn't All We Needed (stephendiehl.com)
There's a lot of modern transformer techniques that have been developed since the original Attention Is All You Need paper. Let's look at some of the most important ones that have been developed over the years and try to implement the basic ideas as succintly as possible. We'll use the Pytorch framework for most of the examples.

Generative AI, Machine Learning, Deep Learning, Transformer Models

130 points by mooreds 60 days ago | 24 comments

Deep Learning is no Intelligence (cullmann.dev)
Here we are in the year 2025 and every company that wants to grab your money now peddles AI.

Artificial Intelligence, Deep Learning, Opinion

6 points by todsacerdoti 60 days ago | 0 comments

The Annotated Kolmogorov-Arnold Network (Kan) (alexzhang13.github.io)
Deep neural networks have been the driving force of developments in AI in the last decade. However, they currently suffer from several known issues such as a lack of interpretability, scaling issues, and data inefficiency – in other words, while they are powerful, they are not a perfect solution.

Artificial Intelligence, Machine Learning, Deep Learning, Networks

36 points by jxmorris12 61 days ago | 2 comments

SUS backprop: linear backpropagation algorithm for long inputs in transformers (arxiv.org)
It is straightforward to design an unbiased gradient estimator that stochastically cuts the backpropagation flow through any part of a computational graph.

Artificial Intelligence, Machine Learning, Transformers, Deep Learning

9 points by brandonb 62 days ago | 0 comments

µPC: Scaling Predictive Coding to 100 Layer Networks (arxiv.org)
The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs) has recently been posed as a challenge for the community (Pinchetti et al., 2024).

Artificial Intelligence, Machine Learning, Deep Learning, Neural Networks

32 points by frozenseven 62 days ago | 0 comments

Deep Learning Is Applied Topology (theahura.substack.com)
When I think about AI, I think about topology.

Deep Learning, Artificial Intelligence, Topology, Mathematics

508 points by theahura 63 days ago | 183 comments

Questioning Representational Optimism in Deep Learning (github.com/akarshkumar0101)
Much of the excitement in modern AI is driven by the observation that scaling up existing systems leads to better performance.

Deep Learning, Artificial Intelligence, Machine Learning

46 points by mattdesl 63 days ago | 7 comments

You could have designed state of the art positional encoding (huggingface.co)
This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models.

Transformers, Machine Learning, Deep Learning

5 points by FL33TW00D 63 days ago | 0 comments

Show HN: A highly extensible framework for building OCR systems (github.com/robbyzhaox)
MyOCR is a highly extensible and customizable framework for building OCR systems. Engineers can easily train, integrate deep learning models into custom OCR pipelines for real-world applications.

Open Source, Computer Vision, OCR, Deep Learning, Software

16 points by robbyzhao 64 days ago | 0 comments

Wav2Lip: Accurately Lip-Syncing Videos and OpenVINO (github.com/openvinotoolkit)

Deep Learning, Computer Vision, Video Editing, Open Source

3 points by handfuloflight 68 days ago | 0 comments

Byte latent transformer: Patches scale better than tokens (2024) (arxiv.org)
We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness.

Generative AI, Computer Science, Deep Learning

107 points by dlojudice 71 days ago | 22 comments

The Speed of VITs and CNNs (eyer.be)
It is often stated that because of the quadratic self-attention, ViTs aren't practical at higher resolution.

Computer Vision, Artificial Intelligence, Deep Learning, CNNs, Vision Transformers

74 points by jxmorris12 81 days ago | 23 comments

Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time (huggingface.co)

Generative AI, Machine Learning, Deep Learning, Inference

5 points by pr337h4m 82 days ago | 1 comments

CosAE: Learnable Fourier Series for Image Restoration (sifeiliu.net)
In this paper, we introduce CosAE (Cosine Autoencoder), a novel, generic Autoencoder that seamlessly leverages the classic Fourier series with a feed-forward neural network.

Image Restoration, Machine Learning, Computer Vision, Artificial Intelligence, Deep Learning

69 points by E-Reverance 87 days ago | 17 comments

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch (arxiv.org)
CUDA Graphs -- a recent hardware feature introduced for NVIDIA GPUs -- aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG. However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result -- deploying CUDA Graphs hurts performance in many cases.

CUDA, PyTorch, Performance Optimization, GPU Programming, Deep Learning

84 points by mfiguiere 89 days ago | 8 comments

Three things everyone should know about Vision Transformers (arxiv.org)
After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and video analysis.

Computer Vision, Transformers, Artificial Intelligence, Deep Learning

71 points by reqo 89 days ago | 17 comments

Double Descent Demystified: size of smallest non-zero singular value of X (arxiv.org)
Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime.

Machine Learning, Deep Learning, Mathematical Concepts

8 points by highfrequency 90 days ago | 1 comments

Improving Deep Learning with a Little Help from Physics (quantamagazine.org)
Rose Yu has a plan for how to make AI better, faster and smarter — and it’s already yielding results.

Deep Learning, Artificial Intelligence, Physics, Machine Learning

10 points by marsh_mellow 90 days ago | 0 comments

Show HN: Keep your PyTorch model in VRAM by hot swapping code (github.com/valine)
This is an example of how to hotswap PyTorch training code without unloading your model weights from VRAM.

Machine Learning, PyTorch, Deep Learning, Software, Programming

77 points by valine 93 days ago | 7 comments

Sparsely-Gated Mixture of Experts (MoE) (thegreenplace.net)
In transformer models, the attention block is typically followed by a feed forward layer (FF), which is a simple fully-connected NN with a hidden layer and nonlinearity.

Transformers, Deep Learning, Machine Learning, Computer Science

15 points by mfrw 94 days ago | 0 comments

A curated blog for learning LLM internals: tokenize, attention, PE, and more (ycombinator.com)
I've been diving deep into the internals of Large Language Models (LLMs) and started documenting my findings.

Machine Learning, Deep Learning, Computer Science

6 points by zljdanceholic 94 days ago | 1 comments

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation (lllyasviel.github.io)
Diffuse thousands of frames at full fps-30 with 13B models using 6GB laptop GPU memory. Finetune 13B video model at batch size 64 on a single 8xA100/H100 node for personal/lab experiments. Personal RTX 4090 generates at speed 2.5 seconds/frame (unoptimized) or 1.5 seconds/frame (teacache). No timestep distillation. Video diffusion, but feels like image diffusion.

Video Generation, Deep Learning, Computer Vision, Artificial Intelligence

270 points by GaggiX 94 days ago | 27 comments

Show HN: I built a deep learning engine from scratch in Python (github.com/whitegra)
It implements deep learning architecture and training logic without relying on NumPy, PyTorch, or any external libraries. Every operation—tensor arithmetic, backpropagation, attention, and optimization—is executed through hand-written, minimal Python logic.

Deep Learning, Python, Machine Learning, Software Development, Open Source

30 points by gmwhitebox_dev 97 days ago | 3 comments

Understanding Some Limits of DeepSeek (ycombinator.com)
Recently I asked deepseek about how to use javascript to extract and makes computations with moodle in education. I note that the program did not consider two crucial points: 1) It is of utmost importance that the answer and the grade of the answer should be in the same row. 2) Don't modify the answer of the student. In this case the answer are ten letters in response to a test with ten questions.

Deep Learning, Education, Software Development, Programming

9 points by hazymemory 97 days ago | 0 comments

The path to open-sourcing the DeepSeek inference engine (github.com/deepseek-ai)
A few weeks ago, during Open Source Week, we open-sourced several libraries. The response from the community has been incredibly positive - sparking inspiring collaborations, productive discussions, and valuable bug fixes. Encouraged by this, we’ve decided to take another step forward: contributing our internal inference engine back to the open-source community.

Open Source, Deep Learning, Inference Engine, Artificial Intelligence

550 points by Palmik 99 days ago | 63 comments

NoProp: Training neural networks without back-propagation or forward-propagation (arxiv.org)
The canonical deep learning approach for learning requires computing a gradient term at each layer by back-propagating the error signal from the output towards each learnable parameter.

Neural Networks, Machine Learning, Deep Learning, AI Research, Computer Science

161 points by belleville 100 days ago | 49 comments