Hacker News with Generative AI: Transformers

SUS backprop: linear backpropagation algorithm for long inputs in transformers (arxiv.org)
It is straightforward to design an unbiased gradient estimator that stochastically cuts the backpropagation flow through any part of a computational graph.

Artificial Intelligence, Machine Learning, Transformers, Deep Learning

9 points by brandonb 201 days ago | 0 comments

You could have designed state of the art positional encoding (huggingface.co)
This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models.

Transformers, Machine Learning, Deep Learning

5 points by FL33TW00D 203 days ago | 0 comments

Vision Transformers Need Registers (arxiv.org)
Transformers have recently emerged as a powerful tool for learning visual representations.

Computer Vision, Machine Learning, Artificial Intelligence, Transformers

94 points by felineflock 224 days ago | 9 comments

Three things everyone should know about Vision Transformers (arxiv.org)
After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and video analysis.

Computer Vision, Transformers, Artificial Intelligence, Deep Learning

71 points by reqo 228 days ago | 17 comments

Sparsely-Gated Mixture of Experts (MoE) (thegreenplace.net)
In transformer models, the attention block is typically followed by a feed forward layer (FF), which is a simple fully-connected NN with a hidden layer and nonlinearity.

Transformers, Deep Learning, Machine Learning, Computer Science

15 points by mfrw 233 days ago | 0 comments

The Solid-State Shift: Reinventing the Transformer for Modern Grids (powermag.com)
Transformers have been the backbone of power grids for over a century, but today’s demands for renewable energy, electric vehicles, and smarter grids are exposing their limits.

Energy, Grids, Transformers, Renewable Energy, Electric Vehicles

58 points by JumpCrisscross 240 days ago | 24 comments

Former Tesla exec Drew Baglino's new startup is rethinking the transformer (techcrunch.com)
Former Tesla executive Drew Baglino has a new startup developing solid-state transformers for the electric grid, Axios reported.

Tesla, Startups, Electric Grid, Renewable Energy, Transformers

10 points by plun9 245 days ago | 2 comments

Real-Time Introspective Compression for Transformers (github.com/Dicklesworthstone)
This article proposes a novel approach to address both problems simultaneously.

Transformers, Compression, Artificial Intelligence, Machine Learning

14 points by eigenvalue 250 days ago | 11 comments

How DeepSeek Rewrote the Transformer [video] (youtube.com)

Transformers, Deep Learning, Computer Vision, Artificial Intelligence, Video

5 points by chetangole 255 days ago | 0 comments

VGGT: Visual Geometry Grounded Transformer (github.com/facebookresearch)

Computer Vision, Transformers, Artificial Intelligence, Deep Learning, Facebook Research

190 points by xnx 258 days ago | 42 comments

Transformers as Support Vector Machines (2023) (arxiv.org)
Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP.

Transformers, Machine Learning, Computer Science, Artificial Intelligence

3 points by TaurenHunter 266 days ago | 0 comments

Transformers Without Normalization (jiachenzhu.github.io)
This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique.

Transformers, Deep Learning, Machine Learning

260 points by hellollm 269 days ago | 32 comments

Understanding Transformers (beyond the Math) – kalomaze's kalomazing blog (bearblog.dev)
Maybe you don't want to attempt the conventional approaches for understanding the transformer architecture for language models. If you're anything like me, an informal approach is what you'd prefer - one that helps you reason about what's happening with these models in the abstract, without requiring mastery on the technical level to begin with.

Machine Learning, Deep Learning, Language Models, Transformers

8 points by bilsbie 274 days ago | 2 comments

RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals (arxiv.org)
Transformers have achieved great success in effectively processing sequential data such as text.

Transformers, Machine Learning, Deep Learning, Computer Science

3 points by PaulHoule 276 days ago | 0 comments

Some thoughts on autoregressive models (wonderfall.dev)
Most generative AI models nowadays are autoregressive. That means they’re following the concept of next token prediction, and the transformer architecture is the current implementation that has been used for years now thanks to its computational efficiency.

Generative AI, Artificial Intelligence, Machine Learning, Transformers

79 points by Wonderfall 280 days ago | 58 comments

Go-attention: A full attention mechanism and transformer in pure Go (github.com/takara-ai)
From the Frontier Research Team at takara.ai we present the first pure Go implementation of attention mechanisms and transformer layers, designed for high performance and ease of use.

Go, Attention Mechanism, Transformers, Machine Learning

168 points by PaulHoule 280 days ago | 85 comments

Show HN: A Transformer model that preserves logical equivalence (huggingface.co)
Running

Transformers, Artificial Intelligence

9 points by snowkylin 281 days ago | 0 comments

Goku Flow Based Video Generative Foundation Models (github.com/Saiyan-World)
Goku is a new family of joint image-and-video generation models based on rectified flow Transformers.

Generative AI, Video Generation, Deep Learning, Transformers

34 points by lastdong 300 days ago | 11 comments

How has DeepSeek improved the Transformer architecture? (epoch.ai)
DeepSeek has recently released DeepSeek v3, which is currently state-of-the-art in benchmark performance among open-weight models, alongside a technical report describing in some detail the training of the model. Impressively, they’ve achieved this SOTA performance by only using 2.8 million H800 hours of training hardware time—equivalent to about 4e24 FLOP if we assume 40% MFU. This is about ten times less training compute than the similarly performing Llama 3.1 405B.

Transformers, Generative AI, Machine Learning

258 points by superasn 314 days ago | 68 comments

RWKV Language Model (rwkv.com)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose".

RNNs, GPT, Transformers

183 points by simonpure 343 days ago | 52 comments

A transformer supply crisis bottlenecks energy projects (ieee.org)
A transformer supply crisis bottlenecks energy projects

Energy, Supply Chain, Infrastructure, Technology, Transformers

115 points by TaurenHunter 361 days ago | 108 comments

Laser: Attention with Exponential Transformation (arxiv.org)
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention.

Transformers, Machine Learning, Attention Mechanisms, Deep Learning

13 points by PaulHoule 375 days ago | 0 comments

You could have designed state of the art positional encoding (fleetwood.dev)
This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models. We will achieve this by iteratively improving our approach to encoding position, arriving at Rotary Postional Encoding (RoPE) used in the latest LLama 3.2 release and most modern transformers. This post intends to limit the mathematical knowledge required to follow along, but some basic linear algebra, trigonometry and understanding of self attention is expected.

Transformers, Machine Learning, Computer Science, Deep Learning

216 points by Philpax 386 days ago | 46 comments

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization (rccchoudhury.github.io)
We present Run-Length Tokenization (RLT), a simple and efficient approach to speed up video transformers by removing redundant tokens from the input.

Computer Vision, Machine Learning, Video Processing, Deep Learning, Transformers

75 points by jasondavies 388 days ago | 16 comments

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (arxiv.org)

Deep Learning, Machine Learning, Artificial Intelligence, Transformers

174 points by og_kalu 402 days ago | 33 comments

Oasis: A Universe in a Transformer (decart.ai)
We're excited to announce Oasis, the first playable, realtime, open-world AI model — it's an interactive video game, but generated end-to-end by a transformer on a frame-by-frame basis.

Artificial Intelligence, Video Games, Transformers

11 points by EvgeniyZh 403 days ago | 1 comments

Transformers Utilization in Chart Understanding: A Review of Advances and Future (arxiv.org)
In recent years, interest in vision-language tasks has grown, especially those involving chart interactions.

Transformers, Machine Learning, Computer Vision, Data Visualization

39 points by sandwichsphinx 413 days ago | 2 comments

New Transformer architecture modifications from Nvidia researchers (twitter.com)

Transformers, Artificial Intelligence, Computer Science, Research, Nvidia

4 points by amichail 415 days ago | 0 comments

Trap – Transformers in APL (github.com/BobMcDear)
trap is an implementation of autoregressive transformers - namely, GPT2 - in APL. In addition to containing the complete definition of GPT, it also supports backpropagation and training with Adam, achieving parity with the PyTorch reference code.

Transformers, APL, Language Models, Machine Learning, Open Source

95 points by tlack 427 days ago | 30 comments

Transformers 2.0: What Ilya and Sam Might Have Missed (ycombinator.com)
Transformers 2.0: What Ilya and Sam Might Have Missed

Transformers, Artificial Intelligence

5 points by metehan777 429 days ago | 1 comments