Hacker News with Generative AI: Transformers

Real-Time Introspective Compression for Transformers (github.com/Dicklesworthstone)
This article proposes a novel approach to address both problems simultaneously.
How DeepSeek Rewrote the Transformer [video] (youtube.com)
VGGT: Visual Geometry Grounded Transformer (github.com/facebookresearch)
Transformers as Support Vector Machines (2023) (arxiv.org)
Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP.
Transformers Without Normalization (jiachenzhu.github.io)
This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique.
Understanding Transformers (beyond the Math) – kalomaze's kalomazing blog (bearblog.dev)
Maybe you don't want to attempt the conventional approaches for understanding the transformer architecture for language models. If you're anything like me, an informal approach is what you'd prefer - one that helps you reason about what's happening with these models in the abstract, without requiring mastery on the technical level to begin with.
RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals (arxiv.org)
Transformers have achieved great success in effectively processing sequential data such as text.
Some thoughts on autoregressive models (wonderfall.dev)
Most generative AI models nowadays are autoregressive. That means they’re following the concept of next token prediction, and the transformer architecture is the current implementation that has been used for years now thanks to its computational efficiency.
Go-attention: A full attention mechanism and transformer in pure Go (github.com/takara-ai)
From the Frontier Research Team at takara.ai we present the first pure Go implementation of attention mechanisms and transformer layers, designed for high performance and ease of use.
Show HN: A Transformer model that preserves logical equivalence (huggingface.co)
Running
Goku Flow Based Video Generative Foundation Models (github.com/Saiyan-World)
Goku is a new family of joint image-and-video generation models based on rectified flow Transformers.
How has DeepSeek improved the Transformer architecture? (epoch.ai)
DeepSeek has recently released DeepSeek v3, which is currently state-of-the-art in benchmark performance among open-weight models, alongside a technical report describing in some detail the training of the model. Impressively, they’ve achieved this SOTA performance by only using 2.8 million H800 hours of training hardware time—equivalent to about 4e24 FLOP if we assume 40% MFU. This is about ten times less training compute than the similarly performing Llama 3.1 405B.
RWKV Language Model (rwkv.com)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose".
A transformer supply crisis bottlenecks energy projects (ieee.org)
A transformer supply crisis bottlenecks energy projects
Laser: Attention with Exponential Transformation (arxiv.org)
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention.
You could have designed state of the art positional encoding (fleetwood.dev)
This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models. We will achieve this by iteratively improving our approach to encoding position, arriving at Rotary Postional Encoding (RoPE) used in the latest LLama 3.2 release and most modern transformers. This post intends to limit the mathematical knowledge required to follow along, but some basic linear algebra, trigonometry and understanding of self attention is expected.
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization (rccchoudhury.github.io)
We present Run-Length Tokenization (RLT), a simple and efficient approach to speed up video transformers by removing redundant tokens from the input.
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (arxiv.org)
Oasis: A Universe in a Transformer (decart.ai)
We're excited to announce Oasis, the first playable, realtime, open-world AI model — it's an interactive video game, but generated end-to-end by a transformer on a frame-by-frame basis.
Transformers Utilization in Chart Understanding: A Review of Advances and Future (arxiv.org)
In recent years, interest in vision-language tasks has grown, especially those involving chart interactions.
New Transformer architecture modifications from Nvidia researchers (twitter.com)
Trap – Transformers in APL (github.com/BobMcDear)
trap is an implementation of autoregressive transformers - namely, GPT2 - in APL. In addition to containing the complete definition of GPT, it also supports backpropagation and training with Adam, achieving parity with the PyTorch reference code.
Transformers 2.0: What Ilya and Sam Might Have Missed (ycombinator.com)
Transformers 2.0: What Ilya and Sam Might Have Missed
Transformers for Ruby (github.com/ankane)
Transformers in music recommendation (research.google)
Synthesizing Abstract Transformers for Reduced-Product Domains (arxiv.org)
Transformer Explainer (poloclub.github.io)
Transformer Explainer: An Interactive Explainer of the Transformer Architecture (poloclub.github.io)
1.5M-Pound Trailers Haul Transformers to New Wyoming Wind Substation (cowboystatedaily.com)
The Engineer’s Guide to Deep Learning: Understanding the Transformer Model (interdb.jp)