Hacker News with Generative AI: Transformers

RWKV Language Model (rwkv.com)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose".
A transformer supply crisis bottlenecks energy projects (ieee.org)
A transformer supply crisis bottlenecks energy projects
Laser: Attention with Exponential Transformation (arxiv.org)
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention.
You could have designed state of the art positional encoding (fleetwood.dev)
This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models. We will achieve this by iteratively improving our approach to encoding position, arriving at Rotary Postional Encoding (RoPE) used in the latest LLama 3.2 release and most modern transformers. This post intends to limit the mathematical knowledge required to follow along, but some basic linear algebra, trigonometry and understanding of self attention is expected.
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization (rccchoudhury.github.io)
We present Run-Length Tokenization (RLT), a simple and efficient approach to speed up video transformers by removing redundant tokens from the input.
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (arxiv.org)
Oasis: A Universe in a Transformer (decart.ai)
We're excited to announce Oasis, the first playable, realtime, open-world AI model — it's an interactive video game, but generated end-to-end by a transformer on a frame-by-frame basis.
Transformers Utilization in Chart Understanding: A Review of Advances and Future (arxiv.org)
In recent years, interest in vision-language tasks has grown, especially those involving chart interactions.
New Transformer architecture modifications from Nvidia researchers (twitter.com)
Trap – Transformers in APL (github.com/BobMcDear)
trap is an implementation of autoregressive transformers - namely, GPT2 - in APL. In addition to containing the complete definition of GPT, it also supports backpropagation and training with Adam, achieving parity with the PyTorch reference code.
Transformers 2.0: What Ilya and Sam Might Have Missed (ycombinator.com)
Transformers 2.0: What Ilya and Sam Might Have Missed
Transformers for Ruby (github.com/ankane)
Transformers in music recommendation (research.google)
Synthesizing Abstract Transformers for Reduced-Product Domains (arxiv.org)
Transformer Explainer (poloclub.github.io)
Transformer Explainer: An Interactive Explainer of the Transformer Architecture (poloclub.github.io)
1.5M-Pound Trailers Haul Transformers to New Wyoming Wind Substation (cowboystatedaily.com)
The Engineer’s Guide to Deep Learning: Understanding the Transformer Model (interdb.jp)
Transformer Layers as Painters (arxiv.org)
Exploring the Limits of Transfer Learning with a Unified Transformer (2019) (arxiv.org)
Training a time series model using transformers at Datadog (arxiv.org)
The Illustrated Transformer (2018) (jalammar.github.io)
Sohu – first specialized chip (ASIC) for transformer models (twitter.com)
Sohu: The First Transformer ASIC (etched.com)
Shape Rotation 101: An Intro to Einsum and Jax Transformers (bearblog.dev)
Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability (neuralblog.github.io)
NanoDO: A minimal Transformer decoder-only language model implementation (github.com/google-deepmind)
Transformers Represent Belief State Geometry in Their Residual Stream (lesswrong.com)
Transformers Can Do Arithmetic with the Right Embeddings (arxiv.org)
Grokked Transformers Are Implicit Reasoners (arxiv.org)