The Matrix Calculus You Need for Deep Learning(explained.ai) Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function.
Tied Crosscoders: Tracing How Chat LLM Behavior Emerges from Base Model(lesswrong.com) We are interested in model-diffing: finding what is new in the chat model when compared to the base model. One way of doing this is training a crosscoder, which would just mean training an SAE on the concatenation of the activations in a given layer of the base and chat model. When training this crosscoder, we find some latents whose decoder vector mostly helps reconstruct the base model activation and does not affect the reconstruction for the chat model activation.
CHM Releases AlexNet Source Code(computerhistory.org) In partnership with Google, CHM has released the source code to AlexNet, the neural network that in 2012 kick-started today’s prevailing approach to AI. It is available as open source here.
Deriving Muon(jeremybernste.in) We recently proposed Muon: a new neural net optimizer. Muon has garnered attention for its excellent practical performance: it was used to set NanoGPT speed records leading to interest from the big labs.
Get Started with Neural Rendering Using Nvidia RTX Kit (Vulkan)(nvidia.com) Neural rendering is the next era of computer graphics. By integrating neural networks into the rendering process, we can take dramatic leaps forward in performance, image quality, and interactivity to deliver new levels of immersion.
74 points by sean_pedersen 119 days ago | 4 comments
A Gentle Introduction to Graph Neural Networks (2021)(distill.pub) Neural networks have been adapted to leverage the structure and properties of graphs. We explore the components needed for building a graph neural network - and motivate the design choices behind them.
Neuroevolution of augmenting topologies (NEAT algorithm)(wikipedia.org) NeuroEvolution of Augmenting Topologies (NEAT) is a genetic algorithm (GA) for the generation of evolving artificial neural networks (a neuroevolution technique) developed by Kenneth Stanley and Risto Miikkulainen in 2002 while at The University of Texas at Austin.