Hacker News with Generative AI: Machine Learning

Neuralatex: A machine learning library written in pure LATEX (neuralatex.com)
Neuralatex is a scalar values-based auto-grad library similar to MicroGrad but written entirely in latex! As part of your latex document you can specify the architecture of a neural network and loss functions, how to generate or load training data, and specify training hyperparameters and experiments. When the document is compiled, the latex compiler will generate or load training data, train the network, run experiments and generate figures.
Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs (recursal.ai)
We are proud to announce the updated Qwerky-72B and 32B.
Foundation Model for Personalized Recommendation (netflixtechblog.com)
Netflix’s personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including “Continue Watching” and “Today’s Top Picks for You.” (Refer to our recent overview for more details). However, as we expanded our set of personalization algorithms to meet increasing business needs, maintenance of the recommender system became quite costly.
Jargonic: Industry-Tunable ASR Model (aiola.ai)
Automatic Speech Recognition (ASR) has made significant strides over the last decade, but most ASR models on the market offer general-purpose transcription. They perform well in clean, controlled environments but break down when handling:
Show HN: Neuronpedia, an open source platform for AI interpretability (neuronpedia.org)
Neuronpedia is an open source interpretability platform.
Aim: Supercharged open-source experiment tracker (github.com/aimhubio)
Aim logs your training runs and any AI Metadata, enables a beautiful UI to compare, observe them and an API to query them programmatically.
Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning (ycombinator.com)
Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent.
RLHF Is Cr*P, It's a Paint Job on a Rusty Car: Geoffrey Hinton (officechai.com)
RLHF, or Reinforcement Learning from Human Feedback, is behind some of the recent advances in AI, but one of the pioneers of the field doesn’t think highly of it.
Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic) (transformer-circuits.pub)
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
Unsupervised Learning of Browser Agents via Environment Interaction in the Wild (arxiv.org)
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agents.
Matrix Calculus (For Machine Learning and Beyond) (arxiv.org)
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions.
The Matrix Calculus You Need for Deep Learning (explained.ai)
Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function.
I Built Faster Reinforcement Learning in C# Solo Than Teams Did with Python (rlmatrix.com)
The question comes relentlessly: “Why build reinforcement learning in C#?” Behind this query lies an unspoken assumption that serious machine learning happens exclusively in Python. This perspective reveals a fundamental disconnect between academic ML researchers with their sprawling Python scripts and those of us solving real industrial problems.
Physics-Based Deep Learning v4 (arxiv.org)
This document is a hands-on, comprehensive guide to deep learning in the realm of physical simulations.
FFN Fusion: Rethinking Sequential Computation in Large Language Models (arxiv.org)
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization.
Learning Theory from First Principles [pdf] (di.ens.fr)
Show HN: Xorq – open-source Python-first Pandas-style pipelines (github.com/xorq-labs)
xorq is a deferred computational framework that brings the replicability and performance of declarative pipelines to the Python ML ecosystem.
Circuit Tracing: Revealing Computational Graphs in Language Models (transformer-circuits.pub)
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
Matrix Profiles (aneksteind.github.io)
Lately I’ve been thinking about time series analysis to aid in Reflect’s insights features. Towards this end, I’ve had a Hacker News thread about anomaly detection bookmarked in Later. I finally got to looking at it and there was a comment that mentioned the article left out matrix profiles, which I had never heard of, so I decided to look into them.
Tao: Using test-time compute to train efficient LLMs without labeled data (databricks.com)
Low responsiveness of ML models to critical or deteriorating health conditions (nature.com)
Machine learning (ML) based mortality prediction models can be immensely useful in intensive care units.
Optimizing ML training with metagradient descent (arxiv.org)
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space.
Ask HN: If you're an organization doing AI/ML what are you using? (ycombinator.com)
Really a question on vendors and services less on techniques or research.
RNA function follows form – why is it so hard to predict? (nature.com)
AlphaFold’s highly accurate structural models transformed protein biology,but RNA lags behind.
New DeepSeek V3 0324 with MIT license (huggingface.co)
DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects.
The Original 2012 AlexNet Is Open Source Now (github.com/computerhistory)
This package contains the original 2012 AlexNet code.
Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI" [video] (youtube.com)
Every Flop Counts: Scaling a 300B LLM Without Premium GPUs (arxiv.org)
In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems.
Show HN: Formal Verification for Machine Learning Models Using Lean 4 (github.com/fraware)
Welcome to the Formal Verification of Machine Learning Models in Lean project. This repository provides a framework for specifying and proving properties—such as robustness, fairness, and interpretability—of machine learning models using Lean 4.
Tied Crosscoders: Tracing How Chat LLM Behavior Emerges from Base Model (lesswrong.com)
We are interested in model-diffing: finding what is new in the chat model when compared to the base model. One way of doing this is training a crosscoder, which would just mean training an SAE on the concatenation of the activations in a given layer of the base and chat model. When training this crosscoder, we find some latents whose decoder vector mostly helps reconstruct the base model activation and does not affect the reconstruction for the chat model activation.