Real-Time Introspective Compression for Transformers
(github.com/Dicklesworthstone)
This article proposes a novel approach to address both problems simultaneously.
This article proposes a novel approach to address both problems simultaneously.
How Google built its Gemini robotics models
(google)
Powered by Gemini Robotics models, robots can learn complex actions like preparing salads, playing games like Tic-Tac-Toe and even folding an origami fox.
Powered by Gemini Robotics models, robots can learn complex actions like preparing salads, playing games like Tic-Tac-Toe and even folding an origami fox.
Kai Scheduler: Kubernetes Native scheduler for AI workloads at large scale
(github.com/NVIDIA)
KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.
KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.
Neuralatex: A machine learning library written in pure LATEX
(neuralatex.com)
Neuralatex is a scalar values-based auto-grad library similar to MicroGrad but written entirely in latex! As part of your latex document you can specify the architecture of a neural network and loss functions, how to generate or load training data, and specify training hyperparameters and experiments. When the document is compiled, the latex compiler will generate or load training data, train the network, run experiments and generate figures.
Neuralatex is a scalar values-based auto-grad library similar to MicroGrad but written entirely in latex! As part of your latex document you can specify the architecture of a neural network and loss functions, how to generate or load training data, and specify training hyperparameters and experiments. When the document is compiled, the latex compiler will generate or load training data, train the network, run experiments and generate figures.
Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs
(recursal.ai)
We are proud to announce the updated Qwerky-72B and 32B.
We are proud to announce the updated Qwerky-72B and 32B.
Foundation Model for Personalized Recommendation
(netflixtechblog.com)
Netflix’s personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including “Continue Watching” and “Today’s Top Picks for You.” (Refer to our recent overview for more details). However, as we expanded our set of personalization algorithms to meet increasing business needs, maintenance of the recommender system became quite costly.
Netflix’s personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including “Continue Watching” and “Today’s Top Picks for You.” (Refer to our recent overview for more details). However, as we expanded our set of personalization algorithms to meet increasing business needs, maintenance of the recommender system became quite costly.
Jargonic: Industry-Tunable ASR Model
(aiola.ai)
Automatic Speech Recognition (ASR) has made significant strides over the last decade, but most ASR models on the market offer general-purpose transcription. They perform well in clean, controlled environments but break down when handling:
Automatic Speech Recognition (ASR) has made significant strides over the last decade, but most ASR models on the market offer general-purpose transcription. They perform well in clean, controlled environments but break down when handling:
Show HN: Neuronpedia, an open source platform for AI interpretability
(neuronpedia.org)
Neuronpedia is an open source interpretability platform.
Neuronpedia is an open source interpretability platform.
Aim: Supercharged open-source experiment tracker
(github.com/aimhubio)
Aim logs your training runs and any AI Metadata, enables a beautiful UI to compare, observe them and an API to query them programmatically.
Aim logs your training runs and any AI Metadata, enables a beautiful UI to compare, observe them and an API to query them programmatically.
Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning
(ycombinator.com)
Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent.
Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent.
RLHF Is Cr*P, It's a Paint Job on a Rusty Car: Geoffrey Hinton
(officechai.com)
RLHF, or Reinforcement Learning from Human Feedback, is behind some of the recent advances in AI, but one of the pioneers of the field doesn’t think highly of it.
RLHF, or Reinforcement Learning from Human Feedback, is behind some of the recent advances in AI, but one of the pioneers of the field doesn’t think highly of it.
Apple's Cubify Anything: Scaling Indoor 3D Object Detection
(github.com/apple)
This repository includes the public implementation of Cubify Transformer and the associated CA-1M dataset.
This repository includes the public implementation of Cubify Transformer and the associated CA-1M dataset.
Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic)
(transformer-circuits.pub)
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
Unsupervised Learning of Browser Agents via Environment Interaction in the Wild
(arxiv.org)
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agents.
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agents.
Matrix Calculus (For Machine Learning and Beyond)
(arxiv.org)
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions.
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions.
The Matrix Calculus You Need for Deep Learning
(explained.ai)
Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function.
Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function.
I Built Faster Reinforcement Learning in C# Solo Than Teams Did with Python
(rlmatrix.com)
The question comes relentlessly: “Why build reinforcement learning in C#?” Behind this query lies an unspoken assumption that serious machine learning happens exclusively in Python. This perspective reveals a fundamental disconnect between academic ML researchers with their sprawling Python scripts and those of us solving real industrial problems.
The question comes relentlessly: “Why build reinforcement learning in C#?” Behind this query lies an unspoken assumption that serious machine learning happens exclusively in Python. This perspective reveals a fundamental disconnect between academic ML researchers with their sprawling Python scripts and those of us solving real industrial problems.
Physics-Based Deep Learning v4
(arxiv.org)
This document is a hands-on, comprehensive guide to deep learning in the realm of physical simulations.
This document is a hands-on, comprehensive guide to deep learning in the realm of physical simulations.
FFN Fusion: Rethinking Sequential Computation in Large Language Models
(arxiv.org)
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization.
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization.
Show HN: Xorq – open-source Python-first Pandas-style pipelines
(github.com/xorq-labs)
xorq is a deferred computational framework that brings the replicability and performance of declarative pipelines to the Python ML ecosystem.
xorq is a deferred computational framework that brings the replicability and performance of declarative pipelines to the Python ML ecosystem.
Circuit Tracing: Revealing Computational Graphs in Language Models
(transformer-circuits.pub)
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
Matrix Profiles
(aneksteind.github.io)
Lately I’ve been thinking about time series analysis to aid in Reflect’s insights features. Towards this end, I’ve had a Hacker News thread about anomaly detection bookmarked in Later. I finally got to looking at it and there was a comment that mentioned the article left out matrix profiles, which I had never heard of, so I decided to look into them.
Lately I’ve been thinking about time series analysis to aid in Reflect’s insights features. Towards this end, I’ve had a Hacker News thread about anomaly detection bookmarked in Later. I finally got to looking at it and there was a comment that mentioned the article left out matrix profiles, which I had never heard of, so I decided to look into them.
Low responsiveness of ML models to critical or deteriorating health conditions
(nature.com)
Machine learning (ML) based mortality prediction models can be immensely useful in intensive care units.
Machine learning (ML) based mortality prediction models can be immensely useful in intensive care units.
Optimizing ML training with metagradient descent
(arxiv.org)
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space.
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space.
Ask HN: If you're an organization doing AI/ML what are you using?
(ycombinator.com)
Really a question on vendors and services less on techniques or research.
Really a question on vendors and services less on techniques or research.
RNA function follows form – why is it so hard to predict?
(nature.com)
AlphaFold’s highly accurate structural models transformed protein biology,but RNA lags behind.
AlphaFold’s highly accurate structural models transformed protein biology,but RNA lags behind.
New DeepSeek V3 0324 with MIT license
(huggingface.co)
DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects.
DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects.