Hacker News with Generative AI: Machine Learning

FlowTSE: Target Speaker Extraction with Flow Matching (arxiv.org)
Target speaker extraction (TSE) aims to isolate a specific speaker's speech from a mixture using speaker enrollment as a reference.
Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning (ycombinator.com)
I built AutoThink, a technique that makes local LLMs reason more efficiently by adaptively allocating computational resources based on query complexity.
OpenTPU: Open-Source Reimplementation of Google Tensor Processing Unit (TPU) (github.com/UCSBarchlab)
OpenTPU is an open-source re-implementation of Google's Tensor Processing Unit (TPU) by the UC Santa Barbara ArchLab.
Show HN: Maestro – A Framework to Orchestrate and Ground Competing AI Models (ycombinator.com)
Show HN: Free mammogram analysis tool combining deep learning and vision LLM (neuralrad.com:5300)
Show HN: Meteosource – Hyper-local weather API based on improved ML models (meteosource.com)
At an affordable price, you will receive accurate and reliable data that you can easily implement into your website or application. We also help you optimise weather-dependent activities.
Outcome-Based Reinforcement Learning to Predict the Future (arxiv.org)
Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting.
Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces (arxiv.org)
Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens.
Some signs of AI model collapse begin to reveal themselves (theregister.com)
Prediction: General-purpose AI could start getting worse
Scaling RNNs to Billions of Parameters with Zero Order (arxiv.org)
During inference, Recurrent Neural Networks (RNNs) scale constant in both FLOPs and GPU memory with increasing context length, as they compress all prior tokens into a fixed-size memory.
You could have invented Transformers (gwern.net)
‘You Could Have Invented Transformers’ tutorial proposal
Gemma 3n Architectural Innovations – Speculation and poking around in the model (reddit.com)
Gemma 3n is a new member of the Gemma family with free weights that was released during Google I/O. It's dedicated to on-device (edge) inference and supports image and text input, with audio input. Google has released an app that can be used for inference on the phone.
Direct Preference Optimization vs. RLHF (together.ai)
We're excited to announce that the Together Fine-Tuning Platform now supports Direct Preference Optimization (DPO)! This technique allows developers to align language models with human preferences creating more helpful, accurate, and tailored AI assistants. In this deep-dive blogpost, we provide details of what DPO is, how it works, when to use it and code examples. If you'd like to jump straight into code have a look at our code notebook.
Neural Thermodynamic Laws for Large Language Model Training (arxiv.org)
Beyond neural scaling laws, little is known about the laws underlying large language models (LLMs). We introduce Neural Thermodynamic Laws (NTL) -- a new framework that offers fresh insights into LLM training dynamics.
Show HN: I made a OSS alternative to Weights and Biases (github.com/mlop-ai)
mlop is a Machine Learning Operations (MLOps) framework. It provides self-hostable superior experimental tracking capabilities and lifecycle management for training ML models. To get started, try out our introductory notebook or get an account with us today!
Ask HN: AI Reading List (ycombinator.com)
In the thread about John Carmack presentation, somebody mentioned the reading list he got from Ilya which were crucial to understand what matters and the current state of the knowledge (at the time).<p>After some googling, it seems like this list is plausible, although not confirmed: https://github.com/dzyim/ilya-sutskever-recommended-reading?tab=readme-ov-file<p>What would an actualized list look today ?
Model-Based Machine Learning (2023) (mbmlbook.com)
Attention Wasn't All We Needed (stephendiehl.com)
There's a lot of modern transformer techniques that have been developed since the original Attention Is All You Need paper. Let's look at some of the most important ones that have been developed over the years and try to implement the basic ideas as succintly as possible. We'll use the Pytorch framework for most of the examples.
You Don't Need Re-Ranking: Understanding the Superlinked Vector Layer (superlinked.com)
When it comes to vector search, it's not just about matching words. Understanding the meaning behind them is equally important. But there are challenges. Sometimes, factors like text meaning, popularity, and recency can lead to results that aren't quite right. This is because vector search isn't always perfect at making precise matches.
KumoRFM: A Foundation Model for In-Context Learning on Relational Data (kumo.ai)
Foundation Models (FMs) have completely taken over unstructured data domains like natural language and images, delivering significant advances in performance across tasks with little to no task-specific training. Yet structured and semi-structured relational data, which represent some of the most valuable information assets, largely miss out on this AI wave.
The Annotated Kolmogorov-Arnold Network (Kan) (alexzhang13.github.io)
Deep neural networks have been the driving force of developments in AI in the last decade. However, they currently suffer from several known issues such as a lack of interpretability, scaling issues, and data inefficiency – in other words, while they are powerful, they are not a perfect solution.
Datadog opens sources a SOTA time series model and 350M point benchmark (datadoghq.com)
We are excited to announce a new open-weights release of Toto, our state-of-the-art time series foundation model (TSFM), and BOOM, a new public observability benchmark that contains 350 million observations across 2,807 real-world time series.
SUS backprop: linear backpropagation algorithm for long inputs in transformers (arxiv.org)
It is straightforward to design an unbiased gradient estimator that stochastically cuts the backpropagation flow through any part of a computational graph.
µPC: Scaling Predictive Coding to 100 Layer Networks (arxiv.org)
The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs) has recently been posed as a challenge for the community (Pinchetti et al., 2024).
Harnessing the Universal Geometry of Embeddings (arxiv.org)
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches.
An upgraded dev experience in Google AI Studio (googleblog.com)
Google AI Studio is the fastest place to start building with the Gemini API, with access to our most capable models, including Gemini 2.5 preview models, and generative media models like Imagen, Lyria RealTime, and Veo. At Google I/O, we announced new features to help you build and deploy complete applications, new model capabilities, and new features in the Google Gen AI SDK.
Depth Anything V2 (depth-anything-v2.github.io)
Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features: more fine-grained details than Depth Anything V1, more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard), more efficient (10x faster) and more lightweight than SD-based models, impressive fine-tuned performance with our pre-trained models. We also release six metric depth models of three scales for indoor and outdoor scenes, respectively.
PlainsightAI Releases OpenFilter: Framework For Universal Vision Workloads (github.com/PlainsightAI)
OpenFilter is an universal abstraction for building and running vision workloads in modular image/video processing pipelines.
Show HN: KVoiceWalk – Voice cloning for Kokoro TTS using random walk algorithms (github.com/RobViren)
KVoiceWalk tries to create new Kokoro voice style tensors that clones target voices by using a random walk algorithm and a hybrid scoring method that combines Resemblyzer similarity, feature extraction, and self similarity.
Show HN: AI Baby Monitor – local Video-LLM that beeps when safety rules break (github.com/zeenolife)
Your second pair of eyes, powered by local video LLMs. Because, you know... it does take a village.