Hacker News with Generative AI: Machine Learning

Radiation Tolerant Software Framework for Space Applications (github.com/r0nlt)
A C++ framework for implementing machine learning models that can operate reliably in radiation environments, such as space. This framework implements industry-standard radiation tolerance techniques validated against NASA and ESA reference models.
Why do LLMs have emergent properties? (johndcook.com)
Large language models display emergence behaviors: when the parameter count is scaled to a certain value, suddenly the LLM is capable of performing a new task not possible at a smaller size.
Block Diffusion: Interpolating Autoregressive and Diffusion Language Models (m-arriola.com)
Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models.
System lets robots identify an object's properties through handling (news.mit.edu)
With a novel simulation method, robots can guess the weight, softness, and other physical properties of an object just by picking it up.
Radiation-tolerant ML framework for space (github.com/r0nlt)
A C++ framework for implementing machine learning models that can operate reliably in radiation environments, such as space. This framework implements industry-standard radiation tolerance techniques validated against NASA and ESA reference models.
Jargonic Sets New SOTA for Japanese ASR (aiola.ai)
Automatic Speech Recognition (ASR) systems often excel in lab conditions but struggle in real-world enterprise environments—especially when it comes to linguistically complex languages like Japanese.
Alignment is not free: How model upgrades can silence your confidence signals (variance.co)
The post-training process for LLMs can bias behavior for language models when they encounter content that violates their safety post-training guidelines. As mentioned by OpenAI’s GPT-4 system card, model calibration rarely survives post-training, resulting in models that are extremely confident even when they’re wrong.¹ For our use case, we often see this behavior with the side effect of biasing language model outputs towards violations, which can result in wasted review times for human reviewers in an LLM-powered content moderation system.
Sutton and Barto book implementation (github.com/ivanbelenky)
This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning.
DoomArena: A Framework for Testing AI Agents Against Evolving Security Threats (arxiv.org)
We present DoomArena, a security evaluation framework for AI agents.
Show HN: Plexe – ML Models from a Prompt (github.com/plexe-ai)
plexe lets you create machine learning models by describing them in plain language. Simply explain what you want, and the AI-powered system builds a fully functional model through an automated agentic approach. Also available as a managed cloud service.
Show HN: OpenRouter Model Price Comparison (pages.dev)
Compare pricing across different AI models available on OpenRouter
How linear regression works intuitively and how it leads to gradient descent (briefer.cloud)
Learning, to a computer, is just turning bad guesses into better ones. In this post, we’ll see how that starts with a straight line: how linear regression makes the first guess, and gradient descent keeps improving it.
RUKA: Rethinking the Design of Humanoid Hands with Learning (ruka-hand.github.io)
This work presents RUKA, a tendon-driven humanoid hand that is compact, affordable, and capable. Made from 3D-printed parts and off-the-shelf components, RUKA has 5 fingers with 15 underactuated degrees of freedom enabling diverse human-like grasps. Its tendon-driven actuation allows powerful grasping in a compact, human-sized form factor. To address control challenges, we learn joint-to-actuator and fingertip-to-actuator models from motion-capture data collected by the MANUS glove, leveraging the hand's morphological accuracy.
Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs (arxiv.org)
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
Dummy's Guide to Modern LLM Sampling (rentry.co)
Large Language Models (LLMs) work by taking a piece of text (e.g. user prompt) and calculating the next word. In more technical terms, tokens. LLMs have a vocabulary, or a dictionary, of valid tokens, and will reference those in training and inference (the process of generating text). More on that below. You need to understand why we use tokens (sub-words) instead of words or letters first.
Physics of Language Models: Architecture Design and the Magic of Canon Layers (ssrn.com)
Understanding architectural differences between large language models (LLMs) remains challenging, particularly at academic-scale pretraining (e.g., 1.3B parameters on 100B tokens), where results are often dominated by noise and randomness.
TScale – Distributed training on consumer GPUs (github.com/Foreseerr)
This repo contains transformer train and inference code written in C++ and CUDA.
ProbOnto – The Ontology and Knowledge Base of Probability Distributions (sites.google.com)
Welcome to ProbOnto - the Ontology and Knowledge Base of Probability Distributions, v2.5
Low-Latency Bayesian Inference: Deploying Models with PyTorch and ONNX (world.hey.com)
Deploying Bayesian models in production often requires balancing predictive accuracy with low-latency inference.
Show HN: Kinematic Hand Skeleton Optimization in Jax (github.com/rerun-io)
A repo to explore training robots with Pi0 and Lerobot and human pose motion retargeting
Step-by-step reasoning verifiers that think (arxiv.org)
Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling.
Show HN: GPT-2 implemented using graphics shaders (github.com/nathan-barry)
A browser-based, WebGL2 implementation of GPT-2 with transform block and attention matrix visualization
Distributed Continuous GPU Profiling (zymtrace.com)
Identify performance bottlenecks in CUDA kernels, optimize inference batch size, and eliminate idle GPU cycles —with zero friction.
Towards the Cutest Neural Network (kevinlynagh.com)
I recently needed to use a microcontroller to estimate the pose (translation and orientation) of an object using readings from six different sensors.
Llasa: Llama-Based Speech Synthesis (llasatts.github.io)
Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time compute.
Show HN: Hyperparam: OSS tools for exploring datasets locally in the browser (hyperparam.app)
Hyperparam was founded to address a critical gap in the machine learning ecosystem: the lack of a user-friendly, scalable UI for exploring and curating massive datasets.
Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time (huggingface.co)
Show HN: mlop – open-source ML Experiment Tracking (github.com/mlop-ai)
m:lop is a fully open source dev tool for machine learning engineers that traces key metrics of model training, all the way down to the parameters and gradients level.
Show HN: Create your own finetuned AI model using Google Sheets (promptrepo.com)
Finetune AI using data in Google Sheets
OCaml's Wings for Machine Learning (github.com/raven-ml)
Raven is a comprehensive ecosystem of libraries, frameworks, and tools that brings machine learning and data science capabilities to OCaml.