Hacker News with Generative AI: Reinforcement Learning

Outcome-Based Reinforcement Learning to Predict the Future (arxiv.org)
Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting.

Reinforcement Learning, Artificial Intelligence, Forecasting, Machine Learning

99 points by bturtel 46 days ago | 15 comments

Reinforcement Learning for Symbolic Mathematics (arxiv.org)
Deep Symbolic Optimization (DSO) is a novel computational framework that enables symbolic optimization for scientific discovery, particularly in applications involving the search for intricate symbolic structures.

Reinforcement Learning, Symbolic Mathematics, Artificial Intelligence, Scientific Discovery

7 points by MarcoDewey 51 days ago | 0 comments

Improving Assembly Code Performance with LLMss via Reinforcement Learning (arxiv.org)
Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored.

Reinforcement Learning, Artificial Intelligence, Programming

13 points by badmonster 53 days ago | 2 comments

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data (arxiv.org)
To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data.

Reinforcement Learning, Artificial Intelligence, Machine Learning, Reasoning

88 points by leodriesch 62 days ago | 19 comments

Absolute Zero Reasoner (andrewzh112.github.io)
Current reasoning models trained with Reinforcement Learning with Verifiable Rewards (RLVR) often rely on manually curated datasets, raising scalability concerns and potentially limiting future AI growth beyond human-defined tasks.

Artificial Intelligence, Reinforcement Learning, Machine Learning

133 points by jonbaer 65 days ago | 24 comments

Sutton and Barto book implementation (github.com/ivanbelenky)
This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning.

Reinforcement Learning, Machine Learning, Open Source, Programming, Software

80 points by ivanbelenky 66 days ago | 10 comments

Show HN: ART – a new open-source RL framework for training agents (github.com/OpenPipe)
ART is an open-source reinforcement training library for improving LLM performance in agentic workflows.

Open Source, Reinforcement Learning, AI, Github

116 points by kcorbitt 73 days ago | 12 comments

The State of Reinforcement Learning for LLM Reasoning (sebastianraschka.com)
A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning.

Generative AI, Reinforcement Learning

8 points by yaiml 78 days ago | 0 comments

Does RL Incentivize Reasoning in LLMs Beyond the Base Model? (limit-of-rlvr.github.io)
Recent breakthroughs in reasoning-focused large language models (LLMs) like OpenAI-o1, DeepSeek-R1, and Kimi-1.5 have largely relied on Reinforcement Learning with Verifiable Rewards (RLVR), which replaces human annotations with automated rewards (e.g., verified math solutions or passing code tests) to scale self-improvement. While RLVR enhances reasoning behaviors such as self-reflection and iterative refinement, we challenge a core assumption:

Artificial Intelligence, Reinforcement Learning

84 points by leodriesch 81 days ago | 38 comments

The State of Reinforcement Learning for LLM Reasoning (sebastianraschka.com)
A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning.

Reinforcement Learning, Generative AI, AI Research

9 points by jonbaer 82 days ago | 0 comments

Welcome to the Era of Experience [pdf] (googleapis.com)

Artificial Intelligence, Reinforcement Learning, Machine Learning

115 points by Siah 83 days ago | 52 comments

Skywork-OR1: new SOTA 32B thinking model with open weight (github.com/SkyworkAI)
✊ Unleashing the Power of Reinforcement Learning for Math and Code Reasoners 🤖

Artificial Intelligence, Machine Learning, Open Source, Reinforcement Learning

145 points by naomiclarkson 90 days ago | 28 comments

DeepCoder: An Open-Source 14B Coder at O3-Mini Level (together.ai)
Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL.

Open Source, Code Generation, Reinforcement Learning, Machine Learning

31 points by tosh 94 days ago | 4 comments

Can reinforcement learning for LLMs scale beyond math and coding tasks? Probably (arxiv.org)
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are accessible for verification.

Reinforcement Learning, Artificial Intelligence, Computer Science

6 points by GabrielBianconi 95 days ago | 4 comments

DeepSeek: Inference-Time Scaling for Generalist Reward Modeling (arxiv.org)
Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale.

Reinforcement Learning, Artificial Intelligence, Machine Learning

163 points by tim_sw 99 days ago | 35 comments

Search-R1: Training LLMs to Reason and Leverage Search Engines with RL (arxiv.org)
Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs).

AI, Reasoning, Search Engines, Reinforcement Learning

101 points by jonbaer 100 days ago | 13 comments

Scaling Up Reinforcement Learning for Traffic Smoothing (bair.berkeley.edu)
We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.

Reinforcement Learning, Traffic Management, AI, Transportation, Research

71 points by saeedesmaili 101 days ago | 21 comments

Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning (ycombinator.com)
Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent.

Machine Learning, Reinforcement Learning, Startups, AI, SaaS

101 points by lmeierhoefer 103 days ago | 60 comments

I Built Faster Reinforcement Learning in C# Solo Than Teams Did with Python (rlmatrix.com)
The question comes relentlessly: “Why build reinforcement learning in C#?” Behind this query lies an unspoken assumption that serious machine learning happens exclusively in Python. This perspective reveals a fundamental disconnect between academic ML researchers with their sprawling Python scripts and those of us solving real industrial problems.

Reinforcement Learning, Programming Languages, C#, Python, Machine Learning

13 points by adriansieradzki 105 days ago | 2 comments

A (Long) Peek into Reinforcement Learning (lilianweng.github.io)
A couple of exciting news in Artificial Intelligence (AI) has just happened in recent years. AlphaGo defeated the best professional human player in the game of Go. Very soon the extended algorithm AlphaGo Zero beat AlphaGo by 100-0 without supervised learning on human knowledge. Top professional game players lost to the bot developed by OpenAI on DOTA2 1v1 competition. After knowing these, it is pretty hard not to be curious about the magic behind these algorithms — Reinforcement Learning (RL).

Artificial Intelligence, Machine Learning, Reinforcement Learning

160 points by Brysonbw 111 days ago | 16 comments

Understanding R1-Zero-Like Training: A Critical Perspective (github.com/sail-sg)
To understand R1-Zero-like training, we critically examine two core components: base models and reinforcement learning. We highlight our findings below.

Machine Learning, Reinforcement Learning, Research, Artificial Intelligence

160 points by pama 112 days ago | 21 comments

Hunyuan T1 Mamba Reasoning model beats R1 on speed and metrics (tencent.github.io)
Reinforcement learning has pioneered a new Scaling paradigm in the post-training phase of large language models, a breakthrough that is increasingly attracting attention from the industry.

Artificial Intelligence, Reinforcement Learning

26 points by vessenes 112 days ago | 2 comments

Legged Locomotion Meets Skateboarding (umich-curly.github.io)
This paper introduces Discrete-time Hybrid Automata Learning (DHAL), a framework using on-policy Reinforcement Learning to identify and execute mode-switching without trajectory segmentation or event function learning.

Robotics, Reinforcement Learning, Machine Learning

156 points by jam 113 days ago | 37 comments

Mathematical Foundations of Reinforcement Learning (github.com/MathFoundationRL)
This textbook has received 5,000+ stars! Glad that it is helpful to many readers.

Reinforcement Learning, Mathematics, Computer Science, Education

424 points by ibobev 123 days ago | 43 comments

Reinforcement Learning in less than 400 lines of C (github.com/antirez)
This code implements a neural network that learns to play tic-tac-toe using reinforcement learning, just playing against a random adversary, in under 400 lines of C code, without any external library used.

Reinforcement Learning, Programming, C Programming, Tic-Tac-Toe

10 points by antirez 124 days ago | 4 comments

Show HN: Llama-8B Teaches Itself Baby Steps to Deep Research Using RL (github.com/dCaples)
Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.

Reinforcement Learning, Research, Artificial Intelligence, Open Source

39 points by diegocaples 124 days ago | 3 comments

All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning (arxiv.org)
From a first-principles perspective, it may seem odd that the strongest results in foundation model fine-tuning (FT) are achieved via a relatively complex, two-stage training procedure.

Machine Learning, Reinforcement Learning, Fine-Tuning, Foundation Models

3 points by gkswamy98 126 days ago | 0 comments

Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” (openpipe.ai)
In this post we’ll discuss how we used Group Relative Policy Optimization (GRPO) to surpass R1, o1, o3-mini, and come within a couple percentage points of Sonnet 3.7 on a reasoning-heavy game called “temporal clue”, while being over 100x cheaper to run at inference time. We’ll include specific lessons learned about task design and hyperparameters we’ve found to work well. And finally, we share the training recipe we used to achieve these results, built on top of torchtune.

Artificial Intelligence, Reinforcement Learning, Game Development, Optimization

199 points by kcorbitt 127 days ago | 55 comments

QwQ-32B: Embracing the Power of Reinforcement Learning (qwenlm.github.io)
Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods.

Reinforcement Learning, Machine Learning, Generative AI

480 points by nwjsmith 128 days ago | 169 comments

RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning (2023) (kzakka.com)
We train anthropomorphic robot hands to play the piano using deep RL and release a simulated benchmark and dataset to advance high-dimensional control.

Robotics, Reinforcement Learning, Music, Artificial Intelligence, Machine Learning

142 points by bemmu 135 days ago | 53 comments