
Reinforcement Learning Papers
Looking for a specific paper or subject?
Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning
Discover how reinforcement learning enables hierarchical reasoning in LLMs and how HICRA improves on top of GRPO…
Microsoft’s Reinforcement Pre-Training (RPT) – A New Direction in LLM Training?
In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning…
GRPO Reinforcement Learning Explained (DeepSeekMath Paper)
DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works…
DAPO: Enhancing GRPO For LLM Reinforcement Learning
Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method…
SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs
Dive into SWE-RL by Meta, a DeepSeek-R1 style recipe for training LLMs for software engineering with reinforcement learning…
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI?
Dive into the groundbreaking DeepSeek-R1 research paper, introduces open-source reasoning models that rivals the performance OpenAI’s o1!…
Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI
In this post we dive into a Stanford research presenting Generative Reward Models, a hybrid Human and AI RL to improve LLMs…





