
Reinforcement Learning Papers
Looking for a specific paper or subject?
Microsoft’s Reinforcement Pre-Training (RPT) – A New Direction in LLM Training?
In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning…
GRPO Reinforcement Learning Explained (DeepSeekMath Paper)
DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works…
DAPO: Enhancing GRPO For LLM Reinforcement Learning
Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method…
SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs
Dive into SWE-RL by Meta, a DeepSeek-R1 style recipe for training LLMs for software engineering with reinforcement learning…
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI?
Dive into the groundbreaking DeepSeek-R1 research paper, introduces open-source reasoning models that rivals the performance OpenAI’s o1!…
Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI
In this post we dive into a Stanford research presenting Generative Reward Models, a hybrid Human and AI RL to improve LLMs…