Reinforcement Learning Papers

Looking for a specific paper or subject?


HICRA

Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning

Discover how reinforcement learning enables hierarchical reasoning in LLMs and how HICRA improves on top of GRPO…
RPT

Microsoft’s Reinforcement Pre-Training (RPT) – A New Direction in LLM Training?

In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning…
DeepSeekMath

GRPO Reinforcement Learning Explained (DeepSeekMath Paper)

DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works…
DAPO teaser

DAPO: Enhancing GRPO For LLM Reinforcement Learning

Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method…
SWE-RL Teaser

SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs

Dive into SWE-RL by Meta, a DeepSeek-R1 style recipe for training LLMs for software engineering with reinforcement learning…
DeepSeek-R1 teaser

DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI?

Dive into the groundbreaking DeepSeek-R1 research paper, introduces open-source reasoning models that rivals the performance OpenAI’s o1!…
GenRM preview image

Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI

In this post we dive into a Stanford research presenting Generative Reward Models, a hybrid Human and AI RL to improve LLMs…
Scroll to Top