NLP Archives - AI Papers Academy

GDPO Explained: How NVIDIA Fixes GRPO for Multi-Reward LLM Reinforcement Learning

GDPO is NVIDIA’s solution to GRPO’s limitations in multi-reward RL for large language models. We break down the paper in this post.

Manifold-Constrained Hyper-Connections (mHC) explained: How DeepSeek rewires residual connections in LLMs for next-gen AI

Discover how reinforcement learning enables hierarchical reasoning in LLMs and how HICRA improves on top of GRPO.

In this post, we break down the TRM paper, a simpler version of the HRM, that beats HRM and top reasoning LLMs with a tiny 7M params model.

In this post we break down the Hierarchical Reasoning Model (HRM), a new model that rivals top LLMs on reasoning benchmarks with only 27M params!

In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning

In this post we explain the Darwin Gödel Machine, a novel method for self-improving AI agents by Sakana AI

Dive into Continuous Thought Machines, a novel architecture that strive to push AI closer to how the human brain works

Dive into Perception Language Models by Meta, a family of fully open SOTA vision-language models with detailed visual understanding

DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works