NLP Papers

Looking for a specific paper or subject?

24 May 2026

DeepSeek-V4 Explained: The End of Standard Attention in LLMs?

In this post, we break down DeepSeek-V4, a new LLM from DeepSeek designed for highly efficient reasoning over million-token contexts…

26 April 2026

Google Nested Learning Explained: Hope Architecture, Continual Learning, and the End of Frozen LLMs

Google’s Nested Learning paper and Hope model explained: a new approach to continual learning in LLMs that addresses catastrophic forgetting…

27 January 2026

GDPO Explained: How NVIDIA Fixes GRPO for Multi-Reward LLM Reinforcement Learning

GDPO is NVIDIA’s solution to GRPO’s limitations in multi-reward RL for large language models. We break down the paper in this post…

3 January 2026

DeepSeek’s mHC Explained: Manifold-Constrained Hyper-Connections

Manifold-Constrained Hyper-Connections (mHC) explained: How DeepSeek rewires residual connections in LLMs for next-gen AI…

25 December 2025

Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning

Discover how reinforcement learning enables hierarchical reasoning in LLMs and how HICRA improves on top of GRPO…

24 October 2025

Less Is More: Tiny Recursive Model (TRM) Paper Explained

In this post, we break down the TRM paper, a simpler version of the HRM, that beats HRM and top reasoning LLMs with a tiny 7M params model…

20 August 2025

The Era of Hierarchical Reasoning Models?

In this post we break down the Hierarchical Reasoning Model (HRM), a new model that rivals top LLMs on reasoning benchmarks with only 27M params!…

1 July 2025

Microsoft’s Reinforcement Pre-Training (RPT) – A New Direction in LLM Training?

In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning…

14 June 2025

Darwin Gödel Machine: Self-Improving AI Agents

In this post we explain the Darwin Gödel Machine, a novel method for self-improving AI agents by Sakana AI…

4 June 2025

Continuous Thought Machines (CTMs) – The Era of AI Beyond Transformers?

Dive into Continuous Thought Machines, a novel architecture that strive to push AI closer to how the human brain works…

3 May 2025

Perception Language Models (PLMs) by Meta – A Fully Open SOTA VLM

Dive into Perception Language Models by Meta, a family of fully open SOTA vision-language models with detailed visual understanding…

14 April 2025

GRPO Reinforcement Learning Explained (DeepSeekMath Paper)

DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works…

21 March 2025

DAPO: Enhancing GRPO For LLM Reinforcement Learning

Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method…

13 March 2025

Cheating LLMs & How (Not) To Stop Them | OpenAI Paper Explained

Discover how OpenAI’s research reveals AI models cheating the system through reward hacking — and what happens when trying to stop them…

8 March 2025

START by Alibaba: Teaching LLMs To Debug Themselves

In this post we break down a recent Alibaba’s paper: START: Self-taught Reasoner with Tools. This paper shows how Large Language Models (LLMs) can teach themselves to debug their own thinking using Python. Introduction Top reasoning models, such as DeepSeek-R1, achieve remarkable results with long chain-of-thought (CoT) reasoning. These models are presented with complex problems…

1 March 2025