
NLP Papers
Looking for a specific paper or subject?
Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning
Discover how reinforcement learning enables hierarchical reasoning in LLMs and how HICRA improves on top of GRPO…
Less Is More: Tiny Recursive Model (TRM) Paper Explained
In this post, we break down the TRM paper, a simpler version of the HRM, that beats HRM and top reasoning LLMs with a tiny 7M params model…
The Era of Hierarchical Reasoning Models?
In this post we break down the Hierarchical Reasoning Model (HRM), a new model that rivals top LLMs on reasoning benchmarks with only 27M params!…
Microsoft’s Reinforcement Pre-Training (RPT) – A New Direction in LLM Training?
In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning…
Darwin Gödel Machine: Self-Improving AI Agents
In this post we explain the Darwin Gödel Machine, a novel method for self-improving AI agents by Sakana AI…
Continuous Thought Machines (CTMs) – The Era of AI Beyond Transformers?
Dive into Continuous Thought Machines, a novel architecture that strive to push AI closer to how the human brain works…
Perception Language Models (PLMs) by Meta – A Fully Open SOTA VLM
Dive into Perception Language Models by Meta, a family of fully open SOTA vision-language models with detailed visual understanding…
GRPO Reinforcement Learning Explained (DeepSeekMath Paper)
DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works…
DAPO: Enhancing GRPO For LLM Reinforcement Learning
Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method…
Cheating LLMs & How (Not) To Stop Them | OpenAI Paper Explained
Discover how OpenAI’s research reveals AI models cheating the system through reward hacking — and what happens when trying to stop them…
START by Alibaba: Teaching LLMs To Debug Themselves
In this post we break down a recent Alibaba’s paper: START: Self-taught Reasoner with Tools. This paper shows how Large Language Models (LLMs) can teach themselves to debug their own thinking using Python. Introduction Top reasoning models, such as DeepSeek-R1, achieve remarkable results with long chain-of-thought (CoT) reasoning. These models are presented with complex problems…
SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs
Dive into SWE-RL by Meta, a DeepSeek-R1 style recipe for training LLMs for software engineering with reinforcement learning…
Large Language Diffusion Models: The Era Of Diffusion LLMs?
Discover Large Language Diffusion Models (LLaDA), a novel diffusion based approach to language modeling that challenges traditional LLMs…
CoCoMix by Meta AI – The Future of LLMs Pretraining?
Discover CoCoMix by Meta AI – a new approach for LLM pretraining using Continuous Concept Mixing, enriching word tokens with latent concepts!…
s1: Simple Test-Time Scaling – Can 1k Samples Rival o1-Preview?
Discover s1: a simple yet powerful approach to test-time scaling for LLMs, rivaling o1-preivew with just 1k samples!…
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI?
Dive into the groundbreaking DeepSeek-R1 research paper, introduces open-source reasoning models that rivals the performance OpenAI’s o1!…
Titans by Google: The Era of AI After Transformers?
Dive into Titans, a new AI architecture by Google, showing promising results comparing to Transformers! Paving the way for a new era in AI?…
rStar-Math by Microsoft: Can SLMs Beat OpenAI o1 in Math?
Discover how System 2 thinking through Monte Carlo Tree Search enables rStar-Math to rival OpenAI’s o1 in math, using Small Language Models!…















