NLP Papers

Looking for a specific paper or subject?

1 July 2025

Microsoft’s Reinforcement Pre-Training (RPT) – A New Direction in LLM Training?

In this post we break down Microsoft’s Reinforcement Pre-Training, which scales up reinforcement learninng with next-token reasoning…

14 June 2025

Darwin Gödel Machine: Self-Improving AI Agents

In this post we explain the Darwin Gödel Machine, a novel method for self-improving AI agents by Sakana AI…

4 June 2025

Continuous Thought Machines (CTMs) – The Era of AI Beyond Transformers?

Dive into Continuous Thought Machines, a novel architecture that strive to push AI closer to how the human brain works…

3 May 2025

Perception Language Models (PLMs) by Meta – A Fully Open SOTA VLM

Dive into Perception Language Models by Meta, a family of fully open SOTA vision-language models with detailed visual understanding…

14 April 2025

GRPO Reinforcement Learning Explained (DeepSeekMath Paper)

DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works…

21 March 2025

DAPO: Enhancing GRPO For LLM Reinforcement Learning

Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method…

13 March 2025

Cheating LLMs & How (Not) To Stop Them | OpenAI Paper Explained

Discover how OpenAI’s research reveals AI models cheating the system through reward hacking — and what happens when trying to stop them…

8 March 2025

START by Alibaba: Teaching LLMs To Debug Themselves

In this post we break down a recent Alibaba’s paper: START: Self-taught Reasoner with Tools. This paper shows how Large Language Models (LLMs) can teach themselves to debug their own thinking using Python. Introduction Top reasoning models, such as DeepSeek-R1, achieve remarkable results with long chain-of-thought (CoT) reasoning. These models are presented with complex problems…

1 March 2025

SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs

Dive into SWE-RL by Meta, a DeepSeek-R1 style recipe for training LLMs for software engineering with reinforcement learning…

21 February 2025

Large Language Diffusion Models: The Era Of Diffusion LLMs?

Discover Large Language Diffusion Models (LLaDA), a novel diffusion based approach to language modeling that challenges traditional LLMs…

15 February 2025

CoCoMix by Meta AI – The Future of LLMs Pretraining?

Discover CoCoMix by Meta AI – a new approach for LLM pretraining using Continuous Concept Mixing, enriching word tokens with latent concepts!…

7 February 2025

s1: Simple Test-Time Scaling – Can 1k Samples Rival o1-Preview?

Discover s1: a simple yet powerful approach to test-time scaling for LLMs, rivaling o1-preivew with just 1k samples!…

25 January 2025

DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI?

Dive into the groundbreaking DeepSeek-R1 research paper, introduces open-source reasoning models that rivals the performance OpenAI’s o1!…

18 January 2025

Titans by Google: The Era of AI After Transformers?

Dive into Titans, a new AI architecture by Google, showing promising results comparing to Transformers! Paving the way for a new era in AI?…

12 January 2025

rStar-Math by Microsoft: Can SLMs Beat OpenAI o1 in Math?

Discover how System 2 thinking through Monte Carlo Tree Search enables rStar-Math to rival OpenAI’s o1 in math, using Small Language Models!…

3 January 2025

Large Concept Models (LCMs) by Meta: The Era of AI After LLMs?

Explore Meta’s Large Concept Models (LCMs) - an AI model that processes concepts instead of tokens. Can it become the next LLM architecture?…

21 December 2024

Byte Latent Transformer (BLT) by Meta AI: A Tokenizer-free LLM Revolution

Explore Byte Latent Transformer (BLT) by Meta AI: A tokenizer-free LLM that scales better than tokenization-based LLMs…

14 December 2024

Coconut by Meta AI – Better LLM Reasoning With Chain of CONTINUOUS Thought?

Discover how Meta AI’s Chain of Continuous Thought (Coconut) empowers large language models (LLMs) to reason in their own language…