Darwin Gödel Machine: Self-Improving AI Agents
In this post we explain the Darwin Gödel Machine, a novel method for self-improving AI agents by Sakana AI
In this post we explain the Darwin Gödel Machine, a novel method for self-improving AI agents by Sakana AI
Dive into Continuous Thought Machines, a novel architecture that strive to push AI closer to how the human brain works
Continuous Thought Machines (CTMs) – The Era of AI Beyond Transformers? Read More »
Dive into Perception Language Models by Meta, a family of fully open SOTA vision-language models with detailed visual understanding
Perception Language Models (PLMs) by Meta – A Fully Open SOTA VLM Read More »
DeepSeekMath is the fundamental GRPO paper, the reinforcement learning method used in DeepSeek-R1. Dive in to understand how it works
GRPO Reinforcement Learning Explained (DeepSeekMath Paper) Read More »
Explore DAPO, an innovative open-source Reinforcement Learning paradigm for LLMs that rivals DeepSeek-R1 GRPO method.
DAPO: Enhancing GRPO For LLM Reinforcement Learning Read More »
Discover how OpenAI’s research reveals AI models cheating the system through reward hacking — and what happens when trying to stop them
Cheating LLMs & How (Not) To Stop Them | OpenAI Paper Explained Read More »
In this post we break down a recent Alibaba’s paper: START: Self-taught Reasoner with Tools. This paper shows how Large Language Models (LLMs) can teach themselves to debug their own thinking using Python. Introduction Top reasoning models, such as DeepSeek-R1, achieve remarkable results with long chain-of-thought (CoT) reasoning. These models are presented with complex problems
START by Alibaba: Teaching LLMs To Debug Themselves Read More »
Dive into SWE-RL by Meta, a DeepSeek-R1 style recipe for training LLMs for software engineering with reinforcement learning.
SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs Read More »
Discover Large Language Diffusion Models (LLaDA), a novel diffusion based approach to language modeling that challenges traditional LLMs.
Large Language Diffusion Models: The Era Of Diffusion LLMs? Read More »
Discover CoCoMix by Meta AI – a new approach for LLM pretraining using Continuous Concept Mixing, enriching word tokens with latent concepts!
CoCoMix by Meta AI – The Future of LLMs Pretraining? Read More »