In this post, we break down the paper “Less is More: Recursive Reasoning with Tiny Networks”, which introduced the Tiny Recursive Model (TRM), a simpler version of the Hierarchical Reasoning Model (HRM), that outperforms HRM and even top reasoning LLMs with a tiny 7M parameters model on challenging reasoning benchmarks.

Introduction
A couple of months back, we’ve reviewed a new architecture called the Hierarchical Reasoning Model (HRM), that with just 27 million parameters, was able to beat top large language models (LLMs) on some of the most challenging reasoning benchmarks.
Now, a new paper titled Less is More: Recursive Reasoning with Tiny Networks introduces a new model architecture inspired by the Hierarchical Reasoning Model, called Tiny Recursive Model (TRM)
With only 7 million parameters, and a simpler architecture, TRM achieves significant improvements over HRM.
Chain-of-Thought Reasoning vs Latent Reasoning

As a quick reminder, current Transformer-based large language models rely on chain-of-thought reasoning. Given an input prompt, the model generates tokens that represent its reasoning process, which are fed back into the model, and repeated until it produces a final answer.
However, sometimes that’s not enough. For example, on ARC-AGI 2, Gemini 2.5 Pro achieves only 4.9% accuracy. Additionally, generating long chain-of-thought traces also comes with a cost. Many forward passes with a growing context window make the process slow and expensive.
Instead, the Hierarchical Reasoning Model and Tiny Recursive Model take a different approach, based on latent reasoning.
Given a prompt, the model performs its entire reasoning process internally, and outputs only the final answer, without the reasoning traces. While this sounds like a standard RNN, we’ll see how it’s different when diving deeper into the architecture. But before diving into the architecture, let’s look at the results.

TRM and HRM Evaluation Benchmarks

Since the Tiny Recursive Model is evaluated on the same benchmarks as the Hierarchical Reasoning Model, we’ll use the above figure from the HRM paper to illustrate the types of reasoning tasks used for evaluation.
- ARC-AGI: an IQ-test–like puzzle where the model needs to figure out rules from a few examples.
- Sudoku-Extreme: a dataset the researchers extracted from existing Sudoku datasets and made it significantly harder.
- Maze solving: where the model must find the optimal path between two points.
TRM vs HRM and Top LLMs


In the above tables from the paper we can see the results of the Tiny Recursive Model on these benchmarks comparing to Hierarchical Reasoning Model and top reasoning models such as Gemini 2.5 Pro and Claude 3.7.
On the Sudoku benchmark, we see an accuracy jump of more than 30% with a model of only 5 million params.
The difference between the two versions of the model (TRM-Att and TRM-MLP) is that one uses self-attention, and the other replaces that with an MLP block inspired by the MLP Mixer paper. For Sudoku, that works well since the context length is small and fixed, smaller than the size of the model dimension.
For maze solving, TRM shows another 10.8% improvement. A noticeable improvement is also shown for ARC-AGI 1 and 2.
Again, doing better than top reasoning models which are orders of magnitude larger. Even though this is a task-specific model, that’s still impressive.
So, how does it actually work? Let’s look at the architecture.
Reminder For The HRM Architecture

First, the input goes through a trainable embedding layer, which turns it into a representation the model can work with.
The Hierarchical Reasoning Model used two coupled recurrent modules, working together at different time scales, and used an analogy where the high-level module is the planner and the low-level module is the doer. The high-level module handles abstract reasoning and sets the overall direction, while the low-level module runs fast, detailed computations to follow the high-level plan and work out the specifics.
While this analogy helps to understand the overall idea, it may not be an entirely correct interpretation.
The Tiny Recursive Model (TRM) Architecture

TRM simplifies the HRM architecture by using a single module that learns to handle both roles. Note that although the diagram shows multiple boxes, they all share the same weights. It’s a single module applied repeatedly to refine the reasoning.
The names of the latent features change as well.
- Instead of z_H which represented the latent features of the high-level module, we now have y, which represents the embedding of the current solution. We’ll see that it is refined by the model until it’s finally used to project the output.
- Instead of z_L which represented the latent features of the low-level module, we now have z, which represents the model’s latent reasoning.
As we’ll now see, both latent features y and z are recursively refined by the model.
TRM Inference Flow
We start by refining the latent reasoning feature z. As input, the module takes the input embeddings, the current latent reasoning z, and the current output embedding y. This allows the module to update its internal reasoning z based on what it currently believes, what it has produced so far, and the original problem input.
This step is repeated for several recurrent steps, noted as T, a hyperparameter of the model. At each step, the module consumes its current latent reasoning z from the previous step, together with the original input embeddings and the output embedding y, which is still the initial one since we haven’t updated it yet.
Once T steps are done, we run a single step with a different role. The model now processes the refined latent reasoning z, along with the current output embedding y, to refine the output embedding. So here we update y.
An important observation is that when updating y, the model doesn’t use the input embedding. This separation helps the model clearly learn to distinguish between its two roles of reasoning refinement and output refinement. This is a main reason for why there is no need for two different modules.
We then repeat the same process. The model runs another T steps of refining the latent reasoning z, now with the new output embedding y, and afterwards again refines y using the new latent reasoning z.
The diagram only shows 2 cycles, but in practice this nested loop runs for n cycles, another hyperparameter of the model. Finally, the output embedding y is fed into a trainable output layer that produces the final tokens.
TRM Differences From Standard RNN
One thing that’s different from standard recurrent networks is the use of two latent features for different purposes.
Another difference lies in how the model is trained. Normally, recurrent models are trained using Backpropagation Through Time, or BPTT. In Backpropagation Through Time, the loss is backpropagated through every single step, which requires huge memory and often becomes unstable when the reasoning chain is long.
Both TRM and HRM don’t use BPTT. Let’s start with how HRM handles that and then see what TRM improves there.
HRM Training with One-Step Gradient Approximation
Instead of using full Backpropagation Through Time, Hierarchical Reasoning Model avoids it by applying a one-step gradient approximation. That means that instead of backpropagating through all the recursive steps over the cycles, it only updates parameters based on the current step’s computation, shown in blue in the HRM architecture diagram.
This has two major benefits. First, no matter how many reasoning steps we have, memory stays the same. Second, training becomes more stable, since it avoids exploding or vanishing gradients from long backpropagation chains.
TRM Training Process Improvement
The HRM’s one-step gradient approximation relies on a strong mathematical assumption. In short, it assumes that the model converged to a fixed point. As the Tiny Recursive Model paper points out, that’s rarely guaranteed in practice, and HRM never actually checks if the model converged before applying the update.
To remove this assumption, the Tiny Recursive Model takes a different approach.
Tiny Recursive Model backpropagates through a single full cycle, which includes one step of output refinement and T steps of latent reasoning refinement. This requires a few more backward passes comparing to Hierarchical Reasoning Model, but it’s still way more efficient comparing to Backpropagation Through Time since the model can run for many cycles.
By training over a full cycle, the model learns to improve its reasoning iteratively. In other words, it starts each cycle with some initial y and z inputs, and learns to make them better. This doesn’t require a strong mathematical assumption. During inference, multiple forward cycles push it progressively closer to the correct solution.
Adaptive Halting
One more missing piece here is adaptive halting. After n cycles are completed, the last hidden state of the high-level module is not just passed to the output layer as drawn here. Instead, it first passes via a halting head, that decides whether the model should stop, or reason for another n cycles.
This way, the model can dynamically adjust its thinking time depending on task complexity. For harder problems, it can continue reasoning for more cycles, while for easier ones, it stops early to save compute.
Final Note
Before wrapping up, there’s an important nuance to mention. The paper’s title, Less is More, suggests that smaller might be better for this architecture.
However, these results may actually reflect a fight against overfitting. Since the model was trained on very limited data, a smaller network was possibly necessary to prevent overfitting.
So in this case, less really means just enough. Small enough to generalize from scarce data, yet still powerful thanks to its recursive reasoning design.
References & Links
- The TRM paper
- TRM Code
- Our HRM breakdown
- Join our newsletter to receive concise 1-minute read summaries for the papers we review – Newsletter
All credit for the research goes to the researchers who wrote the paper we covered in this post.
