Latest Reviews

Google Nested Learning Explained: Hope Architecture, Continual Learning, and the End of Frozen LLMs
GDPO Explained: How NVIDIA Fixes GRPO for Multi-Reward LLM Reinforcement Learning
DeepSeek’s mHC Explained: Manifold-Constrained Hyper-Connections
Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning
Less Is More: Tiny Recursive Model (TRM) Paper Explained

Soft Mixture of Experts

In this post we review Google DeepMind’s paper that introduces Soft Mixture of Experts, a fully-differentiable sparse Transformer.

Search for:

Scroll to Top