Latest Reviews

170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Microsoft’s SkillOpt: 2x Accuracy Without Touching the Model
DeepSeek-V4 Explained: The End of Standard Attention in LLMs?
Google Nested Learning Explained: Hope Architecture, Continual Learning, and the End of Frozen LLMs
GDPO Explained: How NVIDIA Fixes GRPO for Multi-Reward LLM Reinforcement Learning

Soft MoE

In this post we review Google DeepMind’s paper that introduces Soft Mixture of Experts, a fully-differentiable sparse Transformer.

Search for:

Scroll to Top