Fast Inference of Mixture-of-Experts Language Models with Offloading
In this post, we dive into a new research paper, titled: “Fast Inference of Mixture-of-Experts Language Models with Offloading”. Motivation LLMs Are Getting Larger In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open source models such […]
Fast Inference of Mixture-of-Experts Language Models with Offloading Read More »