Looking for a specific paper or subject?
-
What is YOLO-NAS and How it Was Created
In this post we dive into YOLO-NAS, an improved version in the YOLO models family for object detection which was precented earlier this year by Deci. YOLO models have been around for a while now, presented in 2015 with the paper You Only Look Once, which is what the shortcut YOLO stands for, and over…
-
Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI
Introduction In recent years, we’ve witnessed tremendous progress in AI, primarily due to the rise of large language models (LLMs) such as LLaMA-3.1. To further enhance LLM capabilities, extensive research focuses on improving their training processes. In this post, we review a recent research paper by Stanford University and SynthLabs that suggests a potential improvement, which may significantly advance AI. The paper is titled “Generative Reward Models”, authored by some of the same individuals behind the widely-used DPO method. LLM Training Process Before diving to Generative Reward Models (GenRMs), let’s do a quick recap for how large language models are trained. Pre-training Stage LLMs are first pre-trained on huge amount of text, to learn general purpose knowledge. This step helps the LLM to be good at predicting the next token in a…
-
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Motivation In recent years, we use AI for more and more use cases, interacting with models that provide us with remarkable outputs. As we move forward, the models we use are getting larger and larger, and so, an important research domain is to improve the efficiency of using and training AI models. Standard MoE Is…
-
Introduction to Mixture-of-Experts (MoE)
In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open-source models such as LLaMA 2 and 3, and many more. However, as we moved forward, these models got larger and larger and it became important to find…
-
Mixture-of-Agents Enhances Large Language Model Capabilities
Motivation In recent years we witness remarkable advancements in AI and specifically in natural language understanding, which are driven by large language models. Today, there are various different LLMs out there such as GPT-4, Llama 3, Qwen, Mixtral and many more. In this post we review a recent paper, titled: “Mixture-of-Agents Enhances Large Language Model…
-
Arithmetic Transformers with Abacus Positional Embeddings
Introduction In the recent years, we witness remarkable success driven by large language models (LLMs). While LLMs perform well in various domains, such as natural language problems and code generation, there is still a lot of room for improvement with complex multi-step and algorithmic reasoning. To do research about algorithmic reasoning capabilities without pouring significant…
-
CLLMs: Consistency Large Language Models
In this post we dive into Consistency Large Language Models, or CLLMs in short, which were introduced in a recent research paper that goes by the same name. Before diving in, if you prefer a video format then check out the following video: Motivation Top LLMs such as GPT-4, LLaMA3 and more, are pushing the…
-
ReFT: Representation Finetuning for Language Models
In this post we dive into a recent research paper which presents a promising novel direction for fine-tuning LLMs, achieving remarkable results when considering both parameters count and performance. Before diving in, if you prefer a video format then check out the following video: Motivation – Finetuning a Pre-trained Transformer is Expensive A common method…
-
Stealing Part of a Production Language Model
Many of the top large language models today such as GPT-4, Claude 3 and Gemini are closed source, so a lot about the inner workings of these models is not known to the public. One justification for this is usually the competitive landscape, since companies are investing a lot of money and effort to create…