NLP Papers
Looking for a specific paper or subject?
-
LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation
Large language models (LLMs) have literally conquered the world by this point. The native data modality used by large language models is obviously text. However, given their power, an active research domain is to try to harness their strong capabilities for other data modalities, and we already see LLMs that can understand images. Today we’re…
-
Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI
Introduction In recent years, we’ve witnessed tremendous progress in AI, primarily due to the rise of large language models (LLMs) such as LLaMA-3.1. To further enhance LLM capabilities, extensive research focuses on improving their training processes. In this post, we review a recent research paper by Stanford University and SynthLabs that suggests a potential improvement, which may significantly advance AI. The paper is titled “Generative Reward Models”, authored by some of the same individuals behind the widely-used DPO method. LLM Training Process Before diving to Generative Reward Models (GenRMs), let’s do a quick recap for how large language models are trained. Pre-training Stage LLMs are first pre-trained on huge amount of text, to learn general purpose knowledge. This step helps the LLM to be good at predicting the next token in a…
-
Introduction to Mixture-of-Experts (MoE)
In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open-source models such as LLaMA 2 and 3, and many more. However, as we moved forward, these models got larger and larger and it became important to find…
-
Mixture-of-Agents Enhances Large Language Model Capabilities
Motivation In recent years we witness remarkable advancements in AI and specifically in natural language understanding, which are driven by large language models. Today, there are various different LLMs out there such as GPT-4, Llama 3, Qwen, Mixtral and many more. In this post we review a recent paper, titled: “Mixture-of-Agents Enhances Large Language Model…
-
Arithmetic Transformers with Abacus Positional Embeddings
Introduction In the recent years, we witness remarkable success driven by large language models (LLMs). While LLMs perform well in various domains, such as natural language problems and code generation, there is still a lot of room for improvement with complex multi-step and algorithmic reasoning. To do research about algorithmic reasoning capabilities without pouring significant…
-
CLLMs: Consistency Large Language Models
In this post we dive into Consistency Large Language Models, or CLLMs in short, which were introduced in a recent research paper that goes by the same name. Before diving in, if you prefer a video format then check out the following video: Motivation Top LLMs such as GPT-4, LLaMA3 and more, are pushing the…
-
ReFT: Representation Finetuning for Language Models
In this post we dive into a recent research paper which presents a promising novel direction for fine-tuning LLMs, achieving remarkable results when considering both parameters count and performance. Before diving in, if you prefer a video format then check out the following video: Motivation – Finetuning a Pre-trained Transformer is Expensive A common method…
-
Stealing Part of a Production Language Model
Many of the top large language models today such as GPT-4, Claude 3 and Gemini are closed source, so a lot about the inner workings of these models is not known to the public. One justification for this is usually the competitive landscape, since companies are investing a lot of money and effort to create…
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
In this post, we dive into a new and exciting research paper by Microsoft, titled: “The Era of 1-bit LLMs: All Large Language Models are 1.58 bits”. In recent years, we’ve seen a tremendous success of large language models with models such as GPT, LLaMA and more. As we move forward, we see that the…