NLP Papers

Looking for a specific paper or subject?


  • LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation

    LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation

    Large language models (LLMs) have literally conquered the world by this point. The native data modality used by large language models is obviously text. However, given their power, an active research domain is to try to harness their strong capabilities for other data modalities, and we already see LLMs that can understand images. Today we’re…

  • Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI

    Generative Reward Models: Merging the Power of RLHF and RLAIF for Smarter AI

    Introduction In recent years, we’ve witnessed tremendous progress in AI, primarily due to the rise of large language models (LLMs) such as LLaMA-3.1. To further enhance LLM capabilities, extensive research focuses on improving their training processes. In this post, we review a recent research paper by Stanford University and SynthLabs that suggests a potential improvement, which may significantly advance AI. The paper is titled “Generative Reward Models”, authored by some of the same individuals behind the widely-used DPO method. LLM Training Process Before diving to Generative Reward Models (GenRMs), let’s do a quick recap for how large language models are trained. Pre-training Stage LLMs are first pre-trained on huge amount of text, to learn general purpose knowledge. This step helps the LLM to be good at predicting the next token in a…

  • Introduction to Mixture-of-Experts (MoE)

    Introduction to Mixture-of-Experts (MoE)

    In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open-source models such as LLaMA 2 and 3, and many more. However, as we moved forward, these models got larger and larger and it became important to find…

  • Mixture-of-Agents Enhances Large Language Model Capabilities

    Mixture-of-Agents Enhances Large Language Model Capabilities

    Motivation In recent years we witness remarkable advancements in AI and specifically in natural language understanding, which are driven by large language models. Today, there are various different LLMs out there such as GPT-4, Llama 3, Qwen, Mixtral and many more. In this post we review a recent paper, titled: “Mixture-of-Agents Enhances Large Language Model…

  • Arithmetic Transformers with Abacus Positional Embeddings

    Arithmetic Transformers with Abacus Positional Embeddings

    Introduction In the recent years, we witness remarkable success driven by large language models (LLMs). While LLMs perform well in various domains, such as natural language problems and code generation, there is still a lot of room for improvement with complex multi-step and algorithmic reasoning. To do research about algorithmic reasoning capabilities without pouring significant…

  • CLLMs: Consistency Large Language Models

    CLLMs: Consistency Large Language Models

    In this post we dive into Consistency Large Language Models, or CLLMs in short, which were introduced in a recent research paper that goes by the same name. Before diving in, if you prefer a video format then check out the following video: Motivation Top LLMs such as GPT-4, LLaMA3 and more, are pushing the…

  • ReFT: Representation Finetuning for Language Models

    ReFT: Representation Finetuning for Language Models

    In this post we dive into a recent research paper which presents a promising novel direction for fine-tuning LLMs, achieving remarkable results when considering both parameters count and performance. Before diving in, if you prefer a video format then check out the following video: Motivation – Finetuning a Pre-trained Transformer is Expensive A common method…

  • Stealing Part of a Production Language Model

    Stealing Part of a Production Language Model

    Many of the top large language models today such as GPT-4, Claude 3 and Gemini are closed source, so a lot about the inner workings of these models is not known to the public. One justification for this is usually the competitive landscape, since companies are investing a lot of money and effort to create…

  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    In this post, we dive into a new and exciting research paper by Microsoft, titled: “The Era of 1-bit LLMs: All Large Language Models are 1.58 bits”. In recent years, we’ve seen a tremendous success of large language models with models such as GPT, LLaMA and more. As we move forward, we see that the…

  • Self-Rewarding Language Models by Meta AI

    Self-Rewarding Language Models by Meta AI

    On January 18, Mark Zuckerberg announced that the long-term goal of Meta AI is to build general intelligence, and open-source it responsibly. So Meta AI is officially working on building an open-source AGI. On the same day, Meta AI have released a new research paper titled “Self-Rewarding Language Models”, which can be a step that…

Scroll to Top