NLP Papers

Looking for a specific paper or subject?


  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    In this post, we dive into a new and exciting research paper by Microsoft, titled: “The Era of 1-bit LLMs: All Large Language Models are 1.58 bits”. In recent years, we’ve seen a tremendous success of large language models with models such as GPT, LLaMA and more. As we move forward, we see that the…

  • Self-Rewarding Language Models by Meta AI

    Self-Rewarding Language Models by Meta AI

    On January 18, Mark Zuckerberg announced that the long-term goal of Meta AI is to build general intelligence, and open-source it responsibly. So Meta AI is officially working on building an open-source AGI. On the same day, Meta AI have released a new research paper titled “Self-Rewarding Language Models”, which can be a step that…

  • Fast Inference of Mixture-of-Experts Language Models with Offloading

    Fast Inference of Mixture-of-Experts Language Models with Offloading

    In this post, we dive into a new research paper, titled: “Fast Inference of Mixture-of-Experts Language Models with Offloading”. Motivation LLMs Are Getting Larger In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open source models such…

  • LLM in a flash: Efficient Large Language Model Inference with Limited Memory

    LLM in a flash: Efficient Large Language Model Inference with Limited Memory

    In this post we dive into a new research paper from Apple titled: “LLM in a flash: Efficient Large Language Model Inference with Limited Memory”. Before divining in, if you prefer a video format then check out our video review for this paper: Motivation In recent years, we’ve seen a tremendous success of large language…

  • Orca 2: Teaching Small Language Models How to Reason

    Orca 2: Teaching Small Language Models How to Reason

    Several months ago, Microsoft released the first version of Orca, which achieved remarkable results, even surpassing ChatGPT on data from BigBench-Hard dataset, and the ideas from Orca 1 helped to create better language models released in the recent period. The Orca 2 model, presented in the paper we review in this post, achieves significantly better…

  • CODEFUSION: A Pre-trained Diffusion Model for Code Generation

    CODEFUSION: A Pre-trained Diffusion Model for Code Generation

    CODEFUSION is a new code generation model which was introduced in a research paper from Microsoft, titled: “CODEFUSION: A Pre-trained Diffusion Model for Code Generation”. Recently, we’ve observed a significant progress with code generation using AI, which is mostly based on large language models (LLMs), so we refer to them as code LLMs. With a…

  • Table-GPT: Empower LLMs To Understand Tables

    Table-GPT: Empower LLMs To Understand Tables

    Nowadays, we are witnessing a tremendous progress with large language models (LLMs) such as ChatGPT, Llama and more, where we can feed a LLM with a text instruction or question, and most of the times get an accurate response from the model. However, if we’ll try to feed the model with a table data, in…

  • Large Language Models As Optimizers – OPRO by Google DeepMind

    Large Language Models As Optimizers – OPRO by Google DeepMind

    OPRO (Optimization by PROmpting), is a new approach to leverage large language models as optimizers, which was introduced by Google DeepMind in a research paper titled “Large Language Models As Optimizers”. Large language models are very good at getting a prompt, such as an instruction or a question, and yield a useful response that match…

  • Code Llama Paper Explained

    Code Llama Paper Explained

    Code Llama is a new family of open-source large language models for code by Meta AI that includes three type of models. Each type was released with 7B, 13B and 34B params. In this post we’ll explain the research paper behind them, titled “Code Llama: Open Foundation Models for Code”, to understand how these models…

  • WizardMath – Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

    WizardMath – Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

    Welcome WizardMath, a new open-source large language model contributed by Microsoft. While top large language models such as GPT-4 have demonstrated remarkable capabilities in various tasks including mathematical reasoning, they are not open-source. And for open-source large language models such as LLaMA-2 the situation is different, and until now they did not demonstrate strong math…

Scroll to Top