Looking for a specific paper or subject?


  • Code Llama Paper Explained

    Code Llama Paper Explained

    Code Llama is a new family of open-source large language models for code by Meta AI that includes three type of models. Each type was released with 7B, 13B and 34B params. In this post we’ll explain the research paper behind them, titled “Code Llama: Open Foundation Models for Code”, to understand how these models…

  • DINOv2 from Meta AI – Finally a Foundational Model in Computer Vision

    DINOv2 from Meta AI – Finally a Foundational Model in Computer Vision

    DINOv2 is a computer vision model from Meta AI that claims to finally provide a foundational model in computer vision, closing some of the gap from natural language processing where it is already common for a while now. In this post, we’ll explain what does it mean to be a foundational model in computer vision…

  • I-JEPA – A Human-Like Computer Vision Model

    I-JEPA – A Human-Like Computer Vision Model

    I-JEPA, Image-based Joint-Embedding Predictive Architecture, is an open-source computer vision model from Meta AI, and the first AI model based on Yann LeCun’s vision for a more human-like AI, which he presented last year in a 62 pages paper titled “A Path Towards Autonomous Machine Intelligence”.In this post we’ll dive into the research paper that…

  • What is YOLO-NAS and How it Was Created

    What is YOLO-NAS and How it Was Created

    In this post we dive into YOLO-NAS, an improved version in the YOLO models family for object detection which was precented earlier this year by Deci. YOLO models have been around for a while now, presented in 2015 with the paper You Only Look Once, which is what the shortcut YOLO stands for, and over…

  • Introduction to Mixture-of-Experts (MoE)

    Introduction to Mixture-of-Experts (MoE)

    In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open-source models such as LLaMA 2 and 3, and many more. However, as we moved forward, these models got larger and larger and it became important to find…

  • Mixture-of-Agents Enhances Large Language Model Capabilities

    Mixture-of-Agents Enhances Large Language Model Capabilities

    Motivation In recent years we witness remarkable advancements in AI and specifically in natural language understanding, which are driven by large language models. Today, there are various different LLMs out there such as GPT-4, Llama 3, Qwen, Mixtral and many more. In this post we review a recent paper, titled: “Mixture-of-Agents Enhances Large Language Model…

  • Arithmetic Transformers with Abacus Positional Embeddings

    Arithmetic Transformers with Abacus Positional Embeddings

    Introduction In the recent years, we witness remarkable success driven by large language models (LLMs). While LLMs perform well in various domains, such as natural language problems and code generation, there is still a lot of room for improvement with complex multi-step and algorithmic reasoning. To do research about algorithmic reasoning capabilities without pouring significant…

  • CLLMs: Consistency Large Language Models

    CLLMs: Consistency Large Language Models

    In this post we dive into Consistency Large Language Models, or CLLMs in short, which were introduced in a recent research paper that goes by the same name. Before diving in, if you prefer a video format then check out the following video: Motivation Top LLMs such as GPT-4, LLaMA3 and more, are pushing the…

  • ReFT: Representation Finetuning for Language Models

    ReFT: Representation Finetuning for Language Models

    In this post we dive into a recent research paper which presents a promising novel direction for fine-tuning LLMs, achieving remarkable results when considering both parameters count and performance. Before diving in, if you prefer a video format then check out the following video: Motivation – Finetuning a Pre-trained Transformer is Expensive A common method…

  • Stealing Part of a Production Language Model

    Stealing Part of a Production Language Model

    Many of the top large language models today such as GPT-4, Claude 3 and Gemini are closed source, so a lot about the inner workings of these models is not known to the public. One justification for this is usually the competitive landscape, since companies are investing a lot of money and effort to create…

  • How Meta AI ‘s Human-Like V-JEPA Works?

    How Meta AI ‘s Human-Like V-JEPA Works?

    In this post, we dive into V-JEPA, which stands for Video Joint-Embedding Predicting Architecture, a new collection of vision models by Meta AI. V-JEPA is another step in Meta AI’s implementation of Yann LeCun’s vision about a more human-like AI. Several months back, we’ve already covered Meta AI’s I-JEPA model, which is a JEPA model…

  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    In this post, we dive into a new and exciting research paper by Microsoft, titled: “The Era of 1-bit LLMs: All Large Language Models are 1.58 bits”. In recent years, we’ve seen a tremendous success of large language models with models such as GPT, LLaMA and more. As we move forward, we see that the…

  • Self-Rewarding Language Models by Meta AI

    Self-Rewarding Language Models by Meta AI

    On January 18, Mark Zuckerberg announced that the long-term goal of Meta AI is to build general intelligence, and open-source it responsibly. So Meta AI is officially working on building an open-source AGI. On the same day, Meta AI have released a new research paper titled “Self-Rewarding Language Models”, which can be a step that…

  • Fast Inference of Mixture-of-Experts Language Models with Offloading

    Fast Inference of Mixture-of-Experts Language Models with Offloading

    In this post, we dive into a new research paper, titled: “Fast Inference of Mixture-of-Experts Language Models with Offloading”. Motivation LLMs Are Getting Larger In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open source models such…

Scroll to Top