Author name: aipapersacademy.com

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

In this post, we dive into a new and exciting research paper by Microsoft, titled: “The Era of 1-bit LLMs: All Large Language Models are 1.58 bits”. In recent years, we’ve seen a tremendous success of large language models with models such as GPT, LLaMA and more. As we move forward, we see that the …

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Read More »

Fast Inference of Mixture-of-Experts Language Models with Offloading

In this post, we dive into a new research paper, titled: “Fast Inference of Mixture-of-Experts Language Models with Offloading”. Motivation LLMs Are Getting Larger In recent years, large language models are in charge of remarkable advances in AI, with models such as GPT-3 and 4 which are closed source and with open source models such …

Fast Inference of Mixture-of-Experts Language Models with Offloading Read More »

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

In this post we dive into TinyGPT-V, a new multimodal large language model which was introduced in a research paper titled “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”. Before divining in, if you prefer a video format then check out our video review for this paper: Motivation In recent years we’ve seen a …

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Read More »

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

In this post we dive into a new research paper from Apple titled: “LLM in a flash: Efficient Large Language Model Inference with Limited Memory”. Before divining in, if you prefer a video format then check out our video review for this paper: Motivation In recent years, we’ve seen a tremendous success of large language …

LLM in a flash: Efficient Large Language Model Inference with Limited Memory Read More »

How Do Vision Transformers Work?

Up until vision transformers were invented, the dominating model architecture in computer vision was convolutional neural network (CNN), which was invented at 1989 by famous researchers including Yann LeCun and Yoshua Bengio. At 2017, transformers were invented by Google and took the natural language processing domain by storm, but were not adapted successfully to computer …

How Do Vision Transformers Work? Read More »

Orca 2: Teaching Small Language Models How to Reason

Several months ago, Microsoft released the first version of Orca, which achieved remarkable results, even surpassing ChatGPT on data from BigBench-Hard dataset, and the ideas from Orca 1 helped to create better language models released in the recent period. The Orca 2 model, presented in the paper we review in this post, achieves significantly better …

Orca 2: Teaching Small Language Models How to Reason Read More »

From Diffusion Models to LCM-LoRA

Recently, a new research paper was released, titled: “LCM-LoRA: A Universal Stable-Diffusion Acceleration Module”, which presents a method to generate high quality images with large text-to-image generation models, specifically SDXL, but doing so dramatically faster. And not only it can run SDXL much faster, it can also do so for a fine-tuned SDXL, say for …

From Diffusion Models to LCM-LoRA Read More »

CODEFUSION: A Pre-trained Diffusion Model for Code Generation

CODEFUSION is a new code generation model which was introduced in a research paper from Microsoft, titled: “CODEFUSION: A Pre-trained Diffusion Model for Code Generation”. Recently, we’ve observed a significant progress with code generation using AI, which is mostly based on large language models (LLMs), so we refer to them as code LLMs. With a …

CODEFUSION: A Pre-trained Diffusion Model for Code Generation Read More »

Scroll to Top