All Papers - AI Papers Academy

Looking for a specific paper or subject?

23 December 2023NLP

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

In this post we dive into LLM in a flash paper by Apple, that introduces a method to run LLMs on devices that have limited memory…

16 December 2023Computer Vision

Vision Transformers Explained | The ViT Paper

In this post we go back to the important vision transformers paper, to understand how ViT adapted transformers to computer vision…

26 November 2023NLP

Orca 2: Teaching Small Language Models How to Reason

Dive into Orca 2 research paper, the second version of the successful Orca small language model from Microsoft…

19 November 2023Computer Vision

From Diffusion Models to LCM-LoRA

Following LCM-LoRA release, in this post we explore the evolution of diffusion models up to latent consistency models with LoRA…

6 November 2023NLP

CODEFUSION: A Pre-trained Diffusion Model for Code Generation

In this post we dive into Microsoft’s CODEFUSION, an approach to use diffusion models for code generation that achieves remarkable results…

LLM yields accurate response for a text prompt and inaccurate response for a table data input

22 October 2023NLP

Table-GPT: Empower LLMs To Understand Tables

In this post we dive into Table-GPT, a novel research by Microsoft, that empowers LLMs to understand tabular data…

ViT Registers descriptions - adding tokens in addition to the image patches

15 October 2023Computer Vision

Vision Transformers Need Registers – Fixing a Bug in DINOv2?

In this post we explain the paper “Vision Transformers Need Registers” by Meta AI, that explains an interesting behavior in DINOv2 features…

Emu motivations - text-to-image models are not consistent with generating high-quality images

3 October 2023Computer Vision

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

In this post we dive into Emu, a text-to-image generation model by Meta AI, which is quality-tuned to generate highly aesthetic images…

NExT-GPT can both read inputs and generate outputs from multiple modalities

18 September 2023Multimodality

NExT-GPT: Any-to-Any Multimodal LLM

In this post we dive into NExT-GPT, a multimodal large language model (MM-LLM), that can both understand and respond with multiple modalities…

9 September 2023NLP

Large Language Models As Optimizers – OPRO by Google DeepMind

In this post we dive into the Large Language Models As Optimizers paper by Google DeepMind, which introduces OPRO (Optimization by PROmpting)…

7 September 2023Computer Vision

FACET: Fairness in Computer Vision Evaluation Benchmark

In this post we cover FACET, a new dataset created by Meta AI in order to evaluate a benchmark for fairness of computer vision models…

26 August 2023NLP

Code Llama Paper Explained

Discover an in-depth review of Code Llama paper, a specialized version of the Llama 2 model designed for coding tasks…

21 August 2023NLP

WizardMath – Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Diving into WizardMath, a LLM for mathematical reasoning contributed by Microsoft, surpassing models such as WizardLM and LLaMA-2…

17 August 2023NLP

Orca Research Paper Explained

In this post we dive into Orca’s paper which shows how to do imitation tuning effectively, outperforms ChatGPT with about 7% of its size!…

16 August 2023NLP

LongNet: Scaling Transformers to 1B Tokens with Dilated Attention

In this post we dive into the LongNet research paper which introduced the Dilated Attention mechanism and explain how it works…

15 August 2023Computer Vision

DINOv2 from Meta AI – A Foundational Model in Computer Vision

DINOv2 by Meta AI finally gives us a foundational model for computer vision. We’ll explain what it means and why DINOv2 can count as such…

14 August 2023Computer Vision

I-JEPA: The First Human-Like Computer Vision Model

Dive into I-JEPA, Image-based Joint-Embedding Predictive Architecture, the first model based on Yann LeCun’s vision for a more human-like AI…

14 August 2023Multimodality

ImageBind: One Embedding Space To Bind Them All

ImageBind is a multimodality model by Meta AI. In this post, we dive into ImageBind research paper to understand what it is and how it works…