All Papers - AI Papers Academy

Looking for a specific paper or subject?

NExT-GPT can both read inputs and generate outputs from multiple modalities

18 September 2023Multimodality

NExT-GPT: Any-to-Any Multimodal LLM

In this post we dive into NExT-GPT, a multimodal large language model (MM-LLM), that can both understand and respond with multiple modalities…

9 September 2023NLP

Large Language Models As Optimizers – OPRO by Google DeepMind

In this post we dive into the Large Language Models As Optimizers paper by Google DeepMind, which introduces OPRO (Optimization by PROmpting)…

7 September 2023Computer Vision

FACET: Fairness in Computer Vision Evaluation Benchmark

In this post we cover FACET, a new dataset created by Meta AI in order to evaluate a benchmark for fairness of computer vision models…

26 August 2023NLP

Code Llama Paper Explained

Discover an in-depth review of Code Llama paper, a specialized version of the Llama 2 model designed for coding tasks…

21 August 2023NLP

WizardMath – Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Diving into WizardMath, a LLM for mathematical reasoning contributed by Microsoft, surpassing models such as WizardLM and LLaMA-2…

17 August 2023NLP

Orca Research Paper Explained

In this post we dive into Orca’s paper which shows how to do imitation tuning effectively, outperforms ChatGPT with about 7% of its size!…

16 August 2023NLP

LongNet: Scaling Transformers to 1B Tokens with Dilated Attention

In this post we dive into the LongNet research paper which introduced the Dilated Attention mechanism and explain how it works…

15 August 2023Computer Vision

DINOv2 from Meta AI – A Foundational Model in Computer Vision

DINOv2 by Meta AI finally gives us a foundational model for computer vision. We’ll explain what it means and why DINOv2 can count as such…

14 August 2023Computer Vision

I-JEPA: The First Human-Like Computer Vision Model

Dive into I-JEPA, Image-based Joint-Embedding Predictive Architecture, the first model based on Yann LeCun’s vision for a more human-like AI…

14 August 2023Multimodality

ImageBind: One Embedding Space To Bind Them All

ImageBind is a multimodality model by Meta AI. In this post, we dive into ImageBind research paper to understand what it is and how it works…

13 August 2023Computer Vision

Consistency Models – Optimizing Diffusion Models Inference

Consistency models are a new type of generative models which were introduced by Open AI, and in this post we will dive into how they work…

12 August 2023NLP

LIMA from Meta AI – Less Is More for Alignment of LLMs

In this post we explain LIMA, a LLM by Meta AI which was fine-tuned on only 1000 samples, yet it achieves competitive results with top LLMs…

11 August 2023NLP

Shepherd: A Critic for Language Model Generation

Dive into Shepherd, a LLM from Meta AI which is purposed to critique responses from other LLMs, a step in resolving LLMs hallucinations…

10 August 2023NLP

Universal and Transferable Adversarial LLM Attacks

LLMs are aligned for safety to avoid generation of harmful content. In this post we review a paper that is able to successfully attack LLMs…

9 August 2023Multimodality

Meta-Transformer: A Unified Framework for Multimodal Learning

In this post we dive into Meta-Transformer, a unified framework for multimodal learning, which can process information from 12(!) modalities…

9 August 2023Computer Vision,NLP

From Sparse to Soft Mixture of Experts

In this post we review Google DeepMind’s paper that introduces Soft Mixture of Experts, a fully-differentiable sparse Transformer…

8 August 2023Computer Vision

What is YOLO-NAS and How it Was Created

YOLO-NAS is an object detection model with the best accuracy-latency tradeoff to date. In this post we explain how it was created…