
Looking for a specific paper or subject?
Vision Transformers Need Registers – Fixing a Bug in DINOv2?
In this post we explain the paper “Vision Transformers Need Registers” by Meta AI, that explains an interesting behavior in DINOv2 features…
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
In this post we dive into Emu, a text-to-image generation model by Meta AI, which is quality-tuned to generate highly aesthetic images…
NExT-GPT: Any-to-Any Multimodal LLM
In this post we dive into NExT-GPT, a multimodal large language model (MM-LLM), that can both understand and respond with multiple modalities…
Large Language Models As Optimizers – OPRO by Google DeepMind
In this post we dive into the Large Language Models As Optimizers paper by Google DeepMind, which introduces OPRO (Optimization by PROmpting)…
FACET: Fairness in Computer Vision Evaluation Benchmark
In this post we cover FACET, a new dataset created by Meta AI in order to evaluate a benchmark for fairness of computer vision models…
Code Llama Paper Explained
Discover an in-depth review of Code Llama paper, a specialized version of the Llama 2 model designed for coding tasks…
WizardMath – Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Diving into WizardMath, a LLM for mathematical reasoning contributed by Microsoft, surpassing models such as WizardLM and LLaMA-2…
Orca Research Paper Explained
In this post we dive into Orca’s paper which shows how to do imitation tuning effectively, outperforms ChatGPT with about 7% of its size!…
LongNet: Scaling Transformers to 1B Tokens with Dilated Attention
In this post we dive into the LongNet research paper which introduced the Dilated Attention mechanism and explain how it works…
DINOv2 from Meta AI – A Foundational Model in Computer Vision
DINOv2 by Meta AI finally gives us a foundational model for computer vision. We’ll explain what it means and why DINOv2 can count as such…
I-JEPA: The First Human-Like Computer Vision Model
Dive into I-JEPA, Image-based Joint-Embedding Predictive Architecture, the first model based on Yann LeCun’s vision for a more human-like AI…
ImageBind: One Embedding Space To Bind Them All
ImageBind is a multimodality model by Meta AI. In this post, we dive into ImageBind research paper to understand what it is and how it works…
Consistency Models – Optimizing Diffusion Models Inference
Consistency models are a new type of generative models which were introduced by Open AI, and in this post we will dive into how they work…
LIMA from Meta AI – Less Is More for Alignment of LLMs
In this post we explain LIMA, a LLM by Meta AI which was fine-tuned on only 1000 samples, yet it achieves competitive results with top LLMs…
Shepherd: A Critic for Language Model Generation
Dive into Shepherd, a LLM from Meta AI which is purposed to critique responses from other LLMs, a step in resolving LLMs hallucinations…
Universal and Transferable Adversarial LLM Attacks
LLMs are aligned for safety to avoid generation of harmful content. In this post we review a paper that is able to successfully attack LLMs…
Meta-Transformer: A Unified Framework for Multimodal Learning
In this post we dive into Meta-Transformer, a unified framework for multimodal learning, which can process information from 12(!) modalities…
From Sparse to Soft Mixture of Experts
In this post we review Google DeepMind’s paper that introduces Soft Mixture of Experts, a fully-differentiable sparse Transformer…