Computer Vision Papers

Looking for a specific paper or subject?


Sapiens-oreview

Sapiens: Foundation for Human Vision Models

In this post we dive into Sapiens, a new family of computer vision models by Meta AI that show remarkable advancement in human-centric tasks!…
MoNE Architecture Overview

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

In this post we dive into Mixture of Nested Experts, a new method presented by Google that can dramatically reduce AI computational cost…
V-JEPA Training Process

How Meta AI ‘s Human-Like V-JEPA Works?

Explore V-JEPA, which stands for Video Joint-Embedding Predicting Architecture. Another step in Meta AI’s journey for human-like AI…
Vision Transformer (ViT) Architecture

Introduction to Vision Transformers | Original ViT Paper Explained

In this post we go back to the important vision transformers paper, to understand how ViT adapted transformers to computer vision…
Overview of LCM-LoRA

From Diffusion Models to LCM-LoRA

Following LCM-LoRA release, in this post we explore the evolution of diffusion models up to latent consistency models with LoRA…
ViT Registers descriptions - adding tokens in addition to the image patches

Vision Transformers Need Registers – Fixing a Bug in DINOv2?

In this post we explain the paper “Vision Transformers Need Registers” by Meta AI, that explains an interesting behavior in DINOv2 features…
Emu motivations - text-to-image models are not consistent with generating high-quality images

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

In this post we dive into Emu, a text-to-image generation model by Meta AI, which is quality-tuned to generate highly aesthetic images…
FACET data curation pipeline

FACET: Fairness in Computer Vision Evaluation Benchmark

In this post we cover FACET, a new dataset created by Meta AI in order to evaluate a benchmark for fairness of computer vision models…
Scroll to Top