Computer Vision

How Do Vision Transformers Work?

Up until vision transformers were invented, the dominating model architecture in computer vision was convolutional neural network (CNN), which was invented at 1989 by famous researchers including Yann LeCun and Yoshua Bengio. At 2017, transformers were invented by Google and took the natural language processing domain by storm, but were not adapted successfully to computer …

How Do Vision Transformers Work? Read More »

From Diffusion Models to LCM-LoRA

Recently, a new research paper was released, titled: “LCM-LoRA: A Universal Stable-Diffusion Acceleration Module”, which presents a method to generate high quality images with large text-to-image generation models, specifically SDXL, but doing so dramatically faster. And not only it can run SDXL much faster, it can also do so for a fine-tuned SDXL, say for …

From Diffusion Models to LCM-LoRA Read More »

ViT Registers descriptions - adding tokens in addition to the image patches

Vision Transformers Need Registers – Fixing a Bug in DINOv2?

In this post we will discuss about visual transformers registers, which is a concept that was introduced in a research paper by Meta AI titled “Vision Transformers Need Registers”, which is written by authors that were part of DINOv2 release, a successful foundational computer vision model by Meta AI which we covered before in the …

Vision Transformers Need Registers – Fixing a Bug in DINOv2? Read More »

Emu motivations - text-to-image models are not consistent with generating high-quality images

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Emu is a new text-to-image generation model by Meta AI, which was presented in a research paper titled “Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack”. Text-to-image models are able to get a prompt as input, such as “a cat trying to catch a fish” in the example image above, and yield …

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Read More »

DINOv2 as foundational model

DINOv2 from Meta AI – Finally a Foundational Model in Computer Vision

DINOv2 is a computer vision model from Meta AI that claims to finally provide a foundational model in computer vision, closing some of the gap from natural language processing where it is already common for a while now. In this post, we’ll explain what does it mean to be a foundational model in computer vision …

DINOv2 from Meta AI – Finally a Foundational Model in Computer Vision Read More »

I-JEPA example

I-JEPA – A Human-Like Computer Vision Model

I-JEPA, Image-based Joint-Embedding Predictive Architecture, is an open-source computer vision model from Meta AI, and the first AI model based on Yann LeCun’s vision for a more human-like AI, which he presented last year in a 62 pages paper titled “A Path Towards Autonomous Machine Intelligence”.In this post we’ll dive into the research paper that …

I-JEPA – A Human-Like Computer Vision Model Read More »

Consistency models illustration

Consistency Models – Optimizing Diffusion Models Inference

Consistency models are a new type of generative models which were introduced by Open AI in a paper titled Consistency Models.In this post we will discuss about why consistency models are interesting, what they are and how they are created.Let’s start by asking why should we care about consistency models? If you prefer a video …

Consistency Models – Optimizing Diffusion Models Inference Read More »

Scroll to Top