Computer Vision

ViT Registers descriptions - adding tokens in addition to the image patches

Vision Transformers Need Registers – Fixing a Bug in DINOv2?

In this post we will discuss about visual transformers registers, which is a concept that was introduced in a research paper by Meta AI titled “Vision Transformers Need Registers”, which is written by authors that were part of DINOv2 release, a successful foundational computer vision model by Meta AI which we covered before in the

Vision Transformers Need Registers – Fixing a Bug in DINOv2? Read More »

Emu motivations - text-to-image models are not consistent with generating high-quality images

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Emu is a new text-to-image generation model by Meta AI, which was presented in a research paper titled “Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack”. Text-to-image models are able to get a prompt as input, such as “a cat trying to catch a fish” in the example image above, and yield

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Read More »

I-JEPA example

I-JEPA – A Human-Like Computer Vision Model

I-JEPA, Image-based Joint-Embedding Predictive Architecture, is an open-source computer vision model from Meta AI, and the first AI model based on Yann LeCun’s vision for a more human-like AI, which he presented last year in a 62 pages paper titled “A Path Towards Autonomous Machine Intelligence”.In this post we’ll dive into the research paper that

I-JEPA – A Human-Like Computer Vision Model Read More »

Scroll to Top