Sapiens: Foundation for Human Vision Models
In this post we dive into Sapiens, a new family of computer vision models by Meta AI that show remarkable advancement in human-centric tasks!
Sapiens: Foundation for Human Vision Models Read More »
In this post we dive into Sapiens, a new family of computer vision models by Meta AI that show remarkable advancement in human-centric tasks!
Sapiens: Foundation for Human Vision Models Read More »
Motivation In recent years, we use AI for more and more use cases, interacting with models that provide us with remarkable outputs. As we move forward, the models we use are getting larger and larger, and so, an important research domain is to improve the efficiency of using and training AI models. Standard MoE Is
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Read More »
In this post, we dive into V-JEPA, which stands for Video Joint-Embedding Predicting Architecture, a new collection of vision models by Meta AI. V-JEPA is another step in Meta AI’s implementation of Yann LeCun’s vision about a more human-like AI. Several months back, we’ve already covered Meta AI’s I-JEPA model, which is a JEPA model
How Meta AI ‘s Human-Like V-JEPA Works? Read More »
Up until vision transformers were invented, the dominating model architecture in computer vision was convolutional neural network (CNN), which was invented at 1989 by famous researchers including Yann LeCun and Yoshua Bengio. At 2017, transformers were invented by Google and took the natural language processing domain by storm, but were not adapted successfully to computer
How Do Vision Transformers Work? Read More »
Recently, a new research paper was released, titled: “LCM-LoRA: A Universal Stable-Diffusion Acceleration Module”, which presents a method to generate high quality images with large text-to-image generation models, specifically SDXL, but doing so dramatically faster. And not only it can run SDXL much faster, it can also do so for a fine-tuned SDXL, say for
From Diffusion Models to LCM-LoRA Read More »
In this post we will discuss about visual transformers registers, which is a concept that was introduced in a research paper by Meta AI titled “Vision Transformers Need Registers”, which is written by authors that were part of DINOv2 release, a successful foundational computer vision model by Meta AI which we covered before in the
Vision Transformers Need Registers – Fixing a Bug in DINOv2? Read More »
Emu is a new text-to-image generation model by Meta AI, which was presented in a research paper titled “Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack”. Text-to-image models are able to get a prompt as input, such as “a cat trying to catch a fish” in the example image above, and yield
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Read More »
In this post we cover FACET, a new dataset created by Meta AI in order to evaluate a benchmark for fairness of computer vision models. Computer vision models are known to have biases that can impact their performance. For example, as we can see in the image below, given an image classification model, if we
FACET: Fairness in Computer Vision Evaluation Benchmark Read More »
DINOv2 is a computer vision model from Meta AI that claims to finally provide a foundational model in computer vision. In this post, we’ll explain what does it mean to be a foundational model in computer vision and why DINOv2 can count as such
DINOv2 from Meta AI – Finally a Foundational Model in Computer Vision Read More »
I-JEPA, Image-based Joint-Embedding Predictive Architecture, is an open-source computer vision model from Meta AI, and the first AI model based on Yann LeCun’s vision for a more human-like AI, which he presented last year in a 62 pages paper titled “A Path Towards Autonomous Machine Intelligence”.In this post we’ll dive into the research paper that
I-JEPA – A Human-Like Computer Vision Model Read More »