Looking for a specific paper or subject?


Overview of LCM-LoRA

From Diffusion Models to LCM-LoRA

Following LCM-LoRA release, in this post we explore the evolution of diffusion models up to latent consistency models with LoRA…

CODEFUSION: A Pre-trained Diffusion Model for Code Generation

In this post we dive into Microsoft’s CODEFUSION, an approach to use diffusion models for code generation that achieves remarkable results…
LLM yields accurate response for a text prompt and inaccurate response for a table data input

Table-GPT: Empower LLMs To Understand Tables

In this post we dive into Table-GPT, a novel research by Microsoft, that empowers LLMs to understand tabular data…
ViT Registers descriptions - adding tokens in addition to the image patches

Vision Transformers Need Registers – Fixing a Bug in DINOv2?

In this post we explain the paper “Vision Transformers Need Registers” by Meta AI, that explains an interesting behavior in DINOv2 features…
Emu motivations - text-to-image models are not consistent with generating high-quality images

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

In this post we dive into Emu, a text-to-image generation model by Meta AI, which is quality-tuned to generate highly aesthetic images…
NExT-GPT can both read inputs and generate outputs from multiple modalities

NExT-GPT: Any-to-Any Multimodal LLM

In this post we dive into NExT-GPT, a multimodal large language model (MM-LLM), that can both understand and respond with multiple modalities…
Opro framework overview

Large Language Models As Optimizers – OPRO by Google DeepMind

In this post we dive into the Large Language Models As Optimizers paper by Google DeepMind, which introduces OPRO (Optimization by PROmpting)…
FACET data curation pipeline

FACET: Fairness in Computer Vision Evaluation Benchmark

In this post we cover FACET, a new dataset created by Meta AI in order to evaluate a benchmark for fairness of computer vision models…
Scroll to Top