Multimodality Papers

Looking for a specific paper or subject?


Large Concept Model High-level Architecture

Large Concept Models (LCMs) by Meta: The Era of AI After LLMs?

Explore Meta’s Large Concept Models (LCMs) - an AI model that processes concepts instead of tokens. Can it become the next LLM architecture?…
LLaMA-Mesh-preview2

LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation

Dive into Nvidia’s LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models, a LLM which was adapted to understand 3D objects…
TinyGPT architecture

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

In this post we dive into TinyGPT-V, a small but mighty Multimodal LLM which brings Phi-2 success to vision-language tasks…
NExT-GPT can both read inputs and generate outputs from multiple modalities

NExT-GPT: Any-to-Any Multimodal LLM

In this post we dive into NExT-GPT, a multimodal large language model (MM-LLM), that can both understand and respond with multiple modalities…
ImageBind examples

ImageBind: One Embedding Space To Bind Them All

ImageBind is a multimodality model by Meta AI. In this post, we dive into ImageBind research paper to understand what it is and how it works…
Meta-Transformer

Meta-Transformer: A Unified Framework for Multimodal Learning

In this post we dive into Meta-Transformer, a unified framework for multimodal learning, which can process information from 12(!) modalities…
Scroll to Top