LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation
Dive into Nvidia’s LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models, a LLM which was adapted to understand 3D objects.
LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation Read More »
Dive into Nvidia’s LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models, a LLM which was adapted to understand 3D objects.
LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation Read More »
In this post we dive into TinyGPT-V, a small but mighty Multimodal LLM which brings Phi-2 success to vision-language tasks
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Read More »
In this post we dive into NExT-GPT, a multimodal large language model (MM-LLM), that can both understand and respond with multiple modalities
NExT-GPT: Any-to-Any Multimodal LLM Read More »
ImageBind is a multimodality model by Meta AI. In this post, we dive into ImageBind research paper to understand what it is and how it works.
ImageBind: One Embedding Space To Bind Them All Read More »
In this post we dive into Meta-Transformer, a unified framework for multimodal learning, which can process information from 12(!) modalities
Meta-Transformer: A Unified Framework for Multimodal Learning Read More »