Multimodality

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

In this post we dive into TinyGPT-V, a new multimodal large language model which was introduced in a research paper titled “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”. Before divining in, if you prefer a video format then check out our video review for this paper: Motivation In recent years we’ve seen a …

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Read More »

Meta-Transformer

Meta-Transformer: A Unified Framework for Multimodal Learning

In this post we dive into Meta-Transformer, a multimodal learning method, which was presented in a research paper titled Meta-Transformer: A Unified Framework for Multimodal Learning. In the paper, the researchers show they were able to process information from 12(!) different modalities that we see in the picture above, which includes image, text, audio, infrared, …

Meta-Transformer: A Unified Framework for Multimodal Learning Read More »

Scroll to Top