BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Junnan Li, et al.

00

2022-05-31

vision-languagepretraining

Abstract

This paper introduces and evaluates the idea described in “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation”, and reports empirical results that helped shape subsequent work in vision-language, pretraining.

View Paper PDF