
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li, et al.
00
2022-05-31
vision-languagepretraining
Abstract
This paper introduces and evaluates the idea described in “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation”, and reports empirical results that helped shape subsequent work in vision-language, pretraining.