Skip to main content
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, et al.

00
2020-10-22
vitvision

Abstract

This paper introduces and evaluates the idea described in “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”, and reports empirical results that helped shape subsequent work in vit, vision.