ViT

  1. 2012 Alexnet
  2. 2021 ICLR Vision Transformer
  3. People want attention
  4. Transformer lack inductive bias.