Contrastive learning in CV

mindmap
  root((Self-Supervised Learning))
    First Phase
      InstDisc
        note: Proposed instance discrimination, using a memory bank for negative samples
      InvaSpread
        note: End-to-end learning without external structures, limited by small batch size
      CPCv1
        note: Introduced infoNCE objective, versatile for images, audio, video, text, and RL
      CMC
        note: Extended two-view tasks to multiple views, laying groundwork for multi-view or multi-modal contrastive learning
    Second Phase
      MoCov1
        note: Extended InstDisc by turning the memory bank into a queue, momentum encoder for pre-training
      SimCLRv1
        note: Similar to InvaSpread but used larger batch size, more data augmentations, projection head, longer training
      CPCv2
        note: Applied new techniques to CPCv1, significantly improved performance
      Infomin
        note: Introduced Infomin principle, balancing mutual information between samples or views
      MoCov2
        note: Applied SimCLRv1 techniques to MoCov1, achieving better results
      SimCLRv2
        note: Enhanced SimCLR for semi-supervised learning
      SwAV
        note: Combined clustering and contrastive learning, introduced multi-crop technique
    Third Phase
      BYOL
        note: Removed negative samples, using simple mse loss for self-supervised learning
      BN Blog
        note: Suggested BYOL works due to batch norm providing implicit negative samples
      BYOLv2
        note: Refuted BN Blog, showing model initialization is more crucial than batch norm statistics
      SimSiam
        note: Simplified approach, introduced stop gradient operation as critical, avoiding model collapse
      Barlow twins
        note: New objective function, comparing similarity between two matrices
    Fourth Phase
      note: Backbone changed from ResNet to Vision Transformers, addressing training stability issues
      MoCov3
        note: Froze patch projection layer
      DINO
        note: Applied centering to teacher network output

mermaid-diagram-2024-06-05-080014.png

The BN blog in the diagram: Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL) - imbue