Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 4/2022

05-08-2022 | Trends and Surveys

Contrastive self-supervised learning: review, progress, challenges and future research directions

Authors: Pranjal Kumar, Piyush Rawat, Siddhartha Chauhan

Published in: International Journal of Multimedia Information Retrieval | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the last decade, deep supervised learning has had tremendous success. However, its flaws, such as its dependency on manual and costly annotations on large datasets and being exposed to attacks, have prompted researchers to look for alternative models. Incorporating contrastive learning (CL) for self-supervised learning (SSL) has turned out as an effective alternative. In this paper, a comprehensive review of CL methodology in terms of its approaches, encoding techniques and loss functions is provided. It discusses the applications of CL in various domains like Natural Language Processing (NLP), Computer Vision, speech and text recognition and prediction. The paper presents an overview and background about SSL for understanding the introductory ideas and concepts. A comparative study for all the works that use CL methods for various downstream tasks in each domain is performed. Finally, it discusses the limitations of current methods, as well as the need for additional techniques and future directions in order to make meaningful progress in this area.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
2.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
3.
go back to reference Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
4.
go back to reference Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
5.
go back to reference Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
6.
go back to reference Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167CrossRef Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167CrossRef
7.
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805
8.
go back to reference Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite Bert for self-supervised learning of language representations. arXiv:1909.11942 Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite Bert for self-supervised learning of language representations. arXiv:​1909.​11942
9.
go back to reference Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized Bert pretraining approach. arXiv:1907.11692 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized Bert pretraining approach. arXiv:​1907.​11692
10.
go back to reference Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 32 Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 32
11.
go back to reference Asai A, Hashimoto K, Hajishirzi H, Socher R, Xiong C (2019) Learning to retrieve reasoning paths over wikipedia graph for question answering. arXiv:1911.10470 Asai A, Hashimoto K, Hajishirzi H, Socher R, Xiong C (2019) Learning to retrieve reasoning paths over wikipedia graph for question answering. arXiv:​1911.​10470
12.
13.
14.
go back to reference Yang Z, Qi P, Zhang S, Bengio Y, Cohen WW, Salakhutdinov R, Manning CD (2018) Hotpotqa: a dataset for diverse, explainable multi-hop question answering. arXiv:1809.09600 Yang Z, Qi P, Zhang S, Bengio Y, Cohen WW, Salakhutdinov R, Manning CD (2018) Hotpotqa: a dataset for diverse, explainable multi-hop question answering. arXiv:​1809.​09600
15.
go back to reference Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
16.
go back to reference Kalantidis Y, Sariyildiz M, Weinzaepfel P, Larlus D (2020) Improving self-supervised representation learning by synthesizing challenging negatives. Naver Labs Europe Kalantidis Y, Sariyildiz M, Weinzaepfel P, Larlus D (2020) Improving self-supervised representation learning by synthesizing challenging negatives. Naver Labs Europe
17.
go back to reference Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828CrossRef Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828CrossRef
18.
go back to reference Zimmermann RS, Sharma Y, Schneider S, Bethge M, Brendel W (2021) Contrastive learning inverts the data generating process. In: International conference on machine learning. PMLR, pp 12979–12990 Zimmermann RS, Sharma Y, Schneider S, Bethge M, Brendel W (2021) Contrastive learning inverts the data generating process. In: International conference on machine learning. PMLR, pp 12979–12990
19.
go back to reference Ilić S, Marrese-Taylor E, Balazs JA, Matsuo Y (2018) Deep contextualized word representations for detecting sarcasm and irony. arXiv:1809.09795 Ilić S, Marrese-Taylor E, Balazs JA, Matsuo Y (2018) Deep contextualized word representations for detecting sarcasm and irony. arXiv:​1809.​09795
20.
go back to reference Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
21.
go back to reference Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
23.
24.
go back to reference Baevski A, Zhou Y, Mohamed A, Auli M (2020) wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449–12460 Baevski A, Zhou Y, Mohamed A, Auli M (2020) wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449–12460
25.
go back to reference Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607 Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607
26.
go back to reference Chen X, Xie S, He K (2021) An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9640–9649 Chen X, Xie S, He K (2021) An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9640–9649
27.
go back to reference Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9650–9660 Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9650–9660
30.
31.
go back to reference Baevski A, Hsu W-N, Conneau A, Auli M (2021) Unsupervised speech recognition. Adv Neural Inf Process Syst 34 Baevski A, Hsu W-N, Conneau A, Auli M (2021) Unsupervised speech recognition. Adv Neural Inf Process Syst 34
32.
go back to reference Hsu W-N, Tsai Y-HH, Bolte B, Salakhutdinov R, Mohamed A (2021) Hubert: how much can a bad teacher benefit ASR pre-training?. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6533–6537 Hsu W-N, Tsai Y-HH, Bolte B, Salakhutdinov R, Mohamed A (2021) Hubert: how much can a bad teacher benefit ASR pre-training?. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6533–6537
33.
go back to reference Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748–8763 Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748–8763
34.
go back to reference Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst 33:21271–21284 Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst 33:21271–21284
35.
go back to reference Friston K, Kiebel S (2009) Predictive coding under the free-energy principle. Philos Trans R Soc B Bioll Sci 364(1521):1211–1221CrossRef Friston K, Kiebel S (2009) Predictive coding under the free-energy principle. Philos Trans R Soc B Bioll Sci 364(1521):1211–1221CrossRef
36.
go back to reference Friston K (2010) The free-energy principle: A unified brain theory? Nat Rev Neurosci 11(2):127–138CrossRef Friston K (2010) The free-energy principle: A unified brain theory? Nat Rev Neurosci 11(2):127–138CrossRef
37.
go back to reference Jaegle A, Gimeno F, Brock A, Vinyals O, Zisserman A, Carreira J (2021) Perceiver: general perception with iterative attention. In: International conference on machine learning. PMLR, pp 4651–4664 Jaegle A, Gimeno F, Brock A, Vinyals O, Zisserman A, Carreira J (2021) Perceiver: general perception with iterative attention. In: International conference on machine learning. PMLR, pp 4651–4664
38.
go back to reference Holmberg OG, Köhler ND, Martins T, Siedlecki J, Herold T, Keidel L, Asani B, Schiefelbein J, Priglinger S, Kortuem KU et al (2020) Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy. Nat Mach Intell 2(11):719–726CrossRef Holmberg OG, Köhler ND, Martins T, Siedlecki J, Herold T, Keidel L, Asani B, Schiefelbein J, Priglinger S, Kortuem KU et al (2020) Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy. Nat Mach Intell 2(11):719–726CrossRef
39.
go back to reference Arandjelovic R, Zisserman A (2017) Look, listen and learn. In: Proceedings of the IEEE international conference on computer vision, pp 609–617 Arandjelovic R, Zisserman A (2017) Look, listen and learn. In: Proceedings of the IEEE international conference on computer vision, pp 609–617
40.
go back to reference Arandjelovic R, Zisserman A (2018) Objects that sound. In: Proceedings of the European conference on computer vision (ECCV), pp 435–451 Arandjelovic R, Zisserman A (2018) Objects that sound. In: Proceedings of the European conference on computer vision (ECCV), pp 435–451
41.
go back to reference Lee H-Y, Huang J-B, Singh M, Yang M-H (2017) Unsupervised representation learning by sorting sequences. In: Proceedings of the IEEE international conference on computer vision, pp 667–676 Lee H-Y, Huang J-B, Singh M, Yang M-H (2017) Unsupervised representation learning by sorting sequences. In: Proceedings of the IEEE international conference on computer vision, pp 667–676
42.
go back to reference Misra I, van der Maaten L (2020) Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6707–6717 Misra I, van der Maaten L (2020) Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6707–6717
43.
go back to reference Fernando B, Bilen H, Gavves E, Gould S (2017) Self-supervised video representation learning with odd-one-out networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3636–3645 Fernando B, Bilen H, Gavves E, Gould S (2017) Self-supervised video representation learning with odd-one-out networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3636–3645
44.
go back to reference Wei D, Lim JJ, Zisserman A, Freeman WT (2018) Learning and using the arrow of time. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8052–8060 Wei D, Lim JJ, Zisserman A, Freeman WT (2018) Learning and using the arrow of time. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8052–8060
45.
go back to reference Gan C, Gong B, Liu K, Su H, Guibas LJ (2018) Geometry guided convolutional neural networks for self-supervised video representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5589–5597 Gan C, Gong B, Liu K, Su H, Guibas LJ (2018) Geometry guided convolutional neural networks for self-supervised video representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5589–5597
46.
go back to reference Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. Adv Neural Inf Process Syst 29 Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. Adv Neural Inf Process Syst 29
47.
go back to reference Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua X-S (2017) Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM international conference on multimedia, pp 1933–1941 Zhao Y, Deng B, Shen C, Liu Y, Lu H, Hua X-S (2017) Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM international conference on multimedia, pp 1933–1941
48.
go back to reference Kim D, Cho D, Kweon IS (2019) Self-supervised video representation learning with space-time cubic puzzles. Proc AAAI Conf Artif Intell 33(01):8545–8552 Kim D, Cho D, Kweon IS (2019) Self-supervised video representation learning with space-time cubic puzzles. Proc AAAI Conf Artif Intell 33(01):8545–8552
49.
go back to reference Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690 Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690
50.
go back to reference Kong Q, Wei W, Deng Z, Yoshinaga T, Murakami T (2020) Cycle-contrast for self-supervised video representation learning. Adv Neural Inf Process Syst 33:8089–8100 Kong Q, Wei W, Deng Z, Yoshinaga T, Murakami T (2020) Cycle-contrast for self-supervised video representation learning. Adv Neural Inf Process Syst 33:8089–8100
51.
go back to reference Qian R, Meng T, Gong B, Yang M-H, Wang H, Belongie S, Cui Y (2021) Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6964–6974 Qian R, Meng T, Gong B, Yang M-H, Wang H, Belongie S, Cui Y (2021) Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6964–6974
52.
go back to reference McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: contextualized word vectors. Adv Neural Inf Process Syst 30 McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: contextualized word vectors. Adv Neural Inf Process Syst 30
53.
54.
go back to reference Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2019) Tinybert: distilling Bert for natural language understanding. arXiv:1909.10351 Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2019) Tinybert: distilling Bert for natural language understanding. arXiv:​1909.​10351
56.
57.
go back to reference Zhang Y, Qin J, Park DS, Han W, Chiu C-C, Pang R, Le QV, Wu Y (2020) Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv:2010.10504 Zhang Y, Qin J, Park DS, Han W, Chiu C-C, Pang R, Le QV, Wu Y (2020) Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv:​2010.​10504
58.
go back to reference Chung Y-A, Zhang Y, Han W, Chiu C-C, Qin J, Pang R, Wu Y (2021) W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training. arXiv:2108.06209 Chung Y-A, Zhang Y, Han W, Chiu C-C, Qin J, Pang R, Wu Y (2021) W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training. arXiv:​2108.​06209
59.
go back to reference Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S et al (2021) Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv:2109.13226 Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S et al (2021) Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv:​2109.​13226
60.
go back to reference Chiu C-C, Qin J, Zhang Y, Yu J, Wu Y (2022) Self-supervised learning with random-projection quantizer for speech recognition. arXiv:2202.01855 Chiu C-C, Qin J, Zhang Y, Yu J, Wu Y (2022) Self-supervised learning with random-projection quantizer for speech recognition. arXiv:​2202.​01855
61.
go back to reference Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: Generative or contrastive. IEEE Trans Knowl Data Eng Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: Generative or contrastive. IEEE Trans Knowl Data Eng
62.
go back to reference Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9
63.
64.
go back to reference Arivazhagan N, Bapna A, Firat O, Lepikhin D, Johnson M, Krikun M, Chen MX, Cao Y, Foster G, Cherry C et al (2019) Massively multilingual neural machine translation in the wild: findings and challenges. arXiv:1907.05019 Arivazhagan N, Bapna A, Firat O, Lepikhin D, Johnson M, Krikun M, Chen MX, Cao Y, Foster G, Cherry C et al (2019) Massively multilingual neural machine translation in the wild: findings and challenges. arXiv:​1907.​05019
65.
go back to reference Van Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. In: International conference on machine learning. PMLR, pp 1747–1756 Van Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. In: International conference on machine learning. PMLR, pp 1747–1756
66.
go back to reference Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with Pixelcnn decoders. Adv Neural Inf Process Syst 29 Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with Pixelcnn decoders. Adv Neural Inf Process Syst 29
67.
go back to reference Rezende D, Mohamed S (2015) Variational inference with normalizing flows. In: International conference on machine learning. PMLR, pp 1530–1538 Rezende D, Mohamed S (2015) Variational inference with normalizing flows. In: International conference on machine learning. PMLR, pp 1530–1538
68.
go back to reference Yang G, Huang X, Hao Z, Liu M-Y, Belongie S, Hariharan B (2019) Pointflow: 3d point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4541–4550 Yang G, Huang X, Hao Z, Liu M-Y, Belongie S, Hariharan B (2019) Pointflow: 3d point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4541–4550
69.
go back to reference Vahdat A, Kautz J (2020) Nvae: a deep hierarchical variational autoencoder. Adv Neural Inf Process Syst 33:19667–19679 Vahdat A, Kautz J (2020) Nvae: a deep hierarchical variational autoencoder. Adv Neural Inf Process Syst 33:19667–19679
70.
go back to reference Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: International conference on machine learning. PMLR, pp 1691–1703 Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: International conference on machine learning. PMLR, pp 1691–1703
71.
go back to reference You J, Ying R, Ren X, Hamilton W, Leskovec J (2018) Graphrnn: generating realistic graphs with deep auto-regressive models. In: International conference on machine learning. PMLR, pp 5708–5717 You J, Ying R, Ren X, Hamilton W, Leskovec J (2018) Graphrnn: generating realistic graphs with deep auto-regressive models. In: International conference on machine learning. PMLR, pp 5708–5717
72.
go back to reference Zhang L, Lin J, Shao H, Zhang Z, Yan X, Long J (2021) End-to-end unsupervised fault detection using a flow-based model. Reliab Eng Syst Saf 215:107805CrossRef Zhang L, Lin J, Shao H, Zhang Z, Yan X, Long J (2021) End-to-end unsupervised fault detection using a flow-based model. Reliab Eng Syst Saf 215:107805CrossRef
73.
go back to reference Hinton GE, Zemel R (1993) Autoencoders, minimum description length and helmholtz free energy. Adv Neural Inf Process Syst 6 Hinton GE, Zemel R (1993) Autoencoders, minimum description length and helmholtz free energy. Adv Neural Inf Process Syst 6
74.
go back to reference Japkowicz N, Hanson SJ, Gluck MA (2000) Nonlinear autoassociation is not equivalent to PCA. Neural Comput 12(3):531–545CrossRef Japkowicz N, Hanson SJ, Gluck MA (2000) Nonlinear autoassociation is not equivalent to PCA. Neural Comput 12(3):531–545CrossRef
75.
go back to reference Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, pp 1096–1103 Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, pp 1096–1103
76.
go back to reference Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: ICML Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: ICML
77.
go back to reference Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1058–1067 Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1058–1067
78.
go back to reference Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: International conference on artificial neural networks. Springer, pp 44–51 Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: International conference on artificial neural networks. Springer, pp 44–51
79.
go back to reference Wang F, Liu H (2021) Understanding the behaviour of contrastive loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2495–2504 Wang F, Liu H (2021) Understanding the behaviour of contrastive loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2495–2504
80.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
82.
go back to reference Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp 297–304 Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp 297–304
83.
go back to reference Le-Khac PH, Healy G, Smeaton AF (2020) Contrastive representation learning: a framework and review. IEEE Access 8:193907–193934CrossRef Le-Khac PH, Healy G, Smeaton AF (2020) Contrastive representation learning: a framework and review. IEEE Access 8:193907–193934CrossRef
84.
go back to reference Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2020) A survey on contrastive self-supervised learning. Technologies 9(1):2CrossRef Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2020) A survey on contrastive self-supervised learning. Technologies 9(1):2CrossRef
85.
go back to reference Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3733–3742 Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3733–3742
86.
go back to reference Velickovic P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD (2019) Deep graph infomax. ICLR (Poster) 2(3):4 Velickovic P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD (2019) Deep graph infomax. ICLR (Poster) 2(3):4
87.
go back to reference Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv:1808.06670 Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv:​1808.​06670
88.
go back to reference Bachman P, Hjelm RD, Buchwalter W (2019) Learning representations by maximizing mutual information across views. Adv Neural Inf Process Syst 32 Bachman P, Hjelm RD, Buchwalter W (2019) Learning representations by maximizing mutual information across views. Adv Neural Inf Process Syst 32
89.
go back to reference Hassani K, Khasahmadi AH (2020) Contrastive multi-view representation learning on graphs. In: International conference on machine learning. PMLR, pp 4116–4126 Hassani K, Khasahmadi AH (2020) Contrastive multi-view representation learning on graphs. In: International conference on machine learning. PMLR, pp 4116–4126
90.
go back to reference Tschannen M, Djolonga J, Rubenstein PK, Gelly S, Lucic M (2019) On mutual information maximization for representation learning. arXiv:1907.13625 Tschannen M, Djolonga J, Rubenstein PK, Gelly S, Lucic M (2019) On mutual information maximization for representation learning. arXiv:​1907.​13625
91.
go back to reference He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738 He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
92.
go back to reference Noroozi M, Vinjimoor A, Favaro P, Pirsiavash H (2018) Boosting self-supervised learning via knowledge transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9359–9367 Noroozi M, Vinjimoor A, Favaro P, Pirsiavash H (2018) Boosting self-supervised learning via knowledge transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9359–9367
93.
go back to reference Tian Y, Krishnan D, Isola P (2020) Contrastive multiview coding. In: European conference on computer vision. Springer, pp 776–794 Tian Y, Krishnan D, Isola P (2020) Contrastive multiview coding. In: European conference on computer vision. Springer, pp 776–794
94.
go back to reference Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673 Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
95.
go back to reference Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3578–3587 Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3578–3587
96.
go back to reference Purushwalkam S, Gupta A (2020) Demystifying contrastive self-supervised learning: invariances, augmentations and dataset biases. Adv Neural Inf Process Syst 33:3407–3418 Purushwalkam S, Gupta A (2020) Demystifying contrastive self-supervised learning: invariances, augmentations and dataset biases. Adv Neural Inf Process Syst 33:3407–3418
97.
go back to reference Giorgi J, Nitski O, Wang B, Bader G (2020) Declutr: deep contrastive learning for unsupervised textual representations. arXiv:2006.03659 Giorgi J, Nitski O, Wang B, Bader G (2020) Declutr: deep contrastive learning for unsupervised textual representations. arXiv:​2006.​03659
98.
99.
go back to reference Xie Q, Dai Z, Hovy E, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst 33:6256–6268 Xie Q, Dai Z, Hovy E, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst 33:6256–6268
100.
102.
go back to reference Yan Y, Li R, Wang S, Zhang F, Wu W, Xu W (2021) Consert: a contrastive framework for self-supervised sentence representation transfer. arXiv:2105.11741 Yan Y, Li R, Wang S, Zhang F, Wu W, Xu W (2021) Consert: a contrastive framework for self-supervised sentence representation transfer. arXiv:​2105.​11741
103.
go back to reference Rozsa A, Rudd EM, Boult TE (2016) Adversarial diversity and hard positive generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32 Rozsa A, Rudd EM, Boult TE (2016) Adversarial diversity and hard positive generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32
105.
go back to reference Sun C, Baradel F, Murphy K, Schmid C (2019) Learning video representations using contrastive bidirectional transformer. arXiv:1906.05743 Sun C, Baradel F, Murphy K, Schmid C (2019) Learning video representations using contrastive bidirectional transformer. arXiv:​1906.​05743
106.
go back to reference Senocak A, Oh T-H, Kim J, Yang M-H, Kweon IS (2018) Learning to localize sound source in visual scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4358–4366 Senocak A, Oh T-H, Kim J, Yang M-H, Kweon IS (2018) Learning to localize sound source in visual scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4358–4366
107.
go back to reference Senocak A, Oh T-H, Kim J, Yang M-H, Kweon IS (2019) Learning to localize sound sources in visual scenes: analysis and applications. IEEE Trans Pattern Anal Mach Intell 43(5):1605–1619CrossRef Senocak A, Oh T-H, Kim J, Yang M-H, Kweon IS (2019) Learning to localize sound sources in visual scenes: analysis and applications. IEEE Trans Pattern Anal Mach Intell 43(5):1605–1619CrossRef
108.
go back to reference Qian R, Hu D, Dinkel H, Wu M, Xu N, Lin W (2020) Multiple sound sources localization from coarse to fine. In: European conference on computer vision. Springer, pp 292–308 Qian R, Hu D, Dinkel H, Wu M, Xu N, Lin W (2020) Multiple sound sources localization from coarse to fine. In: European conference on computer vision. Springer, pp 292–308
109.
go back to reference Hu D, Nie F, Li X (2019) Deep multimodal clustering for unsupervised audiovisual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9248–9257 Hu D, Nie F, Li X (2019) Deep multimodal clustering for unsupervised audiovisual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9248–9257
110.
go back to reference Hu D, Qian R, Jiang M, Tan X, Wen S, Ding E, Lin W, Dou D (2020) Discriminative sounding objects localization via self-supervised audiovisual matching. Adv Neural Inf Process Syst 33:10077–10087 Hu D, Qian R, Jiang M, Tan X, Wen S, Ding E, Lin W, Dou D (2020) Discriminative sounding objects localization via self-supervised audiovisual matching. Adv Neural Inf Process Syst 33:10077–10087
112.
go back to reference Zhan X, Xie J, Liu Z, Ong Y-S, Loy CC (2020) Online deep clustering for unsupervised representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6688–6697 Zhan X, Xie J, Liu Z, Ong Y-S, Loy CC (2020) Online deep clustering for unsupervised representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6688–6697
113.
go back to reference Tao Y, Takagi K, Nakata K (2021) Clustering-friendly representation learning via instance discrimination and feature decorrelation. arXiv:2106.00131 Tao Y, Takagi K, Nakata K (2021) Clustering-friendly representation learning via instance discrimination and feature decorrelation. arXiv:​2106.​00131
114.
go back to reference Tsai TW, Li C, Zhu J (2020) Mice: mixture of contrastive experts for unsupervised image clustering. In: International conference on learning representations Tsai TW, Li C, Zhu J (2020) Mice: mixture of contrastive experts for unsupervised image clustering. In: International conference on learning representations
115.
go back to reference Hu Q, Wang X, Hu W, Qi G-J (2021) Adco: adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1074–1083 Hu Q, Wang X, Hu W, Qi G-J (2021) Adco: adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1074–1083
117.
go back to reference Kalantidis Y, Sariyildiz MB, Pion N, Weinzaepfel P, Larlus D (2020) Hard negative mixing for contrastive learning. Adv Neural Inf Process Syst 33:21798–21809 Kalantidis Y, Sariyildiz MB, Pion N, Weinzaepfel P, Larlus D (2020) Hard negative mixing for contrastive learning. Adv Neural Inf Process Syst 33:21798–21809
119.
go back to reference Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Process Syst 29 Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Process Syst 29
121.
go back to reference Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823 Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
122.
go back to reference Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207–5216 Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207–5216
123.
go back to reference Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(2) Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(2)
124.
go back to reference Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 539–546 Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 539–546
125.
go back to reference Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742 Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742
126.
go back to reference Oh Song H, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4004–4012 Oh Song H, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4004–4012
127.
go back to reference Goldberger J, Hinton G E, Roweis S, Salakhutdinov R R, “Neighbourhood components analysis,” Advances in neural information processing systems, vol. 17, (2004) Goldberger J, Hinton G E, Roweis S, Salakhutdinov R R, “Neighbourhood components analysis,” Advances in neural information processing systems, vol. 17, (2004)
129.
130.
go back to reference Li Z, Ji J, Fu Z, Ge Y, Xu S, Chen C, Zhang Y (2021) Efficient non-sampling knowledge graph embedding. Proc Web Conf 2021:1727–1736 Li Z, Ji J, Fu Z, Ge Y, Xu S, Chen C, Zhang Y (2021) Efficient non-sampling knowledge graph embedding. Proc Web Conf 2021:1727–1736
131.
go back to reference Peng X, Chen G, Lin C, Stevenson M (2021) Highly efficient knowledge graph embedding learning with orthogonal procrustes analysis. arXiv:2104.04676 Peng X, Chen G, Lin C, Stevenson M (2021) Highly efficient knowledge graph embedding learning with orthogonal procrustes analysis. arXiv:​2104.​04676
132.
133.
go back to reference Becker S, Hinton GE (1992) Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355(6356):161–163CrossRef Becker S, Hinton GE (1992) Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355(6356):161–163CrossRef
134.
go back to reference Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1993) Signature verification using a “siamese” time delay neural network. Adv Neural Inf Process Syst 6 Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1993) Signature verification using a “siamese” time delay neural network. Adv Neural Inf Process Syst 6
135.
go back to reference Chi Z, Dong L, Wei F, Yang N, Singhal S, Wang W, Song X, Mao X-L, Huang H, Zhou M (2020) Infoxlm: an information-theoretic framework for cross-lingual language model pre-training. arXiv:2007.07834 Chi Z, Dong L, Wei F, Yang N, Singhal S, Wang W, Song X, Mao X-L, Huang H, Zhou M (2020) Infoxlm: an information-theoretic framework for cross-lingual language model pre-training. arXiv:​2007.​07834
137.
138.
140.
go back to reference Arora S, Khandeparkar H, Khodak M, Plevrakis O, Saunshi N (2019) A theoretical analysis of contrastive unsupervised representation learning. arXiv:1902.09229 Arora S, Khandeparkar H, Khodak M, Plevrakis O, Saunshi N (2019) A theoretical analysis of contrastive unsupervised representation learning. arXiv:​1902.​09229
141.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
142.
go back to reference Simoulin A, Crabbé B (2021) Contrasting distinct structured views to learn sentence embeddings. In: European chapter of the association of computational linguistics (student) Simoulin A, Crabbé B (2021) Contrasting distinct structured views to learn sentence embeddings. In: European chapter of the association of computational linguistics (student)
144.
go back to reference Sun S, Gan Z, Cheng Y, Fang Y, Wang S, Liu J (2020) Contrastive distillation on intermediate representations for language model compression. arXiv:2009.14167 Sun S, Gan Z, Cheng Y, Fang Y, Wang S, Liu J (2020) Contrastive distillation on intermediate representations for language model compression. arXiv:​2009.​14167
147.
go back to reference Zhang S, Yan J, Yang X (2020) Self-supervised representation learning via adaptive hard-positive mining Zhang S, Yan J, Yang X (2020) Self-supervised representation learning via adaptive hard-positive mining
148.
go back to reference Huynh T, Kornblith S, Walter MR, Maire M, Khademi M (2022) Boosting contrastive self-supervised learning with false negative cancellation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2785–2795 Huynh T, Kornblith S, Walter MR, Maire M, Khademi M (2022) Boosting contrastive self-supervised learning with false negative cancellation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2785–2795
149.
go back to reference Ermolov A, Siarohin A, Sangineto E, Sebe N (2021) Whitening for self-supervised representation learning. In: International conference on machine learning. PMLR, pp 3015–3024 Ermolov A, Siarohin A, Sangineto E, Sebe N (2021) Whitening for self-supervised representation learning. In: International conference on machine learning. PMLR, pp 3015–3024
150.
go back to reference Yao Y, Liu C, Luo D, Zhou Y, Ye Q (2020) Video playback rate perception for self-supervised spatio-temporal representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6548–6557 Yao Y, Liu C, Luo D, Zhou Y, Ye Q (2020) Video playback rate perception for self-supervised spatio-temporal representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6548–6557
151.
go back to reference Bai Y, Fan H, Misra I, Venkatesh G, Lu Y, Zhou Y, Yu Q, Chandra V, Yuille A (2020) Can temporal information help with contrastive self-supervised learning? arXiv:2011.13046 Bai Y, Fan H, Misra I, Venkatesh G, Lu Y, Zhou Y, Yu Q, Chandra V, Yuille A (2020) Can temporal information help with contrastive self-supervised learning? arXiv:​2011.​13046
152.
go back to reference Pan T, Song Y, Yang T, Jiang W, Liu W (2021) Videomoco: contrastive video representation learning with temporally adversarial examples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11205–11214 Pan T, Song Y, Yang T, Jiang W, Liu W (2021) Videomoco: contrastive video representation learning with temporally adversarial examples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11205–11214
154.
go back to reference Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6202–6211 Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6202–6211
155.
go back to reference Zhuang C, She T, Andonian A, Mark M S, Yamins D (2020) Unsupervised learning from video with deep neural embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9563–9572 Zhuang C, She T, Andonian A, Mark M S, Yamins D (2020) Unsupervised learning from video with deep neural embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9563–9572
156.
go back to reference Han T, Xie W, Zisserman A (2019) Video representation learning by dense predictive coding. In: Proceedings of the IEEE/CVF international conference on computer vision workshops Han T, Xie W, Zisserman A (2019) Video representation learning by dense predictive coding. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
157.
go back to reference Han T, Xie W, Zisserman A (2020) Memory-augmented dense predictive coding for video representation learning. In: European conference on computer vision. Springer, pp 312–329 Han T, Xie W, Zisserman A (2020) Memory-augmented dense predictive coding for video representation learning. In: European conference on computer vision. Springer, pp 312–329
158.
go back to reference Lorre G, Rabarisoa J, Orcesi A, Ainouz S, Canu S (2020) Temporal contrastive pretraining for video action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 662–670 Lorre G, Rabarisoa J, Orcesi A, Ainouz S, Canu S (2020) Temporal contrastive pretraining for video action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 662–670
159.
go back to reference Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149 Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149
160.
go back to reference Zhuang C, Zhai A L, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6002–6012 Zhuang C, Zhai A L, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6002–6012
161.
163.
go back to reference Xue F, Ji H, Zhang W, Cao Y (2020) Self-supervised video representation learning by maximizing mutual information. Signal Process Image Commun 88:115967CrossRef Xue F, Ji H, Zhang W, Cao Y (2020) Self-supervised video representation learning by maximizing mutual information. Signal Process Image Commun 88:115967CrossRef
164.
go back to reference Wang J, Jiao J, Liu Y-H (2020) Self-supervised video representation learning by pace prediction. In: European conference on computer vision. Springer, pp 504–521 Wang J, Jiao J, Liu Y-H (2020) Self-supervised video representation learning by pace prediction. In: European conference on computer vision. Springer, pp 504–521
165.
go back to reference Knights J, Harwood B, Ward D, Vanderkop A, Mackenzie-Ross O, Moghadam P (2021) Temporally coherent embeddings for self-supervised video representation learning. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 8914–8921 Knights J, Harwood B, Ward D, Vanderkop A, Mackenzie-Ross O, Moghadam P (2021) Temporally coherent embeddings for self-supervised video representation learning. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 8914–8921
166.
go back to reference Yao T, Zhang Y, Qiu Z, Pan Y, Mei T (2021) Seco: exploring sequence supervision for unsupervised representation learning. In: AAAI, vol 2, p 7 Yao T, Zhang Y, Qiu Z, Pan Y, Mei T (2021) Seco: exploring sequence supervision for unsupervised representation learning. In: AAAI, vol 2, p 7
167.
go back to reference Tao L, Wang X, Yamasaki T (2020) Self-supervised video representation learning using inter-intra contrastive framework. In: Proceedings of the 28th ACM international conference on multimedia, pp 2193–2201 Tao L, Wang X, Yamasaki T (2020) Self-supervised video representation learning using inter-intra contrastive framework. In: Proceedings of the 28th ACM international conference on multimedia, pp 2193–2201
168.
go back to reference Wang J, Gao Y, Li K, Jiang X, Guo X, Ji R, Sun X (2021) Enhancing unsupervised video representation learning by decoupling the scene and the motion. In: AAAI, vol 1, no. 2, p 7 Wang J, Gao Y, Li K, Jiang X, Guo X, Ji R, Sun X (2021) Enhancing unsupervised video representation learning by decoupling the scene and the motion. In: AAAI, vol 1, no. 2, p 7
169.
go back to reference Afouras T, Owens A, Chung JS, Zisserman A (2020) Self-supervised learning of audio-visual objects from video. In: European conference on computer vision. Springer, pp 208–224 Afouras T, Owens A, Chung JS, Zisserman A (2020) Self-supervised learning of audio-visual objects from video. In: European conference on computer vision. Springer, pp 208–224
170.
go back to reference Miech A, Alayrac J-B, Smaira L, Laptev I, Sivic J, Zisserman A (2020) End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9879–9889 Miech A, Alayrac J-B, Smaira L, Laptev I, Sivic J, Zisserman A (2020) End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9879–9889
171.
go back to reference Tokmakov P, Hebert M, Schmid C (2020) Unsupervised learning of video representations via dense trajectory clustering. In: European conference on computer vision. Springer, pp 404–421 Tokmakov P, Hebert M, Schmid C (2020) Unsupervised learning of video representations via dense trajectory clustering. In: European conference on computer vision. Springer, pp 404–421
172.
go back to reference Dunbar E, Karadayi J, Bernard M, Cao X-N, Algayres R, Ondel L, Besacier L, Sakti S, Dupoux E (2020) The zero resource speech challenge 2020: discovering discrete subword and word units. arXiv:2010.05967 Dunbar E, Karadayi J, Bernard M, Cao X-N, Algayres R, Ondel L, Besacier L, Sakti S, Dupoux E (2020) The zero resource speech challenge 2020: discovering discrete subword and word units. arXiv:​2010.​05967
173.
go back to reference Glass J (2012) Towards unsupervised speech processing. In: 2012 11th international conference on information science, signal processing and their applications (ISSPA). IEEE, pp 1–4 Glass J (2012) Towards unsupervised speech processing. In: 2012 11th international conference on information science, signal processing and their applications (ISSPA). IEEE, pp 1–4
174.
go back to reference Schatz T (2016) Abx-discriminability measures and applications. Ph.D. Dissertation, Université Paris 6 (UPMC) Schatz T (2016) Abx-discriminability measures and applications. Ph.D. Dissertation, Université Paris 6 (UPMC)
175.
go back to reference Dunbar E, Cao XN, Benjumea J, Karadayi J, Bernard M, Besacier L, Anguera X, Dupoux E (2017) The zero resource speech challenge 2017. In: 2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 323–330 Dunbar E, Cao XN, Benjumea J, Karadayi J, Bernard M, Besacier L, Anguera X, Dupoux E (2017) The zero resource speech challenge 2017. In: 2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 323–330
176.
177.
go back to reference Wang W, Tang Q, Livescu K (2020) Unsupervised pre-training of bidirectional speech encoders via masked reconstruction. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6889–6893 Wang W, Tang Q, Livescu K (2020) Unsupervised pre-training of bidirectional speech encoders via masked reconstruction. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6889–6893
178.
go back to reference Heck M, Sakti S, Nakamura S (2017) Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017. In: 2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 740–746 Heck M, Sakti S, Nakamura S (2017) Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017. In: 2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 740–746
179.
go back to reference Nandan A, Vepa J (2020) Language agnostic speech embeddings for emotion classification Nandan A, Vepa J (2020) Language agnostic speech embeddings for emotion classification
180.
go back to reference Park DS, Chan W, Zhang Y, Chiu C-C, Zoph B, Cubuk ED, Le QV (2019) Specaugment: a simple data augmentation method for automatic speech recognition. arXiv:1904.08779 Park DS, Chan W, Zhang Y, Chiu C-C, Zoph B, Cubuk ED, Le QV (2019) Specaugment: a simple data augmentation method for automatic speech recognition. arXiv:​1904.​08779
181.
go back to reference Shor J, Jansen A, Han W, Park D, Zhang Y (2021) Universal paralinguistic speech representations using self-supervised conformers. arXiv:2110.04621 Shor J, Jansen A, Han W, Park D, Zhang Y (2021) Universal paralinguistic speech representations using self-supervised conformers. arXiv:​2110.​04621
182.
go back to reference Al-Tahan H, Mohsenzadeh Y (2021) Clar: contrastive learning of auditory representations. In: International conference on artificial intelligence and statistics. PMLR, pp 2530–2538 Al-Tahan H, Mohsenzadeh Y (2021) Clar: contrastive learning of auditory representations. In: International conference on artificial intelligence and statistics. PMLR, pp 2530–2538
183.
go back to reference Saeed A, Grangier D, Zeghidour N (2021) Contrastive learning of general-purpose audio representations. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3875–3879 Saeed A, Grangier D, Zeghidour N (2021) Contrastive learning of general-purpose audio representations. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3875–3879
184.
go back to reference Xia J, Wu L, Chen J, Hu B, Li SZ (2022) Simgrace: a simple framework for graph contrastive learning without data augmentation. arXiv:2202.03104 Xia J, Wu L, Chen J, Hu B, Li SZ (2022) Simgrace: a simple framework for graph contrastive learning without data augmentation. arXiv:​2202.​03104
185.
go back to reference Wang T, Isola P (2020) Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning. PMLR, pp 9929–9939 Wang T, Isola P (2020) Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning. PMLR, pp 9929–9939
186.
go back to reference You Y, Chen T, Shen Y, Wang Z (2021) Graph contrastive learning automated. In: International conference on machine learning. PMLR, pp 12121–12132 You Y, Chen T, Shen Y, Wang Z (2021) Graph contrastive learning automated. In: International conference on machine learning. PMLR, pp 12121–12132
188.
go back to reference You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inf Process Syst 33:5812–5823 You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inf Process Syst 33:5812–5823
189.
go back to reference Sun M, Xing J, Wang H, Chen B, Zhou J, “Mocl: Contrastive learning on molecular graphs with multi-level domain knowledge,” arXiv preprint arXiv:2106.04509, (2021) Sun M, Xing J, Wang H, Chen B, Zhou J, “Mocl: Contrastive learning on molecular graphs with multi-level domain knowledge,” arXiv preprint arXiv:​2106.​04509, (2021)
190.
go back to reference Sun F-Y, Hoffmann J, Verma V, Tang J (2019) Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv:1908.01000 Sun F-Y, Hoffmann J, Verma V, Tang J (2019) Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv:​1908.​01000
191.
go back to reference Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. Proc Web Conf 2021:2069–2080 Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. Proc Web Conf 2021:2069–2080
193.
go back to reference Alayrac J-B, Recasens A, Schneider R, Arandjelović R, Ramapuram J, De Fauw J, Smaira L, Dieleman S, Zisserman A (2020) Self-supervised multimodal versatile networks. Adv Neural Inf Process Syst 33:25–37 Alayrac J-B, Recasens A, Schneider R, Arandjelović R, Ramapuram J, De Fauw J, Smaira L, Dieleman S, Zisserman A (2020) Self-supervised multimodal versatile networks. Adv Neural Inf Process Syst 33:25–37
194.
go back to reference Liu Y, Yi L, Zhang S, Fan Q, Funkhouser T, Dong H (2020) P4contrast: contrastive learning with pairs of point-pixel pairs for RGB-D scene understanding. arXiv:2012.13089 Liu Y, Yi L, Zhang S, Fan Q, Funkhouser T, Dong H (2020) P4contrast: contrastive learning with pairs of point-pixel pairs for RGB-D scene understanding. arXiv:​2012.​13089
195.
go back to reference Chuang C-Y, Robinson J, Lin Y-C, Torralba A, Jegelka S (2020) Debiased contrastive learning. Adv Neural Inf Process Syst 33:8765–8775 Chuang C-Y, Robinson J, Lin Y-C, Torralba A, Jegelka S (2020) Debiased contrastive learning. Adv Neural Inf Process Syst 33:8765–8775
196.
go back to reference Ho C-H, Nvasconcelos N (2020) Contrastive learning with adversarial examples. Adv Neural Inf Process Syst 33:17081–17093 Ho C-H, Nvasconcelos N (2020) Contrastive learning with adversarial examples. Adv Neural Inf Process Syst 33:17081–17093
197.
go back to reference Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P (2020) What makes for good views for contrastive learning? Adv Neural Inf Process Syst 33:6827–6839 Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P (2020) What makes for good views for contrastive learning? Adv Neural Inf Process Syst 33:6827–6839
198.
go back to reference Wu M, Zhuang C, Mosse M, Yamins D, Goodman N (2020) On mutual information in contrastive learning for visual representations. arXiv:2005.13149 Wu M, Zhuang C, Mosse M, Yamins D, Goodman N (2020) On mutual information in contrastive learning for visual representations. arXiv:​2005.​13149
199.
go back to reference Asano Y, Patrick M, Rupprecht C, Vedaldi A (2020) Labelling unlabelled videos from scratch with multi-modal self-supervision. Adv Neural Inf Process Syst 33:4660–4671 Asano Y, Patrick M, Rupprecht C, Vedaldi A (2020) Labelling unlabelled videos from scratch with multi-modal self-supervision. Adv Neural Inf Process Syst 33:4660–4671
200.
go back to reference Morgado P, Vasconcelos N, Misra I (2021) Audio-visual instance discrimination with cross-modal agreement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12475–12486 Morgado P, Vasconcelos N, Misra I (2021) Audio-visual instance discrimination with cross-modal agreement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12475–12486
201.
go back to reference Patrick M, Asano YM, Kuznetsova P, Fong R, Henriques JF, Zweig G, Vedaldi A (2020) Multi-modal self-supervision from generalized data transformations. arXiv:2003.04298 Patrick M, Asano YM, Kuznetsova P, Fong R, Henriques JF, Zweig G, Vedaldi A (2020) Multi-modal self-supervision from generalized data transformations. arXiv:​2003.​04298
202.
203.
go back to reference Gan C, Huang D, Zhao H, Tenenbaum JB, Torralba A (2020) Music gesture for visual sound separation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10478–10487 Gan C, Huang D, Zhao H, Tenenbaum JB, Torralba A (2020) Music gesture for visual sound separation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10478–10487
204.
go back to reference Yang K, Russell B, Salamon J (2020) Telling left from right: learning spatial correspondence of sight and sound. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9932–9941 Yang K, Russell B, Salamon J (2020) Telling left from right: learning spatial correspondence of sight and sound. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9932–9941
205.
go back to reference Lin Y-B, Tseng H-Y, Lee H-Y, Lin Y-Y, Yang M-H (2021) Unsupervised sound localization via iterative contrastive learning. arXiv:2104.00315 Lin Y-B, Tseng H-Y, Lee H-Y, Lin Y-Y, Yang M-H (2021) Unsupervised sound localization via iterative contrastive learning. arXiv:​2104.​00315
206.
go back to reference Nagrani A, Chung JS, Albanie S, Zisserman A (2020) Disentangled speech embeddings using cross-modal self-supervision. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6829–6833 Nagrani A, Chung JS, Albanie S, Zisserman A (2020) Disentangled speech embeddings using cross-modal self-supervision. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6829–6833
207.
209.
210.
go back to reference Bui N D, Yu Y, Jiang L (2021) Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Proceedings of the 44th International ACM SIGIR conference on research and development in information retrieval, pp 511–521 Bui N D, Yu Y, Jiang L (2021) Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Proceedings of the 44th International ACM SIGIR conference on research and development in information retrieval, pp 511–521
211.
go back to reference Li Y, Hu P, Liu Z, Peng D, Zhou JT, Peng X (2021) Contrastive clustering. In: 2021 AAAI conference on artificial intelligence (AAAI) Li Y, Hu P, Liu Z, Peng D, Zhou JT, Peng X (2021) Contrastive clustering. In: 2021 AAAI conference on artificial intelligence (AAAI)
212.
go back to reference Lin Y, Gou Y, Liu Z, Li B, Lv J, Peng X (2021) Completer: incomplete multi-view clustering via contrastive prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11174–11183 Lin Y, Gou Y, Liu Z, Li B, Lv J, Peng X (2021) Completer: incomplete multi-view clustering via contrastive prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11174–11183
213.
go back to reference Pan E, Kang Z (2021) Multi-view contrastive graph clustering. Adv Neural Inf Process Syst 34 Pan E, Kang Z (2021) Multi-view contrastive graph clustering. Adv Neural Inf Process Syst 34
214.
go back to reference Trosten DJ, Lokse S, Jenssen R, Kampffmeyer M (2021) Reconsidering representation alignment for multi-view clustering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1255–1265 Trosten DJ, Lokse S, Jenssen R, Kampffmeyer M (2021) Reconsidering representation alignment for multi-view clustering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1255–1265
215.
go back to reference Wu L, Lin H, Tan C, Gao Z, Li SZ (2021) Self-supervised learning on graphs: contrastive, generative, or predictive. IEEE Trans Knowl Data Eng Wu L, Lin H, Tan C, Gao Z, Li SZ (2021) Self-supervised learning on graphs: contrastive, generative, or predictive. IEEE Trans Knowl Data Eng
216.
217.
go back to reference Albelwi S (2022) Survey on self-supervised learning: auxiliary pretext tasks and contrastive learning methods in imaging. Entropy 24(4):551CrossRef Albelwi S (2022) Survey on self-supervised learning: auxiliary pretext tasks and contrastive learning methods in imaging. Entropy 24(4):551CrossRef
Metadata
Title
Contrastive self-supervised learning: review, progress, challenges and future research directions
Authors
Pranjal Kumar
Piyush Rawat
Siddhartha Chauhan
Publication date
05-08-2022
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 4/2022
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-022-00245-6

Other articles of this Issue 4/2022

International Journal of Multimedia Information Retrieval 4/2022 Go to the issue

Premium Partner