Abstract
Deep learning has achieved a big success in computer vision, NLP, audio processing and machine translation. Accordingly, there have been a bunch of classical deep learning models designed for these tasks. In this chapter, convolutional neural network (CNN), LSTM, autoencoder (AE) and GAN are discussed briefly. These models are most efficient for processing image, time series (e.g., video, NLP) and image generation respectively, as the foundation of our proposed models in this book. Recently, more advanced deep learning models/principles have emerged, such as attention (e.g., non-local, squeeze and excitation (SE), global context (GC), and most popular transformer), graph convolution network (GCN), self-supervised learning and contrastive learning. They can further boost model performance, extend application filed and break the limits of lack of labelled data, noise data and etc.