Introduction
Architecture | Pros. | Cons. |
---|---|---|
×
| - DNN can learn high-level feature representation and apply transfer learning. - It can be used for healthcare and visual recognition. | - It requires substantial volume of training data. - Significant computational power is required. - The learning process is slow. |
×
| - Graphical model, undirected links across a set of visible nodes and a set of hidden nodes. - Used mainly for dimensionality reduction and classification. | - High time complexity for interference than DBN. ↵ - Learning information does not reach to the lower layer. - Tends to overfit. |
×
| - Easy to code and works sufficiently well for just a few layers. - High performance gain by adding layers compared to multilayer perceptron. - Robust in classification. | - It can be trained greedily, one layer at a time. - Hard to deduce posterior distribution for configurations of hidden causes. |
×
| - Learn data encoding, reconstruction and generation at same time. - Training is stable without label data. - Variant: sparse, denoising and contractive DA. | - Requires pretraining stage due to the chances of vanishing error. - Each application requires redesigned and retrained the model. - The DA is sensitive to input errors. |
×
| - The main benefit is data augmentation. - GAN performs unsupervised learning. - GAN learns density distributions of data. | - Difficult to train as optimizing loss function is hard and requires a lot of trial and error. |
×
| - It can process inputs of any length. - RNN can use internal memory and performs well for stream time series data. | - Computation is slow and training can be difficult. - Processing long sequences is difficult. - Prone to problems such as exploding and gradient vanishing. |
×
| - CNN can capture hierarchical information. - CNN can share pretrained weight which is required for transfer learning. - Requires less connection compared to DNN. | - Large labelled dataset is required for training. - The working mechanism of CNN is not clear. |
Overview of Deep Learning
Deep Neural Network (DNN)
[Restricted] Boltzmann Machines ([R]BM)
Deep Boltzmann Machine (DBM)
Deep Belief Network (DBN)
Deep Autoencoder (DA)
Generative Adversarial Network (GAN)
Recurrent Neural Network (RNN)
Convolutional Neural Network (CNN)
Architecture | Network Design | Parameters | Key points |
---|---|---|---|
LeNet (1998) | LeNet-5 is the first CNN architecture with 2 convolutional and 3 fully connected layers. | 0.06 million | - Feedforward NN. - Connection between layers is sparse to reduce computational complexity. |
AlexNet (2012) | AlexNet has 8 layers and consists of 5 convolutional and 3 fully connected layers. | 60 million | - Deeper than the LeNet and aliasing artifacts in the learned feature maps due to large filter size. |
VGG-16 (2014) | VGG-16 has 13 convolutional layers (and max pooling layers) and 2 fully connected layers followed by 1 output layer with softmax activation. | 138 million | - Roughly twice deeper network can be designed compared to the AlexNet. - A deeper variant of VGG-16 is VGG-19. - Computationally expensive and cannot be used with low resource systems. |
Inception-v1 (2014) | Also known as GoogleNet, it has 22 layers with parameters (or 27 when pooling layers are included). Towards the end, it employs an average pooling. | 5 million | - It uses sparse connections to overcome redundant information problem and omits irrelevant feature maps. - High accuracy with a reduced computational cost. - It's heterogeneous topology requires customization. |
Inception-v3 (2015) | Inception-v3 has 48 layers with a number of inception modules (each consisting of pooling layers and convolutional filters with activation functions), concatenation layers and fully connected layer(s) along with dropout and softmax. | 23 million | - It increases accuracy and reduces computational complexity in comparison to Inception-v1. - Reduces representational bottleneck. - Replaces large size filters with smaller ones. - It's architecture is complex and lacks homogeneity. |
ResNet-50 (2015) | ResNet-50 has 50 layers with initial convolutional and max-pooling layers, and final average pooling and fully connected layers. In between, there are 3, 4, 6 and 3 residual blocks separated in 4 stages where each block contains 3 convolutional layers. | 25.5 million | - It provides an accelerated training speed.↵ -Reduces the effect of Vanishing Gradient Problem. - Classifies images with high accuracy. |
Xception (2016) | The Xception architecture has 36 convolutional layers forming the feature extraction base of the network. The 36 convolutional layers are structured into 14 modules, all of which have linear residual connections around them, except for the first and last modules. | 22.8 million | - Xception shows small gains in classification performance on the ImageNet dataset and large gains on the JFT dataset when compared to Inception-v3. |
Inception-v4 (2016) | Inception-v4 consists of two main sections: a feature extractor and a fully connected layer. The feature extractor includes various convolutional blocks such as 1 stem block, 14 inception blocks, 2 reduction blocks and a pooling layer. The inception blocks are divided in three categories, namely, A, B, and C with 4, 7, and 3 blocks, respectively. | 43 million | - Deep hierarchies of features, multilevel feature representation.↵ - Learning speed is slow. |
Inception-ResNet-v2 (2016) | Inception-ResNet-v2 consists of 164 layers with several convolutional blocks which include 1 stem block, 20 residual inception blocks, 2 reduction blocks and a pooling layer. The residual inception blocks are divided in three categories, namely, A, B, and C with 5, 10, and 5 blocks, respectively. | 56 million | - It improves training speed.↵ - Deep hierarchies of features, multilevel feature representation. |
ResNeXt-50 (2016) | ResNeXt-50 has initial convolutional and max-pooling layers, and final average pooling and fully connected layers. In between, there are 3, 4, 6 and 3 residual blocks separated in 4 stages where each block contains 3 convolutional layers. In comparison to ResNet-50, it scales up the number of parallel towers (cardinality=32) within each residual block. | 25 millions | - Has homogeneous topology. ↵ - Performs grouped convolution. |
DenseNet-121 (2016) | DenseNet architecture includes 4 dense blocks. Each layer in a dense block is connected to every other layer. The dense blocks, consisting of convolution, pooling, batch normalization and activation, are separated by transition layers. | 8 millions | - Introduces depth or cross-layer dimension.↵ - Ensures maximum data flow between the layers in the network. ↵ - Avoids relearning of redundant feature maps. |
Type | Data [base/set] | DL architecture | Task |
---|---|---|---|
Images | ABIDE | DNN [95] | Autism disorder identification |
ADHD-200 dataset | DBN [96] | ADHD detection | |
ADNI dataset | AD/MCI diagnosis | ||
BRATS Dataset | CNN [100] | Brain pathology segmentation | |
CT dataset | CNN [101] | Fast segmentation of 3D medical images | |
DRIVE, STARE datasets | GAN [102] | Retinal blood vessel segmentation | |
EM segmentation challenge dataset | CNN [103] | Segment neuronal membranes | |
LSTM [104] | Biomedical volumetric image segmentation | ||
IBSR, LPBA40, and OASIS dataset | CNN [105] | Skull stripping | |
LIDC-IDRI dataset | CNN [106] | Lung nodule malignancy classification | |
MICCAI 2009 LV dataset | DBN [107] | Heart LV segmentation | |
MITOS dataset | CNN [108] | Mitosis detection in breast cancer | |
PACS dataset | CNN [106] | Medical image classification | |
TBI dataset | CNN [109] | Brain lesion segmentation | |
Signals | BCI competition IV | Motion action decoding | |
DEAP dataset | Affective state recognition | ||
CNN [116] | Emotion classification | ||
DECAF | GAN [117] | ||
Freiburg dataset | CNN [118] | Seizure prediction | |
MAHNOB-HCI | DA [119] | Emotion recognition | |
MIT-BIH arrhythmia database | ECG arrhythmia classification | ||
MIT-BIH, INCART, and SVDB | CNN [122] | Movement decoding | |
NinaPro database | Motion action decoding | ||
Sequences | CullPDB, CB513, CASP datasets, CAMEO | CNN [124] | 2ps prediction |
DREAM | CNN [125] | DNA/RNA sequence prediction | |
DNN [126] | Predict effective drug combination | ||
ENCODE database | Gene expression identification | ||
ENCODE DGF dataset | CNN [129] | Predict noncoding variant of gene | |
GEO database | GAN [130] | Gene expression data augmentation | |
GWH and UCSC datasets | DBN [131] | Splice junctions prediction | |
JASPAR database and ENCODE | CNN [132] | Predicting DNA-binding protein | |
miRBoost | RNN [133] | micro-RNA Prediction | |
miRNA-mRNA pairing data repository | LSTM [134] | micro-RNA target prediction | |
Protein Data Bank (PDB) | DA [135] | Protein structure reconstruction | |
SRBCT, prostate tumour, and MLL GE | DBN [136] | Gene/MiRNA feature selection | |
sbv IMPROVER | DBN [137] | Human diseases and drug development | |
TCGA database | DA [138] | Cancer detection and gene identification | |
DBM [139] | |||
DNN [140] | Drug combination estimation | ||
UCSC, CGHV Data, SPIDEX database | CNN [141] | Genetic variants identification |
Deep Learning and Biological Data
Images
Signals
Sequences
Open Access Biological Data Sources
Application | Name | Description | Ref. |
---|---|---|---|
Bio/medical image processing and analysis | CCDB | High-resolution 2/3/4-D light and electron microscope images |
[162] |
CIL | Cell image datasets and cell library app. |
[163] | |
Euro Bioimaging | Biological and biomedical imaging data |
[164] | |
HAPS | Microscopic image of human cells and tissues |
[165] | |
IDR | Viewing, analysis, and sharing of multi-D image data |
[166] | |
SMIR | Post-mortem CT scans of the whole body |
[167] | |
TCIA | CT, MRI, and PET images of cancer patients |
[168] | |
TMA | Microscopic tissue images of human |
[169] | |
UCSB BioSeg | 2D/3D cellular, subcellular and tissue images |
[170] | |
Disease detection and diagnosis | ABIDE | Autism brain imaging datasets |
[171] |
ADHD-200 | fMRI/anatomical datasets fused over the 8 imaging sites |
[172] | |
ADNI | MCI, early AD and elderly control subjects’ diagnosis data |
[173] | |
BCDR | Multimodal mammography and ultrasound scan data |
[174] | |
Kaggle CXRayP | Chest X-ray scans for pneumonia |
[175] | |
MITOS | Breast cancer histological images |
[176] | |
NAMIC | Lupus, brain, prostate MRI scans |
[177] | |
nCOV-CXray | COVID-19 cases with chest X-ray/CT images |
[178] | |
Neurosynth | fMRI datasets and synthesis platform |
[179] | |
NIH | Labelled chest X-ray images with diagnoses |
[180] | |
OASIS | MRI datasets and XNAT data management platform |
[181] | |
Open NI | Imaging modalities and brain diseases data |
[182] | |
SMIR | CT of human temporal bones |
[183] | |
Neuroimage processing and analysis | IXI | It provides neuroimaging data and toolkit software |
[184] |
LPBA40 | Maps of brain regions and a set of whole-head MRI |
[185] | |
NeuroVault.org | API for collecting and sharing statistical maps of brain |
[186] | |
NITRC | MRI, PET, SPECT, CT, MEG/EEG and optical imaging |
[187] | |
OpenfMRI | Multimodal MRI and EEG datasets |
[188] | |
UK data service | fMRI dataset |
[189] | |
Segmentation | DRIVE | Digital Retinal Images diabetic patient |
[190] |
IBSR | Segmentation results of MRI data |
[191] | |
STARE | The dataset contains raw/labelled retinal images |
[192] |
Images
Bio/Medical Image Processing and Analysis
Disease Detection and Diagnosis
Neuroimage Processing and Analysis
Segmentation
Application | Name | Description | Ref. |
---|---|---|---|
Anomaly detection | SAD mc-EEG | Multichannel EEG data for sustained-attention driving task | [193] |
TUH EEG Corpus | Repository for EEG datasets, tools and documents | [194] | |
MIT-BIH-ARH | ECG database containing 48 recordings | [195] | |
PTB D-ECG | ECG database containing 549 recordings | [196] | |
TELE ECG | 250 ECG recordings with annotated QRS and artifact masks | [197] | |
Human–Machine Interfacing | BNCI | Various BMI signal datasets | [198] |
EMG DataRep | Various EMG datasets | [199] | |
Facial sEMG | Contains EMG data from 15 participants | [200] | |
NinaPro database | Kinematic as well as the sEMG data of 27 subjects | [201] | |
Emotion/affective state detection | DEAP | Simultaneously recorded EMG/EEG data | [202] |
DECAF | MEG, hEOG, ECG, trapezius muscle EMG, face video data | [203] | |
Imagine | EEG datasets of 31 subjects while listening voice | [204] | |
MAHNOB-HCI | EMG, ECG, and respiration and skin temperature data | [205] | |
SEED | EEG dataset for emotion and vigilance | [206] | |
Motor imagery classification | EEG-BCI-MI | EEG signals from 13 subjects with 60,000 MI examples | [207] |
EEG-MI-BCI | EEG data from BCI for MI tasks | [208] | |
EEG-MMI | EEG data from PhysioNet for MI task | [209] | |
Neurological condition evaluation | V-P300 BCI | 16-electrode dry EEG from 71 subjects (SP mode) | [210] |
32-electrode wet EEG from 50 subjects (SP mode) | [211] | ||
32-electrode wet EEG from 38 subjects (MPC mode) | [212] | ||
32-electrode wet EEG from 44 subjects (MPCC mode) | [213] | ||
Signal processing and classification | BCI competition | EEG, ECoG, and MEG data from a range of BCI applications | [214] |
BCI-NER challenge | 56 channel EEG dataset decoded by a P300 speller | [215] | |
DRYAD | EEG datasets of 13 subjects recorded under various conditions | [216] | |
PhysioNet | Various EEG, ECG, EMG and sEMG datasets | [217] | |
UCI ML | Various ECG, EMG, sEMG datasets | [218] |
Signals
Anomaly Detection
Human–Machine Interfacing
Emotion/Affective State Detection
Motor Imagery Classification
Neurological Condition Evaluation
Signal Processing and Classification
Application | Name | Description | Ref. |
---|---|---|---|
Bioassay analysis and drug design | COVID-19 | Gene sequence, pathway, and bioassay datasets of COVID-19 |
[220] |
PubChem | Contains compound structures, molecular datasets, and tool |
[221] | |
Genetic disorder analysis | Cancer GeEx | Different cancer genome datasets |
[222] |
IGDD | Mutation data on common genetic diseases |
[223] | |
TCGA | Contains cancer genome data |
[224] | |
BDTNP | 3D Gene expression, DNA-binding data and ChAcD |
[225] | |
Nucleic acid research | ENCODE | Human genome dataset |
[226] |
ESP | Contains sequencing data |
[227] | |
GEO | Contains high-throughput gene expression and functional genomics datasets |
[228] | |
gnomAD | Large-scale exomes and genomes sequencing data |
[229] | |
GTEx | Gene expression datasets |
[230] | |
Harmonizome | Collection of genes and proteins datasets |
[231] | |
INSDC | Contains nucleotide sequence data |
[232] | |
IGSR | Genome data of various ethnicities, age and sex |
[233] | |
JASPAR | Transcription factor DNA-binding preferences dataset |
[234] | |
NIHREM | Human genome datasets |
[235] | |
NSD | Includes omics and health science data |
[236] | |
SysGenSim | Bioinformatics tools and gene sequence dataset |
[237] | |
Protein structure analysis | PDB | Proteins, nucleic acids, and complex assemblies data |
[238] |
SCOP2 | Contains structural classification of proteins |
[239] | |
SCOPe |
[240] | ||
UCI MB | 2ps and splice–junction gene sequences |
[241] | |
Signal transduction pathway study | NCI Nature | Molecular interactions and reactions of cells |
[242] |
NetPath | Signal transduction pathways in humans |
[243] | |
Reactome | Database for reactions, pathways and biological processes |
[244] | |
Single-cell omics | miRBoost | The genomes of eukaryotes containing at least 100 miRNAs |
[245] |
SGD | Provides biological data for budding yeast and analysis tool |
[246] |
Sequences
Bioassay Analysis and Drug Design
Genetic Disorder Analysis
Nucleic Acid Research
Protein Structure Analysis
Signal Transduction Pathway Study
Single-cell Omics
Open-Source Deep Learning Tools
Tool | Platform | Language(s) | Stars* | Forks* | Contrib.* | Supported DL Architecture |
---|---|---|---|---|---|---|
Caffeb | L, M, W, A | Py, C++, Ma | 30100 | 18200 | 266 | CNN, RNN, GAN |
Chainerc | L | Py | 5300 | 1400 | 251 | DA, CNN, RNN, GAN |
DL4ja | L, M, W | Ja | 11500 | 4800 | 32 | DA, CNN, RNN, RBM, LSTM, GAN |
DyNeta | L | C++ | 3000 | 687 | 117 | CNN, RNN, LSTM |
H2Oa | L, M, W | Ja, Py, R | 4700 | 1700 | 132 | CNN, RNN |
Kerasc | L, M, W | Py | 47500 | 18000 | 816 | CNN, RNN, DBN, GAN |
Lasagnea | L, M | Py | 3700 | 980 | 68 | CNN, RNN, LSTM, GAN |
MCTc | W | C++ | 16720 | 4400 | 197 | CNN, DBN, RNN, LSTM |
MXNeta | L, M, W, A, I | C++ | 18500 | 6600 | 780 | DA, CNN, RNN, LSTM, GAN |
Neona | L, M | Py | 3800 | 846 | 78 | DA, CNN, RNN, LSTM, GAN |
PyTorchb | L, M | Py | 37400 | 9500 | 1345 | CNN, RNN, LSTM, GAN |
Singhaa | L, M, W | Py, C++, Ja | 2000 | 499 | 46 | CNN, RNN, RBM, DBM |
TensorFlowa | L, M, W | Py, C++ | 14300 | 80600 | 2450 | CNN, RNN, RBM, LSTM, GAN |
TF.Learnc | L, M | Py, C++ | 9400 | 2400 | 120 | CNN, BRNN, RNN, LSTM, GAN |
Theanob | L, M, W | Py | 9103 | 2500 | 332 | CNN, RNN, RBM, LSTM, GAN |
Torchb | L, M, W, A, I | Lu, C, C++ | 8495 | 2400 | 130 | CNN, RNN, RBM, LSTM, GAN |
Velesa | L, M, W, A | Py | 891 | 185 | 10 | DA, CNN, RNN, LSTM, RBM |
Caffe
-
Easy to deploy;
-
Pretrained models are available;
-
Faster training speed;
-
Used for feedforward networks.
-
Requires writing code for generating new layers;
-
Less support for recurrent networks;
-
No support for distributed training.
Chainer
-
One of the tools for leading dynamic computation graphs/networks;
-
Notably faster than other Python-oriented frameworks.
-
Open Computing Language framework/Open Multi-Processing API is not supported.
DeepLearning4j
-
Supports integration with Big Data frameworks Apache Spark and Hadoop;
-
Supports distributed GPU and CPU platforms and capable to work with tensor.
-
Open Computing Language framework is not supported;
-
GUI is supported for workflow and visualization.
DyNet
-
Designed to be efficient for running on CPU or GPU.
-
Dynamic computation graph like PyTorch and Chainer.
-
In terms of TensorFlow, limited functions are available.
H\(_2\)O
-
Due to its in-memory distributed parallel processing capacities, it can be used for real-time data;
-
GUI is supported (called Flow) for workflow and visualization;
-
GPU support for Deep Water and NVIDIA;
-
Fast training, memory-efficient DataFrame manipulation;
-
Easy-to-use algorithms and well documented;
-
Lacks the data manipulation capabilities of R and Pandas DataFrames;
-
Slow in learning and supports limited model running at a time.
Keras
-
Rich documentation;
-
A high-level API for neural networks;
-
Ability to run on top of state-of-the-art deep learning libraries/frameworks such as TensorFlow, CNTK, or Theano.
-
Cannot utilize multi-GPU directly;
-
Requires Theano as backend for OpenMP support and Theano/TensorFlow/PlaidML as backend for OpenCL.
Lasagne
-
Lasagne is a lightweight library to build and train DL algorithms in Theano;
-
Layers, regularizers, and optimizers can be used independently;
-
Clear documentation is available;
-
Supports training the network on a GPU.
-
Small community than TensorFlow.
Microsoft Cognitive Toolkit
-
It is a framework for feedforward DNNs, CNN and RNN;
-
Can train production systems very fast;
-
Can achieve state-of-the-art performance on benchmark tasks;
-
Allow directed graph visualization.
-
Less community support;
-
Difficult to install;
-
Draw lass interest among the research community.
MXNet
-
A DL framework which has a high-performance imperative API;
-
Rich Language support;
-
MXNet features advanced GPU support;
-
Highly scalable.
-
Small community than TensorFlow;
-
Poor API documentation;
-
Less popular with the research community.
Neon
-
Better visualization properties than other frameworks;
-
Apply optimization at data loading level,
-
Small community than TensorFlow;
-
Less popular with the research community.
PyTorch
-
Pretrained models are available;
-
OpenCL support via separately maintained package.
-
Easily combine modular pieces;
-
Easy to create a layer and run on GPU.
-
Requires writing training code;
-
Limited documentation.
Singa
-
Pretrained models are available;
-
Supports model/data or hybrid partitioning, and synchronous/asynchronous/hybrid training;
-
Distributed deep learning system and handle Big data.
-
Widely used for healthcare data analytics.
-
No Open Multi-Processing support.
TensorFlow
-
Handles large-scale data and operate in heterogeneous environments;
-
Faster compile time than Theano;
-
Computational graph abstraction;
-
Supports parallelism.
-
TensorBoard is used for workflow and visualization.
-
Large memory footprint;
-
Less number of pretrained models are available;
-
Computational graph can be slow;
-
No support for matrix operations;
-
Difficulties in debugging.
TF.Learn
-
Modular and transparent DL library built on the top of TensorFlow;
-
Provides a higher-level API to TensorFlow.
-
Slower compared to its competitors.
Theano
-
High flexibility;
-
High computational stability;
-
Well suited for tensor-based mathematical expressions;
-
Open-source libraries such as Keras, Lasagne and Blocks built on the top of Theano;
-
Able to visualize convolutional filters, images, and graphs;
-
High-level wrappers like Keras and Lasagne increases usability.
-
Difficult to learn;
-
Difficult to deploy;
-
Deployed on single GPU;
-
Slower compilation time than TensorFlow.
Torch
-
User friendly;
-
Convenient for employ with GPUs;
-
Pretrained models are available;
-
Highly modular;
-
Easy to create a layer and run on GPU.
-
Special data format and requires conversion;
-
Require to write training code;
-
Less documentation available.
Veles
-
Distributed platform support;
-
Supports Jupyter Notebook;
-
Supports OpenCL for cross-platform parallel programming.
-
Less community support;
-
Draws lass interest from the research community.
Relative Comparison of DL Tools
Trend
Community
Interoperability
Scalability
Performance of Tools and Benchmark
ESN | Processor | Memory |
---|---|---|
1 | CPU: E5-1650a @ 3.50 GHz | 32 GB |
GPU: Nvidia GeForce GTX Titan Xb | ||
2 | CPU: E5-2630c @ 2.20 GHz | 128 GB |
GPU: Nvidia GeForce GTX 980d | ||
GPU: Nvidia GeForce GTX 1080e | ||
GPU: Tesla K80 accelerator with GK210 GPUsf | ||
3 | CPU: E5-2690c @ 2.60 GHz | 256 GB |
GPU: Tesla P100 acceleratorg | ||
GPU: Tesla M40 acceleratorh | ||
GPU: Tesla K80 accelerator with GK210 GPUsf |