BARTMAP: A viable structure for biclustering

doi:10.1016/j.neunet.2011.03.020

Neural Networks

Volume 24, Issue 7, September 2011, Pages 709-716

https://doi.org/10.1016/j.neunet.2011.03.020 Get rights and content

Abstract

Clustering has been used extensively in the analysis of high-throughput messenger RNA (mRNA) expression profiling with microarrays. Furthermore, clustering has proven elemental in microRNA expression profiling, which demonstrates enormous promise in the areas of cancer diagnosis and treatment, gene function identification, therapy development and drug testing, and genetic regulatory network inference. However, such a practice is inherently limited due to the existence of many uncorrelated genes with respect to sample or condition clustering, or many unrelated samples or conditions with respect to gene clustering. Biclustering offers a solution to such problems by performing simultaneous clustering on both dimensions, or automatically integrating feature selection to clustering without any prior information, so that the relations of clusters of genes (generally, features) and clusters of samples or conditions (data objects) are established. However, the NP-complete computational complexity raises a great challenge to computational methods for identifying such local relations. Here, we propose and demonstrate that a neural-based classifier, ARTMAP, can be modified to perform biclustering in an efficient way, leading to a biclustering algorithm called Biclustering ARTMAP (BARTMAP). Experimental results on multiple human cancer data sets show that BARTMAP can achieve clustering structures with higher qualities than those achieved with other commonly used biclustering or clustering algorithms, and with fast run times.

Introduction

Clustering Bezdek (1981), Oliveira and Pedrycz (2007), Sato-Ilic and Jain (2006), Gokcay and Principe (2002), Xu and Wunsch (2005) and Xu and Wunsch (2009) has been used extensively in the analysis of high-throughput messenger RNA (mRNA) or microRNA expression profiling with microarrays, which demonstrate enormous promise in the areas of cancer diagnosis and treatment, gene function identification, therapy development and drug testing, and genetic regulatory network inference (Eisen et al., 1998, Golub et al., 1999, Jiang et al., 2004, McLachlan et al., 2004, Moreau et al., 2002, Shamir and Sharan, 2002, Xu and Wunsch, 2010, Xu and Wunsch, 2009, Xu and Wunsch, 2005). Usually, the expression levels of many genes are measured across a relatively small set of conditions or samples, and the obtained gene expression data are organized as a data matrix with rows corresponding to genes and columns corresponding to samples or conditions.

Let a gene expression data set be represented as a data matrix $E = (G, S)$ , with $G = {g_{1}, \dots, g_{N}}$ representing a set of $N$ genes or rows and $S = {s_{1}, \dots, s_{M}}$ representing a set of $M$ samples (conditions)¹ or columns (see Table 1). An element $e_{i j} \in E$ then represents the expression level of gene $i$ under condition $j$ . A gene or row cluster is a subset of rows defined on all columns, denoted as $C_{XS} = (X, S)$ , where $X \subseteq G$ is a subset of genes. Similarly, a sample or column cluster is a subset of columns defined across all rows, denoted as $C_{GY} = (G, Y)$ , where $Y \subseteq S$ is a subset of samples.

Such a data matrix and the corresponding row and column cluster definition can be generalized as a data matrix for many other applications. Such a practice, however, is inherently limited in analysis. This is because our general understanding of cellular processes elucidates that only a subset of genes is involved with a specific cellular process that becomes active only under some experimental conditions. However, microarrays generally are not specifically designed to meet the requirements of a particular experiment. Consider, for example, that in gene expression profile-based cancer diagnosis, only a subset of genes is related to some cancer type while numerous genes are considered irrelevant. In this case, the inclusion of all genes in sample clustering or all samples in gene clustering not only increases the computational burden, but could impair clustering performance due to the effect of these unrelated genes or samples, which are treated as noise.

Biclustering, first used by Cheng and Church (2000) in the community of bioinformatics, addresses this problem by performing clustering simultaneously on both row (gene) and column (sample) dimensions instead of clustering these two dimensions separately (Busygin et al., 2008, Madeira and Oliveira, 2004). In essence, biclustering can be regarded as a combination of clustering and automatic feature selection if we treat one dimension (e.g., column) as data objects and the other dimension (e.g., row) as description features. This task becomes particularly challenging without ground truth.

Note that the feature selection in biclustering is different from the feature selection that we usually consider in supervised classification. This difference lies in the fact that biclustering selects different subsets of features for different clusters of data objects, while standard feature selection chooses a subset of features from the candidate pool for all data objects. As such, the local relationship between subsets of genes and subsets of samples or conditions is identified with biclustering. Biclustering indicates gene groups that display similar patterns across a set of conditions (important to gene functional annotation and coregulated gene identification DiMaggio et al., 2008a, Lazzeroni and Owen, 2002, Li et al., 2009, Madeira et al., 2010, Segal et al., 2001, Tanay et al., 2002) or gene groups that are related to certain cancer types (for cancer classification discovery and diagnosis Getz et al., 2003, Kluger et al., 2003, Murali and Kasif, 2003, Tang and Zhang, 2005, Tchagang et al., 2008). In contrast, clustering focuses on uncovering global relationships between genes and samples or conditions. In fields other than bioinformatics, biclustering is also known as subspace clustering, co-clustering, or block clustering, among other names Busygin et al. (2008), Kriegel, Kröger, and Zimek (2009) and Madeira and Oliveira (2004). See the informative survey paper Kriegel et al. (2009).

Specifically, let $I \subseteq G$ and $J \subseteq S$ be subsets of the rows and columns for the gene expression data matrix $E; E_{I J} = (I, J)$ is then the submatrix with rows $I$ and columns $J$ . A bicluster corresponds to such a submatrix that exhibits certain homogeneity. The goal of a biclustering algorithm, then, is to identify a set of biclusters with pairs of row and column subsets. The complexity of the biclustering problem has been shown to be NP-complete (Madeira and Oliveira, 2004, Peeters, 2003), which leads to many heuristics falling into the five major categories summarized in Table 2 according to Madeira and Oliveira (2004). For example, Cheng and Church (2000) used the mean squared residue to measure the coherence of the rows and columns in a bicluster, which is defined as, $H (I, J) = \frac{1}{| I | | J |} \sum_{i \in I, j \in J} {(e_{i j} - \bar{e_{i J}} - \bar{e_{I j}} + \bar{e_{I J}})}^{2},$ where $\bar{e_{i J}}$ is the mean of the $i$ th row, $\bar{e_{I j}}$ is the mean of the $j$ th column, and $\bar{e_{I J}}$ is the mean of the bicluster. A submatrix $E_{I J}$ is called a $δ$ -bicluster if $H (I, J) \leq δ$ for some $δ \geq 0$ . Possible ways to find the largest $δ$ -biclusters, which should also have relatively high row variance Cheng and Church (2000), include the evolutionary algorithm (Divina & Aguilar-Ruiz, 2006), multi-objective particle swarm optimization (Liu, Li, Hu, & Chen, 2009), and other greedy iterative search approaches, such as the FLOC (FLexible Overlapped biClustering) algorithm (Yang, Wang, Wang, & Yu, 2003). Lazzeroni and Owen (2002) developed a plaid model treating the matrix as the sum of additive layers. This model was further extended to combine external grouping information and to generate biclusters of profiles of repeated measures (Turner, Bailey, Krzanowski, & Hemingway, 2005). Gu and Liu (2007) proposed a Bayesian biclustering model with a Gibbs sampling procedure for statistical inference. DiMaggio et al. (2008b) considered modeling biclustering as either a network flow problem or a traveling salesman problem. A systematic comparison of five greedy search-based biclustering algorithms, together with a reference method (Bimax), was presented in Prelić et al. (2006).

Early cancer diagnosis is critical for prolonging human life. As aforementioned, biclustering overcomes the disadvantage of clustering in dealing with unrelated genes and focuses only on the cancer type-related genes, thus providing a more effective method of cancer classification and diagnosis. Biclustering algorithms that are used for this purpose include the interrelated two-way clustering (ITWC) algorithm (Tang & Zhang, 2005), the coupled two-way clustering (CTWC) algorithm (Getz et al., 2003), xMOTIF (Murali & Kasif, 2003), Binary State Pattern Clustering (BSPC) (Beattie & Robinson, 2006), the qualitative biclustering algorithm (QUBIC) (Li et al., 2009), the spectral biclustering algorithm (Kluger et al., 2003), the double conjugated clustering (DCC) algorithm (Busygin, Jacobsen, & Kramer, 2002), and fuzzy adaptive subspace iteration-based two-way clustering (FASIC) (Shaik & Yeasin, 2009), among others. For example, ITWC generates biclusters by combining clustering results from each dimension of the data matrix in an iterative way (Tang & Zhang, 2005). Within each iteration, a set of gene clusters is first created using standard clustering algorithms, e.g., $K$ -means or self-organizing maps, followed by the independent clustering of the sample dimension based on each gene cluster. The clustering results from the previous steps are then combined, and heterogeneous groups, which are pairs of columns that do not share gene features used for clustering, are identified. Finally, the genes are sorted in descending order of the cosine distance. Only the first one third of sorted genes is kept so that a reduced gene set is obtained for each heterogeneous group. The above steps are then repeated using the reduced gene set until some stopping conditions are satisfied.

CTWC uses hierarchical clustering with the Euclidean distance (Getz et al., 2003), while DCC relies on self-organizing maps with the dot product for similarity (Busygin et al., 2002). FASIC adopts the concept of the fuzzy set theory and applies the fuzzy-adaptive subspace iteration algorithm to generate gene clusters under a progressive clustering framework. Those clusters are then scored to generate $p$ -values indicating the significance of the differential expression (Shaik & Yeasin, 2009). Tchagang et al. demonstrated the effectiveness of biclustering for detecting and diagnosing ovarian cancer (Tchagang et al., 2008). Mitra et al. showed a biclustering application for microRNA expression profile-based cancer classification of multiple types, including melanoma, ovarian, renal, colon, leukemia, and so on (Mitra, Bandyopadhyay, Maulik, & Zhang, 2010).

Here, we present BARTMAP (Biclustering ARTMAP) to perform biclustering on gene expression data, particularly for cancer classification discovery, although BARTMAP can be used to other types of data that have high dimensionalities, e.g., document clustering. BARTMAP is adapted to and modified from a neural-based classifier, Fuzzy ARTMAP (Carpenter, Grossberg, Markuzon, Reynolds, & Rosen, 1992), for supervised classification. Fuzzy ARTMAP, together with its other variants, such as Ellipsoid ARTMAP (Anagnostopoulos & Georgiopoulos, 2001) and default ARTMAP (Carpenter, 2003), already have shown great effectiveness in gene expression data analysis and other biomedical applications (Xu et al., 2007, Xu et al., 2009). Similar to Fuzzy ARTMAP, BARTMAP is based on Adaptive Resonance Theory (ART) (Carpenter and Grossberg, 1987, Grossberg, 1976). ART is a learning theory that hypothesizes that resonance in neural circuits can trigger fast learning; it was developed as a solution to the plasticity-stability dilemma.

BARTMAP displays many attractive characteristics that are also inherent and general in the ART family. First, BARTMAP scales very well with large-scale data analysis while maintaining efficiency. As the computational complexity for its ART modules is $O (N log N)$ or $O (N)$ for one-pass variant (Mulder & Wunsch, 2003), the overall computational cost for BARTMAP is relatively low. Each ART module can dynamically and adaptively generate clusters without the requirement of specifying the number of clusters in advance as is required in many other commonly used clustering algorithms, such as $K$ -means and self-organizing feature maps. Furthermore, BARTMAP is an exemplar-based, transparent learning model. During its learning, the architecture summarizes data via the use of exemplars in order to accomplish its learning objective. This ability contrasts with other, opaque neural network architectures for which it is generally difficult to explain why an input produces a particular output. Another important feature of BARTMAP is its ability to detect atypical patterns during its learning. The detection of such patterns is accomplished via the employment of a match-based criterion that decides to which degree a particular pattern matches the characteristics of an already-formed category in BARTMAP. Finally, BARTMAP is far simpler to implement than, for example, backpropagation for feed-forward neural networks and the training algorithm of support vector machines.

The remainder of this paper is organized as follows. Section 2 begins with a description of Fuzzy ART and Fuzzy ARTMAP, followed by how BARTMAP performs biclustering. The experimental data, design, and results are presented and discussed in Section 3. We conclude the paper in Section 4.

Section snippets

Fuzzy ART

Because Fuzzy ART (Carpenter, Grossberg, & Rosen, 1991) serves as the basic module for both Fuzzy ARTMAP and BARTMAP, we commence this section with a brief introduction of Fuzzy ART.

The basic Fuzzy ART architecture consists of two-layer neurons, the feature representation field $F_{1}$ , and the category representation field $F_{2}$ , as shown in Fig. 1. The neurons in layer $F_{1}$ are activated by the input pattern that is normalized with the complement coding rule to avoid category proliferation (Carpenter

Data sets

The proposed method is applied to three benchmark data sets in gene expression profile-based cancer research and one data set in microRNA profile-based cancer research.

The first data set is the leukemia data set consists of 72 samples, including bone marrow samples, peripheral blood samples and childhood AML cases (Golub et al., 1999). Twenty-five of these samples are acute myeloid leukemia (AML), and 47 are acute lymphoblastic leukemia (ALL), which is composed of two subcategories due to the

Conclusions

Biclustering has been demonstrated to be a promising method of high dimensional gene expression data analysis, or other types of high-dimensional data analysis, by automatically combining feature selection with clustering without any prior information. Here, we show that a well-known supervised classification system, ARTMAP, can be modified to achieve biclustering in a fast, stable, and adaptive way. Experimental results on benchmark gene expression or microRNA expression data sets indicate the

Acknowledgements

Partial support for this research from the National Science Foundation, the Missouri University of Science & Technology Intelligent Systems Center, and the M.K. Finley Missouri endowment, is gratefully acknowledged.

References (59)

S. Busygin et al.
Biclustering in data mining
Computers & Operations Research
(2008)
G. Carpenter et al.
A massively parallel architecture for a self-organizing neural pattern recognition machine
Computer Vision, Graphics, and Image Processing
(1987)
G. Carpenter et al.
Fuzzy ART: fast stable learning and categorization of analog patterns by an adaptive resonance system
Neural Networks
(1991)
S. Mulder et al.
Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks
Neural Networks
(2003)
R. Peeters
The maximum edge biclique problem is NP-complete
Discrete Applied Mathematics
(2003)
R. Xu et al.
MicroRNA expression profile-based cancer classification using default ARTMAP
Neural Networks
(2009)
E. Yeoh et al.
Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling
Cancer Cell
(2002)
Anagnostopoulos, G., & Georgiopoulos, M. (2001). Ellipsoid ART and ARTMAP for incremental unsupervised and supervised...
B. Beattie et al.
Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer
Journal of Computational Biology
(2006)
Ben-Dor, A., Chor, B., Karp, R., & Yakhini, Z. (2002). Discovering local structure in gene expression data: the...

J. Bezdek

Pattern recognition with fuzzy objective function algorithms

(1981)

Busygin, S., Jacobsen, G., & Kramer, E. (2002). Double conjugated clustering applied to leukemia microarray data. In:...

Carpenter, G. (2003). Default ARTMAP. In Proceedings of the international conference on neural networks (pp....

G. Carpenter et al.

Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps

IEEE Transactions on Neural Networks

(1992)

Cheng, Y., & Church, G. (2000). Biclustering of expression data. In Proceedings of eighth international conference on...

P. DiMaggio et al.

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

BMC Bioinformatics

(2008)

P. DiMaggio et al.

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

BMC Bioinformatics

(2008)

F. Divina et al.

Biclustering of expression data with evolutionary computation

IEEE Transactions on Knowledge and Data Engineering

(2006)

D. Duffy et al.

A permutation based algorithm for block clustering

Journal of Classification

(1991)

M. Eisen et al.

Cluster analysis and display of genome-wide expression patterns

Proceedings of National Academic Science USA

(1998)

G. Getz et al.

Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data

Bioinformatics

(2003)

E. Gokcay et al.

Information theoretic clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

T. Golub et al.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

(1999)

S. Grossberg

Adaptive pattern recognition and universal encoding II: feedback, expectation, olfaction, and illusions

Biological Cybernetics

(1976)

J. Gu et al.

Bayesian biclustering of gene expression data

BMC Genomics

(2007)

J. Hartigan

Direct clustering of a data matrix

Journal of American Statistical Association

(1972)

L. Hubert et al.

Comparing partitions

Journal of Classification

(1985)

D. Jiang et al.

Cluster analysis for gene expression data: a survey

IEEE Transactions on Knowledge and Data Engineering

(2004)

J. Khan et al.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks

Nature Medicine

(2001)

Cited by (41)

Type2 soft biclustering framework for Alzheimer microarray
2024, Applied Soft Computing
Microarray technology is a powerful tool that enables simultaneous analysis of the expression level of a large number of genes for different samples. Reliable information on gene expression level is much needed in the health system as it is widely used to predict, diagnose, and treat human diseases (e.g., Alzheimer's). For the analysis of the microarray dataset, biclustering is known to be a highly capable approach, however some characteristics of the dataset including high dimensionality, noise, uncertainty, and complex biological processes need to be handled properly. Concerning these characteristics, the current paper proposes a novel two-stage biclustering framework based on soft clustering and a metaheuristic technique. The integration of the two stages ensures a reliable search process to find similar expression patterns concerning gene expression characteristics. The proposed framework employs fuzzy and possibilistic clustering along with Type2-Fuzzy Sets theory to handle high-level uncertainty, noise, and outliers in microarray datasets. Considering the NP-hard nature of the biclustering method, the proposed framework incorporates the Genetic Algorithm with a unique chromosome representation, fitness function, and modification mechanisms. Real microarray datasets of Alzheimer's Disease have been used to evaluate the proposed framework. The comparative analysis of different versions of our proposed framework and some well-known biclustering methods demonstrates that the proposed framework is superior in terms of some indices including the mean squared residual and variance indices. The final results are further evaluated using the defined fitness function, which indicates the better performance of our possibilistic-based biclustering methods.
Joint feature selection and optimal bipartite graph learning for subspace clustering
2023, Neural Networks
Recently, there has been tremendous interest in developing graph-based subspace clustering in high-dimensional data, which does not require a priori knowledge of the number of dimensions and subspaces. The general steps of such algorithms are dictionary representation and spectral clustering. Traditional methods use the dataset itself as a dictionary when performing dictionary representation. There are some limitations that the redundant information present in the dictionary and features may make the constructed graph structure unclear and require post-processing to obtain labels. To address these problems, we propose a novel subspace clustering model that first introduces feature selection to process the input data, randomly selects some samples to construct a dictionary to remove redundant information and learns the optimal bipartite graph with K-connected components under the constraint of the (normalized) Laplacian rank. Finally, the labels are obtained directly from the graphs. The experimental results on motion segmentation and face recognition datasets demonstrate the superior effectiveness and stability of our algorithm.
Topological biclustering ARTMAP for identifying within bicluster relationships
2023, Neural Networks
Citation Excerpt :
Such visualization aims to provide a way to observe associations between different biclusters. This section presents a compendium on TopoART (Tscherepanow, 2010) and BARTMAP (Xu & Wunsch II, 2011) while briefly introducing Adaptive Resonance Theory thus furnishing principal details for explicating TopoBARTMAP. The reader is referred to Brito da Silva, Elnabarawy, and Wunsch I.I. (2019), Carpenter and Grossberg (2010), Carpetner, Grossberg, and Rosen (1991), Grossberg (2013), Wunsch II (2019) for a rigorous exploration of “Winner-Take-All” (WTA) learning models based on Adaptive Resonance Theory (ART) and their applications.
Biclustering is a powerful tool for exploratory data analysis in domains such as social networking, data reduction, and differential gene expression studies. Topological learning identifies connected regions that are difficult to find using other traditional clustering methods and produces a graphical representation. Therefore, to improve the quality of biclustering and module extraction, this work combines the adaptive resonance theory (ART)-based methods of biclustering ARTMAP (BARTMAP) and topological ART (TopoART), to produce TopoBARTMAP. The latter inherits the ability to detect topological associations while performing data reduction. The capabilities of TopoBARTMAP were benchmarked using 35 real world cancer datasets and contrasted with other (bi)clustering methods, where it showed a statistically significant improvement over the other assessed methods on ordered and shuffled data experiments. In experiments with 12 synthetic datasets, the method was observed to perform better at identifying constant, scale, shift, and shift scale type biclusters. The produced graphical representation was refined to represent gene bicluster associations and was assessed on the NCBI GSE89116 dataset containing expression levels of 39,326 probes sampled over 38 observations.
Self-organizing subspace clustering for high-dimensional and multi-view data
2020, Neural Networks
Citation Excerpt :
They take advantage of a neural network’s capacity for performing a relevant number of categorization tasks. Considering a timeline, the first algorithms are Generalized Relevance Learning Vector Quantization (GLVQ) (Hammer & Villmann, 2002) and Biclustering ARTMAP (BARTMAP) (Xu & Wunsch I.I., 2011). Then several SOM-based approaches were proposed: The Dimension Selective Self-organizing Map (DSSOM) (Bassani & Araújo, 2012) and the Local Adaptive Receptive Field Dimension Selective Self-organizing Map (LARFDSSOM) (Bassani & Araújo, 2015).
A surge in the availability of data from multiple sources and modalities is correlated with advances in how to obtain, compress, store, transfer, and process large amounts of complex high-dimensional data. The clustering challenge increases with the growth of data dimensionality which decreases the discriminate power of the distance metrics. Subspace clustering aims to group data drawn from a union of subspaces. In such a way, there is a large number of state-of-the-art approaches and we divide them into families regarding the method used in the clustering. We introduce a soft subspace clustering algorithm, a Self-organizing Map (SOM) with a time-varying structure, to cluster data without any prior knowledge of the number of categories or of the neural network topology, both determined during the training process. The model also assigns proper relevancies (weights) to different dimensions, capturing from the learning process the influence of each dimension on uncovering clusters. We employ a number of real-world datasets to validate the model. This algorithm presents a competitive performance in a diverse range of contexts among them data mining, gene expression, multi-view, computer vision and text clustering problems which include high-dimensional data. Extensive experiments suggest that our method very often outperforms the state-of-the-art approaches in all types of problems considered.
Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence
2020, Neural Networks
Citation Excerpt :
ART templates have specific properties and governing equations based on their internal representation, e.g., hyperboxes (Carpenter, Grossberg and Rosen, 1991); Gaussians (Vigdor & Lerner, 2007; Williamson, 1996); hyperspheres (Anagnostopoulos & Georgiopulos, 2000); hyperellipses (Anagnostopoulos & Georgiopoulos, 2001); and others. Numerous ART-based architectures have been conceived, such as predictive ART (ARTMAP) for supervised mappings (Carpenter, Grossberg, Markuzon, Reynolds, & Rosen, 1992; Carpenter, Grossberg and Reynolds, 1991); fusion ART (Tan, Carpenter, & Grossberg, 2007), whose variants have been effectively used for semi-supervised (Meng, Tan, & Xu, 2014), supervised (Tan, 1995), and reinforcement learning applications (Tan, 2004, 2006; Tan, Lu, & Xiao, 2008); Biclustering ARTMAP (BARTMAP) (Xu & Wunsch II, 2011) for biclustering applications, such as gene expression analysis (Xu & Wunsch II, 2011) and collaborative filtering (Elnabarawy, Wunsch II, & Abdelbar, 2016); and ART networks endowed with multiple vigilance tests (Brito da Silva & Wunsch II, 2017; Gomez-Sanchez, Dimitriadis, Cano-Izquierdo, & Lopez-Coronado, 2001; Huang, Cheng, Shih, & Chen, 2014; Seiffertt & Wunsch II, 2010). A brief review of ART networks related to the contributions of this work is provided next, where emphasis is given to the architectures used for comparison purposes, thereby making this paper self-contained.
This paper presents a novel adaptive resonance theory (ART)-based modular architecture for unsupervised learning, namely the distributed dual vigilance fuzzy ART (DDVFA). DDVFA consists of a global ART system whose nodes are local fuzzy ART modules. It is equipped with distributed higher-order activation and match functions and a dual vigilance mechanism. Together, these allow DDVFA to perform unsupervised modularization, create multi-prototype cluster representations, retrieve arbitrarily-shaped clusters, and reduce category proliferation. Another important contribution is the reduction of order-dependence, an issue that affects any agglomerative clustering method. This paper demonstrates two approaches for mitigating order-dependence: pre-processing using visual assessment of cluster tendency (VAT) or post-processing using a novel Merge ART module. The former is suitable for batch processing, whereas the latter also works for online learning. Experimental results in online mode carried out on 30 benchmark data sets show that DDVFA cascaded with Merge ART statistically outperformed the best other ART-based systems when samples were randomly presented. Conversely, they were found to be statistically equivalent in offline mode when samples were pre-processed using VAT. Remarkably, performance comparisons to non-ART-based clustering algorithms show that DDVFA (which learns incrementally) was also statistically equivalent to the non-incremental (offline) methods of density-based spatial clustering of applications with noise (DBSCAN), single linkage hierarchical agglomerative clustering (SL-HAC), and k-means, while retaining the appealing properties of ART. Links to the source code and data are provided. Considering the algorithm’s simplicity, online learning capability, and performance, it is an ideal choice for many agglomerative clustering applications.
Admiring the Great Mountain: A Celebration Special Issue in Honor of Stephen Grossberg's 80th Birthday
2019, Neural Networks
This editorial summarizes selected key contributions of Prof. Stephen Grossberg and describes the papers in this 80th birthday special issue in his honor. His productivity, creativity, and vision would each be enough to mark a scientist of the first caliber. In combination, they have resulted in contributions that have changed the entire discipline of neural networks. Grossberg has been tremendously influential in engineering, dynamical systems, and artificial intelligence as well. Indeed, he has been one of the most important mentors and role models in my career, and has done so with extraordinary generosity and encouragement. All authors in this special issue have taken great pleasure in hereby commemorating his extraordinary career and contributions.

View all citing articles on Scopus

View full text

BARTMAP: A viable structure for biclustering

Abstract

Introduction

Section snippets

Fuzzy ART

Data sets

Conclusions

Acknowledgements

Computers & Operations Research

Computer Vision, Graphics, and Image Processing

Neural Networks

Neural Networks

Discrete Applied Mathematics

Neural Networks

Cancer Cell

Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer

Journal of Computational Biology

Pattern recognition with fuzzy objective function algorithms

Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps

IEEE Transactions on Neural Networks

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

BMC Bioinformatics

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

BMC Bioinformatics

Biclustering of expression data with evolutionary computation

IEEE Transactions on Knowledge and Data Engineering

A permutation based algorithm for block clustering

Journal of Classification

Cluster analysis and display of genome-wide expression patterns

Proceedings of National Academic Science USA

Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data

Bioinformatics

Information theoretic clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

Adaptive pattern recognition and universal encoding II: feedback, expectation, olfaction, and illusions

Biological Cybernetics

Bayesian biclustering of gene expression data

BMC Genomics

Direct clustering of a data matrix

Journal of American Statistical Association

Comparing partitions

Journal of Classification

Cluster analysis for gene expression data: a survey

IEEE Transactions on Knowledge and Data Engineering

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks

Nature Medicine