BARTMAP: A viable structure for biclustering
Introduction
Clustering Bezdek (1981), Oliveira and Pedrycz (2007), Sato-Ilic and Jain (2006), Gokcay and Principe (2002), Xu and Wunsch (2005) and Xu and Wunsch (2009) has been used extensively in the analysis of high-throughput messenger RNA (mRNA) or microRNA expression profiling with microarrays, which demonstrate enormous promise in the areas of cancer diagnosis and treatment, gene function identification, therapy development and drug testing, and genetic regulatory network inference (Eisen et al., 1998, Golub et al., 1999, Jiang et al., 2004, McLachlan et al., 2004, Moreau et al., 2002, Shamir and Sharan, 2002, Xu and Wunsch, 2010, Xu and Wunsch, 2009, Xu and Wunsch, 2005). Usually, the expression levels of many genes are measured across a relatively small set of conditions or samples, and the obtained gene expression data are organized as a data matrix with rows corresponding to genes and columns corresponding to samples or conditions.
Let a gene expression data set be represented as a data matrix , with representing a set of genes or rows and representing a set of samples (conditions)1 or columns (see Table 1). An element then represents the expression level of gene under condition . A gene or row cluster is a subset of rows defined on all columns, denoted as , where is a subset of genes. Similarly, a sample or column cluster is a subset of columns defined across all rows, denoted as , where is a subset of samples.
Such a data matrix and the corresponding row and column cluster definition can be generalized as a data matrix for many other applications. Such a practice, however, is inherently limited in analysis. This is because our general understanding of cellular processes elucidates that only a subset of genes is involved with a specific cellular process that becomes active only under some experimental conditions. However, microarrays generally are not specifically designed to meet the requirements of a particular experiment. Consider, for example, that in gene expression profile-based cancer diagnosis, only a subset of genes is related to some cancer type while numerous genes are considered irrelevant. In this case, the inclusion of all genes in sample clustering or all samples in gene clustering not only increases the computational burden, but could impair clustering performance due to the effect of these unrelated genes or samples, which are treated as noise.
Biclustering, first used by Cheng and Church (2000) in the community of bioinformatics, addresses this problem by performing clustering simultaneously on both row (gene) and column (sample) dimensions instead of clustering these two dimensions separately (Busygin et al., 2008, Madeira and Oliveira, 2004). In essence, biclustering can be regarded as a combination of clustering and automatic feature selection if we treat one dimension (e.g., column) as data objects and the other dimension (e.g., row) as description features. This task becomes particularly challenging without ground truth.
Note that the feature selection in biclustering is different from the feature selection that we usually consider in supervised classification. This difference lies in the fact that biclustering selects different subsets of features for different clusters of data objects, while standard feature selection chooses a subset of features from the candidate pool for all data objects. As such, the local relationship between subsets of genes and subsets of samples or conditions is identified with biclustering. Biclustering indicates gene groups that display similar patterns across a set of conditions (important to gene functional annotation and coregulated gene identification DiMaggio et al., 2008a, Lazzeroni and Owen, 2002, Li et al., 2009, Madeira et al., 2010, Segal et al., 2001, Tanay et al., 2002) or gene groups that are related to certain cancer types (for cancer classification discovery and diagnosis Getz et al., 2003, Kluger et al., 2003, Murali and Kasif, 2003, Tang and Zhang, 2005, Tchagang et al., 2008). In contrast, clustering focuses on uncovering global relationships between genes and samples or conditions. In fields other than bioinformatics, biclustering is also known as subspace clustering, co-clustering, or block clustering, among other names Busygin et al. (2008), Kriegel, Kröger, and Zimek (2009) and Madeira and Oliveira (2004). See the informative survey paper Kriegel et al. (2009).
Specifically, let and be subsets of the rows and columns for the gene expression data matrix is then the submatrix with rows and columns . A bicluster corresponds to such a submatrix that exhibits certain homogeneity. The goal of a biclustering algorithm, then, is to identify a set of biclusters with pairs of row and column subsets. The complexity of the biclustering problem has been shown to be NP-complete (Madeira and Oliveira, 2004, Peeters, 2003), which leads to many heuristics falling into the five major categories summarized in Table 2 according to Madeira and Oliveira (2004). For example, Cheng and Church (2000) used the mean squared residue to measure the coherence of the rows and columns in a bicluster, which is defined as, where is the mean of the th row, is the mean of the th column, and is the mean of the bicluster. A submatrix is called a -bicluster if for some . Possible ways to find the largest -biclusters, which should also have relatively high row variance Cheng and Church (2000), include the evolutionary algorithm (Divina & Aguilar-Ruiz, 2006), multi-objective particle swarm optimization (Liu, Li, Hu, & Chen, 2009), and other greedy iterative search approaches, such as the FLOC (FLexible Overlapped biClustering) algorithm (Yang, Wang, Wang, & Yu, 2003). Lazzeroni and Owen (2002) developed a plaid model treating the matrix as the sum of additive layers. This model was further extended to combine external grouping information and to generate biclusters of profiles of repeated measures (Turner, Bailey, Krzanowski, & Hemingway, 2005). Gu and Liu (2007) proposed a Bayesian biclustering model with a Gibbs sampling procedure for statistical inference. DiMaggio et al. (2008b) considered modeling biclustering as either a network flow problem or a traveling salesman problem. A systematic comparison of five greedy search-based biclustering algorithms, together with a reference method (Bimax), was presented in Prelić et al. (2006).
Early cancer diagnosis is critical for prolonging human life. As aforementioned, biclustering overcomes the disadvantage of clustering in dealing with unrelated genes and focuses only on the cancer type-related genes, thus providing a more effective method of cancer classification and diagnosis. Biclustering algorithms that are used for this purpose include the interrelated two-way clustering (ITWC) algorithm (Tang & Zhang, 2005), the coupled two-way clustering (CTWC) algorithm (Getz et al., 2003), xMOTIF (Murali & Kasif, 2003), Binary State Pattern Clustering (BSPC) (Beattie & Robinson, 2006), the qualitative biclustering algorithm (QUBIC) (Li et al., 2009), the spectral biclustering algorithm (Kluger et al., 2003), the double conjugated clustering (DCC) algorithm (Busygin, Jacobsen, & Kramer, 2002), and fuzzy adaptive subspace iteration-based two-way clustering (FASIC) (Shaik & Yeasin, 2009), among others. For example, ITWC generates biclusters by combining clustering results from each dimension of the data matrix in an iterative way (Tang & Zhang, 2005). Within each iteration, a set of gene clusters is first created using standard clustering algorithms, e.g., -means or self-organizing maps, followed by the independent clustering of the sample dimension based on each gene cluster. The clustering results from the previous steps are then combined, and heterogeneous groups, which are pairs of columns that do not share gene features used for clustering, are identified. Finally, the genes are sorted in descending order of the cosine distance. Only the first one third of sorted genes is kept so that a reduced gene set is obtained for each heterogeneous group. The above steps are then repeated using the reduced gene set until some stopping conditions are satisfied.
CTWC uses hierarchical clustering with the Euclidean distance (Getz et al., 2003), while DCC relies on self-organizing maps with the dot product for similarity (Busygin et al., 2002). FASIC adopts the concept of the fuzzy set theory and applies the fuzzy-adaptive subspace iteration algorithm to generate gene clusters under a progressive clustering framework. Those clusters are then scored to generate -values indicating the significance of the differential expression (Shaik & Yeasin, 2009). Tchagang et al. demonstrated the effectiveness of biclustering for detecting and diagnosing ovarian cancer (Tchagang et al., 2008). Mitra et al. showed a biclustering application for microRNA expression profile-based cancer classification of multiple types, including melanoma, ovarian, renal, colon, leukemia, and so on (Mitra, Bandyopadhyay, Maulik, & Zhang, 2010).
Here, we present BARTMAP (Biclustering ARTMAP) to perform biclustering on gene expression data, particularly for cancer classification discovery, although BARTMAP can be used to other types of data that have high dimensionalities, e.g., document clustering. BARTMAP is adapted to and modified from a neural-based classifier, Fuzzy ARTMAP (Carpenter, Grossberg, Markuzon, Reynolds, & Rosen, 1992), for supervised classification. Fuzzy ARTMAP, together with its other variants, such as Ellipsoid ARTMAP (Anagnostopoulos & Georgiopoulos, 2001) and default ARTMAP (Carpenter, 2003), already have shown great effectiveness in gene expression data analysis and other biomedical applications (Xu et al., 2007, Xu et al., 2009). Similar to Fuzzy ARTMAP, BARTMAP is based on Adaptive Resonance Theory (ART) (Carpenter and Grossberg, 1987, Grossberg, 1976). ART is a learning theory that hypothesizes that resonance in neural circuits can trigger fast learning; it was developed as a solution to the plasticity-stability dilemma.
BARTMAP displays many attractive characteristics that are also inherent and general in the ART family. First, BARTMAP scales very well with large-scale data analysis while maintaining efficiency. As the computational complexity for its ART modules is or for one-pass variant (Mulder & Wunsch, 2003), the overall computational cost for BARTMAP is relatively low. Each ART module can dynamically and adaptively generate clusters without the requirement of specifying the number of clusters in advance as is required in many other commonly used clustering algorithms, such as -means and self-organizing feature maps. Furthermore, BARTMAP is an exemplar-based, transparent learning model. During its learning, the architecture summarizes data via the use of exemplars in order to accomplish its learning objective. This ability contrasts with other, opaque neural network architectures for which it is generally difficult to explain why an input produces a particular output. Another important feature of BARTMAP is its ability to detect atypical patterns during its learning. The detection of such patterns is accomplished via the employment of a match-based criterion that decides to which degree a particular pattern matches the characteristics of an already-formed category in BARTMAP. Finally, BARTMAP is far simpler to implement than, for example, backpropagation for feed-forward neural networks and the training algorithm of support vector machines.
The remainder of this paper is organized as follows. Section 2 begins with a description of Fuzzy ART and Fuzzy ARTMAP, followed by how BARTMAP performs biclustering. The experimental data, design, and results are presented and discussed in Section 3. We conclude the paper in Section 4.
Section snippets
Fuzzy ART
Because Fuzzy ART (Carpenter, Grossberg, & Rosen, 1991) serves as the basic module for both Fuzzy ARTMAP and BARTMAP, we commence this section with a brief introduction of Fuzzy ART.
The basic Fuzzy ART architecture consists of two-layer neurons, the feature representation field , and the category representation field , as shown in Fig. 1. The neurons in layer are activated by the input pattern that is normalized with the complement coding rule to avoid category proliferation (Carpenter
Data sets
The proposed method is applied to three benchmark data sets in gene expression profile-based cancer research and one data set in microRNA profile-based cancer research.
The first data set is the leukemia data set consists of 72 samples, including bone marrow samples, peripheral blood samples and childhood AML cases (Golub et al., 1999). Twenty-five of these samples are acute myeloid leukemia (AML), and 47 are acute lymphoblastic leukemia (ALL), which is composed of two subcategories due to the
Conclusions
Biclustering has been demonstrated to be a promising method of high dimensional gene expression data analysis, or other types of high-dimensional data analysis, by automatically combining feature selection with clustering without any prior information. Here, we show that a well-known supervised classification system, ARTMAP, can be modified to achieve biclustering in a fast, stable, and adaptive way. Experimental results on benchmark gene expression or microRNA expression data sets indicate the
Acknowledgements
Partial support for this research from the National Science Foundation, the Missouri University of Science & Technology Intelligent Systems Center, and the M.K. Finley Missouri endowment, is gratefully acknowledged.
References (59)
- et al.
Biclustering in data mining
Computers & Operations Research
(2008) - et al.
A massively parallel architecture for a self-organizing neural pattern recognition machine
Computer Vision, Graphics, and Image Processing
(1987) - et al.
Fuzzy ART: fast stable learning and categorization of analog patterns by an adaptive resonance system
Neural Networks
(1991) - et al.
Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks
Neural Networks
(2003) The maximum edge biclique problem is NP-complete
Discrete Applied Mathematics
(2003)- et al.
MicroRNA expression profile-based cancer classification using default ARTMAP
Neural Networks
(2009) - et al.
Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling
Cancer Cell
(2002) - Anagnostopoulos, G., & Georgiopoulos, M. (2001). Ellipsoid ART and ARTMAP for incremental unsupervised and supervised...
- et al.
Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer
Journal of Computational Biology
(2006) - Ben-Dor, A., Chor, B., Karp, R., & Yakhini, Z. (2002). Discovering local structure in gene expression data: the...
Pattern recognition with fuzzy objective function algorithms
Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps
IEEE Transactions on Neural Networks
Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
BMC Bioinformatics
Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
BMC Bioinformatics
Biclustering of expression data with evolutionary computation
IEEE Transactions on Knowledge and Data Engineering
A permutation based algorithm for block clustering
Journal of Classification
Cluster analysis and display of genome-wide expression patterns
Proceedings of National Academic Science USA
Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data
Bioinformatics
Information theoretic clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
Adaptive pattern recognition and universal encoding II: feedback, expectation, olfaction, and illusions
Biological Cybernetics
Bayesian biclustering of gene expression data
BMC Genomics
Direct clustering of a data matrix
Journal of American Statistical Association
Comparing partitions
Journal of Classification
Cluster analysis for gene expression data: a survey
IEEE Transactions on Knowledge and Data Engineering
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks
Nature Medicine
Cited by (41)
Type2 soft biclustering framework for Alzheimer microarray
2024, Applied Soft ComputingJoint feature selection and optimal bipartite graph learning for subspace clustering
2023, Neural NetworksTopological biclustering ARTMAP for identifying within bicluster relationships
2023, Neural NetworksCitation Excerpt :Such visualization aims to provide a way to observe associations between different biclusters. This section presents a compendium on TopoART (Tscherepanow, 2010) and BARTMAP (Xu & Wunsch II, 2011) while briefly introducing Adaptive Resonance Theory thus furnishing principal details for explicating TopoBARTMAP. The reader is referred to Brito da Silva, Elnabarawy, and Wunsch I.I. (2019), Carpenter and Grossberg (2010), Carpetner, Grossberg, and Rosen (1991), Grossberg (2013), Wunsch II (2019) for a rigorous exploration of “Winner-Take-All” (WTA) learning models based on Adaptive Resonance Theory (ART) and their applications.
Self-organizing subspace clustering for high-dimensional and multi-view data
2020, Neural NetworksCitation Excerpt :They take advantage of a neural network’s capacity for performing a relevant number of categorization tasks. Considering a timeline, the first algorithms are Generalized Relevance Learning Vector Quantization (GLVQ) (Hammer & Villmann, 2002) and Biclustering ARTMAP (BARTMAP) (Xu & Wunsch I.I., 2011). Then several SOM-based approaches were proposed: The Dimension Selective Self-organizing Map (DSSOM) (Bassani & Araújo, 2012) and the Local Adaptive Receptive Field Dimension Selective Self-organizing Map (LARFDSSOM) (Bassani & Araújo, 2015).
Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence
2020, Neural NetworksCitation Excerpt :ART templates have specific properties and governing equations based on their internal representation, e.g., hyperboxes (Carpenter, Grossberg and Rosen, 1991); Gaussians (Vigdor & Lerner, 2007; Williamson, 1996); hyperspheres (Anagnostopoulos & Georgiopulos, 2000); hyperellipses (Anagnostopoulos & Georgiopoulos, 2001); and others. Numerous ART-based architectures have been conceived, such as predictive ART (ARTMAP) for supervised mappings (Carpenter, Grossberg, Markuzon, Reynolds, & Rosen, 1992; Carpenter, Grossberg and Reynolds, 1991); fusion ART (Tan, Carpenter, & Grossberg, 2007), whose variants have been effectively used for semi-supervised (Meng, Tan, & Xu, 2014), supervised (Tan, 1995), and reinforcement learning applications (Tan, 2004, 2006; Tan, Lu, & Xiao, 2008); Biclustering ARTMAP (BARTMAP) (Xu & Wunsch II, 2011) for biclustering applications, such as gene expression analysis (Xu & Wunsch II, 2011) and collaborative filtering (Elnabarawy, Wunsch II, & Abdelbar, 2016); and ART networks endowed with multiple vigilance tests (Brito da Silva & Wunsch II, 2017; Gomez-Sanchez, Dimitriadis, Cano-Izquierdo, & Lopez-Coronado, 2001; Huang, Cheng, Shih, & Chen, 2014; Seiffertt & Wunsch II, 2010). A brief review of ART networks related to the contributions of this work is provided next, where emphasis is given to the architectures used for comparison purposes, thereby making this paper self-contained.