Elsevier

Pattern Recognition

Volume 48, Issue 12, December 2015, Pages 3941-3958
Pattern Recognition

A structured view on pattern mining-based biclustering

https://doi.org/10.1016/j.patcog.2015.06.018Get rights and content

Highlights

  • Pattern mining (PM) searches enable flexible, exhaustive and efficient biclustering

  • Integration of existing dispersed PM-inspired contributions for biclustering.

  • Principles for guided design and evaluation of new PM-based biclustering approaches

  • PM-based biclustering solutions have parameterizable coherency and quality.

Abstract

Mining matrices to find relevant biclusters, subsets of rows exhibiting a coherent pattern over a subset of columns, is a critical task for a wide-set of biomedical and social applications. Since biclustering is a challenging combinatorial optimization task, existing approaches place restrictions on the allowed structure, coherence and quality of biclusters. Biclustering approaches relying on pattern mining (PM) allow an exhaustive yet efficient space exploration together with the possibility to discover flexible structures of biclusters with parameterizable coherency and noise-tolerance. Still, state-of-the-art contributions are dispersed and the potential of their integration remains unclear.

This work proposes a structured and integrated view of the contributions of state-of-the-art PM-based biclustering approaches, makes available a set of principles for a guided definition of new PM-based biclustering approaches, and discusses their relevance for applications in pattern recognition. Empirical evidence shows that these principles guarantee the robustness, efficiency and flexibility of PM-based biclustering.

Introduction

The clustering of data matrices groups rows according to their overall values across columns. However, in real-world contexts, the correlation of a subset of rows is typically only significant and meaningful for a subset of the overall columns [114]. Biclustering seeks to find sub-matrices (biclusters), subsets of rows with a coherent pattern across subsets of columns. Illustrating, given a matrix that captures the expression of a set of genes (rows) across a set of conditions (columns), a bicluster defines a group of genes with coherent expression for a subset of conditions. The biclustering task in this domain is critical for the discovery of putative transcriptional modules of genes that participate in a cellular process that is only active in specific conditions [46], [40]. Table 1 provides additional applications in biomedical and social domains, synthesizing the meaning and relevance of discovering biclusters for pattern recognition.

Recent findings from biomedical domains show that exhaustive and flexible approaches to biclustering provide an unprecedented opportunity for an unbiased assessment of the native structure and modular organization of biological networks [14], new insights on the molecular units involved in cellular functions [60], [111], and discriminative high-order combinations of single-nucleotide polymorphisms (SNPs) [42]. However, due to the complexity of the biclustering task1, most of the existing algorithms are either based on greedy or stochastic approaches, potentially producing sub-optimal and constrained biclustering solutions [82], [67]. Illustrative constraints that prevent the flexibility of the biclustering task include the search for a fixed number of biclusters, non-overlapping structures and biclusters with differential-values only (binary settings) or sequential constraints [81], [82], [119]. In this context, the survey of efficient optimal searches for flexible biclustering scenarios is the target task in this work.

The attempts to perform biclustering based on pattern mining (PM) techniques [86], [111], [97], referred in this work as PM-based biclustering, show solid results for efficient and flexible exhaustive searches. In fact, since pattern mining research is driven by scalability requirements [54], its integration with biclustering defines a new promising direction. Contributions of PM-based approaches for biclustering include:

  • efficient exhaustive searches: PM algorithms as-is allow for the efficient analysis of large matrices (over 10.000 × 400 elements). Additional PM principles can be used to foster scalability, including searches in distributed/partitioned data settings or targeting approximate patterns [54], [52].

  • dealing with missing and noisy values [62], [63]: PM methods can mine transactions with varying length, and therefore a specific element from the input matrix can be associated with zero or multiple values, allowing the removal or bounded estimations of a missing or noisy value.

  • inherent orientation to learn constant models, yet recently extended to also learn additive, multiplicative, symmetric, order-preserving and plaid models [62], [60], [63];

  • capturing biclusters from patterns with multiple levels of expression [96], [101]. This contrasts with the majority of existing approaches that rely on differential values or fixed coherency strength [119];

  • flexible structures of biclusters (arbitrary positioning of biclusters) and searches (no need to fix the number of biclusters apriori) [96], [111];

  • annotating the significance of biclusters with PM principles to assess the relevance of patterns [72];

  • easy extension for multi-class settings using discriminative PM or classification rules [43], [95];

  • easy incorporation of PM-based constraints that can be effectively used to guide the search, promoting both efficiency, by pruning the search space, and a focus on non-trivial biclusters [116].

These properties of PM-based biclustering approaches are critical to tackle the problems highlighted in Table 1. Although the latest biclustering advances for pattern recognition are increasingly deterministic [89], [110], [128], [137], [47], [35], [131], they fail to meet several of the enumerated properties of PM-based biclustering. Table 2 pinpoints the benefits of using PM-based biclustering for pattern recognition.

Despite these listed potentialities, recent surveys on biclustering [46], [40], [28], [114] fail to explore the opportunities associated with PM-based biclustering. Additionally, the existing efforts towards PM-based biclustering provide critical principles that are not yet integrated [14], [86], [111]. As such, there is still space for new approaches that benefit from the integration of principles provided by these existing contributions as well as from other fields of research. In this context, this work provides three major contributions:

  • motivates, formalizes and provides a qualitative and quantitative assessment of the state-of-the-art algorithms for PM-based biclustering;

  • offers a structured view on how to define, parameterize and extend PM-based biclustering by coherently integrating the available yet dispersed contributions;

  • further surveys PM principles as well as adequate preprocessing and postprocessing criteria to guarantee the robustness, flexibility and scalability of PM-based biclustering across domains.

The paper is organized as follows. The remainder of this section provides background on pattern mining and biclustering, and surveys the contributions from existing PM-based biclustering approaches. Section 2 introduces a consistent set of principles to guide the definition of PM-based biclustering approaches. In particular, 2.1 Mining options: discovery of biclusters using pattern mining, 2.2 Mapping options: preprocessing input data, 2.3 Closing options: postprocessing biclustering solutions cover principles according to three major decision dimensions (mining, mapping and closing), and Section 2.4 compares the behavior of state-of-the-art PM-based biclustering approaches and proposes a set of principles to address their current challenges. Section 3 provides initial empirical evidence of the relevance of the proposed principles. Finally, the implications of this work are synthesized.

Pattern mining: Frequent patterns are itemsets, rules, subsequences, or substructures that appear in a dataset with frequency no less than a user-specified threshold. Let L be a finite set of items, and P be an itemset PL. A transaction t is a pair (tid,P) with idN. An itemset database D over L is a finite set of transactions {t1,,tn}. A transaction (id,P) contains P, denoted P(tid,P), if PP. The coverage ΦP of an itemset P is the set of all transactions in D in which the itemset P occurs: ΦP={tDPt}. The support of an itemset P in D, denoted supP, can either be absolute, being its coverage size ΦP, or a relative threshold given by |ΦP|/|D|.

An association rule is defined as an implication of the form PP, where P,PL and PP=. The left-hand side of the rule is named antecedent and the right-hand side consequent. Given an itemset database D, the support of a rule, supPP, is given by sup(PP), and the confidence of a rule, confPP, is given by sup(PP)sup(P). Confidence reveals the strength of the rule (the conditional probability that a transaction that contains the items in the antecedent also contains the items in consequent).

Definition 1.1

Given an itemset database D and a minimum support and confidence thresholds, θ and δ:

  • frequent itemset mining (FIM) problem consists of computing the set {PPL,supPθ};

  • association rule mining aims to compute {(P,P)PL,PL,supPPθ,confPPδ}.

A frequent itemset or a pattern is an itemset with supPθ. To illustrate these concepts, consider the following itemset database, Dex={(t1,{B,E,G}),(t2,{A,B,C,E,H,J}),(t3,{A,B,D,H,J}),(t4,{D,H,J}),(t5,{A,H,J}),(t6,{A,G})}), with L=12. We have Φ{B,J}={t2,t3} and sup{B,J}={t2,t3}/6=0.(3). An illustrative rule in Dex is R1:{H,J}{A} with supR1=0.5 and confR1=0.75. For θ=4, the FIM tasks returns {{A},{H},{J},{H,J}}.

Consider two itemsets P and P, where PP, and a predicate M. M is monotonic when M(P)M(P) and anti-monotonic when ¬M(P)¬M(P). FIM approaches rely on these properties: the support of P is bounded by the support of P and, if P is not frequent, then P is also not frequent. Table 3 shows three major search variants that rely on these properties.

Since FIM proposal [2], multiple extensions have been proposed, including principles to enhance the scalability of pattern miners, and condensed and approximate pattern representations [24], [54].

Pattern mining has been additionally applied over structured datasets, leading to contributions in different fields, including sequential pattern mining [79], graph mining [129] and cube computation [55].

Biclustering: Biclustering allows the discovery of subspaces, each defining a subset of rows that show a coherent pattern that is observed for a subset of the overall columns.

Definition 1.2

Given a matrix, A=(X,Y), with a set of rows X={x1,,xn}, a set of columns Y={y1,,ym}, and elements aijR relating row i and column j:

  • A bicluster B=(I,J) is a r×s submatrix of A, where I=(i1,,ir)X is a subset of rows and J=(j1,,js)Y is a subset of columns;

  • The biclustering task is to identify a structure of biclusters B={B1,,Bp} such that each bicluster Bk=(Ik,Jk) satisfies specific criteria of homogeneity and significance.

The homogeneity criteria is commonly guaranteed through the use of a merit function to guide the search [98]. An illustrative merit function is the variance of values in the rows or columns in the bicluster. Merit functions can either define the homogeneity of each bicluster (intra-bicluster homogeneity) or the homogeneity of a set of biclusters (inter-bicluster homogeneity), allowing some biclusters to deviate from the expected homogeneity as long as the overall criterion is preserved. The merit function is the simplest way to affect the coherency, quality and structure. The coherency of a bicluster is defined by the observed correlation of values (Definition 1.3). Biclusters can follow dense, constant, additive, multiplicative, plaid or order-preserving coherencies, either across rows or columns [82]. The quality of a bicluster is defined by the type and amount of accommodated noise. The structure is defined by the number,2 size and positioning of biclusters. Flexible structures are characterized by an arbitrary-high set of (possibly overlapping) biclusters. The statistical significance of a bicluster determines how its probability of occurrence deviates from expectations. Following the taxonomy proposed by Madeira and Oliveira [82], Table 4 synthesizes the main biclustering approaches acccording to their search paradigm.

Definition 1.3

Let the elements in a bicluster aij(I,J) have coherency across rows given by aij=kj+γi+ηij, where kj is the expected value for column j, γi is the adjustment for row i, and ηij is the noise factor. Given a dataset A and a specific coherency strength δ[0,maxAminA], aij=kj+γi+ηij where ηij[kjδ/2,kj+δ/2]. The γ factors define the coherency assumption: constant when γ=0, multiplicative if aij is better described by kjγi+ηij, and additive otherwise. A plaid assumption considers the cumulative contributions from multiple biclusters on areas where their rows and columns overlap.

PM-based biclustering: While traditional biclustering approaches rely on flexible merit functions to guide the space exploration, PM-based approaches require these functions to be defined in terms of support and, eventually, confidence or other interestingness metrics. This restriction enables a scalable exhaustive space search that produces an arbitrarily high number of biclusters within a flexible structure.

Definition 1.4

Let A be a matrix whose values in R are assigned to a set of items L. A bicluster under a constant model can either follow: an overall orientation where aijL; a column-based orientation where aij=kj and kjL; or a row-based orientation where aij=ki and kiL. A bicluster following an additive (or multiplicative) model has aij=kj+γi (or aij=ki×γj), where kiR and γjR define the column and row contributions. A bicluster under a symmetric model either considers symmetries on rows ci×aij or columns cj×aij, where ci{1,1}.

Definition 1.5

Given a matrix A whose elements are the concatenation of the observed values aijL with their column (or row) indexes. Let ΨP of an itemset P in A be its set of indexes. set of biclusters k(Ik,Jk) can be derived from a set of frequent itemsets kPk by mapping (Ik,Jk)=Bk, where Bk=(ΦPk,ΨPk), to compose biclusters with coherency across rows, or (Ik,Jk)=(ΨPk,ΦPk) for column-coherency.

Two classes of PM-based biclustering approaches can be considered: (1) a first class targeting discrete matrices by using as-is pattern miners, and (2) a second class targeting numeric matrices by extending methods based on the introduced monotonic (or Apriori) property [2]. The first class of methods rely on an itemization step followed by the application of FIM under a low support threshold. The itemization step maps a real-value or discrete matrix into an itemset database. For real-value matrices, normalization and discretization procedures are applied. Then, the discrete value of each element is concatenated with its column index. Each transaction of the target itemset database corresponds to a row with these new values. FIM is then applied over this database to mine frequent patterns for composing biclusters with coherency across rows. The second class of methods relies on variants of the FIM task to learn frequent patterns directly from the real-valued matrix. In both classes, the coherency strength δ is implicitly defined by the number of items or the maximum allowed distance. Biclusters with coherency across columns can be mined using the transpose matrix. Finally, biclusters with coherent values overall can be discovered by mining one item (or range of values) at a time. Fig. 1 illustrates how to deliver these different types of biclusters using frequent patterns when considering the constant model.

To our knowledge, BicPAM [62], BiModule [96], DeBi [111], Bellay׳s et al. [14], GenMiner [86] and BiP [60] are the state-of-the-art methods for the first class of PM-based biclustering. BiModule [96], [97] allows a parameterized multi-value itemization of the input matrix to discover constant biclusters derived from (closed) frequent patterns using the LCM miner [125]. DeBi [111] derives biclusters from (maximal) frequent patterns mined over binarized matrices using the MAFIA miner [22], and places key post-processing principles to adjust them in order to guarantee their statistical significance. The recently proposed BicPAM [62], parameterized with the F2G miner [65] by default, extends the constant assumption of previous approaches to find biclusters with symmetric, additive and multiplicative factors by performing iterative corrections on the input matrix. BicPAM also surpasses discretization problems by introducing the possibility to assign multiple discrete values to a single element, and offers new strategies to robustly handle noise and missing values. Bellay׳s et al. method [14] uses the Apriori miner [2] with additional principles to evaluate the functional coherency of the discovered biclusters against the background noise. This is one of diverse PM-based attempts to exhaustively discover dense biclusters in either unweighted networks [13], [90], [133], [80] or, more interestingly, in scored networks [32], [30]. GenMiner [86] includes external knowledge within the input matrix to derive biclusters from association rules that relate annotations (external grouping of rows or columns) with clusters derived from (closed) frequent patterns using CLOSE [102]. BiP [60] is prepared to discover plaid models by relying on noise-tolerant association rules for the recovery of apparent noisy areas due to the presence of cumulative effects on the overlapping areas between biclusters.

The itemization step is optional for the second class of methods [8]. To our knowledge, RAP [101], RCB discovery [8] and ET-bicluster [52] are state-of-the-art methods here. RAP [101] plugs an adapted range-based metric to mine constant biclusters on rows (or columns), while RCB discovery targets biclusters with constant values overall [8]. ET-bicluster extends the previous approaches to discover noisy biclusters, although an exhaustive enumeration of biclusters is not guaranteed [52]. Alternative support metrics with dedicated Apriori-based searches have been additionally proposed [69], [115], [53].

Section snippets

PM-based biclustering

We propose a structured view of PM-based biclustering according to a set of dimensions of decision. We rely on state-of-the-art literature to characterize each dimension. These dimensions gather principles on different steps with impact on the biclusters’ type, structure and quality, as illustrated in Fig. 2, Fig. 3. Throughout this paper we define a set of principles for each step.

Different options for PM-based biclustering can be grouped according to its three major steps: mapping

Performance evaluation of PM-based biclustering approaches

This section evaluates the performance of PM-based biclustering approaches. We first describe the quality evaluation methodology and then present preliminary results on synthetic and real data.

Conclusions

This work provides a structure view on pattern mining-based approaches to biclustering as they are increasingly positioned as the means to perform exhaustive searches under relaxed conditions (flexible structures of biclusters with parameterizable coherency and quality) with heightened efficiency. In this context, this work surveys and integrates the contributions of existing PM-based biclustering approaches, evaluates their performance, and discusses their relevance for pattern recognition

Conflict of interest

None declared.

Acknowledgments

This work was supported by Fundação para a Ciência e a Tecnologia under the projects UID/CEC/50021/2013 and the PhD grant SFRH/BD/75924/2011 to RH.

Rui Henriques received a M.Sc. degree in computer science and engineering from Instituto Superior Tècnico (IST), Universidade de Lisboa. He is developing his Ph.D. studies in the field of learning from high-dimensional and structured data at IST and INESC-ID. He had received distinctions for his academic achievements by IST between 2006 and 2008, and a National Award for his merits by Caixa Geral de Depósitos, in 2009. He has also been a Business Analyst at McKinsey with wide exposure to

References (137)

  • I. Assent, R. Krieger, E. Muller, T. Seidl, DUSC: Dimensionality unbiased subspace clustering, in: ICDM,...
  • Assent Ira et al.

    Machine learning and knowledge discovery in databases

  • Gowtham Atluri, Jeremy Bellay, Gaurav Pandey, Chad Myers, Vipin Kumar, Discovering coherent value bicliques in genetic...
  • R. Rathipriya et al.

    Binary particle swarm optimization based biclustering of web usage data

    CoRR

    (2011)
  • Simon Barkow et al.

    Bicata biclustering analysis toolbox

    Bioinformatics

    (2006)
  • Roberto J. Bayardo

    Efficiently mining long patterns from databases

    SIGMOD Rec.

    (1998)
  • Gürkan Bebek, Jiong Yang, Pathfinder: mining signal transduction pathway segments from protein–protein interaction...
  • Jeremy Bellay, Gowtham Atluri, Tina L. Sing, Kiana Toufighi, Michael Costanzo, Philippe Souza Moraes Ribeiro, Gaurav...
  • Jeremy Bellay

    Putting genetic interactions in context through a global modular decomposition

    Genome Res.

    (2011)
  • Amir Ben-Dor et al.

    Discovering local structure in gene expression datathe order-preserving submatrix problem

    RECOMB

    (2002)
  • G.F. Berriz et al.

    Characterizing gene sets with FuncAssociate

    Bioinformatics

    (2003)
  • Manuele Bicego, Pietro Lovato, Alberto Ferrarini, Massimo Delledonne, Biclustering of expression microarray data with...
  • Sylvain Blachon et al.

    Clustering formal concepts to discover biologically relevant knowledge from gene expression data

    In Silico Biol.

    (2007)
  • Jean-François Boulicaut, Jérémy Besson, Actionability and formal concepts: a data mining perspective, in: IC on Formal...
  • Jean-François Boulicaut, Inductive databases and multiple uses of frequent itemsets: The cInQ approach, in: Rosa Meo,...
  • Doruk Bozdağ et al.

    Comparative analysis of biclustering algorithms

    Bioinformatics and Computational Biology

    (2010)
  • Douglas Burdick, Manuel Calimlim, Johannes Gehrke, Mafia: a maximal frequent itemset algorithm for transactional...
  • Stanislav Busygin et al.

    Biclustering EEG data from epileptic patients treated with vagus nerve stimulation

    (2007)
  • Toon Calders, Bart Goethals, Mining all non-derivable frequent itemsets, in: PKDD, Springer-Verlag, London, UK, 2002,...
  • Toon Calders, Bart Goethals, Szymon Jaroszewicz, Mining rank-correlated sets of numerical attributes, In: ACM SIGKDD,...
  • Pedro Carmona-Saez et al.

    Integrated analysis of gene expression by association rules discovery

    BMC Bioinform.

    (2006)
  • André Valério Carreiro et al.

    Towards a classification approach using meta-biclusteringimpact of discretization in the analysis of expression time series

    J. Integr. Bioinf.

    (2012)
  • Malika Charrad et al.

    Simultaneous clustering: a survey, Pattern Recognition and Machine Intelligence

  • Yizong Cheng et al.

    Biclustering of expression data

    Intelligent Systems for Molecular Biology

    (2000)
  • Recep Colak et al.

    Module discovery by exhaustive search for densely connected, co-expressed regions in biomolecular interaction networks

    PLoS One

    (2010)
  • Chad Creighton et al.

    Mining gene expression databases for association rules

    Bioinformatics

    (2003)
  • Phuong Dao et al.

    Inferring cancer subnetwork markers using density-constrained biclustering

    Bioinformatics

    (2010)
  • P.A.D. de Castro et al.

    Applying biclustering to perform collaborative filtering

    Intell. Syst. Des. Appl.

    (2007)
  • M.C.P. de Souto, D.S.A. de Araujo, I.G. Costa, R. Soares, T.B. Ludermir, A. Schliep, Comparative study on normalization...
  • Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha, Information-theoretic co-clustering, in: KDD, ACM, New...
  • Chris Ding, Ya Zhang, Tao Li, Stephen R. Holbrook, Biclustering protein complex interactions with a biclique finding...
  • E. Elhamifar, R. Vidal, Sparse subspace clustering, in: Computer Vision and Pattern Recognition, June 2009, pp....
  • Kemal Eren et al.

    A comparative analysis of biclustering algorithms for gene expression data

    Brief. Bioinf.

    (2013)
  • Nikita Boyko. Neng Fan et al.
  • Gang Fang et al.

    High-order SNP combinations associated with complex diseasesefficient discovery, statistical power and functional interactions

    Plos One

    (2012)
  • Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers, Vipin Kumar, Subspace differential coexpression...
  • Paolo Favaro, René Vidal, Paolo Favaro, Avinash Ravichandran, A closed form solution to robust subspace estimation and...
  • Usama M. Fayyad, Keki B. Irani, Multi-interval discretization of continuous-valued attributes for classification...
  • Adelaide Freitas et al.

    Survey on biclustering of gene expression data

    Biological Knowledge Discovery Handbook

    (2012)
  • Elisabeth Georgii et al.

    Analyzing microarray data using quantitative association rules

    Bioinformatics

    (2005)
  • Cited by (0)

    Rui Henriques received a M.Sc. degree in computer science and engineering from Instituto Superior Tècnico (IST), Universidade de Lisboa. He is developing his Ph.D. studies in the field of learning from high-dimensional and structured data at IST and INESC-ID. He had received distinctions for his academic achievements by IST between 2006 and 2008, and a National Award for his merits by Caixa Geral de Depósitos, in 2009. He has also been a Business Analyst at McKinsey with wide exposure to real-life projects.

    Cláudia Antunes received her Ph.D. from Instituto Superior Tècnico (IST, University of Lisbon, Portugal) in the domain of data mining and machine learning, proposing new methods to deal with temporal data, in particular for mining event sequential patterns. She is currently a Professor at DEI department at IST and the scientific coordinator of two projects funded by FCT in the areas of domain-driven data mining and educational data mining. Clàudia has been working on methods for general pattern mining, from transactional to structured data. Her main interests are centered on mining complex knowledge from complex data, with emphasis on the incorporation of background knowledge in the pattern mining process.

    Sara C. Madeira received a (5-year) B.Sc. degree in computer science from the University of Beira Interior, Covilh, Portugal, in 2000, and the M.Sc. and Ph.D. degrees in computer science and engineering (CSE) at Instituto Superior Tcnico (IST), Technical University of Lisbon, in 2002 and 2008. She is currently an Assistant Professor, at the CSE department at IST, and a Senior Researcher at INESC-ID, Lisbon. Her research interests include algorithms and data structures, data mining, machine learning, bioinformatics and medical informatics.

    View full text