Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 11/2018

07.03.2018 | Original Article

A new FCA-based method for identifying biclusters in gene expression data

verfasst von: Amina Houari, Wassim Ayadi, Sadok Ben Yahia

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 11/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Biclustering has been very relevant within the field of gene expression data analysis. In fact, its main thrust stands in its ability to identify groups of genes that behave in the same way under a subset of samples (conditions). However, the pioneering algorithms of the literature has shown some limits in terms of the quality of unveiled biclusters. In this paper, we introduce a new algorithm, called BiFCA+, for biclustering microarray data. BiFCA+ heavily relies on the mathematical background of the formal concept analysis, in order to extract the set of biclusters. In addition, the Bond correlation measure is of use to filter out the overlapping biclusters. The extensive experiments, carried out on real-life datasets, shed light on BiFCA+’s ability to identify statistically and biologically significant biclusters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Fußnoten
1
We use a separator-free abbreviated form for the sets; e.g., \(\{I_{1}I_{2}I_{3}\}\) stands for the set of items \(\{I_{1}, I_{2}, I_{3}\}\).
 
2
This may be either monotone increasing, monotone decreasing, up–down or down–up, etc.
 
7
The human B-cell lymphoma dataset version that we have does not contain the names of genes to perform other tests.
 
11
The adjusted significance scores assess genes in each bicluster, which indicates how well they match with the different GO categories.
 
Literatur
1.
Zurück zum Zitat Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511CrossRef Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511CrossRef
3.
Zurück zum Zitat Ayadi W (2011) Algorithmes systematiques et stochastiques de biregroupement pour l’analyse des donnees biopuces. Ph.D. thesis, University of Angers, France Ayadi W (2011) Algorithmes systematiques et stochastiques de biregroupement pour l’analyse des donnees biopuces. Ph.D. thesis, University of Angers, France
4.
Zurück zum Zitat Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data. BioData Mining 2:9CrossRef Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data. BioData Mining 2:9CrossRef
5.
Zurück zum Zitat Ayadi W, Elloumi M, Hao JK (2010) Iterated local search for biclustering of microarray data. In: pattern recognition in bioinformatics–5th IAPR international conference, PRIB 2010, Nijmegen, The Netherlands, September 22-24, 2010. Proceedings, pp. 219–229 Ayadi W, Elloumi M, Hao JK (2010) Iterated local search for biclustering of microarray data. In: pattern recognition in bioinformatics–5th IAPR international conference, PRIB 2010, Nijmegen, The Netherlands, September 22-24, 2010. Proceedings, pp. 219–229
6.
Zurück zum Zitat Ayadi W, Elloumi M, Hao JK (2012) Bicfinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst 30(2):341–358CrossRef Ayadi W, Elloumi M, Hao JK (2012) Bicfinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst 30(2):341–358CrossRef
7.
Zurück zum Zitat Ayadi W, Elloumi M, Hao JK (2012) Bimine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data. Knowl Based Syst 35:224–234CrossRef Ayadi W, Elloumi M, Hao JK (2012) Bimine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data. Knowl Based Syst 35:224–234CrossRef
9.
Zurück zum Zitat Ben-Dor A, Chor B, Karp RM, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3/4):373–384CrossRef Ben-Dor A, Chor B, Karp RM, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3/4):373–384CrossRef
10.
Zurück zum Zitat Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003CrossRef Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003CrossRef
11.
Zurück zum Zitat Berriz GF, King OD, Bryant B, Sander C, Roth FP (2003) Characterizing gene sets with funcassociate. Bioinformatics 19:2502–2504CrossRef Berriz GF, King OD, Bryant B, Sander C, Roth FP (2003) Characterizing gene sets with funcassociate. Bioinformatics 19:2502–2504CrossRef
13.
14.
Zurück zum Zitat Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1:24–45CrossRef Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1:24–45CrossRef
17.
Zurück zum Zitat Cheng KO, Law NF, Siu WC, Liew AWC (2008) Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinform 9:210CrossRef Cheng KO, Law NF, Siu WC, Liew AWC (2008) Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinform 9:210CrossRef
18.
Zurück zum Zitat Cheng Y, Church GM (2000) Biclustering of expression data. In: proc of ISMB, UC San Diego, California, pp 93–103 Cheng Y, Church GM (2000) Biclustering of expression data. In: proc of ISMB, UC San Diego, California, pp 93–103
19.
Zurück zum Zitat Cheng Y, Church GM (2006) Biclustering of expression data. Tech. rep., supplementary information Cheng Y, Church GM (2006) Biclustering of expression data. Tech. rep., supplementary information
20.
Zurück zum Zitat Das S, Idicula SM (2010) Application of cardinality based grasp to the biclustering of gene expression data. Int J Comput Appl 1:44–53 Das S, Idicula SM (2010) Application of cardinality based grasp to the biclustering of gene expression data. Int J Comput Appl 1:44–53
21.
Zurück zum Zitat Divina F, Aguilar-Ruiz JS (2007) A multi-objective approach to discover biclusters in microarray data. In: genetic and evolutionary computation conference, GECCO 2007, proceedings, London, England, UK, July 7–11, 2007, pp 385–392. https://doi.org/10.1145/1276958.1277038 Divina F, Aguilar-Ruiz JS (2007) A multi-objective approach to discover biclusters in microarray data. In: genetic and evolutionary computation conference, GECCO 2007, proceedings, London, England, UK, July 7–11, 2007, pp 385–392. https://​doi.​org/​10.​1145/​1276958.​1277038
22.
Zurück zum Zitat Divina F, AguilarRuiz JS (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18(5):590–602CrossRef Divina F, AguilarRuiz JS (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18(5):590–602CrossRef
25.
Zurück zum Zitat Freitas A, Ayadi W, Elloumi M, Oliveira LJ, Hao JK (2013) Survey on biclustering of gene expression data. In: Elloumi M, Zomaya AY (eds) Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data. Wiley, Hoboken, New Jersey, pp 591–608CrossRef Freitas A, Ayadi W, Elloumi M, Oliveira LJ, Hao JK (2013) Survey on biclustering of gene expression data. In: Elloumi M, Zomaya AY (eds) Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data. Wiley, Hoboken, New Jersey, pp 591–608CrossRef
26.
Zurück zum Zitat Gallo CA, Carballido JA, Ponzoni I (2009) Microarray biclustering: a novel memetic approach based on the pisa platform. In: Pizzuti C, Ritchie MD, Giacobini M (eds) Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO 2009. Springer, Berlin, Heidelberg, pp 44–55CrossRef Gallo CA, Carballido JA, Ponzoni I (2009) Microarray biclustering: a novel memetic approach based on the pisa platform. In: Pizzuti C, Ritchie MD, Giacobini M (eds) Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO 2009. Springer, Berlin, Heidelberg, pp 44–55CrossRef
27.
Zurück zum Zitat Ganter B, Wille R (1999) Formal concept analysis–mathematical foundations. Springer Ganter B, Wille R (1999) Formal concept analysis–mathematical foundations. Springer
29.
Zurück zum Zitat Henriques R, Antunes C, Madeira SC (2013) Methods for the efficient discovery of large item-indexable sequential patterns. In: New frontiers in mining complex patterns–second international workshop, NFMCP 2013, Held in Conjunction with ECML-PKDD 2013, Prague, Czech Republic, September 27, 2013, Revised Selected Papers, pp 100–116. https://doi.org/10.1007/978-3-319-08407-7_7 Henriques R, Antunes C, Madeira SC (2013) Methods for the efficient discovery of large item-indexable sequential patterns. In: New frontiers in mining complex patterns–second international workshop, NFMCP 2013, Held in Conjunction with ECML-PKDD 2013, Prague, Czech Republic, September 27, 2013, Revised Selected Papers, pp 100–116. https://​doi.​org/​10.​1007/​978-3-319-08407-7_​7
37.
Zurück zum Zitat Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20:1993–2003CrossRef Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20:1993–2003CrossRef
39.
Zurück zum Zitat Kaytoue M, Kuznetsov SO, Napoli A (2011) Biclustering numerical data in formal concept analysis. In: proc of ICFCA, Leuven, Belgium, pp 135–150CrossRef Kaytoue M, Kuznetsov SO, Napoli A (2011) Biclustering numerical data in formal concept analysis. In: proc of ICFCA, Leuven, Belgium, pp 135–150CrossRef
42.
Zurück zum Zitat Kumar CA (2012) Fuzzy clustering-based formal concept analysis for association rules mining. Appl Artif Intell 26(3):274–301CrossRef Kumar CA (2012) Fuzzy clustering-based formal concept analysis for association rules mining. Appl Artif Intell 26(3):274–301CrossRef
43.
Zurück zum Zitat Lehmann F, Wille R (1995) A triadic approach to formal concept analysis. In: Conceptual structures: applications, implementation and theory, third international conference on conceptual structures, ICCS ’95, Santa Cruz, California, USA, August 14–18, 1995, proceedings, pp 32–43. https://doi.org/10.1007/3-540-60161-9_27 CrossRef Lehmann F, Wille R (1995) A triadic approach to formal concept analysis. In: Conceptual structures: applications, implementation and theory, third international conference on conceptual structures, ICCS ’95, Santa Cruz, California, USA, August 14–18, 1995, proceedings, pp 32–43. https://​doi.​org/​10.​1007/​3-540-60161-9_​27 CrossRef
47.
Zurück zum Zitat Liu J, Li Z, Liu F, Chen Y (2008) Multi-objective particle swarm optimization biclustering of microarray data. In: 2008 IEEE international conference on bioinformatics and biomedicine, BIBM 2008, 3–5 November 2008, Philadephia, Pennsylvania, USA, pp 363–366. https://doi.org/10.1109/BIBM.2008.17 Liu J, Li Z, Liu F, Chen Y (2008) Multi-objective particle swarm optimization biclustering of microarray data. In: 2008 IEEE international conference on bioinformatics and biomedicine, BIBM 2008, 3–5 November 2008, Philadephia, Pennsylvania, USA, pp 363–366. https://​doi.​org/​10.​1109/​BIBM.​2008.​17
48.
Zurück zum Zitat Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19(4):474–482CrossRef Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19(4):474–482CrossRef
50.
Zurück zum Zitat Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39:2464–2477CrossRef Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39:2464–2477CrossRef
51.
Zurück zum Zitat Mondal KC, Pasquier N (2014) Galois closure based association rule mining from biological data. In: Elloumi M, Zomaya AY (eds) Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data. Wiley, Hoboken, New Jersey, pp 761–802 Mondal KC, Pasquier N (2014) Galois closure based association rule mining from biological data. In: Elloumi M, Zomaya AY (eds) Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data. Wiley, Hoboken, New Jersey, pp 761–802
52.
Zurück zum Zitat Mondal KC, Pasquier N, Mukhopadhyay A, Maulik U, Bandyopadhyay S (2012) A new approach for association rule mining and bi-clustering using formal concept analysis. In: proc of machine learning and data mining in pattern recognition (MLDM), Berlin, Germany, pp 86–101 Mondal KC, Pasquier N, Mukhopadhyay A, Maulik U, Bandyopadhyay S (2012) A new approach for association rule mining and bi-clustering using formal concept analysis. In: proc of machine learning and data mining in pattern recognition (MLDM), Berlin, Germany, pp 86–101
53.
Zurück zum Zitat Mouakher A, Ben Yahia S (2016) Qualitycover: efficient binary relation coverage guided by induced knowledge quality. Inf Sci 355:58–73CrossRef Mouakher A, Ben Yahia S (2016) Qualitycover: efficient binary relation coverage guided by induced knowledge quality. Inf Sci 355:58–73CrossRef
55.
Zurück zum Zitat Omiecinski ER (2003) Alternative interest measures for mining associations in databases. IEEE Trans Knowl Data Eng 15:57–69CrossRef Omiecinski ER (2003) Alternative interest measures for mining associations in databases. IEEE Trans Knowl Data Eng 15:57–69CrossRef
57.
Zurück zum Zitat Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Beeri C, Buneman P (eds) ICDT. Springer, Berlin, Heidelberg, pp 398–416 Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Beeri C, Buneman P (eds) ICDT. Springer, Berlin, Heidelberg, pp 398–416
58.
Zurück zum Zitat Peddada S, Lobenhofer E, Li L, Afshari C, Weinberg C (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19:834–841CrossRef Peddada S, Lobenhofer E, Li L, Afshari C, Weinberg C (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19:834–841CrossRef
59.
Zurück zum Zitat Pensa RG, Besson J, Boulicaut JF (2004) A methodology for biologically relevant pattern discovery from gene expression data. In: proc of discovery science, pp 230–241CrossRef Pensa RG, Besson J, Boulicaut JF (2004) A methodology for biologically relevant pattern discovery from gene expression data. In: proc of discovery science, pp 230–241CrossRef
60.
Zurück zum Zitat Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129CrossRef Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129CrossRef
61.
Zurück zum Zitat Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144CrossRef Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144CrossRef
62.
Zurück zum Zitat Tavazoieand S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecturegenetics. Nat Genet 22:281–285CrossRef Tavazoieand S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecturegenetics. Nat Genet 22:281–285CrossRef
63.
Zurück zum Zitat Teng L, Chan L (2008) Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. J Signal Process Syst 50:267–280CrossRef Teng L, Chan L (2008) Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. J Signal Process Syst 50:267–280CrossRef
64.
Zurück zum Zitat Trabelsi C, Jelassi N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining–16th Pacific-Asia conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29–June 1, 2012, proceedings, part II, pp 231–242. Springer-Verlag. https://doi.org/10.1007/978-3-642-30220-6_20 CrossRef Trabelsi C, Jelassi N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining–16th Pacific-Asia conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29–June 1, 2012, proceedings, part II, pp 231–242. Springer-Verlag. https://​doi.​org/​10.​1007/​978-3-642-30220-6_​20 CrossRef
66.
Zurück zum Zitat Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, Wisconsin, June 3–6, 2002, pp 394–405. https://doi.org/10.1145/564691.564737 Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, Wisconsin, June 3–6, 2002, pp 394–405. https://​doi.​org/​10.​1145/​564691.​564737
68.
Zurück zum Zitat Wille R (1982) Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival I (ed) Ordered Sets. Reidel, Dordrecht/Boston, pp 445–470CrossRef Wille R (1982) Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival I (ed) Ordered Sets. Reidel, Dordrecht/Boston, pp 445–470CrossRef
69.
Zurück zum Zitat Zhang Y, Zha H, Chu CH (2005) A time-series biclustering algorithm for revealing co-regulated genes. Proc 5th Int Conf Inf Technol 1:32–37 Zhang Y, Zha H, Chu CH (2005) A time-series biclustering algorithm for revealing co-regulated genes. Proc 5th Int Conf Inf Technol 1:32–37
Metadaten
Titel
A new FCA-based method for identifying biclusters in gene expression data
verfasst von
Amina Houari
Wassim Ayadi
Sadok Ben Yahia
Publikationsdatum
07.03.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 11/2018
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-018-0794-9

Weitere Artikel der Ausgabe 11/2018

International Journal of Machine Learning and Cybernetics 11/2018 Zur Ausgabe

Neuer Inhalt