Skip to main content

2018 | OriginalPaper | Buchkapitel

An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity

verfasst von : Ole Kristian Ekseth, Svein-Olav Hvasshovd

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the recent years, the application of mutual information based measures has received broad popularity. The mutual information MINE measure is asserted to be the best strategy for identification of relationships in challenging data sets. A major weakness of the MINE similarity metric concerns its high execution time. To address the performance issue numerous approaches are suggested both with respect to improvement of software implementations and with respect to the application of simplified heuristics. However, none of the approaches manage to address the high execution-time of MINE computation.
In this work, we address the latter issue. This paper presents a novel MINE implementation which manages a 530x+ performance increase when compared to established approaches. The novel high-performance approach is the result of a structural evaluation of 30+ different MINE software implementations, implementations which do not make use of simplified heuristics. Hence, the proposed strategy for computation of MINE mutual information is both accurate and fast. The novel mutual information MINE software is available at https://​bitbucket.​org/​oekseth/​mine-data-analysis/​downloads/​. To broaden the applicability the high-performance MINE metric is integrated into the hpLysis machine learning library (https://​bitbucket.​org/​oekseth/​hplysis-cluster-analysis-software).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
To use low-level assembly instructions for hardware parallel computations (SSE) to reduce execution time.
 
Literatur
1.
Zurück zum Zitat Ehsani, R., Drabløs, F.: TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinform. 17(1), 296 (2016)CrossRef Ehsani, R., Drabløs, F.: TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinform. 17(1), 296 (2016)CrossRef
2.
Zurück zum Zitat Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5(1), 8 (2007)CrossRef Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5(1), 8 (2007)CrossRef
3.
Zurück zum Zitat Leach, S.M., Tipney, H., Feng, W., Baumgartner Jr., W.A., Kasliwal, P., Schuyler, R.P., Williams, T., Spritz, R.A., Hunter, L.: Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput. Biol. 5(3), 1000215 (2009)CrossRef Leach, S.M., Tipney, H., Feng, W., Baumgartner Jr., W.A., Kasliwal, P., Schuyler, R.P., Williams, T., Spritz, R.A., Hunter, L.: Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput. Biol. 5(3), 1000215 (2009)CrossRef
4.
Zurück zum Zitat Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33(2), 1134 (1986)MathSciNetMATHCrossRef Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33(2), 1134 (1986)MathSciNetMATHCrossRef
5.
Zurück zum Zitat Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)MATHCrossRef Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)MATHCrossRef
6.
Zurück zum Zitat Liepe, J., Filippi, S., Komorowski, M., Stumpf, M.P.: Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9(1), 1002888 (2013)MathSciNetCrossRef Liepe, J., Filippi, S., Komorowski, M., Stumpf, M.P.: Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9(1), 1002888 (2013)MathSciNetCrossRef
7.
Zurück zum Zitat Villaverde, A.F., Ross, J., Morán, F., Banga, J.R.: MIDER: network inference with mutual information distance and entropy reduction. PLoS ONE 9(5), 96732 (2014)CrossRef Villaverde, A.F., Ross, J., Morán, F., Banga, J.R.: MIDER: network inference with mutual information distance and entropy reduction. PLoS ONE 9(5), 96732 (2014)CrossRef
8.
Zurück zum Zitat Tang, D., Wang, M., Zheng, W., Wang, H.: RapidMic: rapid computation of the maximal information coefficient. Evol. Bioinform. 10, 11 (2014) Tang, D., Wang, M., Zheng, W., Wang, H.: RapidMic: rapid computation of the maximal information coefficient. Evol. Bioinform. 10, 11 (2014)
9.
Zurück zum Zitat Albanese, D., Filosi, M., Visintainer, R., Riccadonna, S., Jurman, G., Furlanello, C.: Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics, 707 (2012) Albanese, D., Filosi, M., Visintainer, R., Riccadonna, S., Jurman, G., Furlanello, C.: Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics, 707 (2012)
10.
Zurück zum Zitat Chen, Y., Zeng, Y., Luo, F., Yuan, Z.: A new algorithm to optimize maximal information coefficient. PLoS ONE 11(6), 0157567 (2016) Chen, Y., Zeng, Y., Luo, F., Yuan, Z.: A new algorithm to optimize maximal information coefficient. PLoS ONE 11(6), 0157567 (2016)
11.
Zurück zum Zitat Wang, K., Phillips, C.A., Saxton, A.M., Langston, M.A.: EntropyExplorer: an R package for computing and comparing differential Shannon entropy, differential coefficient of variation and differential expression. BMC Res. Notes 8(1), 832 (2015)CrossRef Wang, K., Phillips, C.A., Saxton, A.M., Langston, M.A.: EntropyExplorer: an R package for computing and comparing differential Shannon entropy, differential coefficient of variation and differential expression. BMC Res. Notes 8(1), 832 (2015)CrossRef
12.
Zurück zum Zitat Hausser, J., Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J. Mach. Learn. Res. 10(July), 1469–1484 (2009)MathSciNetMATH Hausser, J., Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J. Mach. Learn. Res. 10(July), 1469–1484 (2009)MathSciNetMATH
13.
Zurück zum Zitat Marcon, E., Hérault, B.: Entropart: an R package to measure and partition diversity. J. Stat. Softw. 67(8), 1–26 (2015)CrossRef Marcon, E., Hérault, B.: Entropart: an R package to measure and partition diversity. J. Stat. Softw. 67(8), 1–26 (2015)CrossRef
14.
Zurück zum Zitat Guevara, M.R., Hartmann, D., Mendoza, M.: diverse: an R package to analyze diversity in complex systems. R J. 8(2), 60–78 (2016)CrossRef Guevara, M.R., Hartmann, D., Mendoza, M.: diverse: an R package to analyze diversity in complex systems. R J. 8(2), 60–78 (2016)CrossRef
15.
Zurück zum Zitat Ince, R.A., Mazzoni, A., Petersen, R.S., Panzeri, S.: Open source tools for the information theoretic analysis of neural data. Front. Neurosci. 3, 11 (2010) Ince, R.A., Mazzoni, A., Petersen, R.S., Panzeri, S.: Open source tools for the information theoretic analysis of neural data. Front. Neurosci. 3, 11 (2010)
16.
Zurück zum Zitat Mazandu, G.K., Mulder, N.J.: Information content-based gene ontology functional similarity measures: which one to use for a given biological data type? PLoS ONE 9(12), 113859 (2014)CrossRef Mazandu, G.K., Mulder, N.J.: Information content-based gene ontology functional similarity measures: which one to use for a given biological data type? PLoS ONE 9(12), 113859 (2014)CrossRef
17.
Zurück zum Zitat Morgan, H.D., Sutherland, H.G., Martin, D.I., Whitelaw, E.: Epigenetic inheritance at the agouti locus in the mouse. Nat. Genet. 23(3), 314–318 (1999)CrossRef Morgan, H.D., Sutherland, H.G., Martin, D.I., Whitelaw, E.: Epigenetic inheritance at the agouti locus in the mouse. Nat. Genet. 23(3), 314–318 (1999)CrossRef
18.
Zurück zum Zitat Lee, H.-S., Chen, Z.J.: Protein-coding genes are epigenetically regulated in Arabidopsis polyploids. Proc. Nat. Acad. Sci. 98(12), 6753–6758 (2001)CrossRef Lee, H.-S., Chen, Z.J.: Protein-coding genes are epigenetically regulated in Arabidopsis polyploids. Proc. Nat. Acad. Sci. 98(12), 6753–6758 (2001)CrossRef
19.
Zurück zum Zitat Carro, M., Lim, W., Alvarez, M., Bollo, R., Zhao, X., Snyder, E., Sulman, E., Anne, S., Doetsch, F., Colman, H., et al.: The transcriptional network for mesenchymal transformation of brain tumours. Nature 463(7279), 318 (2010)CrossRef Carro, M., Lim, W., Alvarez, M., Bollo, R., Zhao, X., Snyder, E., Sulman, E., Anne, S., Doetsch, F., Colman, H., et al.: The transcriptional network for mesenchymal transformation of brain tumours. Nature 463(7279), 318 (2010)CrossRef
20.
Zurück zum Zitat Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Alon, U., Margalit, H.: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc. Nat. Acad. Sci. U.S.A. 101(16), 5934–5939 (2004)CrossRef Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Alon, U., Margalit, H.: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc. Nat. Acad. Sci. U.S.A. 101(16), 5934–5939 (2004)CrossRef
21.
Zurück zum Zitat Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11), 1746–1758 (2004)CrossRef Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11), 1746–1758 (2004)CrossRef
22.
Zurück zum Zitat Sommerfelt, R.M., Feuerherm, A.J., Jones, K., Johansen, B.: Cytosolic phospholipase A2 regulates TNF-induced production of joint destructive effectors in synoviocytes. PLoS ONE 8(12), 83555 (2013)CrossRef Sommerfelt, R.M., Feuerherm, A.J., Jones, K., Johansen, B.: Cytosolic phospholipase A2 regulates TNF-induced production of joint destructive effectors in synoviocytes. PLoS ONE 8(12), 83555 (2013)CrossRef
23.
Zurück zum Zitat Lee, W.-P., Tzou, W.-S.: Computational methods for discovering gene networks from expression data. Brief. Bioinform. 10(4), 408–423 (2009) Lee, W.-P., Tzou, W.-S.: Computational methods for discovering gene networks from expression data. Brief. Bioinform. 10(4), 408–423 (2009)
24.
Zurück zum Zitat Riccadonna, S., Jurman, G., Visintainer, R., Filosi, M., Furlanello, C.: DTW-MIC coexpression networks from time-course data. PLoS ONE 11(3), 0152648 (2016)CrossRef Riccadonna, S., Jurman, G., Visintainer, R., Filosi, M., Furlanello, C.: DTW-MIC coexpression networks from time-course data. PLoS ONE 11(3), 0152648 (2016)CrossRef
25.
Zurück zum Zitat Ekseth, K., Hvasshovd, S.: hpLysis similarity: a high-performance software-approach for computation of 320+ simliarty-metrics (2017) Ekseth, K., Hvasshovd, S.: hpLysis similarity: a high-performance software-approach for computation of 320+ simliarty-metrics (2017)
26.
Zurück zum Zitat Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)MathSciNet Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)MathSciNet
27.
Zurück zum Zitat Lord, E., Diallo, A.B., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 1 (2015)CrossRef Lord, E., Diallo, A.B., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 1 (2015)CrossRef
28.
Zurück zum Zitat Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)MATHCrossRef Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)MATHCrossRef
29.
Zurück zum Zitat Ekseth, O.K., Hvasshovd, S.-O.: How an optimized DB-SCAN implementation reduce execution-time and memory-requirements for large data-sets (2017) Ekseth, O.K., Hvasshovd, S.-O.: How an optimized DB-SCAN implementation reduce execution-time and memory-requirements for large data-sets (2017)
31.
Zurück zum Zitat Chao, A., Shen, T.-J.: Nonparametric estimation of Shannons index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10(4), 429–443 (2003)MathSciNetCrossRef Chao, A., Shen, T.-J.: Nonparametric estimation of Shannons index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10(4), 429–443 (2003)MathSciNetCrossRef
32.
Zurück zum Zitat Frery, A.C., Cintra, R.J., Nascimento, A.D.: Entropy-based statistical analysis of PolSAR data. IEEE Trans. Geosci. Remote Sens. 51(6), 3733–3743 (2013)CrossRef Frery, A.C., Cintra, R.J., Nascimento, A.D.: Entropy-based statistical analysis of PolSAR data. IEEE Trans. Geosci. Remote Sens. 51(6), 3733–3743 (2013)CrossRef
33.
Zurück zum Zitat Moon, Y.-I., Rajagopalan, B., Lall, U.: Estimation of mutual information using kernel density estimators. Phys. Rev. E 52(3), 2318 (1995)CrossRef Moon, Y.-I., Rajagopalan, B., Lall, U.: Estimation of mutual information using kernel density estimators. Phys. Rev. E 52(3), 2318 (1995)CrossRef
34.
Zurück zum Zitat Jiao, J., Venkat, K., Han, Y., Weissman, T.: Minimax estimation of functionals of discrete distributions. IEEE Trans. Inf. Theory 61(5), 2835–2885 (2015)MathSciNetMATHCrossRef Jiao, J., Venkat, K., Han, Y., Weissman, T.: Minimax estimation of functionals of discrete distributions. IEEE Trans. Inf. Theory 61(5), 2835–2885 (2015)MathSciNetMATHCrossRef
Metadaten
Titel
An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity
verfasst von
Ole Kristian Ekseth
Svein-Olav Hvasshovd
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91262-2_29