Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5/2016

01.09.2016

Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models

verfasst von: Michiel Stock, Krzysztof Dembczyński, Bernard De Baets, Willem Waegeman

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-K predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We refer to the ICML 2013 tutorial and the ECML-PKDD 2014 workshop on multi-target prediction for an overview.
 
4
This dataset (Ahn et al. 2011) is strictly speaking not a collaborative filtering benchmark dataset, but it can be treated using a similar workflow. Here the goal would be to find related recipes for a particular ingredient combination. See De Clercq et al. (2015) for an recommender system built using this dataset.
 
Literatur
Zurück zum Zitat Agarwal D, Gurevich M (2012) Fast top-k retrieval for model based recommendation. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 483–492 Agarwal D, Gurevich M (2012) Fast top-k retrieval for model based recommendation. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 483–492
Zurück zum Zitat Agrawal R, Gupta A, Prabhu Y, Varma M (2013) Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd international conference on world wide web, pp 13–24 Agrawal R, Gupta A, Prabhu Y, Varma M (2013) Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd international conference on world wide web, pp 13–24
Zurück zum Zitat Ahn YY, Ahnert SE, Bagrow JP, Barabási AL (2011) Flavor network and the principles of food pairing. Sci Rep 1:196CrossRef Ahn YY, Ahnert SE, Bagrow JP, Barabási AL (2011) Flavor network and the principles of food pairing. Sci Rep 1:196CrossRef
Zurück zum Zitat Basilico J, Hofmann T (2004) Unifying collaborative and content-based filtering. In: Proceedings of the 21st international conference on machine learning, pp 9–16 Basilico J, Hofmann T (2004) Unifying collaborative and content-based filtering. In: Proceedings of the 21st international conference on machine learning, pp 9–16
Zurück zum Zitat Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: Proceedings of the 16th annual conference on computational learning theory and 7th kernel workshop, pp 567–580 Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: Proceedings of the 16th annual conference on computational learning theory and 7th kernel workshop, pp 567–580
Zurück zum Zitat Ben-Hur A, Noble WS (2005) Kernel methods for predicting protein-protein interactions. Bioinformatics 21(Suppl 1):i38–i46CrossRef Ben-Hur A, Noble WS (2005) Kernel methods for predicting protein-protein interactions. Bioinformatics 21(Suppl 1):i38–i46CrossRef
Zurück zum Zitat Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18:509–517CrossRefMATH Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18:509–517CrossRefMATH
Zurück zum Zitat Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, pp 97–104 Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, pp 97–104
Zurück zum Zitat Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th international conference on machine learning, pp 55–63 Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th international conference on machine learning, pp 55–63
Zurück zum Zitat Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proceedings of the 18th international conference on world wide web, pp 691–700 Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proceedings of the 18th international conference on world wide web, pp 691–700
Zurück zum Zitat De Clercq M, Stock M, De Baets B, Waegeman W (2015) Data-driven recipe completion using machine learning methods. Trends Food Sci Technol 49:1–13CrossRef De Clercq M, Stock M, De Baets B, Waegeman W (2015) Data-driven recipe completion using machine learning methods. Trends Food Sci Technol 49:1–13CrossRef
Zurück zum Zitat De Paepe A, Van Peer G, Stock M, Volders PJ, Vandesompele J, De Baets B, Waegeman W (2015) miRNA target prediction through modeling quantitative and qualitative miRNA binding site information in a stacked model structure. Nucleic Acid Res De Paepe A, Van Peer G, Stock M, Volders PJ, Vandesompele J, De Baets B, Waegeman W (2015) miRNA target prediction through modeling quantitative and qualitative miRNA binding site information in a stacked model structure. Nucleic Acid Res
Zurück zum Zitat Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45MathSciNetCrossRefMATH Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45MathSciNetCrossRefMATH
Zurück zum Zitat Ding H, Takigawa I, Mamitsuka H, Zhu S (2013) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 14(5):734–747 Ding H, Takigawa I, Mamitsuka H, Zhu S (2013) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 14(5):734–747
Zurück zum Zitat Drineas P, Mahoney M (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH Drineas P, Mahoney M (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH
Zurück zum Zitat Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on machine learning, pp 147–153 Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on machine learning, pp 147–153
Zurück zum Zitat Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 109–117 Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 109–117
Zurück zum Zitat Goel S, Langford J, Strehl A (2009) Predictive indexing for fast search. In: Advances in neural information processing systems, pp. 505–512 Goel S, Langford J, Strehl A (2009) Predictive indexing for fast search. In: Advances in neural information processing systems, pp. 505–512
Zurück zum Zitat Gönen M (2012) Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 28(18):2304–10CrossRef Gönen M (2012) Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 28(18):2304–10CrossRef
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRefMATH Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRefMATH
Zurück zum Zitat Hue M, Riffle M, Vert JP, Noble WS (2010) Large-scale prediction of protein-protein interactions from structures. BMC Bioinform 11(144):1–10 Hue M, Riffle M, Vert JP, Noble WS (2010) Large-scale prediction of protein-protein interactions from structures. BMC Bioinform 11(144):1–10
Zurück zum Zitat Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRef Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRef
Zurück zum Zitat Jacob L, Hoffmann B, Stoven V, Vert JP (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinform 9(1):1–16CrossRef Jacob L, Hoffmann B, Stoven V, Vert JP (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinform 9(1):1–16CrossRef
Zurück zum Zitat Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156CrossRef Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156CrossRef
Zurück zum Zitat Jalali A, Sanghavi S, Ravikumar P, Ruan C (2010) A dirty model for multi-task learning. In: Neural Information processing symposium, pp 964–972 Jalali A, Sanghavi S, Ravikumar P, Ruan C (2010) A dirty model for multi-task learning. In: Neural Information processing symposium, pp 964–972
Zurück zum Zitat Koenigstein N, Ram P, Shavitt Y (2012) Efficient retrieval of recommendations in a matrix factorization framework. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 535–544 Koenigstein N, Ram P, Shavitt Y (2012) Efficient retrieval of recommendations in a matrix factorization framework. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 535–544
Zurück zum Zitat Lee J, Sun M, Lebanon G (2011) A comparative study of collaborative filtering algorithms. ACM Trans Web 5(1):1–27CrossRef Lee J, Sun M, Lebanon G (2011) A comparative study of collaborative filtering algorithms. ACM Trans Web 5(1):1–27CrossRef
Zurück zum Zitat Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATH Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATH
Zurück zum Zitat Mineiro P, Karampatziakis N (2014) Fast label embeddings for extremely large output spaces. CoRR abs/1412.6 Mineiro P, Karampatziakis N (2014) Fast label embeddings for extremely large output spaces. CoRR abs/1412.6
Zurück zum Zitat Omohundro SM (1989) Five balltree construction algorithms. Science 51:1–22 Omohundro SM (1989) Five balltree construction algorithms. Science 51:1–22
Zurück zum Zitat Pahikkala T, Airola A, Stock M, De Baets B, Waegeman W (2013) Efficient regularized least-squares algorithms for conditional ranking on relational data. Mach Learn 93(2–3):321–356MathSciNetCrossRefMATH Pahikkala T, Airola A, Stock M, De Baets B, Waegeman W (2013) Efficient regularized least-squares algorithms for conditional ranking on relational data. Mach Learn 93(2–3):321–356MathSciNetCrossRefMATH
Zurück zum Zitat Pahikkala T, Stock M, Airola A, Aittokallio T, De Baets B, Waegeman W (2014) A two-step learning approach for solving full and almost full cold start problems in dyadic prediction. Lect Notes Comput Sci 8725:517–532CrossRef Pahikkala T, Stock M, Airola A, Aittokallio T, De Baets B, Waegeman W (2014) A two-step learning approach for solving full and almost full cold start problems in dyadic prediction. Lect Notes Comput Sci 8725:517–532CrossRef
Zurück zum Zitat Partalas I, Kosmopoulos A, Baskiotis N, Artieres T, Paliouras G, Gaussier E, Androutsopoulos I, Amini MR, Galinari P (2015) LSHTC: a benchmark for large-scale text classification. submitted to CoRR pp 1–9 Partalas I, Kosmopoulos A, Baskiotis N, Artieres T, Paliouras G, Gaussier E, Androutsopoulos I, Amini MR, Galinari P (2015) LSHTC: a benchmark for large-scale text classification. submitted to CoRR pp 1–9
Zurück zum Zitat Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web—WWW ’01. ACM Press, New York, pp 285–295 Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web—WWW ’01. ACM Press, New York, pp 285–295
Zurück zum Zitat Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefMATH Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefMATH
Zurück zum Zitat Shrivastava A, Li P (2014) Asymmetric lsh (ALSH) for sublinear time maximum inner product search (MIPS). Adv Neural Inf Process Syst 27:2321–2329 Shrivastava A, Li P (2014) Asymmetric lsh (ALSH) for sublinear time maximum inner product search (MIPS). Adv Neural Inf Process Syst 27:2321–2329
Zurück zum Zitat Shrivastava A, Li P (2015) Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). In: Proceedings of the conference on uncertainty in artificial intelligence Shrivastava A, Li P (2015) Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). In: Proceedings of the conference on uncertainty in artificial intelligence
Zurück zum Zitat Stock M, Fober T, Hüllermeier E, Glinca S, Klebe G, Pahikkala T, Airola A, De Baets B, Waegeman W (2014) Identification of functionally related enzymes by learning-to-rank methods. IEEE Trans Comput Biol Bioinform 11(6):1157–1169CrossRef Stock M, Fober T, Hüllermeier E, Glinca S, Klebe G, Pahikkala T, Airola A, De Baets B, Waegeman W (2014) Identification of functionally related enzymes by learning-to-rank methods. IEEE Trans Comput Biol Bioinform 11(6):1157–1169CrossRef
Zurück zum Zitat Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:1–19CrossRef Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:1–19CrossRef
Zurück zum Zitat Takács G, Pilászy I, Németh B, Tikk D (2008) Matrix factorization and neighbor based algorithms for the netflix prize problem. In: Proceedings of the 2008 ACM conference on recommender systems. ACM Press, New York, pp 267–274 Takács G, Pilászy I, Németh B, Tikk D (2008) Matrix factorization and neighbor based algorithms for the netflix prize problem. In: Proceedings of the 2008 ACM conference on recommender systems. ACM Press, New York, pp 267–274
Zurück zum Zitat Tipping M, Bishop C (1997) Probabilistic principal component analysis. J Roy Stat Soc 3:611–622MathSciNetMATH Tipping M, Bishop C (1997) Probabilistic principal component analysis. J Roy Stat Soc 3:611–622MathSciNetMATH
Zurück zum Zitat Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13CrossRef Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13CrossRef
Zurück zum Zitat Vert JP, Qiu J, Noble WS (2007) A new pairwise kernel forbiological network inference with support vector machines. BMC Bioinform 8(10):S8CrossRef Vert JP, Qiu J, Noble WS (2007) A new pairwise kernel forbiological network inference with support vector machines. BMC Bioinform 8(10):S8CrossRef
Zurück zum Zitat Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 8:30–37 Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 8:30–37
Zurück zum Zitat Waegeman W, Pahikkala T, Airola A, Salakoski T, Stock M, De Baets B (2012) A kernel-based framework for learning graded relations from data. IEEE Trans Fuzzy Syst 20(6):1090–1101CrossRef Waegeman W, Pahikkala T, Airola A, Salakoski T, Stock M, De Baets B (2012) A kernel-based framework for learning graded relations from data. IEEE Trans Fuzzy Syst 20(6):1090–1101CrossRef
Zurück zum Zitat Wang C, Liu J, Luo F, Deng Z, Hu QN (2015) Predicting target-ligand interactions using protein ligand-binding site and ligand substructures. BMC Syst Biol 9(S–1):S2CrossRef Wang C, Liu J, Luo F, Deng Z, Hu QN (2015) Predicting target-ligand interactions using protein ligand-binding site and ligand substructures. BMC Syst Biol 9(S–1):S2CrossRef
Zurück zum Zitat Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2006) Kernel dependency estimation. Adv Neur Inf Process Syst 39:440–450 Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2006) Kernel dependency estimation. Adv Neur Inf Process Syst 39:440–450
Zurück zum Zitat Yamanishi Y, Kotera M, Kanehisa M, Goto S (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26(12):i246–54CrossRef Yamanishi Y, Kotera M, Kanehisa M, Goto S (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26(12):i246–54CrossRef
Zurück zum Zitat Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6CrossRef Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6CrossRef
Metadaten
Titel
Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models
verfasst von
Michiel Stock
Krzysztof Dembczyński
Bernard De Baets
Willem Waegeman
Publikationsdatum
01.09.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-016-0456-z

Weitere Artikel der Ausgabe 5/2016

Data Mining and Knowledge Discovery 5/2016 Zur Ausgabe

Premium Partner