nach oben

Data Mining and Knowledge Discovery

Erschienen in:

01.09.2016

Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models

verfasst von: Michiel Stock, Krzysztof Dembczyński, Bernard De Baets, Willem Waegeman

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-K predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach.

Vorheriger Artikel An efficient exact algorithm for triangle listing in large graphs

Nächster Artikel Optimizing network robustness by edge rewiring: a general framework

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

We refer to the ICML 2013 tutorial and the ECML-PKDD 2014 workshop on multi-target prediction for an overview.

http://grouplens.org/.

http://www-etud.iro.umontreal.ca/~bergstrj/audioscrobbler_data.html.

This dataset (Ahn et al. 2011) is strictly speaking not a collaborative filtering benchmark dataset, but it can be treated using a similar workflow. Here the goal would be to find related recipes for a particular ingredient combination. See De Clercq et al. (2015) for an recommender system built using this dataset.

http://www.uniprot.org.

https://www.kaggle.com/c/lshtc.

Agarwal D, Gurevich M (2012) Fast top-k retrieval for model based recommendation. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 483–492

Agrawal R, Gupta A, Prabhu Y, Varma M (2013) Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd international conference on world wide web, pp 13–24

Ahn YY, Ahnert SE, Bagrow JP, Barabási AL (2011) Flavor network and the principles of food pairing. Sci Rep 1:196CrossRef

Basilico J, Hofmann T (2004) Unifying collaborative and content-based filtering. In: Proceedings of the 21st international conference on machine learning, pp 9–16

Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: Proceedings of the 16th annual conference on computational learning theory and 7th kernel workshop, pp 567–580

Ben-Hur A, Noble WS (2005) Kernel methods for predicting protein-protein interactions. Bioinformatics 21(Suppl 1):i38–i46CrossRef

Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18:509–517CrossRefMATH

Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, pp 97–104

Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th international conference on machine learning, pp 55–63

Caruana R (1997) Multitask learning. Mach Learn 75:41–75CrossRef

Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proceedings of the 18th international conference on world wide web, pp 691–700

De Clercq M, Stock M, De Baets B, Waegeman W (2015) Data-driven recipe completion using machine learning methods. Trends Food Sci Technol 49:1–13CrossRef

De Paepe A, Van Peer G, Stock M, Volders PJ, Vandesompele J, De Baets B, Waegeman W (2015) miRNA target prediction through modeling quantitative and qualitative miRNA binding site information in a stacked model structure. Nucleic Acid Res

Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45MathSciNetCrossRefMATH

Ding H, Takigawa I, Mamitsuka H, Zhu S (2013) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 14(5):734–747

Drineas P, Mahoney M (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH

Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on machine learning, pp 147–153

Evgeniou T (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637MathSciNetMATH

Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 109–117

Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Syst Sci 58(1):83–99MathSciNetCrossRefMATH

Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656MathSciNetCrossRefMATH

Fan W, Huai JP (2014) Querying big data: bridging theory and practice. J Comput Sci Technol 29(5):849–869MathSciNetCrossRef

Goel S, Langford J, Strehl A (2009) Predictive indexing for fast search. In: Advances in neural information processing systems, pp. 505–512

Gönen M (2012) Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 28(18):2304–10CrossRef

Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRefMATH

Hue M, Riffle M, Vert JP, Noble WS (2010) Large-scale prediction of protein-protein interactions from structures. BMC Bioinform 11(144):1–10

Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRef

Jacob L, Hoffmann B, Stoven V, Vert JP (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinform 9(1):1–16CrossRef

Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156CrossRef

Jalali A, Sanghavi S, Ravikumar P, Ruan C (2010) A dirty model for multi-task learning. In: Neural Information processing symposium, pp 964–972

Koenigstein N, Ram P, Shavitt Y (2012) Efficient retrieval of recommendations in a matrix factorization framework. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 535–544

Lee J, Sun M, Lebanon G (2011) A comparative study of collaborative filtering algorithms. ACM Trans Web 5(1):1–27CrossRef

Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATH

Mineiro P, Karampatziakis N (2014) Fast label embeddings for extremely large output spaces. CoRR abs/1412.6

Omohundro SM (1989) Five balltree construction algorithms. Science 51:1–22

Pahikkala T, Airola A, Stock M, De Baets B, Waegeman W (2013) Efficient regularized least-squares algorithms for conditional ranking on relational data. Mach Learn 93(2–3):321–356MathSciNetCrossRefMATH

Pahikkala T, Stock M, Airola A, Aittokallio T, De Baets B, Waegeman W (2014) A two-step learning approach for solving full and almost full cold start problems in dyadic prediction. Lect Notes Comput Sci 8725:517–532CrossRef

Partalas I, Kosmopoulos A, Baskiotis N, Artieres T, Paliouras G, Gaussier E, Androutsopoulos I, Amini MR, Galinari P (2015) LSHTC: a benchmark for large-scale text classification. submitted to CoRR pp 1–9

Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web—WWW ’01. ACM Press, New York, pp 285–295

Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefMATH

Shrivastava A, Li P (2014) Asymmetric lsh (ALSH) for sublinear time maximum inner product search (MIPS). Adv Neural Inf Process Syst 27:2321–2329

Shrivastava A, Li P (2015) Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). In: Proceedings of the conference on uncertainty in artificial intelligence

Stock M, Fober T, Hüllermeier E, Glinca S, Klebe G, Pahikkala T, Airola A, De Baets B, Waegeman W (2014) Identification of functionally related enzymes by learning-to-rank methods. IEEE Trans Comput Biol Bioinform 11(6):1157–1169CrossRef

Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:1–19CrossRef

Takács G, Pilászy I, Németh B, Tikk D (2008) Matrix factorization and neighbor based algorithms for the netflix prize problem. In: Proceedings of the 2008 ACM conference on recommender systems. ACM Press, New York, pp 267–274

Tipping M, Bishop C (1997) Probabilistic principal component analysis. J Roy Stat Soc 3:611–622MathSciNetMATH

Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13CrossRef

Vert JP, Qiu J, Noble WS (2007) A new pairwise kernel forbiological network inference with support vector machines. BMC Bioinform 8(10):S8CrossRef

Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 8:30–37

Waegeman W, Pahikkala T, Airola A, Salakoski T, Stock M, De Baets B (2012) A kernel-based framework for learning graded relations from data. IEEE Trans Fuzzy Syst 20(6):1090–1101CrossRef

Wang C, Liu J, Luo F, Deng Z, Hu QN (2015) Predicting target-ligand interactions using protein ligand-binding site and ligand substructures. BMC Syst Biol 9(S–1):S2CrossRef

Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2006) Kernel dependency estimation. Adv Neur Inf Process Syst 39:440–450

Yamanishi Y, Kotera M, Kanehisa M, Goto S (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26(12):i246–54CrossRef

Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6CrossRef

Titel: Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models
verfasst von: Michiel Stock
Krzysztof Dembczyński
Bernard De Baets
Willem Waegeman
Publikationsdatum: 01.09.2016
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 5/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-016-0456-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 5/2016

Top-k overlapping densest subgraphs

An efficient exact algorithm for triangle listing in large graphs

Ensembles of label noise filters: a ranking approach

Guest editors’ introduction to the EcmlPkdd 2016 journal track special issue of Machine Learning

Skopus: Mining top-k sequential patterns under leverage

Bayesian Wishart matrix factorization

Premium Partner