Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2014

01.04.2014

Link classification with probabilistic graphs

verfasst von: Nicola Di Mauro, Claudio Taranto, Floriana Esposito

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The need to deal with the inherent uncertainty in real-world relational or networked data leads to the proposal of new probabilistic models, such as probabilistic graphs. Every edge in a probabilistic graph is associated with a probability whose value represents the likelihood of its existence, or the strength of the relation between the entities it connects. The aim of this paper is to propose two machine learning techniques for the link classification problem in relational data exploiting the probabilistic graph representation. Both the proposed methods will exploit a language-constrained reachability method to infer the probability of possible hidden relationships that may exists between two nodes in a probabilistic graph. Each hidden relationships between two nodes may be viewed as a feature (or a factor), and its corresponding probability as its weight, while an observed relationship is considered as a positive instance for its corresponding link label. Given a training set of observed links, the first learning approach is to use a propositionalization technique adopting a L2-regularized Logistic Regression to learn a model able to predict unobserved link labels. Since in some cases the edges’ probability may be not known in advance or they could not be precisely defined for a classification task, the second xposed approach is to exploit the inference method and to use a mean squared technique to learn the edges’ probabilities. Both the proposed methods have been evaluated on real world data sets and the corresponding results proved their validity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Sometimes called certain graph.
 
2
In the rest of the paper, if not otherwise specified, \(\mathbb{I} {C}\) denotes the indicator function returning 1 if the condition C is true, and 0 otherwise.
 
Literatur
Zurück zum Zitat Baccianella, S., Esuli, A., Sebastiani, F. (2009). Evaluation measures for ordinal regression. In Proceedings of the 9th international conference on intelligent systems design and applications. IEEE, (pp. 283–287). Baccianella, S., Esuli, A., Sebastiani, F. (2009). Evaluation measures for ordinal regression. In Proceedings of the 9th international conference on intelligent systems design and applications. IEEE, (pp. 283–287).
Zurück zum Zitat Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (Ed.), Online Learning and Neural Networks. Cambridge: Cambridge University Press. Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (Ed.), Online Learning and Neural Networks. Cambridge: Cambridge University Press.
Zurück zum Zitat Cantador, I., Brusilovsky, P., Kuflik, T. (eds.) (2011). 2nd Workshop on information heterogeneity and fusion. Recommender Systems (HetRec 2011), ACM. Cantador, I., Brusilovsky, P., Kuflik, T. (eds.) (2011). 2nd Workshop on information heterogeneity and fusion. Recommender Systems (HetRec 2011), ACM.
Zurück zum Zitat Colbourn, C.J. (1987). The Combinatorics of Network Reliability. Oxford University Press. Colbourn, C.J. (1987). The Combinatorics of Network Reliability. Oxford University Press.
Zurück zum Zitat Craven, M., & Slattery, S. (2001). Relational learning with statistical predicate invention: better models for hypertext. Machine Learning, 43(1–2), 97–119.CrossRefMATH Craven, M., & Slattery, S. (2001). Relational learning with statistical predicate invention: better models for hypertext. Machine Learning, 43(1–2), 97–119.CrossRefMATH
Zurück zum Zitat Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curve. In Proceedings of the 23rd international conference on machine learning (pp. 233–240). Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curve. In Proceedings of the 23rd international conference on machine learning (pp. 233–240).
Zurück zum Zitat De Raedt, L., Frasconi, P., Kersting, K. (2008). Probabilistic Inductive Logic Programming. In S. Muggleto, (Ed.) Theory and Applications, LNCS, (vol 4911). Springer. De Raedt, L., Frasconi, P., Kersting, K. (2008). Probabilistic Inductive Logic Programming. In S. Muggleto, (Ed.) Theory and Applications, LNCS, (vol 4911). Springer.
Zurück zum Zitat Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In F. Ricci, L. Rokach, B. Shapira, P. B. Kantor (Eds.) Recommender Systems Handbook (pp. 107–144). Springer. Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In F. Ricci, L. Rokach, B. Shapira, P. B. Kantor (Eds.) Recommender Systems Handbook (pp. 107–144). Springer.
Zurück zum Zitat Domingos, P., & Lowd, D. (2009). Markov Logic: an interface layer for artificial intelligence, 1st edn. Morgan and Claypool Publishers. Domingos, P., & Lowd, D. (2009). Markov Logic: an interface layer for artificial intelligence, 1st edn. Morgan and Claypool Publishers.
Zurück zum Zitat Duchi, J.C., Hazan, E., Singer, Y. (2010). Adaptive subgradient methods for online learning and stochastic optimization. In A. T. Kalai & M. Mohri (Eds.) The 23rd Conference on Learning Theory, Omnipress, (pp. 257–269). Duchi, J.C., Hazan, E., Singer, Y. (2010). Adaptive subgradient methods for online learning and stochastic optimization. In A. T. Kalai & M. Mohri (Eds.) The 23rd Conference on Learning Theory, Omnipress, (pp. 257–269).
Zurück zum Zitat Georgiev, K., & Nakov, P. (2013). A non-iid framework for collaborative filtering with restricted Boltzmann machines. In S. Dasgupta & D. McAllester (Eds.) Proceedings of the 30th international conference on machine learning, JMLR workshop and conference proceedings (Vol. 28, pp. 1148–1156). Georgiev, K., & Nakov, P. (2013). A non-iid framework for collaborative filtering with restricted Boltzmann machines. In S. Dasgupta & D. McAllester (Eds.) Proceedings of the 30th international conference on machine learning, JMLR workshop and conference proceedings (Vol. 28, pp. 1148–1156).
Zurück zum Zitat Getoor, L., & Diehl, C.P. (2005). Link mining: a survey. SIGKDD Explorations, 7(2), 3–12.CrossRef Getoor, L., & Diehl, C.P. (2005). Link mining: a survey. SIGKDD Explorations, 7(2), 3–12.CrossRef
Zurück zum Zitat Getoor, L., & Taskar, B. (2007). Introduction to Statistical Relational Learning Adaptive Computation and Machine Learning. The MIT Press. Getoor, L., & Taskar, B. (2007). Introduction to Statistical Relational Learning Adaptive Computation and Machine Learning. The MIT Press.
Zurück zum Zitat Goldberg, D.S., & Roth, F.P. (2003). Assessing experimentally derived interactions in a small world. Proceedings of the National Academy of Sciences, 100(8), 4372–4376.CrossRefMATHMathSciNet Goldberg, D.S., & Roth, F.P. (2003). Assessing experimentally derived interactions in a small world. Proceedings of the National Academy of Sciences, 100(8), 4372–4376.CrossRefMATHMathSciNet
Zurück zum Zitat Gutmann, B., Kimmig, A., Kersting, K., Raedt, L. (2008). Parameter learning in probabilistic databases: a least squares approach. In Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I (pp. 473–488). Springer. Gutmann, B., Kimmig, A., Kersting, K., Raedt, L. (2008). Parameter learning in probabilistic databases: a least squares approach. In Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I (pp. 473–488). Springer.
Zurück zum Zitat Gutmann, B., Thon, I., De Raedt, L. (2011). Learning the parameters of probabilistic logic programs from interpretations. In Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Part I (pp. 581–596). Springer. Gutmann, B., Thon, I., De Raedt, L. (2011). Learning the parameters of probabilistic logic programs from interpretations. In Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Part I (pp. 581–596). Springer.
Zurück zum Zitat He, J., & Chu, W.W. (2010). A social network-based recommender system (snrs). In N. Memon, J. J. Xu, D. L. Hicks, H. Chen (Eds.) Data Mining for Social Network Data, Annals of Information Systems (Vol. 12, pp. 47–74). Springer. He, J., & Chu, W.W. (2010). A social network-based recommender system (snrs). In N. Memon, J. J. Xu, D. L. Hicks, H. Chen (Eds.) Data Mining for Social Network Data, Annals of Information Systems (Vol. 12, pp. 47–74). Springer.
Zurück zum Zitat Jin, R., Liu, L., Ding, B., Wang, H. (2011). Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endownment, 4, 551–562. Jin, R., Liu, L., Ding, B., Wang, H. (2011). Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endownment, 4, 551–562.
Zurück zum Zitat Kramer, S., Lavrač N., Flach, P. (2000). In Relational data mining,chap propositionalization approaches to relational data mining, (pp. 262–286). Berlin: Springer-Verlag. Kramer, S., Lavrač N., Flach, P. (2000). In Relational data mining,chap propositionalization approaches to relational data mining, (pp. 262–286). Berlin: Springer-Verlag.
Zurück zum Zitat Langseth, H., & Nielsen, T.D. (2012). A latent model for collaborative filtering. International Journal of Approximate Reasoning, 53(4), 447–466.CrossRefMathSciNet Langseth, H., & Nielsen, T.D. (2012). A latent model for collaborative filtering. International Journal of Approximate Reasoning, 53(4), 447–466.CrossRefMathSciNet
Zurück zum Zitat Lin, C.J., Weng, R.C., Keerthi, S.S. (2008). Trust region newton method for logistic regression. Journal of Machine Learning Research, 9, 627–650.MATHMathSciNet Lin, C.J., Weng, R.C., Keerthi, S.S. (2008). Trust region newton method for logistic regression. Journal of Machine Learning Research, 9, 627–650.MATHMathSciNet
Zurück zum Zitat Macskassy, S.A. (2011). Relational classifiers in a non-relational world: using homophily to create relations. In X. Chen, T. S. Dillon, H. Ishbuchi, J. Pei, H. Wang, M. A. Wani (Eds.) 10th International Conference on Machine Learning and Applications and Workshops, IEEE, (pp. 406–411). Macskassy, S.A. (2011). Relational classifiers in a non-relational world: using homophily to create relations. In X. Chen, T. S. Dillon, H. Ishbuchi, J. Pei, H. Wang, M. A. Wani (Eds.) 10th International Conference on Machine Learning and Applications and Workshops, IEEE, (pp. 406–411).
Zurück zum Zitat Newman, M.E.J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E, 64. Newman, M.E.J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E, 64.
Zurück zum Zitat Newman, M.E.J. (2001b) The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404–409.CrossRefMATH Newman, M.E.J. (2001b) The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404–409.CrossRefMATH
Zurück zum Zitat Pfeiffer, I.J.J., & Neville, J. (2011). Methods to determine node centrality and clustering in graphs with uncertain structure. In Proceedings of the Fifth International Conference on Weblogs and Social Media, The AAAI Press. Pfeiffer, I.J.J., & Neville, J. (2011). Methods to determine node centrality and clustering in graphs with uncertain structure. In Proceedings of the Fifth International Conference on Weblogs and Social Media, The AAAI Press.
Zurück zum Zitat Popescul, A., & Ungar, L.H. (2003). Statistical relational learning for link prediction. In IJCAI03 Workshop on Learning Statistical Models from Relational Data. Popescul, A., & Ungar, L.H. (2003). Statistical relational learning for link prediction. In IJCAI03 Workshop on Learning Statistical Models from Relational Data.
Zurück zum Zitat Potamias, M., Bonchi, F., Gionis, A., Kollios, G. (2010). K-nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment, 3, 997–1008. Potamias, M., Bonchi, F., Gionis, A., Kollios, G. (2010). K-nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment, 3, 997–1008.
Zurück zum Zitat Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming. MIT Press (pp. 715–729). Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming. MIT Press (pp. 715–729).
Zurück zum Zitat Taranto, C., Di Mauro, N., Esposito, F. (2011). Probabilistic inference over image networks. Italian Research 7 Conference on Digital Libraries 2011 (Vol 249, pp. 1-13). CCIS. Taranto, C., Di Mauro, N., Esposito, F. (2011). Probabilistic inference over image networks. Italian Research 7 Conference on Digital Libraries 2011 (Vol 249, pp. 1-13). CCIS.
Zurück zum Zitat Taranto, C., Di Mauro, N., Esposito, F. (2012a). Uncertain graphs meet collaborative filtering. In 3rd Italian Information Retrieval Workshop. Taranto, C., Di Mauro, N., Esposito, F. (2012a). Uncertain graphs meet collaborative filtering. In 3rd Italian Information Retrieval Workshop.
Zurück zum Zitat Taranto, C., Di Mauro, N., Esposito, F. (2012b). Uncertain (multi)graphs for personalization services in digital libraries. In M. Agosti, F. Esposito, S. Ferilli, N. Ferro (Eds.) 8th Italian Research Conference on Digital Libraries, Vol. 354. Berlin: Springer, CCIS. Taranto, C., Di Mauro, N., Esposito, F. (2012b). Uncertain (multi)graphs for personalization services in digital libraries. In M. Agosti, F. Esposito, S. Ferilli, N. Ferro (Eds.) 8th Italian Research Conference on Digital Libraries, Vol. 354. Berlin: Springer, CCIS.
Zurück zum Zitat Taranto, C., Di Mauro, N., Esposito, F. (2013). Learning in probabilistic graphs exploiting language-constrained patterns. In A. Appice, M. Ceci, C. Loglisci, G. Manco, E. Masciari, Z. W. Ras (Eds.) New Frontiers in Mining Complex Patterns, LNCS (Vol. 7765, pp. 155–169). Berlin: Springer.CrossRef Taranto, C., Di Mauro, N., Esposito, F. (2013). Learning in probabilistic graphs exploiting language-constrained patterns. In A. Appice, M. Ceci, C. Loglisci, G. Manco, E. Masciari, Z. W. Ras (Eds.) New Frontiers in Mining Complex Patterns, LNCS (Vol. 7765, pp. 155–169). Berlin: Springer.CrossRef
Zurück zum Zitat Taskar, B., Wong, M.F., Abbeel, P., Koller, D. (2003). Link prediction in relational data. In S. Thrun, L. K. Saul, B. Schölkopf (Eds.) Advances in Neural Information Processing Systems (p. 16). Taskar, B., Wong, M.F., Abbeel, P., Koller, D. (2003). Link prediction in relational data. In S. Thrun, L. K. Saul, B. Schölkopf (Eds.) Advances in Neural Information Processing Systems (p. 16).
Zurück zum Zitat von Luxburg, U., Radl, A., Hein, M. (2011). Hitting and commute times in large graphs are often misleading. CORR. von Luxburg, U., Radl, A., Hein, M. (2011). Hitting and commute times in large graphs are often misleading. CORR.
Zurück zum Zitat Vozalis, M.G., Markos, A., Margaritis, K.G. (2010). Collaborative filtering through svd-based and hierarchical nonlinear pca. In Proceedings of the 20th international conference on Artificial neural networks. Part I, (pp. 395–400). Berlin: Springer. Vozalis, M.G., Markos, A., Margaritis, K.G. (2010). Collaborative filtering through svd-based and hierarchical nonlinear pca. In Proceedings of the 20th international conference on Artificial neural networks. Part I, (pp. 395–400). Berlin: Springer.
Zurück zum Zitat Witsenburg, T., & Blockeel, H. (2011). Improving the accuracy of similarity measures by using link information. In M. Kryszkiewicz, H. Rybinski, A. Skowron, Z. W. Ras (Eds.) Proceedings of the 19th International conference on Foundations of Intelligent Systems (Vol. 6804, pp. 501512). Springer: LNCS Witsenburg, T., & Blockeel, H. (2011). Improving the accuracy of similarity measures by using link information. In M. Kryszkiewicz, H. Rybinski, A. Skowron, Z. W. Ras (Eds.) Proceedings of the 19th International conference on Foundations of Intelligent Systems (Vol. 6804, pp. 501512). Springer: LNCS
Zurück zum Zitat Zan, H., Xin, L., Hsinchun, C. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries ACM Press (pp. 141–142). Zan, H., Xin, L., Hsinchun, C. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries ACM Press (pp. 141–142).
Zurück zum Zitat Zhu, J. (2003). Mining web site link structures for adaptive web site navigation and search. PhD thesis. Zhu, J. (2003). Mining web site link structures for adaptive web site navigation and search. PhD thesis.
Zurück zum Zitat Zou, Z., Gao, H., Li, J. (2010a). Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM (pp. 633–642). Zou, Z., Gao, H., Li, J. (2010a). Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM (pp. 633–642).
Zurück zum Zitat Zou, Z., Li, J., Gao, H., Zhang, S. (2010b). Finding top-k maximal cliques in an uncertain graph. International Conference on Data Engineering, 649–652. Zou, Z., Li, J., Gao, H., Zhang, S. (2010b). Finding top-k maximal cliques in an uncertain graph. International Conference on Data Engineering, 649–652.
Metadaten
Titel
Link classification with probabilistic graphs
verfasst von
Nicola Di Mauro
Claudio Taranto
Floriana Esposito
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2014
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0293-0

Weitere Artikel der Ausgabe 2/2014

Journal of Intelligent Information Systems 2/2014 Zur Ausgabe

Premium Partner