Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2018

28.01.2017

HINMINE: heterogeneous information network mining with information retrieval heuristics

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper presents an approach to mining heterogeneous information networks by decomposing them into homogeneous networks. The proposed HINMINE methodology is based on previous work that classifies nodes in a heterogeneous network in two steps. In the first step the heterogeneous network is decomposed into one or more homogeneous networks using different connecting nodes. We improve this step by using new methods inspired by weighting of bag-of-words vectors mostly used in information retrieval. The methods assign larger weights to nodes which are more informative and characteristic for a specific class of nodes. In the second step, the resulting homogeneous networks are used to classify data either by network propositionalization or label propagation. We propose an adaptation of the label propagation algorithm to handle imbalanced data and test several classification algorithms in propositionalization. The new methodology is tested on three data sets with different properties. For each data set, we perform a series of experiments and compare different heuristics used in the first step of the methodology. We also use different classifiers which can be used in the second step of the methodology when performing network propositionalization. Our results show that HINMINE, using different network decomposition methods, can significantly improve the performance of the resulting classifiers, and also that using a modified label propagation algorithm is beneficial when the data set is imbalanced.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7, 2399–2434.MathSciNetMATH Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7, 2399–2434.MathSciNetMATH
Zurück zum Zitat Burt, R., & Minor, M. (1983). Applied Network Analysis: A Methodological Introduction: Sage Publications. Burt, R., & Minor, M. (1983). Applied Network Analysis: A Methodological Introduction: Sage Publications.
Zurück zum Zitat Cantador, I., Brusilovsky, P., & Kuflik, T. (2011). 2Nd workshop on information heterogeneity and fusion in recommender systems (hetrec 2011). In Proceedings of the 5th ACM conference on Recommender systems. RecSys. New York: ACM. Cantador, I., Brusilovsky, P., & Kuflik, T. (2011). 2Nd workshop on information heterogeneity and fusion in recommender systems (hetrec 2011). In Proceedings of the 5th ACM conference on Recommender systems. RecSys. New York: ACM.
Zurück zum Zitat Consortium (2000). Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature genetics, 25(1), 25–29. Consortium (2000). Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature genetics, 25(1), 25–29.
Zurück zum Zitat de Sousa, C. A. R., Rezende, S. O., & Batista, G. E (2013). Influence of graph construction on semi-supervised learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 160–175): Springer. de Sousa, C. A. R., Rezende, S. O., & Batista, G. E (2013). Influence of graph construction on semi-supervised learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 160–175): Springer.
Zurück zum Zitat Debole, F., & Sebastiani, F (2004). Supervised term weighting for automated text categorization. In Text Mining and Its Applications (pp. 81–97): Springer. Debole, F., & Sebastiani, F (2004). Supervised term weighting for automated text categorization. In Text Mining and Its Applications (pp. 81–97): Springer.
Zurück zum Zitat Demṡar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.MathSciNet Demṡar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.MathSciNet
Zurück zum Zitat D’Orazio, V., Landis, S. T., Palmer, G., & Schrodt, P. (2014). Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Polytical Analysis, 22(2), 224–242.CrossRef D’Orazio, V., Landis, S. T., Palmer, G., & Schrodt, P. (2014). Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Polytical Analysis, 22(2), 224–242.CrossRef
Zurück zum Zitat Grčar, M., Trdin, N., & Lavrač, N. (2013). A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335.CrossRef Grčar, M., Trdin, N., & Lavrač, N. (2013). A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335.CrossRef
Zurück zum Zitat Han, E.-H., & Karypis, G (2000). Centroid-based document classification: Analysis and experimental results. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 424–431): Springer. Han, E.-H., & Karypis, G (2000). Centroid-based document classification: Analysis and experimental results. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 424–431): Springer.
Zurück zum Zitat Hwang, T., & Kuang, R. (2010). A heterogeneous label propagation algorithm for disease gene discovery. In Proceedings of SIAM International Conference on Data Mining (pp. 583–594). Hwang, T., & Kuang, R. (2010). A heterogeneous label propagation algorithm for disease gene discovery. In Proceedings of SIAM International Conference on Data Mining (pp. 583–594).
Zurück zum Zitat Jeh, G., & Widom, J (2002). SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543): ACM. Jeh, G., & Widom, J (2002). SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543): ACM.
Zurück zum Zitat Ji, M., Sun, Y., Danilevsky, M., Han, J., & Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (pp. 570–586). Ji, M., Sun, Y., Danilevsky, M., Han, J., & Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (pp. 570–586).
Zurück zum Zitat Jones, K.S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.CrossRef Jones, K.S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.CrossRef
Zurück zum Zitat Kondor, R.I., & Lafferty, J.D. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the 19th International Conference on Machine Learning (pp. 315–322). Kondor, R.I., & Lafferty, J.D. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the 19th International Conference on Machine Learning (pp. 315–322).
Zurück zum Zitat Kralj, J., Valmarska, A., Robnik-Ṡikonja, M., & Lavraċ, N. (2015). Mining text enriched heterogeneous citation networks. In Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 672–683). Kralj, J., Valmarska, A., Robnik-Ṡikonja, M., & Lavraċ, N. (2015). Mining text enriched heterogeneous citation networks. In Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 672–683).
Zurück zum Zitat Kwok, J.T.-Y. (1998). Automated text categorization using support vector machine. In Proceedings of the 5th International Conference on Neural Information Processing (pp. 347–351). Kwok, J.T.-Y. (1998). Automated text categorization using support vector machine. In Proceedings of the 5th International Conference on Neural Information Processing (pp. 347–351).
Zurück zum Zitat Lan, M., Tan, C.L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721–735.CrossRef Lan, M., Tan, C.L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721–735.CrossRef
Zurück zum Zitat Liu, W., & Chang, S.-F (2009). Robust multi-class transductive learning with graphs. In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (pp. 381–388): IEEE. Liu, W., & Chang, S.-F (2009). Robust multi-class transductive learning with graphs. In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (pp. 381–388): IEEE.
Zurück zum Zitat Manevitz, L.M., & Yousef, M. (2002). One-class SVMs for document classification. Journal of Machine Learning Research, 2, 139–154.MATH Manevitz, L.M., & Yousef, M. (2002). One-class SVMs for document classification. Journal of Machine Learning Research, 2, 139–154.MATH
Zurück zum Zitat Martineau, J., & Finin, T. (2009). Delta TFIDF: an improved feature space for sentiment analysis. In Proceedings of the third AAAI internatonal conference on weblogs and social media. San Jose: AAAI Press. Martineau, J., & Finin, T. (2009). Delta TFIDF: an improved feature space for sentiment analysis. In Proceedings of the third AAAI internatonal conference on weblogs and social media. San Jose: AAAI Press.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report: Stanford InfoLab. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report: Stanford InfoLab.
Zurück zum Zitat Robertson, S.E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 232–241). New York: Springer. Robertson, S.E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 232–241). New York: Springer.
Zurück zum Zitat Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI magazine, 29(3), 93.CrossRef Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI magazine, 29(3), 93.CrossRef
Zurück zum Zitat Storn, R., & Price, K. (1997). Differential evolution; A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.MathSciNetCrossRefMATH Storn, R., & Price, K. (1997). Differential evolution; A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.MathSciNetCrossRefMATH
Zurück zum Zitat Sun, Y., & Han, J. (2012). Mining Heterogeneous Information Networks: Principles and Methodologies: Morgan & Claypool Publishers. Sun, Y., & Han, J. (2012). Mining Heterogeneous Information Networks: Principles and Methodologies: Morgan & Claypool Publishers.
Zurück zum Zitat Sun, Y., Yu, Y., & Han, J. (2009). Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD I,nternational Conference on Knowledge Discovery and Data Mining (pp. 797–806). Sun, Y., Yu, Y., & Han, J. (2009). Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD I,nternational Conference on Knowledge Discovery and Data Mining (pp. 797–806).
Zurück zum Zitat Tan, S. (2006). An effective refinement strategy for KNN text classifier. Expert Systems with Applications, 30(2), 290–298.CrossRef Tan, S. (2006). An effective refinement strategy for KNN text classifier. Expert Systems with Applications, 30(2), 290–298.CrossRef
Zurück zum Zitat Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In KDD’08 (pp. 990–998). Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In KDD’08 (pp. 990–998).
Zurück zum Zitat Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., & Sharan, R. (2010). Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology, 6(1). Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., & Sharan, R. (2010). Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology, 6(1).
Zurück zum Zitat Zachary, W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452–473.CrossRef Zachary, W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452–473.CrossRef
Zurück zum Zitat Zhou, D., Bousquet, O., Lal, T.N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. Advances in N,eural Information Processing Systems, 16(16), 321–328. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. Advances in N,eural Information Processing Systems, 16(16), 321–328.
Zurück zum Zitat Zhu, X., Ghahramani, Z., Lafferty, J., & et al. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML, (Vol. 3 pp. 912–919). Zhu, X., Ghahramani, Z., Lafferty, J., & et al. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML, (Vol. 3 pp. 912–919).
Metadaten
Titel
HINMINE: heterogeneous information network mining with information retrieval heuristics
Publikationsdatum
28.01.2017
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2018
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-017-0444-9

Weitere Artikel der Ausgabe 1/2018

Journal of Intelligent Information Systems 1/2018 Zur Ausgabe