Skip to main content

2018 | OriginalPaper | Buchkapitel

W-PathSim: Novel Approach of Weighted Similarity Measure in Content-Based Heterogeneous Information Networks by Applying LDA Topic Modeling

verfasst von : Phu Pham, Phuc Do, Chien D. C. Ta

Erschienen in: Intelligent Information and Database Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In information retrieval, similarity measure or top-k similarity searching had been extensively researched. Similarity search supports to find the most relevant information in a large-scale collection of datasets, especially, with large-scale heterogeneous information networks (HINs) which is composed by multiple types of object and relation. There are studies related to similarity search applied in HINs, the “PathSim” is one of a remarkable work of Sun et al. which is based on meta-path for calculating the similarity between objects in multi-typed linking information networks. However, there is also a shortcoming of PathSim in weighting the “path instance(s)” of defined meta-paths in similarity scoring between two objects. The shortage of evaluating the weight of linked connections between objects might influence the output quality. In this paper, we present W - PathSim model, which applies the L atent D irichlet A llocation (LDA) topic modeling for generating the weighting attribute for the object’s links. We conduct experiments on real DBLP and Aminer datasets in order to demonstrate the effectiveness of our proposed model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Han, J., et al.: Mining knowledge from databases: an information network analysis approach. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1251–1252, ACM (2010) Han, J., et al.: Mining knowledge from databases: an information network analysis approach. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1251–1252, ACM (2010)
2.
Zurück zum Zitat Shi, C., et al.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 17–37 (2017) Shi, C., et al.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 17–37 (2017)
3.
Zurück zum Zitat Ji, M., Han, J., Danilevsky, M.: Ranking-based classification of heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1298–1306, ACM (2011) Ji, M., Han, J., Danilevsky, M.: Ranking-based classification of heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1298–1306, ACM (2011)
4.
Zurück zum Zitat Sun, Y., et al.: Community evolution detection in dynamic heterogeneous information networks. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pp. 137–146, ACM (2010) Sun, Y., et al.: Community evolution detection in dynamic heterogeneous information networks. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pp. 137–146, ACM (2010)
5.
Zurück zum Zitat Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543, ACM (2002) Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543, ACM (2002)
6.
Zurück zum Zitat Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web, pp. 271–279, ACM (2003) Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web, pp. 271–279, ACM (2003)
7.
Zurück zum Zitat Xu, X., et al.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833, ACM (2007) Xu, X., et al.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833, ACM (2007)
8.
Zurück zum Zitat Sun, Y., et al.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of the VLDB Endowment, pp. 992–1003 (2011) Sun, Y., et al.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of the VLDB Endowment, pp. 992–1003 (2011)
9.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 993–1022 (2003) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 993–1022 (2003)
10.
Zurück zum Zitat Blei, D.M.: Probabilistic topic models. Commun. ACM 77–84 (2012) Blei, D.M.: Probabilistic topic models. Commun. ACM 77–84 (2012)
12.
Zurück zum Zitat Chodpathumwan, Y., et al.: Towards representation independent similarity search over graph databases. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2233–2238, ACM (2016) Chodpathumwan, Y., et al.: Towards representation independent similarity search over graph databases. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2233–2238, ACM (2016)
13.
Zurück zum Zitat Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, pp. 424–440 (2007) Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, pp. 424–440 (2007)
Metadaten
Titel
W-PathSim: Novel Approach of Weighted Similarity Measure in Content-Based Heterogeneous Information Networks by Applying LDA Topic Modeling
verfasst von
Phu Pham
Phuc Do
Chien D. C. Ta
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75417-8_51