Skip to main content

2018 | OriginalPaper | Buchkapitel

Hierarchical Expert Profiling Using Heterogeneous Information Networks

verfasst von : Jorge Silva, Pedro Ribeiro, Fernando Silva

Erschienen in: Discovery Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Linking an expert to his knowledge areas is still a challenging research problem. The task is usually divided into two steps: identifying the knowledge areas/topics in the text corpus and assign them to the experts. Common approaches for the expert profiling task are based on the Latent Dirichlet Allocation (LDA) algorithm. As a result, they require pre-defining the number of topics to be identified which is not ideal in most cases. Furthermore, LDA generates a list of independent topics without any kind of relationship between them. Expert profiles created using this kind of flat topic lists have been reported as highly redundant and many times either too specific or too general.
In this paper we propose a methodology that addresses these limitations by creating hierarchical expert profiles, where the knowledge areas of a researcher are mapped along different granularity levels, from broad areas to more specific ones. For the purpose, we explore the rich structure and semantics of Heterogeneous Information Networks (HINs). Our strategy is divided into two parts. First, we introduce a novel algorithm that can fully use the rich content of an HIN to create a topical hierarchy, by discovering overlapping communities and ranking the nodes inside each community. We then present a strategy to map the knowledge areas of an expert along all the levels of the hierarchy, exploiting the information we have about the expert to obtain an hierarchical profile of topics.
To test our proposed methodology, we used a computer science bibliographical dataset to create a star-schema HIN containing publications as star-nodes and authors, keywords and ISI fields as attribute-nodes. We use heterogeneous pointwise mutual information to demonstrate the quality and coherence of our created hierarchies. Furthermore, we use manually labelled data to serve as ground truth to evaluate our hierarchical expert profiles, showcasing how our strategy is capable of building accurate profiles.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Research areas created by the Institute for Scientific Information.
 
4
For simplicity consider that the links have the same weight.
 
5
As illustrated by Fig. 2.
 
6
For clarification, an ’-’ symbol refers to a different level on the hierarchy.
 
7
Through experimentation we determined that 4 was the number of levels that achieved the most comprehensible topical hierarchy.
 
8
Following the idea of [21], we setted \(k=5\) for ISI fields since there are only 120 of them in the HIN. In these cases, the part \(\frac{1}{k^2}\) of the formula changes to \(\frac{1}{5k}\).
 
Literatur
1.
Zurück zum Zitat Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends\(\textregistered \) Inf. Retriev. 6(2–3), 127–256 (2012)CrossRef Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends\(\textregistered \) Inf. Retriev. 6(2–3), 127–256 (2012)CrossRef
2.
Zurück zum Zitat Berendsen, R., Rijke, M., Balog, K., Bogers, T., Bosch, A.: On the assessment of expertise profiles. J. Assoc. Inf. Sci. Technol. 64(10), 2024–2044 (2013)CrossRef Berendsen, R., Rijke, M., Balog, K., Bogers, T., Bosch, A.: On the assessment of expertise profiles. J. Assoc. Inf. Sci. Technol. 64(10), 2024–2044 (2013)CrossRef
3.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine Learn. Res. 3(Jan), 993–1022 (2003) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine Learn. Res. 3(Jan), 993–1022 (2003)
4.
Zurück zum Zitat Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)CrossRef Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)CrossRef
5.
Zurück zum Zitat Daud, A.: Using time topic modeling for semantics-based dynamic research interest finding. Knowl.Based Syst. 26, 154–163 (2012)CrossRef Daud, A.: Using time topic modeling for semantics-based dynamic research interest finding. Knowl.Based Syst. 26, 154–163 (2012)CrossRef
6.
Zurück zum Zitat De Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: Committee-based profiles for politician finding. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 25(Suppl. 2), 21–36 (2017)CrossRef De Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: Committee-based profiles for politician finding. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 25(Suppl. 2), 21–36 (2017)CrossRef
7.
Zurück zum Zitat Duan, D., Li, Y., Li, R., Lu, Z., Wen, A.: Mei: Mutual enhanced infinite community-topic model for analyzing text-augmented social networks. Comput. J. 56(3), 336–354 (2012)CrossRef Duan, D., Li, Y., Li, R., Lu, Z., Wen, A.: Mei: Mutual enhanced infinite community-topic model for analyzing text-augmented social networks. Comput. J. 56(3), 336–354 (2012)CrossRef
9.
Zurück zum Zitat bin Jamaludin, N.A., Annamalai, M., Jamil, N., Bakar, Z.A.: A model for keyword profile creation using extracted keywords and terminological ontology. In: 2013 IEEE Conference on e-Learning, e-Management and e-Services (IC3e), pp. 136–141. IEEE (2013) bin Jamaludin, N.A., Annamalai, M., Jamil, N., Bakar, Z.A.: A model for keyword profile creation using extracted keywords and terminological ontology. In: 2013 IEEE Conference on e-Learning, e-Management and e-Services (IC3e), pp. 136–141. IEEE (2013)
10.
Zurück zum Zitat Jeong, Y.S., Lee, S.H., Gweon, G.: Discovery of research interests of authors over time using a topic model. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 24–31. IEEE (2016) Jeong, Y.S., Lee, S.H., Gweon, G.: Discovery of research interests of authors over time using a topic model. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 24–31. IEEE (2016)
11.
Zurück zum Zitat Karimzadehgan, M., White, R.W., Richardson, M.: Enhancing expert finding using organizational hierarchies. In: European Conference on Information Retrieval, pp. 177–188. Springer (2009) Karimzadehgan, M., White, R.W., Richardson, M.: Enhancing expert finding using organizational hierarchies. In: European Conference on Information Retrieval, pp. 177–188. Springer (2009)
12.
Zurück zum Zitat Li, C., Cheung, W.K., Ye, Y., Zhang, X., Chu, D., Li, X.: The author-topic-community model for author interest profiling and community discovery. Knowl. Inf. Syst. 44(2), 359–383 (2015)CrossRef Li, C., Cheung, W.K., Ye, Y., Zhang, X., Chu, D., Li, X.: The author-topic-community model for author interest profiling and community discovery. Knowl. Inf. Syst. 44(2), 359–383 (2015)CrossRef
13.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
14.
Zurück zum Zitat Newman, M.E.: Modularity and community structure in networks. Proc. Natl Acad. Sci. 103(23), 8577–8582 (2006)CrossRef Newman, M.E.: Modularity and community structure in networks. Proc. Natl Acad. Sci. 103(23), 8577–8582 (2006)CrossRef
15.
Zurück zum Zitat Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004) Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
16.
17.
Zurück zum Zitat Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)CrossRef Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)CrossRef
18.
Zurück zum Zitat Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 565–576. ACM (2009) Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 565–576. ACM (2009)
19.
Zurück zum Zitat Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806. ACM (2009) Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806. ACM (2009)
20.
Zurück zum Zitat Tang, J., Jin, R., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: Eighth IEEE International Conference on Data Mining, 2008 ICDM 2008, pp. 1055–1060. IEEE (2008) Tang, J., Jin, R., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: Eighth IEEE International Conference on Data Mining, 2008 ICDM 2008, pp. 1055–1060. IEEE (2008)
21.
Zurück zum Zitat Wang, C., Liu, J., Desai, N., Danilevsky, M., Han, J.: Constructing topical hierarchies in heterogeneous information networks. Knowl. Inf. Syst. 44(3), 529–558 (2015)CrossRef Wang, C., Liu, J., Desai, N., Danilevsky, M., Han, J.: Constructing topical hierarchies in heterogeneous information networks. Knowl. Inf. Syst. 44(3), 529–558 (2015)CrossRef
22.
Zurück zum Zitat Wang, J., Hu, X., Tu, X., He, T.: Author-conference topic-connection model for academic network search. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2179–2183. ACM (2012) Wang, J., Hu, X., Tu, X., He, T.: Author-conference topic-connection model for academic network search. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2179–2183. ACM (2012)
Metadaten
Titel
Hierarchical Expert Profiling Using Heterogeneous Information Networks
verfasst von
Jorge Silva
Pedro Ribeiro
Fernando Silva
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01771-2_22