Skip to main content
Erschienen in: International Journal on Digital Libraries 1/2020

16.11.2018

A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information

verfasst von: Gustavo Oliveira de Siqueira, Sérgio Canuto, Marcos André Gonçalves, Alberto H. F. Laender

Erschienen in: International Journal on Digital Libraries | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this article, we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a knowledge area classification scheme. Our proposal relies on discriminatory evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014) Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014)
2.
Zurück zum Zitat Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012) Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012)
3.
Zurück zum Zitat Campos, R., Canuto, S., Salles, T., de Sá, C.C., Gonçalves, M.A.: Stacking bagged and boosted forests for effective automated classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 105–114. ACM (2017) Campos, R., Canuto, S., Salles, T., de Sá, C.C., Gonçalves, M.A.: Stacking bagged and boosted forests for effective automated classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 105–114. ACM (2017)
4.
Zurück zum Zitat Canuto, S., Gonçalves, M., Santos, W., Rosa, T., Martins, W.: An efficient and scalable metafeature-based document classification approach based on massively parallel computing. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 333–342. ACM (2015) Canuto, S., Gonçalves, M., Santos, W., Rosa, T., Martins, W.: An efficient and scalable metafeature-based document classification approach based on massively parallel computing. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 333–342. ACM (2015)
5.
Zurück zum Zitat Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011) Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011)
6.
Zurück zum Zitat Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014) Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014)
7.
Zurück zum Zitat de Siqueira, G.O., Canuto, S., Gonçalves, M.A., Laender, A.H.F.: Automatic hierarchical categorization of research expertise using minimum information. In: International Conference on Theory and Practice of Digital Libraries, pp. 103–115. Springer (2017) de Siqueira, G.O., Canuto, S., Gonçalves, M.A., Laender, A.H.F.: Automatic hierarchical categorization of research expertise using minimum information. In: International Conference on Theory and Practice of Digital Libraries, pp. 103–115. Springer (2017)
8.
Zurück zum Zitat Dias, T.M.R.: A study on the Brazilian scientific production based on data from the lattes platform (in Portuguese). Ph.D. Thesis, CEFET-MG, Belo Horizonte, MG (2016) Dias, T.M.R.: A study on the Brazilian scientific production based on data from the lattes platform (in Portuguese). Ph.D. Thesis, CEFET-MG, Belo Horizonte, MG (2016)
9.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001)CrossRef Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001)CrossRef
10.
Zurück zum Zitat Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)CrossRef Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)CrossRef
11.
Zurück zum Zitat Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Syst. Appl. 38(7), 8586–8596 (2011)CrossRef Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Syst. Appl. 38(7), 8586–8596 (2011)CrossRef
12.
Zurück zum Zitat Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)CrossRef Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)CrossRef
13.
Zurück zum Zitat Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)CrossRef Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)CrossRef
14.
Zurück zum Zitat Moreira, C., Calado, P., Martins, B.: Learning to Rank for Expert Search in Digital Libraries of Academic Publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Berlin (2011)CrossRef Moreira, C., Calado, P., Martins, B.: Learning to Rank for Expert Search in Digital Libraries of Academic Publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Berlin (2011)CrossRef
15.
Zurück zum Zitat Naik, A., Rangwala, H.: Hierflat: flattened hierarchies for improving top-down hierarchical classification. Int. J. Data Sci. Anal. 4(3), 191–208 (2017)CrossRef Naik, A., Rangwala, H.: Hierflat: flattened hierarchies for improving top-down hierarchical classification. Int. J. Data Sci. Anal. 4(3), 191–208 (2017)CrossRef
16.
Zurück zum Zitat Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spat. Algorithms Syst. 2(4), 14:1–14:24 (2016) Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spat. Algorithms Syst. 2(4), 14:1–14:24 (2016)
17.
Zurück zum Zitat Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRef Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRef
18.
Zurück zum Zitat Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015) Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015)
19.
Zurück zum Zitat Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. J. Assoc. Inf. Sci. Technol. 52(5), 391–401 (2001)CrossRef Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. J. Assoc. Inf. Sci. Technol. 52(5), 391–401 (2001)CrossRef
20.
Zurück zum Zitat Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: Broof: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 353–362. ACM (2015) Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: Broof: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 353–362. ACM (2015)
21.
Zurück zum Zitat Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)CrossRef Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)CrossRef
22.
Zurück zum Zitat Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12, 482 (2011)CrossRef Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12, 482 (2011)CrossRef
23.
Zurück zum Zitat Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)MathSciNetCrossRef Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)MathSciNetCrossRef
24.
Zurück zum Zitat Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic Theses and Dissertations: Data and Dissertations (2016) Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic Theses and Dissertations: Data and Dissertations (2016)
25.
Zurück zum Zitat Viegas, F., da Rocha, L.C., Resende, E., Salles, T., Martins, W., Freitas, M.F., Gonçalves, M.A.: Exploiting efficient and effective lazy semi-bayesian strategies for text classification. Neurocomputing 307, 153–171 (2018)CrossRef Viegas, F., da Rocha, L.C., Resende, E., Salles, T., Martins, W., Freitas, M.F., Gonçalves, M.A.: Exploiting efficient and effective lazy semi-bayesian strategies for text classification. Neurocomputing 307, 153–171 (2018)CrossRef
26.
Zurück zum Zitat Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Berlin (2011)CrossRef Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Berlin (2011)CrossRef
27.
Zurück zum Zitat Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)CrossRef Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)CrossRef
28.
Zurück zum Zitat Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)CrossRef Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)CrossRef
Metadaten
Titel
A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information
verfasst von
Gustavo Oliveira de Siqueira
Sérgio Canuto
Marcos André Gonçalves
Alberto H. F. Laender
Publikationsdatum
16.11.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 1/2020
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-018-0260-z

Weitere Artikel der Ausgabe 1/2020

International Journal on Digital Libraries 1/2020 Zur Ausgabe

Premium Partner