Skip to main content
Top
Published in: Soft Computing 8/2020

12-10-2018 | Focus

Information retrieval methodology for aiding scientific database search

Authors: Samuel Marcos-Pablos, Francisco J. García-Peñalvo

Published in: Soft Computing | Issue 8/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

During literature reviews, and specially when conducting systematic literature reviews, finding and screening relevant papers during scientific document search may involve managing and processing large amounts of unstructured text data. In those cases where the search topic is difficult to establish or has fuzzy limits, researchers require to broaden the scope of the search and, in consequence, data from retrieved scientific publications may become huge and uncorrelated. However, through a convenient analysis of these data the researcher may be able to discover new knowledge which may be hidden within the search output, thus exploring the limits of the search and enhancing the review scope. With that aim, this paper presents an iterative methodology that applies text mining and machine learning techniques to a downloaded corpus of abstracts from scientific databases, combining automatic processing algorithms with tools for supervised decision-making in an iterative process sustained on the researchers’ judgement, so as to adapt, screen and tune the search output. The paper ends showing a working example that employs a set of developed scripts that implement the different stages of the proposed methodology.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Buttcher S, Clarke C, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. The MIT Press, CambridgeMATH Buttcher S, Clarke C, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. The MIT Press, CambridgeMATH
go back to reference Felizardo KR, Nakagawa EY, Feitosa D, Minghim R, Maldonado JC (2010) An approach based on visual text mining to support categorization and classification in the systematic mapping. In: Proceedings of the 14th international conference on evaluation and assessment in software engineering, BCS learning & development Ltd., Swindon, UK, EASE’10, pp 34–43 Felizardo KR, Nakagawa EY, Feitosa D, Minghim R, Maldonado JC (2010) An approach based on visual text mining to support categorization and classification in the systematic mapping. In: Proceedings of the 14th international conference on evaluation and assessment in software engineering, BCS learning & development Ltd., Swindon, UK, EASE’10, pp 34–43
go back to reference Hotho A, Nnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62 Hotho A, Nnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62
go back to reference Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Berlin, pp 137–142CrossRef Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Berlin, pp 137–142CrossRef
go back to reference Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Version 2.3, EBSE Technical Report EBSE-2007-01, Keele University and University of Durham Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Version 2.3, EBSE Technical Report EBSE-2007-01, Keele University and University of Durham
go back to reference Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. VLDB Endow 5(12):2032–2033CrossRef Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. VLDB Endow 5(12):2032–2033CrossRef
go back to reference Marcos-Pablos S, García-Peñalvo F Decision support tools for slr search string construction. In: Proceedings of the 6th international conference on technological ecosystems for enhancing multiculturality, ACM, New York, NY, USA, TEEM 2018 (in press) Marcos-Pablos S, García-Peñalvo F Decision support tools for slr search string construction. In: Proceedings of the 6th international conference on technological ecosystems for enhancing multiculturality, ACM, New York, NY, USA, TEEM 2018 (in press)
go back to reference Mayer-Schnberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston Mayer-Schnberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston
go back to reference Mergel GD, Silveira MS, da Silva TS (2015) A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th annual ACM symposium on applied computing, ACM, New York, NY, USA, SAC’15, pp 1594–1601. https://doi.org/10.1145/2695664.2695902 Mergel GD, Silveira MS, da Silva TS (2015) A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th annual ACM symposium on applied computing, ACM, New York, NY, USA, SAC’15, pp 1594–1601. https://​doi.​org/​10.​1145/​2695664.​2695902
go back to reference Olorisade BK, de Quincey E, Brereton P, Andras P (2016) A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’16, pp 14:1–14:11. https://doi.org/10.1145/2915970.2915982 Olorisade BK, de Quincey E, Brereton P, Andras P (2016) A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’16, pp 14:1–14:11. https://​doi.​org/​10.​1145/​2915970.​2915982
go back to reference O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5CrossRef O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5CrossRef
go back to reference Petticrew M, Roberts H (2008) Systematic reviews in the social sciences: a practical guide. Wiley, London Petticrew M, Roberts H (2008) Systematic reviews in the social sciences: a practical guide. Wiley, London
go back to reference Ros R, Bjarnason E, Runeson P (2017) A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’17, pp 118–127. https://doi.org/10.1145/3084226.3084243 Ros R, Bjarnason E, Runeson P (2017) A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’17, pp 118–127. https://​doi.​org/​10.​1145/​3084226.​3084243
go back to reference Sparck Jones K (1988) Document retrieval systems. Taylor Graham Publishing, London, UK, chap A statistical interpretation of term specificity and its application in retrieval, pp 132–142 Sparck Jones K (1988) Document retrieval systems. Taylor Graham Publishing, London, UK, chap A statistical interpretation of term specificity and its application in retrieval, pp 132–142
go back to reference Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
go back to reference Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E (2014) Systematic review automation technologies. Syst Rev 3:74CrossRef Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E (2014) Systematic review automation technologies. Syst Rev 3:74CrossRef
Metadata
Title
Information retrieval methodology for aiding scientific database search
Authors
Samuel Marcos-Pablos
Francisco J. García-Peñalvo
Publication date
12-10-2018
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 8/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3568-0

Other articles of this Issue 8/2020

Soft Computing 8/2020 Go to the issue

Premium Partner