Top

Published in:

2021 | OriginalPaper | Chapter

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

Authors : Sandhya Avasthi, Ritu Chauhan, Debi P. Acharjya

Published in: Advances in Information Communication Technology and Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The discovery of knowledge from large-scale text data or semi-structured data is very difficult. In text mining, useful information is extracted out of such large text corpus which fulfills a user current information need. This process is being exploited by various organizations for quality improvement, business need, and understanding user behavior. The text available in unstructured and semi-structured form can come through sources such as medical, financial, market, scientific, and others documents. Text mining applies quantitative approach to analyze massive amount of textual data and tries to solve information overload problem. The main objective is to review text mining techniques, application areas, and existing issues.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Tree-Based Multi-Keyword Rank Search Scheme Supporting Dynamic Update and Verifiability upon Encrypted Cloud Data

next chapter Vehicle Number Extraction Using Open Source Tools

Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut KA (2017) brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919

Padhy N, Mishra D, Panigrahi R et al (2012) The survey of data mining applications and feature scope. arXiv preprint arXiv:1211.5723

Fan W, Wallace L, Rich S, Zhang Z (2006) Tapping the power of text mining. Commun ACM 49(9):76–82CrossRef

Rajendra R, Saransh V (2013) A novel modified apriori approach for web document clustering. Int J Comput Appl 159–171

Weiss SM, Indurkhya N, Zhang T, Damerau F (2010) Text mining: predictive methods for analyzing unstructured information. Springer Science and Business Media

Gupta V, Lehal GS (2009) A survey of text mining technique and applications. J Emerg Technol web Intell

Liao S-H, Chu P-H, Hsiao P-Y (2012) Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst Appl 39(12):11303–11311

Welbers K, Van Atteveldt W, Benoit K (2017) Text Analysis in R. Commun Methods Meas 11(4):245–265CrossRef

Cohen AM, Hersh WR (2005) A survey of current work in biomedical text mining. Brief Bioinform 6(1):57–71CrossRef

10.

Manning CD, Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press

11.

Li J, Sun A, Han J, Li C (2018) A survey on deep learning for named entity recognition. arXiv preprint arXiv:1812.09449

12.

Henriksson A, Moen H, Skeppstedt M, Daudaravičius V, Duneld M (2014) Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J Biomed Semant 5(1):1CrossRef

13.

Kaur H, Chauhan R, Alam MA, Aljunid S (2012) SpaGRID: a spatial grid framework for high dimensional databases, pp 690–691

14.

Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef

15.

Heidari M, Felden C (2015) Financial footnote analysis: developing a text mining approach. In: Proceedings of international conference on data mining (DMIN), pp 10–16

16.

Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359MATHCrossRef

17.

Nedellec C, Nazarenko A (2005) Ontologies and information extraction: a necessary symbiosis. In: Buitelaar P, Comiano P, Magnin B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press Publication

18.

Chauhan R, Jangade R, Rekapally R (2018) Classification model for prediction of heart disease. In: Soft computing: theories and applications, pp 707–714

19.

Athenikos SJ, Han H (2010) Biomedical question answering: a survey. Comput Methods Programs Biomed 99(1):1–24CrossRef

20.

Kaur H, Chauhan R, Afshar Alam M (2010) Spatial clustering algorithm using R-tree. J Comput 3(2):85–90

21.

Chan YS, Roth D (2010) Exploiting background knowledge for relation extraction. In: Proceedings of the 23rd international conference on computational linguistics . Association for Computational Linguistics, pp 152–160

22.

Oard DW, Baron JR, Hedin B (2010) Evaluation of information retrieval for E-discovery. Artif Intell Law 18:347

23.

Kaur H, Chauhan R, Aljunid SM (2012) Data mining cluster analysis on the influence of health factors in Casemix data. 12(suppl 1):2–3

24.

Chauhan R, Kumar N, Rekapally R (2019) Predictive data analytics technique for optimization of medical databases. In: Proceedings of SoCTA, pp 433–441

25.

Ittoo A, Nguyen LM, van den Bosch A (2016) Text analytics in industry: challenges, desiderata and trends. Comput Ind 78:96–107CrossRef

26.

Al-Hashemi R (2010) Text summarization extraction system (tses) using extracted keywords. Int Arab J e-Technol 1(4):164–168

27.

Witten IH, Don KJ, Dewsnip M, Tablan V (2004) Text mining in a digital library. Int J Digit Libr 4(1):56–59CrossRef

28.

Henriksson A, Zhao J, Dalianis H, Boström H (2016) Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Med Inform Decis Mak 16(2):69

29.

Alonso I, Contreras D (2016) Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: an UMLS approach. Expert Syst Appl 44:386–399CrossRef

30.

Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205CrossRef

31.

Zhao Y (2013) Analysing twitter data with text mining and social network analysis. In: Proceedings of the 11th Australasian data mining and analytics conference

32.

Dörre J, Gerstl P, Seiffert R (1999) Text mining: finding nuggets in mountains of textual data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 398–401

33.

Sharda R, Henry M (2009) Information extraction from interviews to obtain tacit knowledge: a text mining application. In: AMCIS 2009 proceedings, p 283

34.

Ayesha S, Mustafa T, Sattar AR, Khan MI (2010) Data mining model for higher education system. Eur J Sci Res 43(1):24–29

35.

Sanderson M, Zobel J (2009) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 162–169

36.

Antoun C, Zhang C, Conrad FG, Schober MF, Comparisons of online recruitment strategies for convenience samples: craigslist, Google AdWords, Facebook, and Amazon mechanical turk 28(3):231–246

37.

Türegü N (2018) Text mining in financial information. In: Current analysis on economics & finance, pp 18–26

38.

Mcinnes BT, Stevenson M (2014) Determining the difficulty of word sense disambiguation. J Biomed Inform 47:83–90CrossRef

39.

King G, Lam P, Roberts M (2014) Computer-assisted keyword and document set discovery from unstructured text 456

40.

Wen Z, Yoshida T, Tang X (2007) A study with multi-word feature with text classification. In: Proceedings of the 51st annual meeting of the ISSS-2007, Tokyo, Japan, vol 51, p 45

Title: Techniques, Applications, and Issues in Mining Large-Scale Text Databases
Authors: Sandhya Avasthi
Ritu Chauhan
Debi P. Acharjya
Publisher: Springer Singapore
Book: Advances in Information Communication Technology and Computing
Print ISBN: 978-981-15-5420-9

Electronic ISBN: 978-981-15-5421-6

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-15-5421-6_39

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"