Skip to main content
Top

2021 | OriginalPaper | Chapter

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

Authors : Sandhya Avasthi, Ritu Chauhan, Debi P. Acharjya

Published in: Advances in Information Communication Technology and Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The discovery of knowledge from large-scale text data or semi-structured data is very difficult. In text mining, useful information is extracted out of such large text corpus which fulfills a user current information need. This process is being exploited by various organizations for quality improvement, business need, and understanding user behavior. The text available in unstructured and semi-structured form can come through sources such as medical, financial, market, scientific, and others documents. Text mining applies quantitative approach to analyze massive amount of textual data and tries to solve information overload problem. The main objective is to review text mining techniques, application areas, and existing issues.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut KA (2017) brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut KA (2017) brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:​1707.​02919
2.
go back to reference Padhy N, Mishra D, Panigrahi R et al (2012) The survey of data mining applications and feature scope. arXiv preprint arXiv:1211.5723 Padhy N, Mishra D, Panigrahi R et al (2012) The survey of data mining applications and feature scope. arXiv preprint arXiv:​1211.​5723
3.
go back to reference Fan W, Wallace L, Rich S, Zhang Z (2006) Tapping the power of text mining. Commun ACM 49(9):76–82CrossRef Fan W, Wallace L, Rich S, Zhang Z (2006) Tapping the power of text mining. Commun ACM 49(9):76–82CrossRef
4.
go back to reference Rajendra R, Saransh V (2013) A novel modified apriori approach for web document clustering. Int J Comput Appl 159–171 Rajendra R, Saransh V (2013) A novel modified apriori approach for web document clustering. Int J Comput Appl 159–171
5.
go back to reference Weiss SM, Indurkhya N, Zhang T, Damerau F (2010) Text mining: predictive methods for analyzing unstructured information. Springer Science and Business Media Weiss SM, Indurkhya N, Zhang T, Damerau F (2010) Text mining: predictive methods for analyzing unstructured information. Springer Science and Business Media
6.
go back to reference Gupta V, Lehal GS (2009) A survey of text mining technique and applications. J Emerg Technol web Intell Gupta V, Lehal GS (2009) A survey of text mining technique and applications. J Emerg Technol web Intell
7.
go back to reference Liao S-H, Chu P-H, Hsiao P-Y (2012) Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst Appl 39(12):11303–11311 Liao S-H, Chu P-H, Hsiao P-Y (2012) Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst Appl 39(12):11303–11311
8.
go back to reference Welbers K, Van Atteveldt W, Benoit K (2017) Text Analysis in R. Commun Methods Meas 11(4):245–265CrossRef Welbers K, Van Atteveldt W, Benoit K (2017) Text Analysis in R. Commun Methods Meas 11(4):245–265CrossRef
9.
go back to reference Cohen AM, Hersh WR (2005) A survey of current work in biomedical text mining. Brief Bioinform 6(1):57–71CrossRef Cohen AM, Hersh WR (2005) A survey of current work in biomedical text mining. Brief Bioinform 6(1):57–71CrossRef
10.
go back to reference Manning CD, Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press Manning CD, Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press
12.
go back to reference Henriksson A, Moen H, Skeppstedt M, Daudaravičius V, Duneld M (2014) Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J Biomed Semant 5(1):1CrossRef Henriksson A, Moen H, Skeppstedt M, Daudaravičius V, Duneld M (2014) Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J Biomed Semant 5(1):1CrossRef
13.
go back to reference Kaur H, Chauhan R, Alam MA, Aljunid S (2012) SpaGRID: a spatial grid framework for high dimensional databases, pp 690–691 Kaur H, Chauhan R, Alam MA, Aljunid S (2012) SpaGRID: a spatial grid framework for high dimensional databases, pp 690–691
14.
go back to reference Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef
15.
go back to reference Heidari M, Felden C (2015) Financial footnote analysis: developing a text mining approach. In: Proceedings of international conference on data mining (DMIN), pp 10–16 Heidari M, Felden C (2015) Financial footnote analysis: developing a text mining approach. In: Proceedings of international conference on data mining (DMIN), pp 10–16
16.
go back to reference Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359MATHCrossRef Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359MATHCrossRef
17.
go back to reference Nedellec C, Nazarenko A (2005) Ontologies and information extraction: a necessary symbiosis. In: Buitelaar P, Comiano P, Magnin B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press Publication Nedellec C, Nazarenko A (2005) Ontologies and information extraction: a necessary symbiosis. In: Buitelaar P, Comiano P, Magnin B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press Publication
18.
go back to reference Chauhan R, Jangade R, Rekapally R (2018) Classification model for prediction of heart disease. In: Soft computing: theories and applications, pp 707–714 Chauhan R, Jangade R, Rekapally R (2018) Classification model for prediction of heart disease. In: Soft computing: theories and applications, pp 707–714
19.
go back to reference Athenikos SJ, Han H (2010) Biomedical question answering: a survey. Comput Methods Programs Biomed 99(1):1–24CrossRef Athenikos SJ, Han H (2010) Biomedical question answering: a survey. Comput Methods Programs Biomed 99(1):1–24CrossRef
20.
go back to reference Kaur H, Chauhan R, Afshar Alam M (2010) Spatial clustering algorithm using R-tree. J Comput 3(2):85–90 Kaur H, Chauhan R, Afshar Alam M (2010) Spatial clustering algorithm using R-tree. J Comput 3(2):85–90
21.
go back to reference Chan YS, Roth D (2010) Exploiting background knowledge for relation extraction. In: Proceedings of the 23rd international conference on computational linguistics . Association for Computational Linguistics, pp 152–160 Chan YS, Roth D (2010) Exploiting background knowledge for relation extraction. In: Proceedings of the 23rd international conference on computational linguistics . Association for Computational Linguistics, pp 152–160
22.
go back to reference Oard DW, Baron JR, Hedin B (2010) Evaluation of information retrieval for E-discovery. Artif Intell Law 18:347 Oard DW, Baron JR, Hedin B (2010) Evaluation of information retrieval for E-discovery. Artif Intell Law 18:347
23.
go back to reference Kaur H, Chauhan R, Aljunid SM (2012) Data mining cluster analysis on the influence of health factors in Casemix data. 12(suppl 1):2–3 Kaur H, Chauhan R, Aljunid SM (2012) Data mining cluster analysis on the influence of health factors in Casemix data. 12(suppl 1):2–3
24.
go back to reference Chauhan R, Kumar N, Rekapally R (2019) Predictive data analytics technique for optimization of medical databases. In: Proceedings of SoCTA, pp 433–441 Chauhan R, Kumar N, Rekapally R (2019) Predictive data analytics technique for optimization of medical databases. In: Proceedings of SoCTA, pp 433–441
25.
go back to reference Ittoo A, Nguyen LM, van den Bosch A (2016) Text analytics in industry: challenges, desiderata and trends. Comput Ind 78:96–107CrossRef Ittoo A, Nguyen LM, van den Bosch A (2016) Text analytics in industry: challenges, desiderata and trends. Comput Ind 78:96–107CrossRef
26.
go back to reference Al-Hashemi R (2010) Text summarization extraction system (tses) using extracted keywords. Int Arab J e-Technol 1(4):164–168 Al-Hashemi R (2010) Text summarization extraction system (tses) using extracted keywords. Int Arab J e-Technol 1(4):164–168
27.
go back to reference Witten IH, Don KJ, Dewsnip M, Tablan V (2004) Text mining in a digital library. Int J Digit Libr 4(1):56–59CrossRef Witten IH, Don KJ, Dewsnip M, Tablan V (2004) Text mining in a digital library. Int J Digit Libr 4(1):56–59CrossRef
28.
go back to reference Henriksson A, Zhao J, Dalianis H, Boström H (2016) Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Med Inform Decis Mak 16(2):69 Henriksson A, Zhao J, Dalianis H, Boström H (2016) Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Med Inform Decis Mak 16(2):69
29.
go back to reference Alonso I, Contreras D (2016) Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: an UMLS approach. Expert Syst Appl 44:386–399CrossRef Alonso I, Contreras D (2016) Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: an UMLS approach. Expert Syst Appl 44:386–399CrossRef
30.
go back to reference Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205CrossRef Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205CrossRef
31.
go back to reference Zhao Y (2013) Analysing twitter data with text mining and social network analysis. In: Proceedings of the 11th Australasian data mining and analytics conference Zhao Y (2013) Analysing twitter data with text mining and social network analysis. In: Proceedings of the 11th Australasian data mining and analytics conference
32.
go back to reference Dörre J, Gerstl P, Seiffert R (1999) Text mining: finding nuggets in mountains of textual data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 398–401 Dörre J, Gerstl P, Seiffert R (1999) Text mining: finding nuggets in mountains of textual data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 398–401
33.
go back to reference Sharda R, Henry M (2009) Information extraction from interviews to obtain tacit knowledge: a text mining application. In: AMCIS 2009 proceedings, p 283 Sharda R, Henry M (2009) Information extraction from interviews to obtain tacit knowledge: a text mining application. In: AMCIS 2009 proceedings, p 283
34.
go back to reference Ayesha S, Mustafa T, Sattar AR, Khan MI (2010) Data mining model for higher education system. Eur J Sci Res 43(1):24–29 Ayesha S, Mustafa T, Sattar AR, Khan MI (2010) Data mining model for higher education system. Eur J Sci Res 43(1):24–29
35.
go back to reference Sanderson M, Zobel J (2009) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 162–169 Sanderson M, Zobel J (2009) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 162–169
36.
go back to reference Antoun C, Zhang C, Conrad FG, Schober MF, Comparisons of online recruitment strategies for convenience samples: craigslist, Google AdWords, Facebook, and Amazon mechanical turk 28(3):231–246 Antoun C, Zhang C, Conrad FG, Schober MF, Comparisons of online recruitment strategies for convenience samples: craigslist, Google AdWords, Facebook, and Amazon mechanical turk 28(3):231–246
37.
go back to reference Türegü N (2018) Text mining in financial information. In: Current analysis on economics & finance, pp 18–26 Türegü N (2018) Text mining in financial information. In: Current analysis on economics & finance, pp 18–26
38.
go back to reference Mcinnes BT, Stevenson M (2014) Determining the difficulty of word sense disambiguation. J Biomed Inform 47:83–90CrossRef Mcinnes BT, Stevenson M (2014) Determining the difficulty of word sense disambiguation. J Biomed Inform 47:83–90CrossRef
39.
go back to reference King G, Lam P, Roberts M (2014) Computer-assisted keyword and document set discovery from unstructured text 456 King G, Lam P, Roberts M (2014) Computer-assisted keyword and document set discovery from unstructured text 456
40.
go back to reference Wen Z, Yoshida T, Tang X (2007) A study with multi-word feature with text classification. In: Proceedings of the 51st annual meeting of the ISSS-2007, Tokyo, Japan, vol 51, p 45 Wen Z, Yoshida T, Tang X (2007) A study with multi-word feature with text classification. In: Proceedings of the 51st annual meeting of the ISSS-2007, Tokyo, Japan, vol 51, p 45
Metadata
Title
Techniques, Applications, and Issues in Mining Large-Scale Text Databases
Authors
Sandhya Avasthi
Ritu Chauhan
Debi P. Acharjya
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-5421-6_39