Skip to main content
Top

2018 | OriginalPaper | Chapter

Text Document Analysis Using Map-Reduce Framework

Authors : K. V. Kanimozhi, P. Prabhavathy, M. Venkatesan

Published in: Advanced Computational and Communication Paradigms

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the advance Internet and increasing globalization, the electronics forms of information grow in a rapid manner. Extracting the useful hidden information from those multiple documents is a recent challenge. Hence, efficient and automated clustering algorithm which is effective in identifying topics plays the main role in information retrieval. In this paper, the analysis regarding the large unstructured text document corpus using our proposed map-reduce algorithm has been performed, and the results show the advantage of the proposed method by detecting clusters of document features within less computation time and provides premier solution for increasing the precision rate of retrieval in information extraction.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wei, C.-P., Yang, C.C., Lin, C.-M.: A latent semantic indexing—approach to multilingual document clustering. Sci. Direct. Decis. Support Syst. 45, 606–620 (2008)CrossRef Wei, C.-P., Yang, C.C., Lin, C.-M.: A latent semantic indexing—approach to multilingual document clustering. Sci. Direct. Decis. Support Syst. 45, 606–620 (2008)CrossRef
2.
go back to reference Clifton, C., Cooley, R., Rennie, J.: TopCat: data mining for topic identification in text corpus. IEEE Trans. Knowl. Data Eng. 16(8) (2003)CrossRef Clifton, C., Cooley, R., Rennie, J.: TopCat: data mining for topic identification in text corpus. IEEE Trans. Knowl. Data Eng. 16(8) (2003)CrossRef
3.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
4.
go back to reference Nagwani, N.K.: Summarizing large text collection using topic modeling and clustering based on Map Reduce framework. J. Big Data, 2–6 (2015) Nagwani, N.K.: Summarizing large text collection using topic modeling and clustering based on Map Reduce framework. J. Big Data, 2–6 (2015)
5.
go back to reference Kontostathis, A., Pottenger, W.M.: A framework for understanding latent semantic indexing (LSI) performance. Inf. Process. Manag., 56–73 (2006)CrossRef Kontostathis, A., Pottenger, W.M.: A framework for understanding latent semantic indexing (LSI) performance. Inf. Process. Manag., 56–73 (2006)CrossRef
6.
go back to reference Xie, P., Xing, E.P.: Integrating document clustering and topic modeling. In: Proceedings of the Twenty Ninth Conference on Uncertainty in Artificial Intelligence (2013). UAI-P-2013-PG-694-703 Xie, P., Xing, E.P.: Integrating document clustering and topic modeling. In: Proceedings of the Twenty Ninth Conference on Uncertainty in Artificial Intelligence (2013). UAI-P-2013-PG-694-703
7.
go back to reference Ferrara, E., JafariAsbagh, M., Varol1, O., Qazvinian, V., Menczer, F., Flammini, A.: Clustering memes in social media (2013) Ferrara, E., JafariAsbagh, M., Varol1, O., Qazvinian, V., Menczer, F., Flammini, A.: Clustering memes in social media (2013)
8.
go back to reference Goldszmidt, M., Sahami, M.: A probabilistic approach to full text document clustering. Technical Report, Stanford Info Lab. Digital Libraries, 1998 Goldszmidt, M., Sahami, M.: A probabilistic approach to full text document clustering. Technical Report, Stanford Info Lab. Digital Libraries, 1998
11.
go back to reference Zhao, Y., Chen, Y., Liang, Z., Yuan, S., Li, Y.: Big Data processing with probabilistic latent semantic analysis on Map Reduce. In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE (2014). https://doi.org/10.1109/cyberc.2014.37. 978-1-4799-6236-5/14 Zhao, Y., Chen, Y., Liang, Z., Yuan, S., Li, Y.: Big Data processing with probabilistic latent semantic analysis on Map Reduce. In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE (2014). https://​doi.​org/​10.​1109/​cyberc.​2014.​37. 978-1-4799-6236-5/14
14.
go back to reference Sadeghian, A.H., Nezamabadi-pour, H.: Document cluster ing using gravitational ensemble clustering. In: International Symposium on Artificial Intelligence and signal processing. IEEE (2015). 978-1-4799-8818-1/15/$31.00 Sadeghian, A.H., Nezamabadi-pour, H.: Document cluster ing using gravitational ensemble clustering. In: International Symposium on Artificial Intelligence and signal processing. IEEE (2015). 978-1-4799-8818-1/15/$31.00
15.
go back to reference Kanimozhi, K.V., Venkatesan, M.: Survey on text clustering techniques. Adv. Res. Electr. Electron. Eng. 2(12), 55–58 (2015) Kanimozhi, K.V., Venkatesan, M.: Survey on text clustering techniques. Adv. Res. Electr. Electron. Eng. 2(12), 55–58 (2015)
17.
go back to reference Kanimozhi, K.V., Venkatesan, M.: Big text datasets clustering based on frequent item sets—a Survey. Int. J. Innov. Res. Sci. Engi. 2(5) (2016) Kanimozhi, K.V., Venkatesan, M.: Big text datasets clustering based on frequent item sets—a Survey. Int. J. Innov. Res. Sci. Engi. 2(5) (2016)
18.
go back to reference Kanimozhi, K.V., Venkatesan, M.: A novel Map-Reduce based augmented clustering algorithm for big text datasets. In: Data Engineering and Intelligent Computing: Proceedings of IC3T 2016, 2017 May 31, vol. 542, p. 427. Springer (2017) Kanimozhi, K.V., Venkatesan, M.: A novel Map-Reduce based augmented clustering algorithm for big text datasets. In: Data Engineering and Intelligent Computing: Proceedings of IC3T 2016, 2017 May 31, vol. 542, p. 427. Springer (2017)
Metadata
Title
Text Document Analysis Using Map-Reduce Framework
Authors
K. V. Kanimozhi
P. Prabhavathy
M. Venkatesan
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8237-5_57