Skip to main content
Top
Published in: Social Network Analysis and Mining 1/2021

01-12-2021 | Original Article

Text documents streams with improved incremental similarity

Authors: Rui Portocarrero Sarmento, Douglas O. Cardoso, Kemmily Dearo, Pavel Brazdil, João Gama

Published in: Social Network Analysis and Mining | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of community hierarchies in large networks. CoRR abs/0803.0476 (2008) Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of community hierarchies in large networks. CoRR abs/0803.0476 (2008)
go back to reference Carmona Cejudo JM (2013) Nuevas tendencias en fundamentos teóricos aplicaciones de la minería de datos aplicada a la clasificación de textos en lenguaje natural. Ph.D. thesis, U. Salamanca Carmona Cejudo JM (2013) Nuevas tendencias en fundamentos teóricos aplicaciones de la minería de datos aplicada a la clasificación de textos en lenguaje natural. Ph.D. thesis, U. Salamanca
go back to reference Eddelbuettel D (2013) Seamless R and C++ Integration with Rcpp. Springer, New York. ISBN 978-1-4614-6867-7 Eddelbuettel D (2013) Seamless R and C++ Integration with Rcpp. Springer, New York. ISBN 978-1-4614-6867-7
go back to reference Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in r. J Stat Softw 25(5):1–54CrossRef Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in r. J Stat Softw 25(5):1–54CrossRef
go back to reference Feldman R, Sanger J (2006) Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New YorkCrossRef Feldman R, Sanger J (2006) Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New YorkCrossRef
go back to reference Gama J (2010) Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, CaliforniaCrossRef Gama J (2010) Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, CaliforniaCrossRef
go back to reference Iacobucci D (1994) Graphs and Matrices. In: Wasserman S (ed) Social network analysis: methods and applications. Cambridge University Press, New York, pp 92–166CrossRef Iacobucci D (1994) Graphs and Matrices. In: Wasserman S (ed) Social network analysis: methods and applications. Cambridge University Press, New York, pp 92–166CrossRef
go back to reference Sarmento RP, Lemos L, Cordeiro M, Rossetti G, Cardoso D (2019) Dyncomm R package - dynamic community detection for evolving networks. CoRR abs/1905.01498. arXiv:1905.01498 Sarmento RP, Lemos L, Cordeiro M, Rossetti G, Cardoso D (2019) Dyncomm R package - dynamic community detection for evolving networks. CoRR abs/1905.01498. arXiv:​1905.​01498
go back to reference Trigo L, Víta M, Sarmento R, Brazdil P (2015) Retrieval, visualization and validation of affinities between documents. INSTICC (SciTePress), pp 452–459 Trigo L, Víta M, Sarmento R, Brazdil P (2015) Retrieval, visualization and validation of affinities between documents. INSTICC (SciTePress), pp 452–459
go back to reference Wasserman S, Faust K (1994) Social network analysis: Methods and applications, vol 8. Cambridge University Press, CambridgeCrossRef Wasserman S, Faust K (1994) Social network analysis: Methods and applications, vol 8. Cambridge University Press, CambridgeCrossRef
Metadata
Title
Text documents streams with improved incremental similarity
Authors
Rui Portocarrero Sarmento
Douglas O. Cardoso
Kemmily Dearo
Pavel Brazdil
João Gama
Publication date
01-12-2021
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2021
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-021-00826-z

Other articles of this Issue 1/2021

Social Network Analysis and Mining 1/2021 Go to the issue

Premium Partner