Skip to main content

2019 | OriginalPaper | Buchkapitel

Statistical Processing of Stopwords on SNS

verfasst von : Yuta Nezu, Takao Miura

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For the purpose of text classification or information retrieval, we apply preprocessing to these texts such as stemming and stopwords removal. Almost all the techniques could be useful only to well-formed text information like textbooks and news articles, but is not true to social network services (SNS) or any other texts in internet world. In this investigation, we propose how to extract stopwords in context of social network services. To do that, first we discuss what stopwords mean, how different from conventional ones, and we propose statistical filters TFIG and TFCHI, to identify. We examine categorical estimation to extract characteristic values putting our attention on Kullback Leibler Divergence (KLD) over temporal sequences on SNS data. Moreover we apply several preprocessing to manage unknown words and to improve morphological analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
One exception is predicate. In fact, the predicate should appear as a last verb in each sentence.
 
2
Morphological analysis means both word segmentation and part of speech processing in Japanese. For example, "sumomo/mo/momo/mo/momo/no/uchi" means Both Plum and Peach are same kind of Peach, which is a typical tongue twister where you should say “mo” many times. There are two nouns “sumomo” (plum) and “momo” (peach). There is no delimiter between words (no space, no comma, and no thrash) and everything goes into one string as “sumomomomomomomomonouchi”.
 
4
We say 1/IG instead of IG because we like to smaller value better. So is true for 1/CHI.
 
5
For instance, when collecting twitter documents by giving a keyword "Home Alone", then we give a class “Home Alone” to the collection.
 
Literatur
1.
Zurück zum Zitat Manning, C., Raghavan, P.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)CrossRef Manning, C., Raghavan, P.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)CrossRef
4.
Zurück zum Zitat Saif, H., Fernandez, M., Alani, H.: Automatic stopword generation using contextual semantics for sentiment analysis of Twitter. In: The 13th International Semantic Web Conference (ISCW) (2014) Saif, H., Fernandez, M., Alani, H.: Automatic stopword generation using contextual semantics for sentiment analysis of Twitter. In: The 13th International Semantic Web Conference (ISCW) (2014)
5.
Zurück zum Zitat Sonoda, T., Miura, T.: Mining Japanese collocation by statistical indicators. In: 15th International Conference on Enterprise Information Systems (ICEIS), Angers, France (2013) Sonoda, T., Miura, T.: Mining Japanese collocation by statistical indicators. In: 15th International Conference on Enterprise Information Systems (ICEIS), Angers, France (2013)
6.
Zurück zum Zitat Yang, Y., Pedersen, J.O. : A comparative study on feature selection in text categorization. In: Proceedings of International Conference on Machine Learning (ICML), pp. 412–420 (1997) Yang, Y., Pedersen, J.O. : A comparative study on feature selection in text categorization. In: Proceedings of International Conference on Machine Learning (ICML), pp. 412–420 (1997)
7.
Zurück zum Zitat Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic construction of Chinese stop word list. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science (2006) Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic construction of Chinese stop word list. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science (2006)
8.
Zurück zum Zitat Nezu, Y., Miura, T.: Extracting stopwords on social network service. In: The 29th International Conference on information Modelling and Knowledge Bases (EJC) (2019) Nezu, Y., Miura, T.: Extracting stopwords on social network service. In: The 29th International Conference on information Modelling and Knowledge Bases (EJC) (2019)
Metadaten
Titel
Statistical Processing of Stopwords on SNS
verfasst von
Yuta Nezu
Takao Miura
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-27615-7_9

Premium Partner