nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

Statistical Processing of Stopwords on SNS

verfasst von : Yuta Nezu, Takao Miura

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

For the purpose of text classification or information retrieval, we apply preprocessing to these texts such as stemming and stopwords removal. Almost all the techniques could be useful only to well-formed text information like textbooks and news articles, but is not true to social network services (SNS) or any other texts in internet world. In this investigation, we propose how to extract stopwords in context of social network services. To do that, first we discuss what stopwords mean, how different from conventional ones, and we propose statistical filters TFIG and TFCHI, to identify. We examine categorical estimation to extract characteristic values putting our attention on Kullback Leibler Divergence (KLD) over temporal sequences on SNS data. Moreover we apply several preprocessing to manage unknown words and to improve morphological analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel HGraph: A Connected-Partition Approach to Proximity Graphs for Similarity Search

Nächstes Kapitel Multiple Choice Question Answering in the Legal Domain Using Reinforced Co-occurrence

One exception is predicate. In fact, the predicate should appear as a last verb in each sentence.

Morphological analysis means both word segmentation and part of speech processing in Japanese. For example, "sumomo/mo/momo/mo/momo/no/uchi" means Both Plum and Peach are same kind of Peach, which is a typical tongue twister where you should say “mo” many times. There are two nouns “sumomo” (plum) and “momo” (peach). There is no delimiter between words (no space, no comma, and no thrash) and everything goes into one string as “sumomomomomomomomonouchi”.

See https://twitter.com/?lang=ja.

We say 1/IG instead of IG because we like to smaller value better. So is true for 1/CHI.

For instance, when collecting twitter documents by giving a keyword "Home Alone", then we give a class “Home Alone” to the collection.

Manning, C., Raghavan, P.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)CrossRef

Bouge, K.: https://sites.google.com/site/kevinbouge/stopwordslists/stopwordsja-txt. Accessed 28 Dec 2017

slothlib - Revision 77. http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/-SlothLib/NLP/Filter/StopWord/word/Japanese.txt. Accessed 19 Jan 2018

Saif, H., Fernandez, M., Alani, H.: Automatic stopword generation using contextual semantics for sentiment analysis of Twitter. In: The 13th International Semantic Web Conference (ISCW) (2014)

Sonoda, T., Miura, T.: Mining Japanese collocation by statistical indicators. In: 15th International Conference on Enterprise Information Systems (ICEIS), Angers, France (2013)

Yang, Y., Pedersen, J.O. : A comparative study on feature selection in text categorization. In: Proceedings of International Conference on Machine Learning (ICML), pp. 412–420 (1997)

Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic construction of Chinese stop word list. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science (2006)

Nezu, Y., Miura, T.: Extracting stopwords on social network service. In: The 29th International Conference on information Modelling and Knowledge Bases (EJC) (2019)

Titel: Statistical Processing of Stopwords on SNS
verfasst von: Yuta Nezu
Takao Miura
Verlag: Springer International Publishing
Buch: Database and Expert Systems Applications
Print ISBN: 978-3-030-27614-0

Electronic ISBN: 978-3-030-27615-7

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-27615-7_9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner