nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining

verfasst von : Ting Huang, Ruizhang Huang, Bowei Liu, Yingying Yan

Erschienen in: Web and Big Data

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we present a web content extraction method to extract different types of informative web content for news web pages. A fuzzy sequential pattern mining method, namely FSP, is developed to gradually discover fuzzy sequential patterns for various types of informative web content. To avoid the situation that the usage of HTML tags may be changed with the development of web technology, fuzzy sequential patterns are mined using a stable feature, in particular, the number of tokens in each line of source code. We have conducted extensive experiments and good clustering properties for the discovered sequential patterns are observed. Experimental results demonstrate that the FSP method is effective compared with state-of-the-art content extraction methods. Besides main articles of web pages, it can also find other types interesting web content such as article recommendations and article titles effectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds

Nächstes Kapitel Exploiting High Utility Occupancy Patterns

Gottron, T.: Content code blurring: a new approach to content extraction. In: 19th International Workshop on Database and Expert Systems Application, DEXA 2008, pp. 29–33. IEEE (2008)

Kohlschütter, C., Nejdl, W.: A densitometric approach to web page segmentation. In: Proceedings of 17th ACM Conference on Information and Knowledge Management, pp. 1173–1182. ACM (2008)

Liu, Y., Zheng, Y.F.: One-against-all multi-class SVM classification using reliability measures. In: Proceedings of 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005, vol. 2, pp. 849–854. IEEE (2005)

Popela, T.: Implementace algoritmu pro vizualni segmentaci www stranek. In: Master’s thesis, BRNO University of Technology (2012)

Sun, F., Song, D., Liao, L.: DOM based content extraction via text density. In: Proceedings of 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 245–254. ACM (2011)

Weninger, T., Hsu, W.H., Han, J.: CETR: content extraction via tag ratios. In: Proceedings of 19th International Conference on World Wide Web, pp. 971–980. ACM (2010)

Titel: Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining
verfasst von: Ting Huang
Ruizhang Huang
Bowei Liu
Yingying Yan
Verlag: Springer International Publishing
Buch: Web and Big Data
Print ISBN: 978-3-319-63578-1

Electronic ISBN: 978-3-319-63579-8

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-63579-8_18

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"