Skip to main content
Top

2018 | OriginalPaper | Chapter

Web Usage Data Cleaning

A Rule-Based Approach for Weblog Data Cleaning

Authors : Amine Ganibardi, Chérif Arab Ali

Published in: Big Data Analytics and Knowledge Discovery

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user behavior, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics, based on the requested resource attribute of the weblog database. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with dynamic and responsive web constraints, this contribution introduces a rule-based cleaning method focused on the logging structure rules. The rule-based cleaning method experimentation demonstrates significant advantages compared to the content-centric methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newsl. 1(2), 12–23 (2000)CrossRef Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newsl. 1(2), 12–23 (2000)CrossRef
2.
go back to reference Srivastava, M., Garg, R., Mishra, P.K.: Preprocessing techniques in web usage mining: a survey. Int. J. Comput. Appl. 97(18), 1–9 (2014) Srivastava, M., Garg, R., Mishra, P.K.: Preprocessing techniques in web usage mining: a survey. Int. J. Comput. Appl. 97(18), 1–9 (2014)
4.
go back to reference Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)CrossRef Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)CrossRef
5.
go back to reference Langhnoja, S., Barot, M., Mehta, D.: Pre-processing: procedure on web log file for web usage mining. Int. J. Emerg. Technol. 2(12), 5 (2012) Langhnoja, S., Barot, M., Mehta, D.: Pre-processing: procedure on web log file for web usage mining. Int. J. Emerg. Technol. 2(12), 5 (2012)
6.
go back to reference Chitraa, V., Thanamani, D.A.S.: Web log data cleaning for enhancing mining process. Int. J. Commun. Comput. Technol. 01(03), 7 (2012) Chitraa, V., Thanamani, D.A.S.: Web log data cleaning for enhancing mining process. Int. J. Commun. Comput. Technol. 01(03), 7 (2012)
7.
go back to reference Srivastava, J., Desikan, P., Kumar, V.: Web mining: Accomplishments and future directions. In: National Science Foundation Workshop on Next Generation Data Mining (NGDM 2002), pp. 51–56 (2002) Srivastava, J., Desikan, P., Kumar, V.: Web mining: Accomplishments and future directions. In: National Science Foundation Workshop on Next Generation Data Mining (NGDM 2002), pp. 51–56 (2002)
8.
go back to reference Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: systematization and critical review. J. Intell. Inf. Syst. 28(1), 79–104 (2007)CrossRef Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: systematization and critical review. J. Intell. Inf. Syst. 28(1), 79–104 (2007)CrossRef
9.
go back to reference Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web-usage analysis. Informs J. Comput. 15(2), 171–190 (2003)CrossRef Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web-usage analysis. Informs J. Comput. 15(2), 171–190 (2003)CrossRef
10.
go back to reference Pabarskaite, Z.: Implementing advanced cleaning and end-user interpretability technologies in web log mining. In: 2002 Proceedings of the 24th International Conference on Information Technology Interfaces, ITI 2002, pp. 109–113 (2002) Pabarskaite, Z.: Implementing advanced cleaning and end-user interpretability technologies in web log mining. In: 2002 Proceedings of the 24th International Conference on Information Technology Interfaces, ITI 2002, pp. 109–113 (2002)
11.
go back to reference Dhandi, M., Chakrawarti, R.K.: A comprehensive study of web usage mining, pp. 1–5 (2016) Dhandi, M., Chakrawarti, R.K.: A comprehensive study of web usage mining, pp. 1–5 (2016)
12.
go back to reference Srinivas, A.V.: A survey on preprocessing of web-log data in web usage mining. Int. J. Modern Trends Sci. Technol. 03(02), 35–41 (2017) Srinivas, A.V.: A survey on preprocessing of web-log data in web usage mining. Int. J. Modern Trends Sci. Technol. 03(02), 35–41 (2017)
13.
go back to reference Zhang, Q., Segall, R.S.: Web mining: a survey of current research, techniques, and software. Int. J. Inf. Technol. Decis. Making 7(04), 683–720 (2008)CrossRef Zhang, Q., Segall, R.S.: Web mining: a survey of current research, techniques, and software. Int. J. Inf. Technol. Decis. Making 7(04), 683–720 (2008)CrossRef
14.
go back to reference Spiliopoulou, M.: Web usage mining for Web site evaluation. Commun. ACM 43(8), 127–134 (2000)CrossRef Spiliopoulou, M.: Web usage mining for Web site evaluation. Commun. ACM 43(8), 127–134 (2000)CrossRef
15.
go back to reference Zahran, D.I., Al-Nuaim, H.A., Rutter, M.J., Benyon, D.: A comparative approach to web evaluation and website evaluation methods. Int. J. Pub. Inf. Syst. 10(1), 21–39 (2014) Zahran, D.I., Al-Nuaim, H.A., Rutter, M.J., Benyon, D.: A comparative approach to web evaluation and website evaluation methods. Int. J. Pub. Inf. Syst. 10(1), 21–39 (2014)
Metadata
Title
Web Usage Data Cleaning
Authors
Amine Ganibardi
Chérif Arab Ali
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-98539-8_15

Premium Partner