Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2016

01-08-2016

Improving the prediction of page access by using semantically enhanced clustering

Authors: Erman Sen, I. Hakki Toroslu, Pinar Karagoz

Published in: Journal of Intelligent Information Systems | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

There are many parameters that may affect the navigation behaviour of web users. Prediction of the potential next page that may be visited by the web user is important, since this information can be used for prefetching or personalization of the page for that user. One of the successful methods for the determination of the next web page is to construct behaviour models of the users by clustering. The success of clustering is highly correlated with the similarity measure that is used for calculating the similarity among navigation sequences. This work proposes a new approach for determining the next web page by extending the standard clustering with the content-based semantic similarity method. Semantics of web-pages are represented as sets of concepts, and thus, user session are modelled as sequence of sets. As a result, session similarity is defined as an alignment of two sequences of sets. The success of the proposed method has been shown through applying it on real life web log data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
CLUTO (scluster, gcluto)—a cross-platform for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters, http://​glaros.​dtc.​umn.​edu/​gkhome/​cluto/​cluto/​overview
 
2
One year of access web-logs of METU C.Eng. website (http://​www.​ceng.​metu.​edu.​tr)
 
Literature
go back to reference Batet, M., Erola, A., Sanchez, D., & Castella-Roca, J. (2013). Utility preserving query log anonymization via semantic microaggregation. Information Sciences, 242, 49–63.CrossRef Batet, M., Erola, A., Sanchez, D., & Castella-Roca, J. (2013). Utility preserving query log anonymization via semantic microaggregation. Information Sciences, 242, 49–63.CrossRef
go back to reference Bayir, M., Toroslu, I., Cosar, A., & Fidan, G. (2009). Smart miner: a new framework for mining large scale web usage data. In International conference in World Wide Web (pp. 161–170). Bayir, M., Toroslu, I., Cosar, A., & Fidan, G. (2009). Smart miner: a new framework for mining large scale web usage data. In International conference in World Wide Web (pp. 161–170).
go back to reference Bayir, M., Toroslu, I., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.CrossRef Bayir, M., Toroslu, I., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.CrossRef
go back to reference Berendt, B. (2000a). Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9, 56–75. Berendt, B. (2000a). Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9, 56–75.
go back to reference Berendt, B. (2000b). Web usage mining, site semantics and the support of navigation. In Web mining for e-commerce—challenges and opportunities workshop (WEBKDD). Berendt, B. (2000b). Web usage mining, site semantics and the support of navigation. In Web mining for e-commerce—challenges and opportunities workshop (WEBKDD).
go back to reference Berendt, B. (2001). Understanding web usage at different levels of abstraction: coarsening and visualizing sequence. In WEBKDD Workshop of mining log data across all customer touch points. Berendt, B. (2001). Understanding web usage at different levels of abstraction: coarsening and visualizing sequence. In WEBKDD Workshop of mining log data across all customer touch points.
go back to reference Blanco, L., Dalvi, N., & Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In 20th international conference on world wide web (WWW) (pp. 443– 446). Blanco, L., Dalvi, N., & Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In 20th international conference on world wide web (WWW) (pp. 443– 446).
go back to reference Dai, H., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. In PKDD workshop on semantic mining. Dai, H., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. In PKDD workshop on semantic mining.
go back to reference Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.CrossRef Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.CrossRef
go back to reference Eirinaki, M., Vazigiannis, M., & Varlamis, I. (2003). Sewep: using site semantics and a taxonomy to enhance the web personalization process. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108). Eirinaki, M., Vazigiannis, M., & Varlamis, I. (2003). Sewep: using site semantics and a taxonomy to enhance the web personalization process. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108).
go back to reference Gunel, B., & Senkul, P. (2012a). Integrating semantic tagging with popularity based pagerank for next page prediction. In International symposium on computer and information sciences (ISCIS). Gunel, B., & Senkul, P. (2012a). Integrating semantic tagging with popularity based pagerank for next page prediction. In International symposium on computer and information sciences (ISCIS).
go back to reference Gunel, B., & Senkul, P. (2012b). Investigating the effect of duration, page size end frequency on next page recommendation with pagerank algorithm. In WSDM Workshop on web search and click data (WSCD). Gunel, B., & Senkul, P. (2012b). Investigating the effect of duration, page size end frequency on next page recommendation with pagerank algorithm. In WSDM Workshop on web search and click data (WSCD).
go back to reference Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. Journal of Biomedical Informatics, 48, 38–53.CrossRef Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. Journal of Biomedical Informatics, 48, 38–53.CrossRef
go back to reference Heflin, J., Hendler, J., & Luke, S. (1999). Shoe: a knowledge representation language for internet applications. CS-TR-4078 (UMACS TR-99-71), University of Maryland, Dept. of Computer Sciences. Heflin, J., Hendler, J., & Luke, S. (1999). Shoe: a knowledge representation language for internet applications. CS-TR-4078 (UMACS TR-99-71), University of Maryland, Dept. of Computer Sciences.
go back to reference Kilic, S., Senkul, P., & Toroslu, I.H. (2012). Clustering frequent navigation patterns from website logs by using ontology and temporal information. In International symposium on computer and information sciences (ISCIS) (pp. 363–370). Kilic, S., Senkul, P., & Toroslu, I.H. (2012). Clustering frequent navigation patterns from website logs by using ontology and temporal information. In International symposium on computer and information sciences (ISCIS) (pp. 363–370).
go back to reference Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web through usage-based clustering of urls. In IEEE Knowledge and data engineering exchange workshop. Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web through usage-based clustering of urls. In IEEE Knowledge and data engineering exchange workshop.
go back to reference Mobasher, B., Cooley, R., & Srivastava, J. (2000a). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151. Mobasher, B., Cooley, R., & Srivastava, J. (2000a). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.
go back to reference Mobasher, B., Dai, H., Luo, T., Nakagawa, M., Yuqing, S., & Wiltshire, J. (2000b). Discovery of aggregate usage profiles for web personalization. In WEBKDD workshop on web mining for e-commerce. Mobasher, B., Dai, H., Luo, T., Nakagawa, M., Yuqing, S., & Wiltshire, J. (2000b). Discovery of aggregate usage profiles for web personalization. In WEBKDD workshop on web mining for e-commerce.
go back to reference Mobasher, B., Dai, H., Luo, T., Yuqing, S., & Zhu, J. (2000c). Integrating web usage and content mining for more effective personalization. In International conference on e-commerce and web technologies (ECWeb). Mobasher, B., Dai, H., Luo, T., Yuqing, S., & Zhu, J. (2000c). Integrating web usage and content mining for more effective personalization. In International conference on e-commerce and web technologies (ECWeb).
go back to reference Needleman, S., & Wunsch, C. (1970). A general method applicable to search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453.CrossRef Needleman, S., & Wunsch, C. (1970). A general method applicable to search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453.CrossRef
go back to reference Pallis, G., Lefteris, A., & Vakali, A. (2007). Validation and interpretation of web users’ session clusters. Information Processing and Managament, 43(5), 1348–1367.CrossRef Pallis, G., Lefteris, A., & Vakali, A. (2007). Validation and interpretation of web users’ session clusters. Information Processing and Managament, 43(5), 1348–1367.CrossRef
go back to reference Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: automatically synthesizing web pages. In National conference on artificial intelligence. Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: automatically synthesizing web pages. In National conference on artificial intelligence.
go back to reference Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: conceptual cluster mining. In International joint conference on artificial intelligence (IJCAI). Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: conceptual cluster mining. In International joint conference on artificial intelligence (IJCAI).
go back to reference Perkowitz, M., & Etzioni, O. (2000). Towards adaptive web sites: conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275.CrossRefMATH Perkowitz, M., & Etzioni, O. (2000). Towards adaptive web sites: conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275.CrossRefMATH
go back to reference Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.CrossRef Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.CrossRef
go back to reference Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 17–30.CrossRef Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 17–30.CrossRef
go back to reference Ricklefs, M., & Blomqvist, E. (2008). Ontology-based relevance assesment: an evaluation of different semantic similarity measures. In On the move (OTM) confederated international conferences (coopIS) (pp. 1235–1252). Ricklefs, M., & Blomqvist, E. (2008). Ontology-based relevance assesment: an evaluation of different semantic similarity measures. In On the move (OTM) confederated international conferences (coopIS) (pp. 1235–1252).
go back to reference Sanchez, D., Batet, M., Isem, D., & Valls, A. (2012). Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.CrossRef Sanchez, D., Batet, M., Isem, D., & Valls, A. (2012). Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.CrossRef
go back to reference Senkul, P., & Salin, S. (2012). Improving pattern quality in web usage mining by using semantic information. Knowledge and Information Systems, 30, 527–541.CrossRef Senkul, P., & Salin, S. (2012). Improving pattern quality in web usage mining by using semantic information. Knowledge and Information Systems, 30, 527–541.CrossRef
go back to reference Spiliopolou, M. (2000). Web usage mining for web site evaluation. Communications of the ACM, 43(8), 127–134.CrossRef Spiliopolou, M. (2000). Web usage mining for web site evaluation. Communications of the ACM, 43(8), 127–134.CrossRef
go back to reference Spiliopoulou, M., & Faulstich, L. (1998). Wum: a web utilization miner. In International workshop on the web and databases. Spiliopoulou, M., & Faulstich, L. (1998). Wum: a web utilization miner. In International workshop on the web and databases.
go back to reference Spiliopoulou, M., Faulstich, L., & Wilkler, K. (1999). A data miner analyzing the navigational behaviour of web users. In ACAI workshop on machine learning in user modeling. Spiliopoulou, M., Faulstich, L., & Wilkler, K. (1999). A data miner analyzing the navigational behaviour of web users. In ACAI workshop on machine learning in user modeling.
go back to reference Thwe, P. (2014). Web page access prediction based on integrated approach. International Journal of Computer Science and Business Informatics, 12(1), 55–64. Thwe, P. (2014). Web page access prediction based on integrated approach. International Journal of Computer Science and Business Informatics, 12(1), 55–64.
go back to reference Varelas, G., Voutsakis, E., Raftapoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In International workshop on web information and data management (WIDM) (pp. 10–16). Varelas, G., Voutsakis, E., Raftapoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In International workshop on web information and data management (WIDM) (pp. 10–16).
go back to reference Zhao, Y., & Karypis, G. (2004). Emprical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55, 311–331.CrossRefMATH Zhao, Y., & Karypis, G. (2004). Emprical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55, 311–331.CrossRefMATH
Metadata
Title
Improving the prediction of page access by using semantically enhanced clustering
Authors
Erman Sen
I. Hakki Toroslu
Pinar Karagoz
Publication date
01-08-2016
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2016
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-016-0398-3

Other articles of this Issue 1/2016

Journal of Intelligent Information Systems 1/2016 Go to the issue

Premium Partner