Skip to main content
Top
Published in: International Journal on Digital Libraries 3-4/2014

01-08-2014

Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection

Authors: Jeremy Steinhauer, Lois M. L. Delcambre, Marianne Lykke, Marit Kristine Ådland

Published in: International Journal on Digital Libraries | Issue 3-4/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We seek to improve information retrieval in a domain-specific collection by clustering user sessions from a click log and then classifying later user sessions in real time. As a preliminary step, we explore the main assumption of this approach: whether user sessions in such a site are related to the question that they are answering. Since a large class of machine learning algorithms use a distance measure at the core, we evaluate the suitability of common machine learning distance measures to distinguish sessions of users searching for the answer to same or different questions. We found that two distance measures work very well for our task and three others do not. As a further step, we then investigate how effective the distance measures are when used in clustering. For our dataset, we conducted a user study where we had multiple users answer the same set of questions. This data, grouped by question, was used as our gold standard for evaluating the clusters produced by the clustering algorithms. We found that the observed difference between the two classes of distance measures affected the quality of the clusterings, as expected. We also found that one of the two distance measures that worked well to differentiate sessions, worked significantly better than the other when clustering. Finally, we discuss why some distance metrics performed better than others in the two parts of our work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’00, pp. 407–416. ACM, New York (2000). doi:10.1145/347090.347176 Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’00, pp. 407–416. ACM, New York (2000). doi:10.​1145/​347090.​347176
2.
go back to reference Castellano, G., Fanelli, A.M., Torsello, M.A.: Mining usage profiles from access data using fuzzy clustering. In: Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization. SMO’06, pp. 157–160. World Scientific and Engineering Academy and Society (WSEAS), Wisconsin (2006). http://dl.acm.org/citation.cfm?id=1369472.1369500. Accessed 12 May 2014 Castellano, G., Fanelli, A.M., Torsello, M.A.: Mining usage profiles from access data using fuzzy clustering. In: Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization. SMO’06, pp. 157–160. World Scientific and Engineering Academy and Society (WSEAS), Wisconsin (2006). http://​dl.​acm.​org/​citation.​cfm?​id=​1369472.​1369500. Accessed 12 May 2014
3.
go back to reference Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions and the web. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’01, pp. 490–497. ACM, New York (2001). doi:10.1145/365024.365325 Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions and the web. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’01, pp. 490–497. ACM, New York (2001). doi:10.​1145/​365024.​365325
4.
go back to reference Fu, Y., Sandhu, K., Shih, M.Y.: Clustering of web users based on access patterns. In: Proceedings of the 1999 KDD Workshop on Web Mining. Springer, Berlin (1999) Fu, Y., Sandhu, K., Shih, M.Y.: Clustering of web users based on access patterns. In: Proceedings of the 1999 KDD Workshop on Web Mining. Springer, Berlin (1999)
6.
9.
go back to reference Wang, W., Zaïane, O.R.: Clustering web sessions by sequence alignment. In: Proceedings of the 13th international workshop on database and expert systems applications (DEXA 2002). Aix-en-Provence, pp. 394–398. Springer, Berlin (2002) Wang, W., Zaïane, O.R.: Clustering web sessions by sequence alignment. In: Proceedings of the 13th international workshop on database and expert systems applications (DEXA 2002). Aix-en-Provence, pp. 394–398. Springer, Berlin (2002)
10.
go back to reference Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of the Fifth International World Wide Web Conference on Computer Networks and ISDN Systems. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp.1007–1014 (1996). http://dl.acm.org/citation.cfm?id=232710.232725. Accessed 12 May 2014 Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of the Fifth International World Wide Web Conference on Computer Networks and ISDN Systems. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp.1007–1014 (1996). http://​dl.​acm.​org/​citation.​cfm?​id=​232710.​232725. Accessed 12 May 2014
11.
go back to reference Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Inform. Storage Retr. 7, 217–240 (1971) Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Inform. Storage Retr. 7, 217–240 (1971)
12.
go back to reference Voorhees, E.M.: The cluster hypothesis revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’85, pp. 188–196. ACM, New York (1985). doi:10.1145/253495.253524 Voorhees, E.M.: The cluster hypothesis revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’85, pp. 188–196. ACM, New York (1985). doi:10.​1145/​253495.​253524
13.
go back to reference Steinhauer, J., Delcambre, L.M.L., Lykke, M., Aadland, M.K.: Do user (browse and click) sessions relate to their questions in a domain-specific collection? In: Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, vol. 8092, pp. 96–107. Springer, Berlin, Heidelberg (2013) Steinhauer, J., Delcambre, L.M.L., Lykke, M., Aadland, M.K.: Do user (browse and click) sessions relate to their questions in a domain-specific collection? In: Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, vol. 8092, pp. 96–107. Springer, Berlin, Heidelberg (2013)
14.
go back to reference Strehl, A. Strehl, E., Ghosh, J. Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search, AAAI 2000, pp. 58–64 (2000) Strehl, A. Strehl, E., Ghosh, J. Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search, AAAI 2000, pp. 58–64 (2000)
15.
go back to reference Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000) Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000)
16.
go back to reference Ageev, M., Guo, Q., Lagun, D., Agichtein, E.: Find it if you can: a game for modeling different types of web search success using interaction data. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’11, pp. 345–354. ACM, New York (2011). doi:10.1145/2009916.2009965 Ageev, M., Guo, Q., Lagun, D., Agichtein, E.: Find it if you can: a game for modeling different types of web search success using interaction data. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’11, pp. 345–354. ACM, New York (2011). doi:10.​1145/​2009916.​2009965
17.
go back to reference Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Min. Knowl. Discov. 6, 61–82 (2002) Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Min. Knowl. Discov. 6, 61–82 (2002)
18.
go back to reference Buhrmester M., Kwang T., Gosling S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1):3–5 (2011). doi:10.1177/1745691610393980 Buhrmester M., Kwang T., Gosling S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1):3–5 (2011). doi:10.​1177/​1745691610393980​
20.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.​1145/​1656274.​1656278
21.
go back to reference Achtert, E., Goldhofer, S., Kriegel, H.P., Schubert, E., Zimek, A.: Evaluation of clusterings—metrics and visual support. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1285–1288 (2012) Achtert, E., Goldhofer, S., Kriegel, H.P., Schubert, E., Zimek, A.: Evaluation of clusterings—metrics and visual support. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1285–1288 (2012)
Metadata
Title
Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection
Authors
Jeremy Steinhauer
Lois M. L. Delcambre
Marianne Lykke
Marit Kristine Ådland
Publication date
01-08-2014
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 3-4/2014
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-014-0117-z

Other articles of this Issue 3-4/2014

International Journal on Digital Libraries 3-4/2014 Go to the issue

Premium Partner