Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 1/2023

13-10-2022

Category tree distance: a taxonomy-based transaction distance for web user analysis

Authors: Yinjia Zhang, Qinpei Zhao, Yang Shi, Jiangfeng Li, Weixiong Rao

Published in: Data Mining and Knowledge Discovery | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the emergence of webpage services, huge amounts of customer transaction data are flooded in cyberspace, which are getting more and more useful for profiling users and making recommendations. Since web user transaction data are usually multi-modal, heterogeneous and large-scale, the traditional data analysis methods meet new challenges. One of the challenges is the distance definition on two transaction data or two web users. The distance definition takes an important role in further analysis, such as the cluster analysis or k-nearest neighbor query. We introduce a category tree distance in this paper, which makes use of the product taxonomy information to convert the user transaction data to vectors. Then, the similarity between web users can be evaluated by the vectors from their transaction data. The properties of the distance like upper and lower bounds and the complexity analysis are also given in the paper. To investigate the performance of the proposal, we conduct experiments on real web user transaction data. The results show that the proposed distance outperforms the other distances on user transaction analysis.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Augsten N, Böhlen M, Gamper J (2008) The \(pq\)-gram distance between ordered labeled trees. ACM Trans Database Syst 10(1145/1670243):1670247 Augsten N, Böhlen M, Gamper J (2008) The \(pq\)-gram distance between ordered labeled trees. ACM Trans Database Syst 10(1145/1670243):1670247
go back to reference Blei DM, Jordan MI, Griffiths TL, et al (2003) Hierarchical topic models and the nested Chinese restaurant process. In Proceedings of the 16th international conference on neural information processing systems. MIT Press, Cambridge, MA, USA, NIPS’03, pp 17–24 Blei DM, Jordan MI, Griffiths TL, et al (2003) Hierarchical topic models and the nested Chinese restaurant process. In Proceedings of the 16th international conference on neural information processing systems. MIT Press, Cambridge, MA, USA, NIPS’03, pp 17–24
go back to reference Giannotti F, Gozzi C, Manco G (2002) Clustering transactional data. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery. Springer-Verlag, Berlin, Heidelberg, PKDD ’02, pp 175–187 Giannotti F, Gozzi C, Manco G (2002) Clustering transactional data. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery. Springer-Verlag, Berlin, Heidelberg, PKDD ’02, pp 175–187
go back to reference Gong L, Lin L, Song W et al (2020) JNET: learning User Representations via joint network embedding and topic embedding. Association for Computing Machinery, New York, NY, USA, pp 205–213 Gong L, Lin L, Song W et al (2020) JNET: learning User Representations via joint network embedding and topic embedding. Association for Computing Machinery, New York, NY, USA, pp 205–213
go back to reference Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’16, pp 855–864. https://doi.org/10.1145/2939672.2939754 Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’16, pp 855–864. https://​doi.​org/​10.​1145/​2939672.​2939754
go back to reference Guidotti R, Monreale A, Nanni M, et al (2017) Clustering individual transactional data for masses of users. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 195–204.https://doi.org/10.1145/3097983.3098034 Guidotti R, Monreale A, Nanni M, et al (2017) Clustering individual transactional data for masses of users. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 195–204.https://​doi.​org/​10.​1145/​3097983.​3098034
go back to reference He R, McAuley J (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th international conference on world wide web. In: International world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’16, pp 507–517. https://doi.org/10.1145/2872427.2883037 He R, McAuley J (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th international conference on world wide web. In: International world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’16, pp 507–517. https://​doi.​org/​10.​1145/​2872427.​2883037
go back to reference Ienco D, Pensa RG, Meo R (2012) From context to distance: learning dissimilarity for categorical data clustering. ACM Trans Knowl Discov Data 10(1145/2133360):2133361 Ienco D, Pensa RG, Meo R (2012) From context to distance: learning dissimilarity for categorical data clustering. ACM Trans Knowl Discov Data 10(1145/2133360):2133361
go back to reference Lee H, Im J, Jang S, et al (2019) Melu: Meta-learned user preference estimator for cold-start recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, KDD ’19, pp 1073–1082. https://doi.org/10.1145/3292500.3330859 Lee H, Im J, Jang S, et al (2019) Melu: Meta-learned user preference estimator for cold-start recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, KDD ’19, pp 1073–1082. https://​doi.​org/​10.​1145/​3292500.​3330859
go back to reference Liang, S Zhang, X, Ren Z, Kanoulas E (2018) Dynamic embeddings for user profiling in twitter. Association for Computing Machinery, New York, NY, USA, pp 1764–1773 Liang, S Zhang, X, Ren Z, Kanoulas E (2018) Dynamic embeddings for user profiling in twitter. Association for Computing Machinery, New York, NY, USA, pp 1764–1773
go back to reference Liu X, Song Y, Liu S, et al (2012) Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’12, pp 1433–1441. https://doi.org/10.1145/2339530.2339754 Liu X, Song Y, Liu S, et al (2012) Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’12, pp 1433–1441. https://​doi.​org/​10.​1145/​2339530.​2339754
go back to reference Liu X, Liu Y, Aberer K, et al (2013) Personalized point-of-interest recommendation by mining users’ preference transition. In: Proceedings of the 22nd ACM international conference on information & knowledge management. Association for Computing Machinery, New York, NY, USA, CIKM ’13, pp 733–738. https://doi.org/10.1145/2505515.2505639 Liu X, Liu Y, Aberer K, et al (2013) Personalized point-of-interest recommendation by mining users’ preference transition. In: Proceedings of the 22nd ACM international conference on information & knowledge management. Association for Computing Machinery, New York, NY, USA, CIKM ’13, pp 733–738. https://​doi.​org/​10.​1145/​2505515.​2505639
go back to reference Liu Y, Wei W, Sun A, et al (2014) Exploiting geographical neighborhood characteristics for location recommendation. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. Association for Computing Machinery, New York, NY, USA, CIKM ’14, pp 739–748. https://doi.org/10.1145/2661829.2662002 Liu Y, Wei W, Sun A, et al (2014) Exploiting geographical neighborhood characteristics for location recommendation. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. Association for Computing Machinery, New York, NY, USA, CIKM ’14, pp 739–748. https://​doi.​org/​10.​1145/​2661829.​2662002
go back to reference McAuley J, Targett C, Shi Q, et al (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’15, pp 43–52. https://doi.org/10.1145/2766462.2767755 McAuley J, Targett C, Shi Q, et al (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’15, pp 43–52. https://​doi.​org/​10.​1145/​2766462.​2767755
go back to reference Mikolov T, Sutskever I, Chen K et al (2013) Distributed representations of words and phrases and their compositionality. In: Burges C, Bottou L, Welling M et al (eds) Advances in Neural Information Processing Systems, vol 26. Curran Associates, Inc Mikolov T, Sutskever I, Chen K et al (2013) Distributed representations of words and phrases and their compositionality. In: Burges C, Bottou L, Welling M et al (eds) Advances in Neural Information Processing Systems, vol 26. Curran Associates, Inc
go back to reference Munthe Caspersen K, Bjeldbak Madsen M, Berre Eriksen A, et al (2017) A hierarchical tree distance measure for classification. In: Proceedings of the 6th international conference on pattern recognition applications and methods - ICPRAM,, INSTICC. SciTePress, pp 502–509. https://doi.org/10.5220/0006198505020509 Munthe Caspersen K, Bjeldbak Madsen M, Berre Eriksen A, et al (2017) A hierarchical tree distance measure for classification. In: Proceedings of the 6th international conference on pattern recognition applications and methods - ICPRAM,, INSTICC. SciTePress, pp 502–509. https://​doi.​org/​10.​5220/​0006198505020509​
go back to reference Nguyen D, Nguyen TD, Luo W et al (2018) Trans2vec: Learning transaction embedding via items and frequent itemsets. In: Phung D, Tseng VS, Webb GI et al (eds) Advances in Knowledge Discovery and Data Mining. Springer International Publishing, Cham, pp 361–372 Nguyen D, Nguyen TD, Luo W et al (2018) Trans2vec: Learning transaction embedding via items and frequent itemsets. In: Phung D, Tseng VS, Webb GI et al (eds) Advances in Knowledge Discovery and Data Mining. Springer International Publishing, Cham, pp 361–372
go back to reference Ni Y, Ou D, Liu S, et al (2018) Perceive your users in depth: learning universal user representations from multiple e-commerce tasks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, KDD ’18, pp 596–605. https://doi.org/10.1145/3219819.3219828 Ni Y, Ou D, Liu S, et al (2018) Perceive your users in depth: learning universal user representations from multiple e-commerce tasks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, KDD ’18, pp 596–605. https://​doi.​org/​10.​1145/​3219819.​3219828
go back to reference Nickel M, Kiela D (2017) Poincaré embeddings for learning hierarchical representations. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp 6341–6350 Nickel M, Kiela D (2017) Poincaré embeddings for learning hierarchical representations. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp 6341–6350
go back to reference Okura S, Tagami Y, Ono S, et al (2017) Embedding-based news recommendation for millions of users. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 1933–1942. https://doi.org/10.1145/3097983.3098108 Okura S, Tagami Y, Ono S, et al (2017) Embedding-based news recommendation for millions of users. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 1933–1942. https://​doi.​org/​10.​1145/​3097983.​3098108
go back to reference Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’14, pp 701–710. https://doi.org/10.1145/2623330.2623732 Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’14, pp 701–710. https://​doi.​org/​10.​1145/​2623330.​2623732
go back to reference Segond M, Borgelt C (2011) Item set mining based on cover similarity. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining - Volume Part II. Springer-Verlag, Berlin, Heidelberg, PAKDD’11, pp 493–505 Segond M, Borgelt C (2011) Item set mining based on cover similarity. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining - Volume Part II. Springer-Verlag, Berlin, Heidelberg, PAKDD’11, pp 493–505
go back to reference Tang J, Qu M, Wang M, et al (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, WWW ’15, pp 1067–1077. https://doi.org/10.1145/2736277.2741093 Tang J, Qu M, Wang M, et al (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, WWW ’15, pp 1067–1077. https://​doi.​org/​10.​1145/​2736277.​2741093
go back to reference Tummala K, Oswald C, Sivaselvan B (2018) A frequent and rare itemset mining approach to transaction clustering. In: Sharma RSM (eds). Data Science Analytics and Applications. Springer Singapore, Singapore, pp 8–18 Tummala K, Oswald C, Sivaselvan B (2018) A frequent and rare itemset mining approach to transaction clustering. In: Sharma RSM (eds). Data Science Analytics and Applications. Springer Singapore, Singapore, pp 8–18
go back to reference Yang R, Kalnis P, Tung AKH (2005) Similarity evaluation on tree-structured data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. Association for Computing Machinery, New York, NY, USA, SIGMOD ’05, pp 754–765. https://doi.org/10.1145/1066157.1066243 Yang R, Kalnis P, Tung AKH (2005) Similarity evaluation on tree-structured data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. Association for Computing Machinery, New York, NY, USA, SIGMOD ’05, pp 754–765. https://​doi.​org/​10.​1145/​1066157.​1066243
go back to reference Zhang C, Tao F, Chen X, et al (2018) Taxogen: unsupervised topic taxonomy construction by adaptive term embedding and clustering. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, KDD ’18, pp 2701–2709. https://doi.org/10.1145/3219819.3220064 Zhang C, Tao F, Chen X, et al (2018) Taxogen: unsupervised topic taxonomy construction by adaptive term embedding and clustering. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, KDD ’18, pp 2701–2709. https://​doi.​org/​10.​1145/​3219819.​3220064
Metadata
Title
Category tree distance: a taxonomy-based transaction distance for web user analysis
Authors
Yinjia Zhang
Qinpei Zhao
Yang Shi
Jiangfeng Li
Weixiong Rao
Publication date
13-10-2022
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 1/2023
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-022-00874-9

Other articles of this Issue 1/2023

Data Mining and Knowledge Discovery 1/2023 Go to the issue

Premium Partner