Skip to main content
Top
Published in:

2021 | OriginalPaper | Chapter

Identifying Optimal Clusters in Purchase Transaction Data

Authors : L. Cleofas-Sanchez, A. Pineda-Briseño, J. S. Sanchez

Published in: Advances in Computational Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Clustering in transaction databases can find potentially useful patterns to gain some insight into the structure of the data, which can help for effective decision-making. However, one of the critical tasks in clustering is to identify the appropriate number of clusters, which will determine the performance of any process further applied to the transaction database. This paper presents a methodology to discover the optimal structure of purchase transaction data using the Davies-Bouldin and Calinski-Harabasz validity indices to obtain the number of clusters and formed them with the farthest-first traversals algorithm. The quality of the structures previously formed is evaluated with data complexity measures such as F1, F2, F3, N1 and IR. In this work, we use the support vector machine and multi-layer perceptron classification algorithms, to determine recognition ability in classification problems of more than two classes, and in the context of separability and imbalance of classes present in the groups previously obtained. The experimental results exhibit the viability of the proposed methodology for decision-making.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011) Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011)
2.
go back to reference Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)CrossRef Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)CrossRef
3.
go back to reference Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)MathSciNetMATH Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)MathSciNetMATH
4.
go back to reference Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)CrossRef Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)CrossRef
5.
go back to reference Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRef Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRef
6.
go back to reference Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)CrossRef Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)CrossRef
7.
go back to reference Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)CrossRef Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)CrossRef
8.
go back to reference Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)CrossRef Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)CrossRef
9.
go back to reference Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRef Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRef
10.
go back to reference Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998) Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998)
11.
go back to reference He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)CrossRef He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)CrossRef
12.
go back to reference Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)MathSciNetCrossRef Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)MathSciNetCrossRef
13.
go back to reference Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)MathSciNet Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)MathSciNet
14.
go back to reference Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef
15.
go back to reference Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)CrossRef Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)CrossRef
16.
go back to reference Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015) Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015)
17.
go back to reference Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)CrossRef Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)CrossRef
18.
go back to reference Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)MathSciNetCrossRef Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)MathSciNetCrossRef
19.
go back to reference Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)CrossRef Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)CrossRef
20.
go back to reference Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006) Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006)
21.
go back to reference Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRef Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRef
22.
go back to reference Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)CrossRef Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)CrossRef
23.
go back to reference Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011) Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011)
24.
go back to reference Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)CrossRef Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)CrossRef
25.
go back to reference Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001) Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001)
26.
go back to reference Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003) Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003)
27.
go back to reference Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001) Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001)
Metadata
Title
Identifying Optimal Clusters in Purchase Transaction Data
Authors
L. Cleofas-Sanchez
A. Pineda-Briseño
J. S. Sanchez
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-89817-5_1

Premium Partner