Skip to main content
Published in:

2021 | OriginalPaper | Chapter

Identifying Optimal Clusters in Purchase Transaction Data

Authors : L. Cleofas-Sanchez, A. Pineda-Briseño, J. S. Sanchez

Published in: Advances in Computational Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

loading …


Clustering in transaction databases can find potentially useful patterns to gain some insight into the structure of the data, which can help for effective decision-making. However, one of the critical tasks in clustering is to identify the appropriate number of clusters, which will determine the performance of any process further applied to the transaction database. This paper presents a methodology to discover the optimal structure of purchase transaction data using the Davies-Bouldin and Calinski-Harabasz validity indices to obtain the number of clusters and formed them with the farthest-first traversals algorithm. The quality of the structures previously formed is evaluated with data complexity measures such as F1, F2, F3, N1 and IR. In this work, we use the support vector machine and multi-layer perceptron classification algorithms, to determine recognition ability in classification problems of more than two classes, and in the context of separability and imbalance of classes present in the groups previously obtained. The experimental results exhibit the viability of the proposed methodology for decision-making.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"


Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"


Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe


Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"


Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Available only for authorised users
go back to reference Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011) Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011)
go back to reference Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)CrossRef Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)CrossRef
go back to reference Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)MathSciNetMATH Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)MathSciNetMATH
go back to reference Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)CrossRef Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)CrossRef
go back to reference Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRef Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRef
go back to reference Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)CrossRef Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)CrossRef
go back to reference Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)CrossRef Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)CrossRef
go back to reference Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)CrossRef Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)CrossRef
go back to reference Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRef Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRef
go back to reference Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998) Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998)
go back to reference He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)CrossRef He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)CrossRef
go back to reference Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)MathSciNetCrossRef Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)MathSciNetCrossRef
go back to reference Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)MathSciNet Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)MathSciNet
go back to reference Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef
go back to reference Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)CrossRef Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)CrossRef
go back to reference Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015) Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015)
go back to reference Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)CrossRef Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)CrossRef
go back to reference Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)MathSciNetCrossRef Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)MathSciNetCrossRef
go back to reference Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)CrossRef Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)CrossRef
go back to reference Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006) Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006)
go back to reference Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRef Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRef
go back to reference Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)CrossRef Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)CrossRef
go back to reference Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011) Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011)
go back to reference Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)CrossRef Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)CrossRef
go back to reference Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001) Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001)
go back to reference Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003) Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003)
go back to reference Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001) Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001)
Identifying Optimal Clusters in Purchase Transaction Data
L. Cleofas-Sanchez
A. Pineda-Briseño
J. S. Sanchez
Copyright Year

Premium Partner