Skip to main content
Top

2019 | OriginalPaper | Chapter

Parallel and Distributed Map-Reduce Models for External Clustering Validation Indexes

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Procedures that evaluate the results of clustering algorithms are known as cluster validation (CV) indexes. There exist several CV indexes usually classified into two broad classes namely external and internal clustering validation indexes depending on whether ground truth or optimal clustering solutions are known in advance or not respectively. Traditional cluster validation indexes are even impossible to perform especially when the size of the data set is very large. To solve the issue of CV indexes in such contexts, we propose parallel and distributed external clustering validation models based on MapReduce for three indexes namely: F-measure, Normalized Mutual Information and Variation of Information. The experimental results reveal that these models scale very well with increasing size of dataset and provide accurate results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Davidson, I., Ravi, S.S., Shamis, L.: A SAT-based framework for efficient constrained clustering. In: the Proceedings of the 10th SIAM International Conference on Data Mining, pp. 94–105 (2010)CrossRef Davidson, I., Ravi, S.S., Shamis, L.: A SAT-based framework for efficient constrained clustering. In: the Proceedings of the 10th SIAM International Conference on Data Mining, pp. 94–105 (2010)CrossRef
3.
go back to reference MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceeding of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceeding of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
4.
go back to reference Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large datasets. In: ACM SIGMOD, vol. 25, pp. 103–114 (1996)CrossRef Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large datasets. In: ACM SIGMOD, vol. 25, pp. 103–114 (1996)CrossRef
5.
go back to reference Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise, vol. 96, pp. 226–231 (1996) Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise, vol. 96, pp. 226–231 (1996)
6.
go back to reference Liao, W.-K., Liu, K., Choudhary, A.: A grid based algorithm using adaptive mesh refinement. In: 7th Workshop on Mining Scientific and Engineering Datasets, pp. 1–9 (2004) Liao, W.-K., Liu, K., Choudhary, A.: A grid based algorithm using adaptive mesh refinement. In: 7th Workshop on Mining Scientific and Engineering Datasets, pp. 1–9 (2004)
8.
go back to reference Rendon, E., Abundez, I., Arizmendi, A., Quiroz, E.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011) Rendon, E., Abundez, I., Arizmendi, A., Quiroz, E.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
11.
go back to reference Santibanez, M., Valdovinos, R.-M., Trueba, A., Rendon, E., Alejo, R., Lopez, E.: Applicability of cluster validation indexes for large data sets. In: The 12th Mexican International Conference on Artificial Intelligence, pp. 187–193 (2013). https://doi.org/10.1109/micai.2013.30 Santibanez, M., Valdovinos, R.-M., Trueba, A., Rendon, E., Alejo, R., Lopez, E.: Applicability of cluster validation indexes for large data sets. In: The 12th Mexican International Conference on Artificial Intelligence, pp. 187–193 (2013). https://​doi.​org/​10.​1109/​micai.​2013.​30
12.
go back to reference Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York (2014)CrossRef Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York (2014)CrossRef
13.
go back to reference Zerabi, S., Meshoul, S.: Parallel clustering validation based on MapReduce. In: International Conference CSA 2017, Algiers, Algeria, 24–25 April. Springer, Heidelberg (2018, in press). ISSN 2367-3370 Zerabi, S., Meshoul, S.: Parallel clustering validation based on MapReduce. In: International Conference CSA 2017, Algiers, Algeria, 24–25 April. Springer, Heidelberg (2018, in press). ISSN 2367-3370
14.
go back to reference Zerabi, S., Meshoul, S., Merniz, A., Melal R.: Towards clustering validation in big data context. In: International Conference BDCA 2017, Tetouan, Morocco. ACM (2017). ISBN 978-1-4503-4852-2/17/03 Zerabi, S., Meshoul, S., Merniz, A., Melal R.: Towards clustering validation in big data context. In: International Conference BDCA 2017, Tetouan, Morocco. ACM (2017). ISBN 978-1-4503-4852-2/17/03
16.
go back to reference White, T.: Hadoop: The Definitive Guide. O’reilly Media, Sebastopol (2009) White, T.: Hadoop: The Definitive Guide. O’reilly Media, Sebastopol (2009)
17.
go back to reference White, T.: Hadoop: The Definitive Guide, 3rd edn. O’reilly Media, Sebastopol (2012) White, T.: Hadoop: The Definitive Guide, 3rd edn. O’reilly Media, Sebastopol (2012)
18.
go back to reference Chullipparambil, C.P.: Big data analytics using Hadoop tools. Ph.D. thesis, San Diego State University (2016) Chullipparambil, C.P.: Big data analytics using Hadoop tools. Ph.D. thesis, San Diego State University (2016)
19.
go back to reference Lee, K., Choi, H., Moon, B.: Parallel data processing with MapReduce: a survey. SIGMOD Rec. 40(4), 11–20 (2011)CrossRef Lee, K., Choi, H., Moon, B.: Parallel data processing with MapReduce: a survey. SIGMOD Rec. 40(4), 11–20 (2011)CrossRef
20.
go back to reference Handl, J., Knowles, J.: Improvements to the scalability of multi objective clustering. IEEE Congr. Evol. Comput. 3, 2372–2379 (2005) Handl, J., Knowles, J.: Improvements to the scalability of multi objective clustering. IEEE Congr. Evol. Comput. 3, 2372–2379 (2005)
Metadata
Title
Parallel and Distributed Map-Reduce Models for External Clustering Validation Indexes
Authors
Soumeya Zerabi
Souham Meshoul
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-319-97719-5_15