Skip to main content

2016 | OriginalPaper | Buchkapitel

A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

verfasst von : Rayner Alfred, Gabriel Jong Chiye, Yuto Lim, Chin Kim On, Joe Henry Obit

Erschienen in: Soft Computing in Data Science

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In learning relational data, the Dynamic Aggregation of Relational Attributes algorithm is capable to transform a multi-relational database into a vector space representation, in which a traditional clustering algorithm can then be applied directly to summarize relational data. However, the performance of the algorithm is highly dependent on the quality of clusters produced. A small change in the initialization of the clustering algorithm parameters may cause adverse effects to the clusters quality produced. In optimizing the quality of clusters, a Genetic Algorithm is used to find the best combination of initializations in order to produce the optimal clusters. The proposed method involves the task of finding the best initialization with respect to the number of clusters, proximity distance measurements, fitness functions, and classifiers used for the evaluation. Based on the results obtained, clustering coupled with Euclidean distance is found to perform better in the classification stage compared to using clustering coupled with Cosine similarity. Based on the findings, the cluster entropy is the best fitness function, followed by multi-objectives fitness function used in the genetic algorithm. This is most probably because of the involvement of external measurement that takes the class label into consideration in optimizing the structure of the cluster results. In short, this paper shows the influence of varying the initialization values on the predictive performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cattral, R., Oppacher, F., Graham, K.J.L.: Techniques for evolutionary rule discovery in data mining. In: Conference on Evolutionary Computation, pp. 1737–1744 (2009) Cattral, R., Oppacher, F., Graham, K.J.L.: Techniques for evolutionary rule discovery in data mining. In: Conference on Evolutionary Computation, pp. 1737–1744 (2009)
2.
Zurück zum Zitat Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. In: IEEE 2014, pp. 1149–1176 (2014) Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. In: IEEE 2014, pp. 1149–1176 (2014)
3.
Zurück zum Zitat Dzeroski, S.: Relational data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer US (2010) Dzeroski, S.: Relational data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer US (2010)
4.
Zurück zum Zitat Ling, P., Rong, X.: Double-Phase Locality Sensitive Hashing of neighborhood development for multi-relational data. In: 13th UK Workshop on Computational Intelligence (UKCI), pp. 206–213 (2013) Ling, P., Rong, X.: Double-Phase Locality Sensitive Hashing of neighborhood development for multi-relational data. In: 13th UK Workshop on Computational Intelligence (UKCI), pp. 206–213 (2013)
5.
Zurück zum Zitat Mistry, U., Thakkar, A.R.: Link-based classification for Multi-Relational database. In: Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6 (2014) Mistry, U., Thakkar, A.R.: Link-based classification for Multi-Relational database. In: Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6 (2014)
6.
Zurück zum Zitat Zhang, W.: Multi-relational data mining based on higher-order inductive logic. In: WRI Global Congress in Intelligent Systems, Xiamen, pp. 453–458 (2009) Zhang, W.: Multi-relational data mining based on higher-order inductive logic. In: WRI Global Congress in Intelligent Systems, Xiamen, pp. 453–458 (2009)
7.
Zurück zum Zitat Roth, D., Yih, W.-T.: Propositionalization of relational learning: an information extraction case study. In: 17th International Joint Conference on Artificial Intelligence, Seattle (2001) Roth, D., Yih, W.-T.: Propositionalization of relational learning: an information extraction case study. In: 17th International Joint Conference on Artificial Intelligence, Seattle (2001)
8.
Zurück zum Zitat Nguyen, T.-S., Duong, T.-A., Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)CrossRef Nguyen, T.-S., Duong, T.-A., Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)CrossRef
9.
Zurück zum Zitat Lu, B., Ju, F.: An optimized genetic K-means clustering algorithm. In: International Conference on Computer Science and Information Processing, pp. 1296–1299 (2012) Lu, B., Ju, F.: An optimized genetic K-means clustering algorithm. In: International Conference on Computer Science and Information Processing, pp. 1296–1299 (2012)
10.
Zurück zum Zitat Li, T., Chen, Y.: A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 36–41(2008) Li, T., Chen, Y.: A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 36–41(2008)
11.
Zurück zum Zitat Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium 2005, pp. 185–191(2005) Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium 2005, pp. 185–191(2005)
12.
Zurück zum Zitat Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2nd International Conference on Machine Learning and Computing (ICMLC), pp. 71–75 (2010) Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2nd International Conference on Machine Learning and Computing (ICMLC), pp. 71–75 (2010)
13.
Zurück zum Zitat Bharwad, N.D., Goswami, M.M.: Proposed efficient approach for classification for multi-relational data mining using Bayesian Belief Network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1–4 (2004) Bharwad, N.D., Goswami, M.M.: Proposed efficient approach for classification for multi-relational data mining using Bayesian Belief Network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1–4 (2004)
14.
Zurück zum Zitat Muggleton, S.: Inductive Logic Programming. New Gener. Comput. 8(4), 295–318 (1991)CrossRefMATH Muggleton, S.: Inductive Logic Programming. New Gener. Comput. 8(4), 295–318 (1991)CrossRefMATH
15.
Zurück zum Zitat Guo, J., Zheng, L., Li, T.: An efficient graph-based multi-relational data mining algorithm. In: International Conference on Computational Intelligence and Security, pp. 176–180 (2007) Guo, J., Zheng, L., Li, T.: An efficient graph-based multi-relational data mining algorithm. In: International Conference on Computational Intelligence and Security, pp. 176–180 (2007)
16.
Zurück zum Zitat Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective generic algorithm. In: 12th International Conference on Hybrid Intelligent Systems, Pune, pp. 336–341 (2012) Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective generic algorithm. In: 12th International Conference on Hybrid Intelligent Systems, Pune, pp. 336–341 (2012)
17.
Zurück zum Zitat Shah, N., Mahajan, S.: Document clustering: a detailed review. Int. J. Appl. Inf. Syst. 4, 30–38 (2012) Shah, N., Mahajan, S.: Document clustering: a detailed review. Int. J. Appl. Inf. Syst. 4, 30–38 (2012)
18.
Zurück zum Zitat Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69(11), 1208–1226 (2010)CrossRef Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69(11), 1208–1226 (2010)CrossRef
19.
Zurück zum Zitat Pettinger, D., Di Fatta, G.: Space partitioning for scalable k-means. In: 9th International Conference in Machine Learning and Apps (ICMLA), pp. 319–324 (2010) Pettinger, D., Di Fatta, G.: Space partitioning for scalable k-means. In: 9th International Conference in Machine Learning and Apps (ICMLA), pp. 319–324 (2010)
20.
Zurück zum Zitat Rendon, E., Abundez, A.A.I., Quiroz, E.M.: Internal versus External cluster validation indexes. Int. J. Comput. and Commun. 5(1), 27–32 (2011) Rendon, E., Abundez, A.A.I., Quiroz, E.M.: Internal versus External cluster validation indexes. Int. J. Comput. and Commun. 5(1), 27–32 (2011)
21.
Zurück zum Zitat Bilal, M., Masud, S., Athar, S.: FPGA design for statistics-inspired approximate sum-of-squared-error computation in multimedia applications. IEEE Trans. Circ. Syst. II: Express Briefs 59(8), 506–510 (2012)CrossRef Bilal, M., Masud, S., Athar, S.: FPGA design for statistics-inspired approximate sum-of-squared-error computation in multimedia applications. IEEE Trans. Circ. Syst. II: Express Briefs 59(8), 506–510 (2012)CrossRef
22.
Zurück zum Zitat Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, London (1999)MATH Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, London (1999)MATH
23.
Zurück zum Zitat Razali, N.M., Geraghty, J.: Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering 2011, London, vol. II (2011) Razali, N.M., Geraghty, J.: Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering 2011, London, vol. II (2011)
24.
Zurück zum Zitat Wahid, A., Gao, X., Peter, A.: Multi-view clustering of web documents using multi-objective genetic algorithm. In: 2014 IEEE Congress Evolutionary Computation (CEC), pp. 2625–2632 (2014) Wahid, A., Gao, X., Peter, A.: Multi-view clustering of web documents using multi-objective genetic algorithm. In: 2014 IEEE Congress Evolutionary Computation (CEC), pp. 2625–2632 (2014)
25.
Zurück zum Zitat Wen, X., Li, X., Gao, L., Wan, L., Wang, W.: Multi-objective genetic algorithm for integrated process planning and scheduling with fuzzy processing time. In: 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), pp. 293–298 (2013) Wen, X., Li, X., Gao, L., Wan, L., Wang, W.: Multi-objective genetic algorithm for integrated process planning and scheduling with fuzzy processing time. In: 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), pp. 293–298 (2013)
26.
Zurück zum Zitat Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 9(9), 992–1007 (2006)CrossRef Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 9(9), 992–1007 (2006)CrossRef
27.
Zurück zum Zitat Ismail, F.S., Yusof, R., Waqiyuddin, S.M.M.: Multi-objective optimization problems: method and application. In: 2011 4th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–6 (2011) Ismail, F.S., Yusof, R., Waqiyuddin, S.M.M.: Multi-objective optimization problems: method and application. In: 2011 4th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–6 (2011)
28.
Zurück zum Zitat Zeghichi, N., Assas, M., Mouss, L.H.: Genetic algorithm with pareto fronts for multi-criteria optimization case study milling parameters optimization. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA), Benevento, pp. 1–5 (2011) Zeghichi, N., Assas, M., Mouss, L.H.: Genetic algorithm with pareto fronts for multi-criteria optimization case study milling parameters optimization. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA), Benevento, pp. 1–5 (2011)
29.
Zurück zum Zitat Atashkari, K., NarimanZadeh, N., Ghavimi, A.R., Mahmoodabadi, M.J., Aghaienezhad, F.: Multi-objective optimization of power and heating system based on artificial bee colony. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, pp. 64–68 (2011) Atashkari, K., NarimanZadeh, N., Ghavimi, A.R., Mahmoodabadi, M.J., Aghaienezhad, F.: Multi-objective optimization of power and heating system based on artificial bee colony. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, pp. 64–68 (2011)
Metadaten
Titel
A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data
verfasst von
Rayner Alfred
Gabriel Jong Chiye
Yuto Lim
Chin Kim On
Joe Henry Obit
Copyright-Jahr
2016
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-2777-2_10

Premium Partner