Skip to main content
Top

2021 | OriginalPaper | Chapter

Mining Feature Relationships in Data

Author : Andrew Lensen

Published in: Genetic Programming

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

When faced with a new dataset, most practitioners begin by performing exploratory data analysis to discover interesting patterns and characteristics within data. Techniques such as association rule mining are commonly applied to uncover relationships between features (attributes) of the data. However, association rules are primarily designed for use on binary or categorical data, due to their use of rule-based machine learning. A large proportion of real-world data is continuous in nature, and discretisation of such data leads to inaccurate and less informative association rules. In this paper, we propose an alternative approach called feature relationship mining (FRM), which uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data. To the best of our knowledge, our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features. Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships which can be easily interpreted and which provide clear and non-trivial insight into data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For example, mutating the 0.71 node of \(x = f_1 \times (f_0 + 0.71)\) using a traditional mutation would give a new value in U[0, 1]. While local-search approaches can be used to optimise constants more cautiously, it is best if they can be avoided completely.
 
2
Two features are defined to be linearly correlated if they have an absolute Pearson’s correlation greater than 0.95.
 
3
Note that \(\text {Fitness}=\text {Cost}+ \alpha \times \text {Nodes}\), but we also list the fitness separately for completeness.
 
4
Only the top five FRs are considered to make the plots easier to analyse.
 
Literature
1.
go back to reference Tukey, J.W.: Exploratory data analysis, vol. 2. Reading, MA (1977) Tukey, J.W.: Exploratory data analysis, vol. 2. Reading, MA (1977)
2.
go back to reference Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, pp. 207–216, May 26–28, 1993. ACM Press (1993) Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, pp. 207–216, May 26–28, 1993. ACM Press (1993)
5.
go back to reference Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)CrossRef Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)CrossRef
7.
go back to reference Hart, E., Sim, K., Gardiner, B., Kamimura, K.: A hybrid method for feature construction and selection to improve wind-damage prediction in the forestry sector. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1121–1128 (2017) Hart, E., Sim, K., Gardiner, B., Kamimura, K.: A hybrid method for feature construction and selection to improve wind-damage prediction in the forestry sector. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1121–1128 (2017)
8.
go back to reference Chen, Q., Xue, B., Zhang, M.: Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression. IEEE Trans. Cybern. 1–14 (2020) Chen, Q., Xue, B., Zhang, M.: Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression. IEEE Trans. Cybern. 1–14 (2020)
9.
go back to reference Arnaldo, I., Krawiec, K., O’Reilly, U.: Multiple regression genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 879–886. ACM (2014) Arnaldo, I., Krawiec, K., O’Reilly, U.: Multiple regression genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 879–886. ACM (2014)
10.
go back to reference Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRef Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRef
12.
go back to reference Lensen, A., Zhang, M., Xue, B.: Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genetic Program. Evolvable Mach. 21, 399–431 (2020)CrossRef Lensen, A., Zhang, M., Xue, B.: Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genetic Program. Evolvable Mach. 21, 399–431 (2020)CrossRef
13.
go back to reference Lensen, A., Xue, B., Zhang, M.: Genetic programming for evolving a front of interpretable models for data visualisation. IEEE Trans. Cybern. 1–15 (2020) Lensen, A., Xue, B., Zhang, M.: Genetic programming for evolving a front of interpretable models for data visualisation. IEEE Trans. Cybern. 1–15 (2020)
14.
go back to reference Telikani, A., Gandomi, A.H., Shahbahrami, A.: A survey of evolutionary computation for association rule mining. Inf. Sci. 524, 318–352 (2020)MathSciNetCrossRef Telikani, A., Gandomi, A.H., Shahbahrami, A.: A survey of evolutionary computation for association rule mining. Inf. Sci. 524, 318–352 (2020)MathSciNetCrossRef
15.
go back to reference Rodríguez, D.M., Rosete, A., Alcalá-Fdez, J., Herrera, F.: A new multiobjective evolutionary algorithm for mining a reduced set of interesting positive and negative quantitative association rules. IEEE Trans. Evol. Comput. 18(1), 54–69 (2014)CrossRef Rodríguez, D.M., Rosete, A., Alcalá-Fdez, J., Herrera, F.: A new multiobjective evolutionary algorithm for mining a reduced set of interesting positive and negative quantitative association rules. IEEE Trans. Evol. Comput. 18(1), 54–69 (2014)CrossRef
16.
go back to reference Kuo, R.J., Chao, C.M., Chiu, Y.T.: Application of particle swarm optimization to association rule mining. Appl. Soft Comput. 11(1), 326–336 (2011)CrossRef Kuo, R.J., Chao, C.M., Chiu, Y.T.: Application of particle swarm optimization to association rule mining. Appl. Soft Comput. 11(1), 326–336 (2011)CrossRef
17.
go back to reference Taboada, K., Shimada, K., Mabu, S., Hirasawa, K., Hu, J.: Association rule mining for continuous attributes using genetic network programming. In: Lipson, H. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, p. 1758, July 7–11, 2007. ACM (2007) Taboada, K., Shimada, K., Mabu, S., Hirasawa, K., Hu, J.: Association rule mining for continuous attributes using genetic network programming. In: Lipson, H. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, p. 1758, July 7–11, 2007. ACM (2007)
18.
go back to reference Mabu, S., Chen, C., Lu, N., Shimada, K., Hirasawa, K.: An intrusion-detection model based on fuzzy class-association-rule mining using genetic network programming. IEEE Trans. Syst. Man Cybern. Part C 41(1), 130–139 (2011)CrossRef Mabu, S., Chen, C., Lu, N., Shimada, K., Hirasawa, K.: An intrusion-detection model based on fuzzy class-association-rule mining using genetic network programming. IEEE Trans. Syst. Man Cybern. Part C 41(1), 130–139 (2011)CrossRef
19.
go back to reference Luna, J.M., Romero, J.R., Ventura, S.: Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl. Inf. Syst. 32(1), 53–76 (2012)CrossRef Luna, J.M., Romero, J.R., Ventura, S.: Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl. Inf. Syst. 32(1), 53–76 (2012)CrossRef
20.
go back to reference Luna, J.M., Pechenizkiy, M., del Jesus, M.J., Ventura, S.: Mining context-aware association rules using grammar-based genetic programming. IEEE Trans. Cybern. 48(11), 3030–3044 (2018)CrossRef Luna, J.M., Pechenizkiy, M., del Jesus, M.J., Ventura, S.: Mining context-aware association rules using grammar-based genetic programming. IEEE Trans. Cybern. 48(11), 3030–3044 (2018)CrossRef
21.
go back to reference Tomassini, M., Vanneschi, L., Collard, P., Clergue, M.: A study of fitness distance correlation as a difficulty measure in genetic programming. Evol. Comput. 13(2), 213–239 (2005)CrossRef Tomassini, M., Vanneschi, L., Collard, P., Clergue, M.: A study of fitness distance correlation as a difficulty measure in genetic programming. Evol. Comput. 13(2), 213–239 (2005)CrossRef
22.
go back to reference Haeri, M.A., Ebadzadeh, M.M., Folino, G.: Statistical genetic programming for symbolic regression. Appl. Soft Comput. 60, 447–469 (2017)CrossRef Haeri, M.A., Ebadzadeh, M.M., Folino, G.: Statistical genetic programming for symbolic regression. Appl. Soft Comput. 60, 447–469 (2017)CrossRef
23.
go back to reference Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 829–836 (2002) Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 829–836 (2002)
26.
go back to reference Badran, K.M.S., Rockett, P.I.: The influence of mutation on population dynamics in multiobjective genetic programming. Genet. Program. Evolvable Mach. 11(1), 5–33 (2010)CrossRef Badran, K.M.S., Rockett, P.I.: The influence of mutation on population dynamics in multiobjective genetic programming. Genet. Program. Evolvable Mach. 11(1), 5–33 (2010)CrossRef
27.
go back to reference Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Academic press, Cambridge (2013)CrossRef Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Academic press, Cambridge (2013)CrossRef
28.
go back to reference Halstead, M.H., et al.: Elements of Software Science, vol. 7. Elsevier, New York (1977)MATH Halstead, M.H., et al.: Elements of Software Science, vol. 7. Elsevier, New York (1977)MATH
29.
go back to reference Roth, A.E. (ed.): The Shapley Value: Essays in Honor of Lloyd S. Cambridge University Press, Shapley, Cambridge (1988)MATH Roth, A.E. (ed.): The Shapley Value: Essays in Honor of Lloyd S. Cambridge University Press, Shapley, Cambridge (1988)MATH
Metadata
Title
Mining Feature Relationships in Data
Author
Andrew Lensen
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-72812-0_16

Premium Partner