High Energy Physics event selection with Gene Expression Programming

https://doi.org/10.1016/j.cpc.2007.10.003Get rights and content

Abstract

Gene Expression Programming is a new evolutionary algorithm that overcomes many limitations of the more established Genetic Algorithms and Genetic Programming. Its application to event selection in high energy physics data analysis is presented using as an example application the selection of KS particles produced in e+e interactions at 10 GeV and reconstructed in the decay mode KSπ+π. The algorithm was used for automatic identification of classification criteria for signal/background separation. For the problem studied and for data samples with signal to background ratios between 0.25 and 5, the classification accuracy obtained with the criteria developed by the GEP algorithm was in the range of 92–95%.

Introduction

High Energy Physics, due to the vast and complex data expected from current and future experiments, is in need of powerful and efficient techniques for various analysis tasks. While methods like Artificial Neural Networks, Fisher Discriminant or Likelihood Estimators have already been used in the past or current experiments, Evolutionary Algorithms are less explored.

Evolutionary Algorithms (EA) are computer algorithms inspired by biological evolutionary theories. They encode the candidate solution to the problem at hand into an entity understood by the computer, called a chromosome. The optimal solution is sought through an optimisation process guided by an objective function, called the fitness function, which quantifies how good the candidate solution is. The optimisation process uses a generate-and-test cycle. An initial set of candidate solutions (called a generation) is created, usually randomly. Then new solutions are generated from the current ones applying a set of operators on the chromosomes. These operators, called genetic operators, simulate the genetic reproduction from biological evolutionary theories. The new solutions are then tested with the fitness function and the best ones are selected for further modifications (reproduction).

The EAs contain, traditionally, four types of algorithms: Genetic Algorithms (GA) [1], Genetic Programming (GP) [2], Evolutionary Strategies (ES) [3] and Evolutionary Programming (EP) [4]. The main differences among them are in the chromosome representation and in the reproduction methods. ES and EP represent the chromosome as a vector of real numbers, while in GA the chromosome is usually a fixed length binary string and in GP a LISP expression translated graphically into a tree. The common reproduction method is based on the mutation and cross-over operators that have different definitions in each algorithm. These operators are applied directly on the candidate solution, limiting the performance of the algorithms. In GP, for example, many syntactically invalid structures are produced through genetic variation, wasting computational resources.

Gene Expression Programming (GEP) [5] is a new evolutionary algorithm which overcomes some of the limitations of GA and GP. It works with two entities called the chromosome and the expression tree. The chromosome is the encoder of the candidate solution which is translated into an expression tree (the actual candidate solution). This translation process of the chromosome into an expression tree can be seen as similar to the expression of the biological genes encoded in DNA into proteins. The GEP genetic operators are applied on the chromosome, not directly on the candidate solution (expression tree). This reproduction method together with the structural organisation of the chromosome and its translation process into an expression tree allows unconstrained genetic modifications, always producing valid expression trees. Previous studies [6] showed that these characteristics allow GEP to outperform GP by two to four orders of magnitude in terms of convergence speed for benchmark symbolic regression and classification problems.

The potential of GEP has started to be exploited in different fields such as data and text mining [7], [8], [9], [10], electrical circuit modelling [11], and various engineering applications [12].

In high energy physics, EAs were successfully tested but not largely used, mainly due to their high computational needs. GAs were applied mainly to problems such as discrimination and parameter optimisation in both experimental and theoretical studies (for example, see [13], [14], [15], [16]). GP was recently applied to event selection type problems [17], [18], [19]. ES were tested for optimisation of event selection criteria [20].

This paper presents the application of GEP to particle physics data analysis, extending the first analysis presented in [21] for more datasets and more complex chromosome configurations. The algorithm is used for signal/background separation in the off-line event selection stage of the data analysis process. The purpose of the study is the evaluation of the potential of the algorithm for solving such a problem rather than the extraction of particular physics results.

Section snippets

Algorithm

The GEP algorithm is described in detail in [5] and [6]. The main ideas are summarised here.

The first steps, and the most difficult ones, of the application of GEP (and of any EA) are the problem definition, the encoding of the candidate solution and the definition of the fitness function. The encoding and the fitness functions are specific to each problem. Adequate choices are crucial for the success of the algorithm. In making these choices knowledge about the problem and about the expected

Methodology

In this study GEP was applied to an event selection problem using a supervised statistical learning approach. The algorithm was used to extract selection criteria for the signal/background classification from training data samples for which it was known to which class the event belonged. The generalisation power of the extracted selection criteria was tested by applying the criteria on similar but independent data samples (test data) and comparing the classification provided by these criteria

Analysis and results

Two analyses, with two sets of input functions, were performed on datasets with the signal to background ratio (S/B) equal to 0.25, 1 and 5. The results presented below required little computational resources. For the most complex configurations, approximately 60 minutes were needed in order to reach the convergence of the search process, on a computer with a 2.8GHz processor.

Conclusions

A newly developed Evolutionary Algorithm, Gene Expression Programming, and its application to an event selection problem in high energy physics data analysis were introduced in this study. In order to evaluate the potential of the algorithm for solving such a problem, the selection of KSπ+π signal events from background events produced in the e+e interaction at 10 GeV was chosen for investigation. This decay channel, being well understood theoretically and experimentally, was accurately

Acknowledgements

Support from the UK Particle Physics and Astronomy Research Council (PPARC) is gratefully acknowledged.

References (30)

  • J.R. Koza

    Genetic Programming: On the Programming of the Computers by Means of Natural Selection

    (1992)
  • H.-P. Schwefel

    Numerical Optimization of Computer Models

    (1981)
  • L.J. Fogel et al.

    Artificial Intelligence through Simulated Evolution

    (1966)
  • C. Ferreira

    Gene expression programming: A new adaptive algorithm for solving problems

    Complex Systems

    (2001)
  • C. Ferreira

    Gene Expression Programming: Mathematical Modelling by an Artificial Intelligence

    (2002)
  • Cited by (181)

    • Determining the dependence of solar cell power conversion efficiency on fabrication parameters in inorganic perovskite types by programming gene expression

      2022, Physica E: Low-Dimensional Systems and Nanostructures
      Citation Excerpt :

      Fig. 3 shows the main steps of the GEP algorithm. It should be noted that the output of the genetic algorithm (GEP) depends on the fitting function and individuals are selected as a new or modified generation based on it [66,73,74]. In this study, GeneXproTools 5.0 is used to derive the PCE prediction model as a function of perovskite layer construction variables.

    View all citing articles on Scopus
    View full text