Erschienen in:

Open Access 2023 | OriginalPaper | Buchkapitel

7. Randomized Methods

verfasst von : Éric D. Taillard

Erschienen in: Design of Heuristic Algorithms for Hard Optimization

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

This chapter is devoted to random and memory-free methods that repeatedly construct solutions or modify them. Among the most popular techniques, there are simulated annealing, threshold accepting, great deluge and demon algorithms, noising methods, late acceptance hill climbing, variable neighborhood search, and GRASP.

By applying the principles presented in the previous chapters, we can build a solution and improve it to find a local optimum. In addition, if the problem is complicated or large, we have seen how to decompose it into sub-problems easier to solve.

What if, using these techniques, we obtain solutions whose quality is not good enough? Let us suppose that we can devote more computational time to finding better solutions. The first option coming to mind is to try to introduce a learning process. That is the subject of subsequent chapters.

A second option—a priori simpler to implement—is to incorporate random components into an “improvement” method where we allow the choice of lower quality solutions than that of departure. Although we no longer have a strict improvement at each iteration, this is a local search since such methods are based on locally modifying solutions. Very similar methods are based on this second option: simulated annealing (SA), threshold accepting (TA), great deluge, demon algorithms, and the noising method. Interestingly, the latter framework can be seen as a generalization of the previous methods. The late acceptance hill climbing shares similarities with these methods but incorporates a self-parameter tuning. The variable neighborhood search (VNS) alternates intensification and diversification phases. It improves a solution with a basic neighborhood and degrades it by randomly selecting moves in other neighborhoods.

A third option is to repeat constructions with random choices, possibly followed by local improvements. This is the way followed by the greedy randomized adaptive search procedure (GRASP).

7.1 Simulated Annealing

The simulated annealing method is one of the first local search techniques that does not strictly improve the quality of the solution at each iteration. This method is inspired by a process of physics, the annealing, which minimizes the internal energy of the molecules of a material. Some materials, like metals, have their internal structure modified, depending on the temperature to which they are heated. By rapidly cooling the material, the molecules do not have time to arrange to achieve the usual structure at low temperatures but forms grains which are small crystals whose orientation is different for each grain. This is the quenching process, which is used in particular to harden some steels.

On the contrary, if the cooling is slow, the molecules manage to form crystals much larger, corresponding to their minimum energy state. By repeating the method, one can further increase the size of the crystals or even obtain a monocrystal. This is the annealing process. These two processes are illustrated in Fig. 7.1.

Černý [1] and Kirkpatrick et al. [8] independently had the idea to simulate this process in combinatorial optimization, making the analogy between the objective function to minimize and the energy of the molecules. At high temperatures, a molecule has enough energy to fill a gap in the crystal lattice or change its configuration. However, at low temperatures, it has a significantly lower probability of doing so. Translating this in terms of combinatorial optimization means changing a solution locally and randomly and accepting its degradation with a certain probability. The latter must be low if the degradation is significant.

Expressed in terms of local search, this corresponds to generating a random move m ∈ M and calculating the cost difference Δ between the initial and modified solution: Δ = f(s ⊕ m) − f(s). If Δ < 0, the move m improves s, and it is accepted. The new solution becomes s ⊕ m. Else, the move m can eventually be accepted, with a probability proportional to e ^−Δ∕T, where T is a parameter simulating the temperature. At each step, the temperature T is diminished. Several formulas have been proposed to adjust the temperature. Among the most frequently encountered, T ← α ⋅ T and \(T \leftarrow \frac {T}{1 +\alpha T}\), where 0 < α < 1, is the parameter adjusting the decreasing speed of the temperature. The method comprises at least two other parameters: T _init and T _end the initial and finishing temperatures. Algorithm 7.1 provides the framework for basic simulated annealing.

Algorithm 7.1: Elementary simulated annealing. Countless variants of algorithms based on this framework have been proposed. Practical implementations do not return the solution s of the last iteration but the best solution found throughout the search

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figa_HTML.png

This framework is generally modified. First, the parameters defining the initial and final temperatures provide a very different effect according to the fitness function measure unit. Indeed, if f measures the length of the TSP tour, these parameters should be adapted depending on whether the unit is meters or kilometers. To make the algorithm more robust, we do not invite the user to directly provide temperatures. For instance, the user specifies degrading moves acceptance rates τ _init et τ _end, which is much more intuitive. The temperatures are then calculated automatically according to these rates. A random walk performing a few hundred or a few thousand steps can record statistics on the average Δ values.

Frequently, the temperature is not decreased at each iteration (Line 8). Another parameter is introduced, defining the number of iterations performed with a given temperature.

Figure 7.2 illustrates the evolution of the tour length for a TSP with 225 cities. Code 7.1 was executed with an initial temperature of 5 ⋅ d _max∕n, a final temperature of 20 ⋅ d _max∕n ² and α = 0.99, where d _max is the largest distance between two cities. The algorithm was provided a relatively good initial tour. There is a significant deterioration of the latter during the first iterations at high temperatures. This degradation is necessary to alter the structure of the starting solution to discover better solutions. About half of the iterations are carried out unnecessarily in this run, as the value of the best solution found no longer evolves.

Code 7.1 implements a very basic simulated annealing for the TSP. It is based on the 2-opt neighborhood structure. Algorithm 7.1 is adapted to decrease the temperature only every n ² iterations and not at each iteration. Thus, a value of α between 0.8 and 0.99 produces satisfactory results, regardless of the instance size. Finally, the user must provide the absolute value of the initial and final temperatures. As mentioned above, T _init = 5 ⋅ d _max∕n and T _end = 20 ⋅ d _max∕n ² are values that can be suitable for starting a parameter tuning.

Code 7.1 tsp_SA.pyA basic simulated annealing implementation for the TSP

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figaaa_HTML.png

7.2 Threshold Accepting

Threshold accepting, proposed by Dueck and Scheuer [4], is a pure local search. It only moves from a solution to one of its neighbors. Like simulated annealing, demon, and great deluge algorithms, the move are randomly chosen but are not systematically applied to the current solution. In the case of an improving move, the neighbor solution is accepted. If the move deteriorates the solution, it is only accepted if the deterioration is lower than a given threshold. The latter is gradually decreased to reach zero, so that the method stops in a local optimum. Algorithm 7.2 provides the threshold accepting framework.

Algorithm 7.2: Threshold accepting. The values of the thresholds τ₁, …τ_T are not necessarily explicitly provided but calculated, for instance, by providing only the initial threshold and multiplying it by another parameter, α, at each round of R iterations

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figb_HTML.png

Gilli et al. [6] proposed an automated method for setting the thresholds. First, random moves are performed, recording the fitness difference of neighbor solutions. This allows determining the empirical distribution function F of the amplitude of the moves. Then, the T thresholds are fixed using the inverse of this function (see Fig. 7.3): τ _t = F ⁻¹(0.8 ⋅ (T − t)∕T). Put differently, the first threshold τ ₁ accepts about 80% of the degrading moves, while the last, τ _T = 0, only allows improvements.

7.3 Great Deluge Algorithm

The great deluge algorithm, proposed by Dueck [3], has similarities with the previous one. However, the absolute value of the fitness function limits the search progression instead of the amplitude of the moves. The name of this method comes from the legend that, as a result of incessant rain, all terrestrial beings eventually drowned, except for those on Noah’s Ark. The animals on the ground panicked and ran in all directions, everywhere except where there was water.

The analogy with a maximization process is made by considering random moves that are accepted as long as they do not lead to a solution whose quality is less than a threshold L. The last is the water level which increases by a value of P at each iteration. This parameter simulates the rain strength. The process stops when the value of the current solution is less than L. Algorithm 7.3 provides the framework of this method. Its operating principle is illustrated in Fig. 7.4.

Algorithm 7.3: Great deluge algorithm. The algorithm can adapt to minimization problems. Hence, simulating the behavior of fish when the water level drops!

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figc_HTML.png

7.4 Demon Algorithm

The demon algorithm simulates the behavior of a compulsive gambler who always bets larger amounts of money. The devil of playing pushes futilely to spend money if the earnings exceed a certain threshold, D _max. Once this threshold is reached, the gambler continues to play. The gambler enters the casino with an sum of D in pocket and stops playing when exhausted, after I _max bets. This last parameter is involved in many iterative local searches for directly adjusting the computational effort. Translated in terms of local search, a bet is to allow a degrading move. But the loss cannot exceed the available budget. If the move improves the solution, the budget is increased accordingly, up to the maximum threshold. Algorithm 7.4 provides the framework of this method.

Algorithm 7.4: Demon algorithm. This algorithm is relatively simple to implement, but, as for threshold accepting, its parameters must be adjusted according to the numerical value of the data

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figd_HTML.png

7.5 Noising Methods

The noising methods, proposed by Charon and Hudry [2], make the assumption that the data of the problem are not known with infinite precision. Under these conditions, even if the problem is convex and simple, an improvement method can be trapped in local optima resulting from artifacts. To make the improvement method more robust, a random noise is added, either to the data or to the move evaluation. For instance, the coordinates of a Euclidean TSP can be slightly changed. Taking these stochastic values into account, the resulting fitness function has no local optima. The lasts are somehow erased by the random noise. The framework of noising methods is provided by Algorithm 7.5.

Algorithm 7.5: Noising method. At each move evaluation, random noise is generated according to the probability distribution noise(i), whose variance generally decreases with i

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Fige_HTML.png

A parameter of the method is a probability distribution, set up with the iteration number. Each time a solution is evaluated, a random noise occurrence is generated. Generally, its expectation is zero, and its variance decreases with the iteration number (see Fig. 7.5).

At the end of the algorithm, the method gets closer and closer to an improvement method. The variance of the noise function must naturally depend on the numerical data of the problem. To achieve this goal, one can incorporate the evaluation of the objective function in the probability distribution. The choice of the latter can transform a noising method into simulated annealing, threshold accepting or other techniques described above. Code 7.2 implements a kind of noising method for the TSP.

Code 7.2 tsp_noising.pyImplantation of a noising method for the TSP. The noise distribution is uniform and decreases exponentially with the number of iterations performed

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figaab_HTML.png

To evaluate and perform a move in constant time, the data structure introduced in Section 5.1.3.3 has been used. However, this data structure does not allow evaluating a random move (i, j) in constant time. Indeed, performing the tour in a given direction, given a city i and its succeeding one s _i, we cannot immediately identify the city s _j that succeeds j.

The artifice used in this code is to systematically scan the whole neighborhood instead of randomly drawing the move. The noise added to the evaluation of a move is such that the acceptance criterion is identical to that of a simulated annealing.

7.6 Late Acceptance Hill Climbing

Another technique is similar to the methods of simulated annealing, threshold accepting, and great deluge. The core idea is to differ the acceptance criterion of a neighbor solution. Instead of comparing the value of the latter with that of the current solution, it is compared to that obtained h iterations before. Thus, a neighbor solution is accepted either if it is at least as good as the current solution, or if it is better than the solution visited h iterations previously. The derived method is called Late Acceptance Hill Climbing (LAHC).

“Hill climbing” refers to an improvement method seeking to maximize an objective. Naturally, a descent method is obtained by changing the acceptance criterion of a neighbor solution. LAHC implementation requires storing a list L of the h values of the previous solution visited. This strategy allows a self-calibration of the acceptance criterion of a worse solution. Unlike the methods viewed above, LAHC is insensitive to the order of magnitude of the fitness values.

The framework of LAHC, given by Algorithm 7.6, does not specify a stopping criterion. A possibility is to set the probability p of being in a local optimum relative to the neighborhood M. The stopping criterion corresponding to this probability is to have performed \(|M|(\ln |M| - \ln (1 - p))\) iterations without improving the current solution.

Algorithm 7.6: Late acceptance hill climbing

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figf_HTML.png

7.7 Variable Neighborhood Search

Variable neighborhood search (VNS) [7] implements an idea called strategic oscillations . The search alternates intensification and diversification phases. The chief idea of this method is to rely on several neighborhoods M ₁…M _p. A first neighborhood, M ₁, is exploited as usual to find local optima. This is one of the most elementary ways to intensify the search. The other neighborhoods allow escaping from these optima by performing random moves. The latter ones increasingly destroy the local optima structure. Performing random moves is also one of the simplest ways to diversify the search for exploring other portions of the solution space. The framework of a basic VNS is provided by Algorithm 7.7.

Algorithm 7.7: Variable neighborhood search. When a limited number p of neighborhoods are available, the algorithm is repeated several times

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figg_HTML.png

A very limited number of neighborhoods are generally used. In this case, the method is repeated several times (see Problem 7.4). Several variants of this framework have been proposed: VNS descent, VNS decomposition, skewed VNS, and reduced VNS.

Code 7.3 provides a very simple VNS implementation for the TSP.

Code 7.3 tsp_VNS.pyVNS implementation for the TSP. The neighborhood M_k consists in swapping two cities k times. The repair method is an ejection chain. In addition to its extreme simplicity, this implementation requires no parameters

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figaac_HTML.png

7.8 GRASP

The greedy randomized adaptive search procedure (GRASP) was proposed by Feo and Resende [5]. It repeatedly improves, with a local search, a solution obtained with a greedy constructive method. The last incorporates a random component so that it produces various solutions. This method comprises two parameters, I _max, the number of repetitions of the outer loop of the algorithm and α, to adjust the degree of randomization. The framework of the method is provided by Algorithm 7.8.

Algorithm 7.8: GRASP. The programmer must initially design a method, local_search, for improving a solution. The user must provide two parameters, I_max which sets the computational effort, and α which sets the random choice level. The difference with the greedy constructive Algorithm 4.2 are highlighted

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figh_HTML.png

Code 7.4 implements the construction of a solution using GRASP. In practice, this code is repeated several times, and the best solutions produced are retained.

Code 7.4 tsp_GRASP.pyGRASP Implementation for the TSP. The randomized greedy construction is based on the nearest neighbor criterion. The improvement procedure is the ejection chain Code 12.3

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-13714-3_7/MediaObjects/522503_1_En_7_Figaad_HTML.png

Setting parameter α = 0 leads to a purely greedy constructive method. Unless many elements have the same incremental cost, the repetition of constructions does not make sense. Setting α = 1 leads to a purely random construction. The method represents then an iterative local search starting with random solutions. This technique is often used because it produces better solutions than a single local search run and requires negligible coding effort. To benefit from the advantages of the GRASP method, it is necessary to tune the α parameter. Usually, it will produce its full potential for values close to 0. It should additionally be noted that the initialization of the partial solution may include a random component. For the TSP, it can be the departure city. The incremental cost function may correspond to the nearest neighbor criterion.

Problems

7.1

SA Duration

How many iterations does a simulated annealing run if it starts with an initial temperature T ₀ and ends at temperature T _f, knowing that the temperature is multiplied by α at each iteration?

7.2

Tuning GRASP

Try to tune the α parameter of the GRASP code. Take the TSPLIB problem instance tsp225. Are good values depending on the number of iterations I _max the method performs?

7.3

VNS with a Single Neighborhood

VNS requires to have p different neighborhoods and to use a particular neighborhood M ₁ to find local optima. If we only have M ₁, how can we build the other neighborhoods?

7.4

Record to Record

In tsp_VNS, n different neighborhoods are used, leading to a method without parameters. Could a more efficient algorithm be obtained by limiting the number of neighborhoods and repeating the search several times? How many neighborhoods and how many times should we repeat the search?

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Springer Professional