Optimization Methods

There are several methods available to perform optimizations. The following provides a brief overview of the approach the methods use and, where appropriate, references to published material related to the methods. Options specific to the different methods are also described here.

Set up an Optimization by first Defining Payoffs and then setting up parameters to be search over using the Optimization Specs. It is in this panel that you select a method to use

Powell

Powell is an efficient conjugate gradient search as described in: Powell, M. J. D. (June 2009). The BOBYQA algorithm for bound constrained optimization without derivatives (Report). Department of Applied Mathematics and Theoretical Physics, Cambridge University (www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf retrieved September, 2017). It is a robust approach to finding the maximum value of the payoff in a small number of simulations for payoff surfaces that are reasonably well behaved. it is a local search technique, and may miss global optima for payoff surfaces that have multiple maxima or large areas of unchanging payoff values.

The additional global options for Powell are:

Tolerance is amount of a parameter change that is considered significant. This defaults to 10E-5 which is reasonable for parameters that are in the 1 to 100 range. Larger tolerance values will tend to speed convergence at the cost of precision.

Initial Step is the approximate expected size of change that should be made in the parameters. It is 1 by default. Use larger values if the parameters are large number, smaller if they are fractional numbers. This influences the efficiency of the early search process.

Max Sims specified the maximum number of simulations before the method will stop if it have not yet converged. This is a backstop to prevent excessive computation. Use -1 to not constrain the number of simulations. If you have specified additional starts each start will use up to this number of simulations.

The additional Per Parameter option for Powell is:

Scaling is the amount by which the parameter should be multiplied before applying the global tolerance and initial step criteria. Use this when some parameters are much larger in scale than others. For example is all parameters were fractions except one that was expected to be in the millions, you would apply a scaling of 1e-6 to the parameter in the millions. If all parameters are of the same magnitude, the default value of 1.0 will be fine.

Grid

Grid provides a mechanism for methodically searching parameters across their ranges. It works by changing each parameter across a sequence of values The total number of searches performed in a grid search is the product of the number of values searched for each parameter. If there are many parameters, this can be a very big number, so there are options to make the grid search more contained, or more progressive.

Grid ignores initial parameter values (and therefore does not ever perform additional starts) except when Sequential is selected as the search type .

The additional global option for Grid is:

Search Type: lets you choose between different ways to execute the grid search.

Exhaustive goes through all possible combinations. The number of searches required is the product of the number of steps for each of the parameters which can be a very big number.

Bisecting starts from the min and max for each parameter and then does the middle point of that, then the middle points of the middle points and so on. When a bisection would give a smaller interval for a parameter than the parameter's tolerance, the bisecting for that parameter stops. Bisecting has the advantage of testing extremes early, and then refining the evaluations. It can be used with large tolerances on some parameters to quicken the search, or simply canceled to review results.

Sequential goes through the parameters in sequence, instead of all at once. The first parameter is searched with the others left at their model values. The best value from the first is then used and the second parameter is searched and so on. The total number of simulations is thus the sum, rather than the product, of the number for each parameter. Sequential works well when there is strong independence among the parameter effects (for example, as occurs if they are additive to the payoff).

Latin Hypercube goes through all values by only testing each one a single time along each dimension. It can be useful when there are one or two parameters that contribute most strongly to the payoff, but these parameters are not known in advance.

The additional per parameter options for Grid are:

Number of steps is an positive integer between 2 and 101.This is the number of steps between min and max that will be taken for the parameter. The default is 11.

Tolerance is a positive number (bigger than 1e-12). It is used only if Bisecting is selected and determines when the bisecting process will stop for this parameter.

Differential Evolution

Differential evolution (DE) is a population-based evolutionary method for real-valued global optimization that creates new solutions using rapidly computed differences in existing solutions (Price, K.V., Storn, R.M., and Lampinen, J.A., 2005. Differential evolution: A practical approach to global optimization. Springer-Verlag, Berlin, Germany and Storn, R., Price, K., 1997. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4) 341-359). This enables DE to follow the contours of the fitness landscape at less computational cost than other methods.

DE uses a real-valued representation and operates by combining existing solutions with weighted difference vectors formed from other solutions. By using weighted differences of existing solutions, it automatically adapts its step size and its orientation as convergence occurs, shifting from a global search (exploration) to a local search (exploitation) method (Price et al., 2005). It tends to converge to the global optimum faster (in fewer fitness function evaluations) than other real-valued approaches.

There are a number of DE variants, named according to the template DE/mutation/diffs/crossover where mutation is the method used to mutate the parent vector, diffs is the number of pairs of difference vectors used in mutation (i.e., how many differences, usually only 1 or 2), and crossover, if present, is the method used to cross the new vector with the parent vector.

Multiobjective optimization uses fast non-dominated sorting to select values on the non-dominated front. It includes several other improvements designed to speed convergence to the global optimum and give a uniformly-spaced non-dominated front. For more information, see: Chichakly, K. J., & Eppstein, M. J. (2013, July). Improving uniformity of solution spacing in biobjective evolution. In Proceedings of the 15th annual conference companion on Genetic and evolutionary computation (pp. 87-88). ACM.

Mutation Type

DE has specific names for the members of the parent population used in mutation. The parent, i.e., the member of the population that will be replaced if the child is better, is known as the target vector. The member of the population that is being mutated to form the child is known as the base vector. Finally, the members used to create the difference (perturbation) for the mutation are known as the difference vectors. This implementation supports the five standard types of DE mutation:

rand: Favors exploration with the base and difference vectors randomly chosen. When combined with binomial crossover, this is known as classic DE.

best: Favors exploitation with the base vector always being the best solution from the parent population, modulated by randomly chosen difference vectors. Compared to rand, it “usually speeds convergence, reduces the odds of stagnation, and lowers the probability of success,” (Price et al., 2005, p. 73) where success refers to converging to the global optimum rather than a local one.

target-to-rand: Compromise between exploration and exploitation leaning toward exploration with the base vector always being the target vector, modulated by both randomly chosen difference vectors and the difference between a randomly chosen vector and the target vector.

target-to-best: Compromise between exploration and exploitation leaning toward exploitation with the base vector always being the target vector, modulated by both randomly chosen difference vectors and the difference between the best solution from the parent population and the target vector.

either-or: Alternates between DE/rand/1 and a recombinant version of DE/rand/2 based on a probability. The main motivation behind this formula is to fill discovered gaps in the mutation/recombination space (Price et al., 2005). It is considered more effective than rand when the payoff functions cannot be decomposed.

Note It is usually best to start with rand as the mutation type.

Crossover Type

Use the Crossover Type to specify how the characteristics of the next generation population are derived from the current generation.

bin: A separate binomial trial is performed for each gene, with the child gene replacing the target vector gene only if the uniformly drawn random number is below the crossover probability. This is also known as uniform crossover and is the method used in Classic DE.

exp: Using an exponential distribution and starting at a random gene, genes are copied from the child to the target as long as uniformly drawn random numbers remain below the crossover probability.

Note At least one gene is always copied with the above methods to ensure the new potential solution is not the same as the base vector.

Note Use bin unless you have a specific need for exp.

Minimization, maximization, constraints and payoff definitions

The differential evolution method allows constrained multicriteria optimization. When you activate a payoff you can set the optimizer to minimize it, maximize it, or treat it as a constraint.

Note Using negative payoff weights and maximize will give the same results as using positive payoff weights and minimize.

Generations specifies the number of generations to run the evolutionary algorithm. Keep in mind that (generations + 1)*population_size is the total number of times that your model will be run and plan accordingly. There is always a trade-off between generations, population size, and convergence.

Population size specifies the number of individuals to include in each generation. For single-objective problems, this can often be small, say 5 or 10. As you increase the number of objectives, you need to quickly increase this. For example, while 5 may be enough for one objective, you may need a minimum of 50 for two objectives and 500 for three objectives. This affects both convergence speed and the number of values on the final non-dominated front.

Tolerance is optional and specifies a convergence criteria that enables the optimization to terminate before the specified number of generations if the relative change in the mean between successive generations for all objectives is below this value.

Scaling factor specifies the mutation scaling factor applied to difference vectors. It is typically in the range of 0 to 1, though some problems are solved more readily with values up to 2. Smaller values favor exploration and larger values favor exploitation. The default value is typical.

Crossover prob is the crossover probability used with binomial or exponential crossover. The best results occur in the range of 0 to 0.2 or 0.9 to 1 (Price et al., 2005). Lower values favor exploitation while larger values favor exploration. The default value is a good place to start.

K/rand prob is optional and specifies either the recombination scaling factor for mutation types target-to-rand and target-to-best or the probability of selecting rand (mutation) over the recombinant version rand/2 in mutation type either-or. It is not used by any other mutation type. If not specified, the default recombination scaling factor is the same as the mutation scaling factor and the probability of choosing rand is set to 0.5. These are both good starting places, so, in general, leave this blank.

Seed is the random number seed used for the optimization. Set this if you want to be able to replicate your results at a later date. Leave it set to 0 to randomly choose a different seed each time the optimization is run.

Stepper

Stepper is a very simple optimization that steps multiples of the specified parameter tolerances and completes when the step size is decreased to the parameter tolerance. The method is useful because it tends to converge relatively quickly and in a predicable manner. The quick convergence is dependent on good selections for both the parameter and payoff tolerances and can easily be premature for payoffs with large flat areas. This makes the method a good choice when performing additional starts.

The additional global options for Stepper are:

Step Multiplier is an integer between 2 and 512. It specified how big an initial step to attempt relative to the individual parameter tolerances. The default value is 32. If the step is too big (in that it does not improve the payoff) the multiplier is cut in half. When the multiplier hits 1, the method stops. The default value for this is 32. It does not have to be a power of 2, but given the way the method works that makes the most sense.

Payoff Tolerance is a positive number (bigger than 1e-8). It specifies the change in the payoff that is not significant when an individual parameter is changed by its tolerance. If changing a parameter falls below this threshold, that parameter is dropped from the stepping process. If no parameter has a significant effect on the payoff the method stops. This value needs to be appropriate for the payoff being used. The default value of 0.01 is arbitrary.

The additional per parameter options for Stepper are:

Tolerance is a positive number (bigger than 1e-12). It specifies the smallest change in a parameter that is significant. Changes smaller than this will not be considered.