Subjects: Evolutionary computation. Information technology–Mathematics.1
A Smörgåsbord of Search Algorithms
In their chapter 3.8 "A Smörgåsbord of Search Algorithms", DEM presentTable 3.2. A list of some search algorithms. | |
---|---|
|
|
- Knuth definition of a search covers only a finite search-space:
"In general, we shall suppose that a set of N records has been stored, and the problem is to locate the appropriate one. As in the case of sorting, we assume that each record includes a special field called its key."
- some of the methods were developed after 1973, the year Knuth's book was published according to DEM
method | source | search problem |
---|---|---|
active set method | J. Nocedal and S. Wright, Numerical Optimization (Springer Science & Business Media, 2006). | optimization problem: $\min_{x \in \mathbb{R}^n}f(x)$ subject to $\begin{cases}c_i(x) = 0, &i \in \mathcal{E}\\c_i(x) \ge 0,& i \in \mathcal{I} \end{cases}$ |
adaptive coordinate descent | I. Loshchilov, M. Schoenauer, and M. Sebag, “Adaptive Coordinate Descent.” In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (ACM, 2011), pp. 885–892. | (separable) continuous optimization problems |
alpha–beta pruning | Donald E. Knuth and Ronald W. Moore, “An analysis of alpha-beta pruning.” Artif Intel, 6(4), pp. 293–326 (1976). | "searching a large tree of potential continuations" (p. 234) |
ant colony optimization | M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by a colony of cooperating agents.” IEEE Transactions on Systems, Man, and Cybernetics — Part B, 26(1), pp. 29–41 (1996). | stochastic combinatorial optimization (here) |
artificial immune system optimization | Leandro N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach (Springer, 2002), pp. 57– 58. | |
auction algorithm | Dimitri P. Bertsekas, “A distributed asynchronous relaxation algorithm for the assignment problem.” Proceedings of the IEEE International Conference on Decision and Control, pp. 1703–1704 (1985). | |
Berndt–Hall–Hall–Hausman algorithm | Ernst R. Berndt, Bronwyn H. Hall, Robert E. Hall, and Jerry A. Hausman, “Estimation and inference in nonlinear structural models.” Annals of Economic and Social Measurement, 3(4), pp. 653–665 (1974). | non-linear least squares problems |
blind search | ||
branch and bound | Patrenahalli M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection.” IEEE Transactions on Computers, 100(9), pp. 917–922 (1977). | |
branch and cut | M. Padberg and G. Rinaldi, “A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems.” SIAM Rev, 33(1), pp. 60–100 (1991). | |
branch and price | Cynthia Barnhart, Ellis L. Johnson, George L, Nemhauser, Martin W.P. Savelsbergh, and Pamela H. Vance, “Branch-and-price: Column generation for solving huge integer programs.” Operations Research, 46(3), pp. 316–329 (1998). | |
Broyden–Fletcher–Goldfarb–Shanno (BFGS) method | J. Nocedal and Stephen J. Wright, Numerical Optimization, 2nd edition (Springer-Verlag, Berlin, New York, 2006). | |
Constrained optimization by linear approximation (COBYLA) | Thomas A. Feo and Mauricio G.C. Resende, “A probabilistic heuristic for a computationally difficult set covering problem.” Op Res Lett, 8(2), pp. 67–71 (1989). | |
conjugate gradient method | A.V. Knyazev and I. Lashuk, “Steepest descent and conjugate gradient methods with variable preconditioning.” SIAM J Matrix Anal Appl, 29(4), pp. 1267–1280 (2007). | linear system with a real symmetric positive definite matrix of coefficients A |
CMA-ES (covariance matrix adaptation evolution strategy) | Y. Akimoto, Y. Nagata, I. Ono, and S. Kobayashi. “Bidirectional relation between CMA evolution strategies and natural evolution strategies.” Parallel Problem Solving from Nature, PPSN XI, pp. 154–163 (Springer, Berlin Heidelberg, 2010). | |
criss-cross algorithm | Dick den Hertog, C. Roos, and T. Terlaky, “The linear complimentarity problem, sufficient matrices, and the criss-cross method.” Linear Algebra Appl, 187, pp. 1–14 (1993). | |
cross-entropy optimization | R.Y. Rubinstein, “Optimization of computer simulation models with rare events.” Eur J Ops Res, 99, pp. 89–112 (1997). R.Y. Rubinstein and D.P. Kroese, The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning (Springer-Verlag, New York, 2004). | given a set $\mathcal{X}$, and an $\mathbb{R}-$valued function on $\mathcal{X}$, determine $\max_{\textbf{x} \in \mathcal{X}} S(\textbf{x})$ (here p.4) |
cuckoo search | X.S. Yang and S. Deb, “Cuckoo search via Lévy flights.” World Congress on Nature & Biologically Inspired Computing (NaBIC 2009). IEEE Publications, pp. 210–214. arXiv:1003.1594v1. | |
Davidon’s variable metric method | W. C. Davidon, “Variable metric method for minimization.” AEC Research Development Rept. ANL-5990 (Rev.) (1959). | |
differential evolution | P. Rocca, G. Oliveri, and A. Massa, “Differential evolution as applied to electromagnetics.” Antennas and Propagation Magazine, IEEE, 53(1), pp. 38–49 (2011). | |
eagle strategy | Xin-She Yang and Suash Deb, “Eagle strategy using Lévy walk and firefly algorithms for stochastic optimization.” Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) (Springer Berlin Heidelberg, 2010), pp. 101–111. | |
evolutionary programs | Jacek M. Zurada, R.J. Marks II and C.J. Robinson; Editors, Computational Intelligence: Imitating Life (IEEE Press, 1994). M. Palaniswami, Y. Attikiouzel, Robert J. Marks II, D. Fogel, and T. Fukuda; Editors, Computational Intelligence: A Dynamic System Perspective (IEEE Press, 1995). | |
evolutionary strategies | ||
exhaustive search | ||
Fibonacci search | David E. Ferguson, “Fibonaccian searching.” Communications of the ACM, 3(12), p. 648 (1960). J. Kiefer, “Sequential minimax search for a maximum.” Proceedings of the American Mathematical Society, 4(3), pp. 502–506 (1953). | |
firefly algorithm | Xin-She Yang, “Firefly algorithms for multimodal optimization.” In Stochastic Algorithms: Foundations and Applications (Springer Berlin Heidelberg, 2009), pp. 169–178. | |
Fletcher–Powell method | R. Fletcher and M.J.D. Powell, “A rapidly convergent descent method for minimization.” Computer J. (6), pp. 163–168 (1963). | |
genetic algorithms | David E. Goldberg, Genetic Algorithms in Search Optimization and Machine Learning (Addison Wesley, 1989). R. Reed and R.J. Marks II, “Genetic Algorithms and Neural Networks: An Introduction.” Northcon/92 Conference Record (Western Periodicals Co., Ventura, CA, Seattle WA, October 19–21, 1992), pp. 293–301. | |
glowworm swarm optimization | K.N. Krishnanand and D. Ghose. “Detection of multiple source locations using a glowworm metaphor with applications to collective robotics.” Proceedings of the 2005 IEEE Swarm Intelligence Symposium (SIS 2005), pp. 84–91 (2005). | |
golden section search | A. Mordecai and Douglass J. Wilde. “Optimality proof for the symmetric Fibonacci search technique.” Fibonacci Quarterly, 4, pp. 265–269 (1966). A. Mordecai and Douglass J. Wilde. “Optimality proof for the symmetric Fibonacci search technique.” Fibonacci Quarterly, 4, pp. 265–269 (1966). | |
gradient descent | Jan A. Snyman, Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms (Springer Publishing, 2005). | constrained optimization problem $minimize_{w.r.t. \mathbf{x}}f(\mathbf{x})$, $\mathbf{x} = [x_1, x_2,\ldots, x_n]^T \in \mathbb{R}^n$ subject to $\begin{cases}g_j(\mathbf{x}) \le 0, & j=1,2, \ldots, m \\ h_j(\mathbf{x})=0,& j=1,2,\ldots,r\end{cases}$ |
great deluge algorithm | Gunter Dueck, “New optimization heuristics: the great deluge algorithm and the record-to-record travel.” J Comp Phys, 104(1), pp. 86–92 (1993). | |
harmony search | Zong Woo Geem, “Novel derivative of harmony search algorithm for discrete design variables.” Applied Mathematics and Computation, 199, (1), pp. 223–230 (2008). | |
imperialist competitive algorithm | Esmaeil Atashpaz-Gargari and Caro Lucas, “Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition.” 2007 IEEE Congress on Evolutionary Computation (CEC 2007), pp. 4661– 4667 (2007). | |
intelligent water drop optimization | Shah-Hosseini Hamed, “The intelligent water drops algorithm: a natureinspired swarm-based optimization algorithm.” Int J Bio-Inspired Comp, 1(1/2), pp. 71–79 (2009). | |
Karmarkar’s algorithm | Karmarkar Narendra, “A new polynomial-time algorithm for linear programming.” Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, pp. 302–311 (1984). | |
Levenberg–Marquardt algorithm | Kenneth Levenberg, “A Method for the Solution of Certain Non-Linear Problems in Least Squares.” Quart App Math, 2, pp. 164–168 (1944). | |
Linear, Quadratic, Integer and Convex Programming | Alexander Schrijver, Theory of Linear and Integer Programming (John Wiley & Sons, 1998). Yurii Nesterov, Arkadii Nemirovskii, and Yinyu Ye, “Interior-point polynomial algorithms in convex programming.” Vol. 13. Philadelphia Society for Industrial and Applied Mathematics (1994). | given a subset $\Pi \subset \Sigma^* \times \Sigma^*$, where $\Sigma$ is some alphabet, then the search problem is: given string $z \in \Sigma^*$, find a string $y$ such that $(z,y) \in \Pi$, or decide that no such string $y$ exists. |
Nelder–Mead method | K.I.M. McKinnon, “Convergence of the Nelder–Mead simplex method to a non-stationary point.” SIAM J Optimization, 9, pp. 148–158 (1999). | |
Newton–Raphson method | E. Süli and D. Mayers, An Introduction to Numerical Analysis (Cambridge University Press, 2003). | |
one-at-a-time search | A.H. Boas, “Modern mathematical tools for optimization,” Chem Engrg (1962). | |
particle swarm optimization | J. Kennedy and R. Eberhart, “Particle Swarm Optimization.” Proceedings of IEEE International Conference on Neural Networks IV, pp. 1942–1948 (1995). | |
pattern search | A. W. Dickinson, “Nonlinear optimization: Some procedures and examples.” Proceedings of the 19th ACM National Conference (ACM, 1964), pp. 51–201. | |
POCS (alternating projections onto convex sets) | Robert J. Marks II, Handbook of Fourier Analysis & its Applications (Oxford University Press, 2009). | |
razor search | J.W. Bandler and P.A. Macdonsdd, “Optimization of microwave networks by razor search.” IEEE Trans. Microwave Theory Tech., 17(8), pp. 552–562 (1969). | |
Rosenbrock methods | H.H. Rosenbrock, “An automatic method for finding the greatest or least value of a function.” Comp. J., 3, pp. 175–184 (1960). | |
sequential unconstrained minimization | ||
technique (SUMT) | John W. Bandler, “Optimization methods for computer-aided design.” IEEE Transactions on Microwave Theory and Techniques, 17(8), pp. 533–552 (1969). | |
shuffled frog-leaping algorithm | Muzaffar Eusuff, Kevin Lansey, and Fayzul Pasha, “Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization.” Engineering Optimization, 38(2), pp. 129–154 (2006). | |
simplex methods | M.J. Box, “A new method of constrained optimization and a comparison with other methods.” Computer J., (8), pp. 42–52 (1965). J.A. Nelder and R. Mead, “A simplex method for function minimization.” Computer J., 7, pp. 308–313 (1965). | |
simulated annealing | S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by simulated annealing.” Science, 220(4598), pp. 671–680 (1983). | |
social cognitive optimization | X.-F. Xie, W. Zhang, and Z. Yang, “Social cognitive optimization for nonlinear programming problems.” Proceedings of the First International Conference on Machine Learning and Cybernetics, 2, pp. 779–783 (Beijing, 2002). | |
stochastic gradient search | James C. Spall, Introduction to Stochastic Search and Optimization (2003). | Find the value(s) of a vector $\mathbf{\theta} \in \Theta$ that minimize a scalar-valued $loss function$ $L(\mathbf{\theta})$ or: Find the value(s) of $\mathbf{\theta} \in \Theta$ that solve the equation $\mathbf{g}(\mathbf{\theta}) = \mathbf{0}$ for some vector-valued function $\mathbf{g}(\mathbf{\theta})$ |
stochastic hill climbing | Brian P. Gerkey, Sebastian Thrun, and Geoff Gordon, “Parallel stochastic hillclimbing with small teams.” Multi-Robot Systems. From Swarms to Intelligent Automata, Volume III, pp. 65–77. (Springer Netherlands, 2005). | |
Tabu search | F. Glover, “Tabu Search — Part I.” ORSA J Comput, 1(3), pp. 190–206 (1989). “Tabu Search — Part II”, ORSA J Comput, 2(1), pp. 4–32 (1990). | |
Tree search | Athanasios K. Sakalidis, “AVL-Trees for Localized Search.” Inform Control, 67, pp. 173–194 (1985). R. Seidel and C.R. Aragon, “Randomized search trees.” Algorithmica, 16(4–5), pp. 464–497 (1996). | |
Zionts–Wallenius method | S. Zionts and J. Wallenius, “An interactive programming method for solving the multiple criteria problem.” Manage Sci, 22(6), pp. 652–663 (1976). |
But there are some textbooks, too: I tried to look them up and to quote what the authors of those define as their search problem. Nocedil and Snyman both describe the classical optimization problem: here, the search space is a subset of an $n$-dimensional vector space $V$ over $\mathbf{R}$ - as $V$ is restricted by some (in)equations. Finding the target means minimizing an $R$-valued function on this set.
On the other hand, Macready and Wolpert - in their classical paper "No Free lunch Theorems for Optimization"1 - look at two finite sets $\mathcal{X}$ and a sortable $\mathcal{Y}$ and wish to minimize a function $f: \mathcal{X} \rightarrow \mathcal{Y}$ by finding the $x \in \mathcal{X}$ such that $f(x)$ is minimal.
What have all these approaches in common? A set to search on and a function which can be optimized. In most cases, the range of the function is $\mathbb{R}$, $\mathbb{Z}$, or $\mathbb{N}_0$, but for some problems (as mentioned in section 3.7.2. Pareto optimization and optimal sub-optimality of Introduction into Evolutionary Informatics), another partially ordered set will be used. I will ignore this for the time and just choose an ordered set for my definition of a optimization problem which should cover virtually all the cases discussed above:
As said before, optimizing and searching are virtually the same, but to stress the character of a search I introduce a target - something, which is mentioned in all the searches of DEM. So, my search problem is:General Optimization Problemgiven:find $x \in \Omega$ such that $f(x) = \min f$
- a set $\Omega$
- an ordered set $\mathcal{Y}$ and
- a function $f: \Omega \rightarrow \mathcal{Y}$
Nothing substantial has changed, the definition just became a little more verbose. I am quite sure that most authors of the papers on the table would accept this as a good attempt of a definition - but is it the search problem which DEM have in mind?General Search Problemgiven:find $x \in T$
- a set $\Omega$
- an ordered set $\mathcal{Y}$
- a target $T \subset \Omega$ and
- a function $f: \Omega \rightarrow \mathcal{Y}$ such that $T = \{\tau \in \Omega | f(\tau)=\min f \}$
On page 48, they provide an example of a search credited to Walter Bradley:
Kirk is an armadillo foraging for grubs when he is bitten by a spider that makes him blind. Kirk wants to return to his armadillo hole, but is disoriented. He knows, though, that his hole is at the lowest elevation in the immediate area, so he balls up and rolls downhill to his hole. When Kirk does this, he is not explicitly seeking his hole. His surroundings are fortuitously designed to take him there. Kirk’s target is thus implicit in the sense it is not specifically sought, but is a result of the environment’s action on him. He can bounce off of trees and be kicked around by playful kids. And repeated trials of rolling down the hill might take drastically different paths. But ultimately, Kirk will end up in his hole at the bottom of the hill. Kirk reaches his home because of information he acquires from his environment. The environment must be designed correctly for this to happen.Here, $\Omega$ is Kirk's habitat, $f$ is given as the elevation. What is surprising is that DEM make a distinction between the minimum of the function $f$ and Kirk's intended target $T$, his borrow hole. Luckily, both coincide, but DEM imply that this is not necessarily the case!
Next, they revisit their "pancake search example": here, the taste of the pancake as a function depends smoothly on a variety of factors like amount of ingredients, oven temperature, baking time, etc. - the possible combinations of which make up $\Omega$. On this $\Omega$, a cook looks for the best taste by optimizing the taste function. Now, they restrict $\Omega$ by additional conditions to $\Omega'$, such that the original extreme of $f$ does not lie in the new restricted set.
For the definitions of optimization/search problem above, this does not pose a problem: there is now the set $\Omega'$ to search on, looking for the optimum of $f|_{\Omega'}$. Though the new solution will taste worse than the original one, the new target is the solution of the new restricted problem.
Not so for DEM: "If, however, the environment is constrained in a negative way, the target may never be found even if it was available prior to the alteration."
That is the great difference between the problems which all other scientists discuss and the ones of DEM: DEM have decoupled the optimum of the function and the target, arriving quite another version of a search problem:
DEM's Search Problemgiven:find $x \in T$
- a set $\Omega$
- an ordered set $\mathcal{Y}$
- a target $T \subset \Omega$ and
- a function $f: \Omega \rightarrow \mathcal{Y}$
The Problems with DEM's Search Problem
First there is of course the problem of applicability: it is not clear how any of DEM's results is relevant for the problems in the table as those concern fundamentally different problems.Then there is a problem of procedure: for an algorithm for a search or optimization, generally some information about $\Omega$ is given and (a finite number of) values of $f$ can be obtained. If $T$ is independent of $f$, how is it ever possible to say that a target was hit? This additional information can only be given ex cathedra afterwards!
Not every one of the search algorithms stated in the table will always identify the target, but in many cases, this is possible - at least theoretically: if possible, an exhaustive search will always give you the target. Not so for DEM: even if you have calculated $f$ for all elements of $\Omega$ and found the optimum, this does not have to be the intended target which still has to be revealed.
Why do DEM use their definition?
I would like to answer this question using Dawkins's weasel. Then- $\Omega$ is the set of strings consisting from 28 letters chosen from the alphabet ABCDEFGHIJKLMNOPQRZ plus * as a sign indicating a space
- $T=$METHINKS*IT*IS*LIKE*A*WEASEL
- $f$ is given by the number of correct letters - a number from 0 to 28.
- My answer would be fantastic: if I*REALLY*DO*NOT*LIKE*WEASELS is the target, then it is the optimum of $f$, so $f$ for finding this phrase is the number of common letters with I*REALLY*DO*NOT*LIKE*WEASELS...
- DEM's answer would be abysmal: though the target is I*REALLY*DO*NOT*LIKE*WEASELS, $f$ still is defined as the number of common letters with METHINKS*IT*IS*LIKE*A*WEASEL. The algorithm would result in METHINKS*IT*IS*LIKE*A*WEASEL
How does DEM's search problem work for evolution?
Some interesting characters make a cameo in DEM's textbook: not only Bob and Monica, the pirates X, Y, and Z, Melody and Maverick, but also God and Abraham. In this spirit I would like to invent a dialogue between God and Darwin's Bulldog:Bulldog: | "The horse is a marvellous creature: fully adapted to its niche, really, survival of the fittest at play" |
God: | "Oh know, that one is a total disaster - it may function better than any other creature in its environment, but I was aiming for pink unicorns" |
Perhaps these are all strawmen?
It could be that I have erected an elaborate strawman, and that the search problem which I attributed to DEM has nothing to do with their ideas. In this case, it should be easy for DEM - or their apologists - to come forward with their definition. Or perhaps - if I am right - they may just wish to explain why their model is not horrible.1. Still thankful for the nice header, Tom!↩
2. Obviously, the work is not completed yet. I will look up more in the future - and I will be grateful for any contribution to this project!↩
3. David H. Wolpert, William G. Macready "No Free Lunch Theorems for Optimization", IEEE Transactions on Evolutionary Computation Vol. 1, No. 1, April 1997, p. 68↩
Hi DiEb,
ReplyDeleteTwo areas where target sets are detached from objective function extrema are surrogate functions (i.e., when you want to optimize one function, but it is intractable, so you settle for a surrogate whose extrema are hopefully close to the true target), and unreliable objective functions (such as noisy evaluations, which means you may actually be optimizing towards the wrong target). These are ever present issues in applied data science and I'm sure you can think of other cases without too much effort. So, allowing different possible relationships between target sets and information resources like objective functions makes sense. If nothing else, it is a more flexible model that includes the usual state of affairs where the target set is fixed to be the minimum/maximum of the function as a special case. What matters, in the end, is the degree of dependence between the target set and the objective function, namely, how much information about the target can you gain by exploring the objective function?
Since DEM have evolutionary concerns in mind, it also makes sense that their targets need to be defined independently from the objective function, or else when we say "Evolution can find functional small targets in large spaces, such as eyes, echolocators, wings, and flagella" we are essentially making a tautological statement, because the targets of evolution are defined simply to be the things it produces. Of course evolution can produce what it produces. However, these targets are identifiable independently of the fitness functions of nature, and there is no a priori guarantee that such fitness functions will have maxima that coincide with these targets (and have smooth monotonically increasing fitness slopes leading up to them). If you want to say the targets are exactly those places where the fitness functions are maximized, there is no theoretical guarantee that the targets will exactly overlap with the set of living things. Unless you can show otherwise. I'm open to being persuaded.