## Sunday, October 11, 2009

### Evolutionary Algorithms in Conservation of Information in Search

work in progress :-)

In their paper Conservation of Information in Search: Measuring the Cost of Success, William A. Dembski, Senior Member, IEEE, and Robert J. Marks II, Fellow, IEEE present three algorithms which could loosely be described as evolutionary. Idiosyncratically, they are labeled
1. Partitioned Search
2. Optimization by Mutation, and
3. Optimization by Mutation With Elitism
These procedures are discussed section II, part E and F (pp. 1055-1057), and some additional math is done in the appendix.

#### The algorithms and the examples given

The three algorithms are introduced or illustrated with an example, for which the parameters can be found in this table:
Algorithmclassalphabetlength of targetMutationfitness function
Partitioned Searchnone{*,A,B,...Z} (27 el.)28every incorrect letter is mutatedposition of correct letters
Optimization by Mutation(1,2) ES{0,1} (2 el.)100each letter is flipped with probability µ=0.00005number of correct letters
Optimization by Mutation With Elitism(1+1) ES{0,1} (2 el.)131*exactly one letter is flippednumber of correct letters

*Why 131b for the last algorithm? I could understand 133 ~ 28* log227, but 131?

#### References

Four references are given in the section under discussion:
• [3] M. Aigner, Discrete Mathematics. Providence, RI: Amer. Math. Soc., 2007.
• [12] R. Dawkins, The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe Without Design. New York: Norton, 1996.
• [32] D. J. C.MacKay, Information Theory, Inference and Learning Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2002.
• [34] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill, 1991, pp. 537–542.
The first one is unproblematic - though superfluous: it is given as a reference for the Lth harmonic number. The three other references are either problematic or at least indicating sloppy work:

1. Partitioned search [12] is a “divide and conquer” procedure best introduced by example. Richard Dawkins doesn't use the expression "partitioned search" in his book.

2. When μ << 1, the search is a simple Markov birth process [34] that converges to the target. The page numbers of the reference don't fit: it should be pp. 647-650, not pp. 537–542. But at least, page numbers are given!

3. To illustrate, consider a single-parent version of a simple mutation algorithm analyzed by McKay [32].David MacKay analyzes explicitly multi-parented algorithms, there is not much of a resemblance with Dembski's and Marks's procedure. And annoyingly, again, page numbers are omitted. MacKay's book has over 600 pages, only a few in Chapter III, 19 : Why have Sex? Information Acquisition and Evolution are dedicated to evolutionary algorithms.

But one has to be grateful for small things: At least these references weren't garbled like
[7] S. Christensen and F. Oppacher, “What can we learn from no free lunch? A ﬁrst attempt to characterize the concept of a searchable,” in Proc. Genetic Evol. Comput., 2001, pp. 1219–1226.

Of course, it's about the concept of a searchable function. Not the first time, R. Marks and W. Dembski got that wrong.
Ann.: Tom English explained that errors in references are common in the engineering literature. So, I'll stop classifying the references of Dembski's and Marks's article as either superfluous or erroneous for now.

1. I misread "Optimization by Mutation With Elitism," with help from the authors. The section begins, "Optimization by mutation with elitism is the same as optimization by mutation in Section III-F2 with the following [one?] change. One mutated child is generated. If the child is better than the parent, it replaces the parent." I said to myself, "OK, they've just gone from (1,2) to (1+1)." I recall seeing "Markov birth process," and stopping to consider whether that term is correct for a discrete-time process with multiple births possible in a single time step. When I returned to the paper, I evidently skipped over the last sentence of the paragraph, which gives a second change: "We will also assume that there is a single bit flip per generation." To echo your prior post, this truly is annoying.

I intentionally skip over irrelevancies in D&M's papers, not caring to spend my time determining just how bad they really are. There is no need for D&M to analyze the (1+1)-ES. The solution for the exact probability that the algorithm solves the ONEMAX problem in n or fewer steps appears in the literature. If D&M had time to waste embedding the "mas evolutionary informatics" plug in their paper, they had time to spend acquainting themselves with the theory of evolutionary computation.

For the conventional (1+1)-ES, the random number of bit-flips in the offspring is binomially distributed. D&M implicitly stipulate that the mutation rate corresponds to a mean of one bit-flip per offspring, and replace the random number of bit-flips with the mean. (The substitution is infamous, and I believe that it should always be justified.) Dembski obviously had to get his pathetic idée fixe into the peer-reviewed literature.

2. It appears that other folks would like to discuss the article here.

3. AFAIK, the classification of the ES is quite independent of the way of recombination or mutation.

The one-letter-per-string-mutation is surprising: it's the same as in the first weasel algorithm which Oxfodensis mailed to Dembsky...