Saturday, October 3, 2009

Which parameter should Dembski have used?

In the previous post I talked about Dawkins's weasel algorithm as understood by virtually everyone. Even William Dembski and Robert Marks describe something similar in their paper Conservation of Information in Search: Measuring the Cost of Success. But it's not the algorithm Partioned Search introduced on p. 1055 (and claimed by William Dembski to represent Dawkins's weasel), it's an algorithm called Random Search of p. 1056. The differences to the weasel of Dawkins:
1. the alphabet consists of two letters, not 27.
2. the target length is 100, not 28
3. the size of a generation is quite low: it's two
4. the mutation probability per letter is very small: µ = 0.00005
The value S=2 for the population size and µ = 0.00005 for the effective mutation rate are chosen to be able to do a little math - but they result in quite a great number of generations (~55,000) and queries (~110,000) on average.
Which values would a real programmer have used? Fooling around with his program, tweaking the parameter S and µ a little bit, he would soon have found out, that he should alter the original values considerably:

The greater the size of the population, the greater the optimal size of the mutation rate.

The smallest number of queries (1576) on average is achieved with S = 9 and µ = 0.009. Any practitioner would have come up with values near to these, as they make the program considerably faster. Sadly, these values doesn't allow for an application of Dembski's and Marks's elegant estimation of the number of correct letters as a function of generations.

What's interesting: While Dawkins's weasel had a range of population sizes which preferred virtually the same mutation probability, this doesn't happen in this realization. We'll see that this effect depends mainly on the size of the alphabet...