Sunday, September 27, 2009

The Original Weasels?

W. Dembski introduces two programs in his post The original weasels at his blog Uncommon Descent. These programs were sent to him by someone called Oxfordensis, and W. Dembski states
These are by far the best candidates [for being Dawkins's original program] we have received to date.

Let's have a short look:

Weasel 1

Here's how this program works for the target "METHINKS IT IS LIKE A WEASEL":
  1. Chose random string
  2. copy string 100 times, each time changing exactly one letter at random
  3. select string with most correct letters in place
  4. is the number of correct letters 28? Then STOP. Otherwise:
  5. goto 2
The mutation procedure is interesting: While most programs I've seen mutate the parent string by changing each letter with a certain probability, here, exactly one letter is changed - but I've seen implementations of this version, too. The letter-based mutation seems to be more natural, while the other one has mathematical advantages: the neighbourhood of a string s, i.e., the strings s can mutate into, becomes small: it isn't the whole set of size 2728 ≈ 1.20 * 1040 anymore, but only of size 27 * 28 = 756...
Another advantage: you only have to call your random generator twice for each child, while a straight-forward implementation of the letter-mutating process take (1+μ)*28 of these costly calls per child - though this number can be reduced to (1+28*μ) calls per child....
So, one parameter less to tweak: though the number of children is set to 100 in this program, this can be changed easily.

But which value to chose for the remaining parameter - the size of the population? The bigger the size of the population, the less generations (black line) it takes to complete the search. So, make it big? No, not necessarily: commonly, you want your program to be fast. Therefore, you should try to reduce the number of queries (blue line), i.e, the number of mutated strings - which is the number of evaluations of the fitness function, too. Mutating at random, and calculating the fitness function, that's what's taking time.
I suppose if you fool around with the program for a while, you'll quickly find out that the optimal size of generations in the sense above is about 50 - and an exact calculation leads to an average of 3931 queries for a population size of 49 as an optimum (σ = 648.04).

Weasel 2

Well, this beast is boring. BORING. Look how it works for the target "METHINKS IT IS LIKE A WEASEL":
  1. Chose random string
  2. make new string by copying old string one time, changing exactly one letter at random
  3. chose string from old and new string with most correct letters in place
  4. is the number of correct letters 28? Then STOP. Otherwise:
  5. goto 2

It's just a randomized version of the hangman game, taking 2969 steps on average (σ = 959.2). And it's what W. Dembski and R. Marks call Optimization by Mutation With Elitism in their current paper:
3) Optimization by Mutation With Elitism: Optimization by
mutation with elitism is the same as optimization by mutation
in Section III-F2 with the following change. One mutated
child is generated. If the child is better than the parent, it
replaces the parent. If not, the child dies, and the parent
tries again. Typically, this process gives birth to numerous
offspring, but we will focus attention on the case of a single
child. We will also assume that there is a single bit flip per


Could these be the programs used by R. Dawkins in his book The Blind Watchmaker? The first one would fit his algorithm, and with a population size, it would take 47.6 generations on average (σ= 11.7). And what's about the BBC video? If the second version was used for this, I'd regard it at best as a visual effect, as it doesn't match Dawkins's algorithm. And yes, it is (implicitly) latching :-)

Wednesday, September 23, 2009

Random Mutation

In their paper Conservation of Information in Search: Measuring the Cost of Success William Dembski and Robert Marks present an algorithm which resembles Dawkins's weasel (the algorithm I talked about before - at length :-)
It isn't the algorithm Partioned Search introduced on p. 1055, it's the algorithm Random Mutation on p. 1056. The differences
  1. the alphabet consists of two letters, not 27.
  2. the target length is 100, not 28
  3. the size of a generation is quite low: it's two
  4. the mutation probability per letter is very small: µ = 0.00005
Many programmers have implemented weasel-like programs, and the runs I've seen usually use a µ between 0.04 and 0.05, while the generation size is greater than 20.

So, why do W. Dembski and R. Marks use their values? Well, in the appendix of their paper, they calculated an approximation for the number of correct letters depending on the number of generations. This approximation discards terms quadratic in µ, so to use it, not only has µ << 1, but the size of a generation has to be small, too.

But should they use these values? I don't think so: As usual, I modeled their algorithm as a Markov chain and could calculate some exact values. The expected number of generations to get to the target for their choice of parameters is 55,500. (see pic 1)

Some fooling around with their program should have shown that this number can be reduced dramatically by using an more appropriate value for µ. The optimum lies near to µ ~ 0.0005, which takes only 10,600 generations.

And though their approximation doesn't work as fantastically as for the parameter ten times smaller, the numbers are not more than 2.5% off. (see pic 2)

Saturday, September 19, 2009

A New Weasel

At Uncommon Descent, W. Dembski presents us with a new weasel: Someone with the name Oxfordensis programmed years ago two Pascal versions.
The interesting aspect: In contrast to the usual weasels - as discussed in the earlier posts - not every letter is changed with a certain probability in this program, but exactly one letter is changed in each child. Here, the word change is used loosely, as with a probability of 1/27, no letter is changed at all.
Mathematically, this is easier to handle: the transition matrix of the corresponding Markov process gets sparser...
And you have only one parameter which can be changed: the size of the generation.
If I had any reader he may ask himself: Does this algorithm latch? Well, the pic provides the answer: it depends :-)
And here are some values for the probability that the program will show a change to the worse during a run of the program, depending on the size of the population:
  1. 99.999 %
  2. 98.485 %
  3. 83.879 %
  4. 55.944 %
  5. 30.008 %
  6. 14.071 %
  7. 6.168 %
  8. 2.649 %
  9. 1.137 %
  10. 0.491 %
  11. 0.214 %
  12. 0.094 %

So, even with a generation size of one hundred children, one in two hundred runs will show that this algorithm doesn't latch - it's not what W. Dembski and R. Marks describe as a partitioned search.