Friday, February 23, 2018

An Amazon Review: Still waiting for the ultimate book on Intelligent Design

I wrote a review at amazon for Dr. Robert J. Marks II's, Dr. Dr. William A. Dembski's, and Dr. Winston Ewert's book Introduction to Evolutionary Informatics (1st Edition):
We are all waiting for the ultimate book on Intelligent Design, written by R. Marks and W. Dembski. Instead we get a "textbook", another attempt to explain the concepts to laymen. I got the impression that the authors used this setting to avoid the necessary rigour: they just do not define terms like "search" which they use hundreds of times. This allows for a lot of hand-waving, like the following sentence on p. 174:

"We note, however, the choice of an algorithm along with its parameters and initialization imposes a probability distribution over the search space"

That unsubstantiated claim is essential for their following proofs on "The Search for a Search"!

And then there are details like this one:

p. 130: "For the Cracker Barrel puzzle [we got] an endogenous information of I = 7.15 bits"
p. 138: "We return now to the Cracker Barrel puzzle. We showed that the endogenous information [...] is I = 7.4 bits"

I tried to solve this conundrum, but I came up with I = 7.8 bits. I contacted the authors, but got no reply.
Not surprisingly, I gave it only two stars.

Some Details on the Cracker Barrel Puzzle

A more complete quote from p. 130 is:
For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, at random, a single peg is removed. This starts the game. Using random initialization and random moves, simulation of four million games using a computer program resulted in an estimated win probability p = $0.007\,0$ and an endogenous information of $$I_\Omega = − \log_2\,p\;=\;7.15\,bits.$$
They didn't calculate the correct value, but they simulated the puzzle 4,000,000. A simulation is the most easy programmable way to get a result - but how good is it? It should be pretty good: performing one simulation is a Bernoulli trial with a probability of success $p_t$, the theoretical probability to win a single game by chance. Repeating 4,000,000 Bernoulli trials leads to a binomial experiment $B(4,000,000; p_t)$, so $\sigma = 0.000\,042$ for $p_t$ - that's why stating four positions after the decimal point isn't overconfident: assuming that there is no systemic error, then the probability that the actual value $p_t$ lies within $0.007\,00 \pm 0.000\,05$ is $77\%$.

Giving three significant digits for $I_\Omega$ oversells the power of their experiment slightly: this implies that they expect $p_t$ to be in the interval $[0.007\,067;0.007\,065]$ with a reasonable probability - but the probability is at best about $44\%$.

Confining themselves to only two significant digits on p. 138: $I_\Omega = 7.4\;bits$ yields much more reliable results: again, assuming that there is nothing systematically wrong with their calculation, they can say that $p_t$ is in $[0.005\,72;0.006\,30]$ with a probability of more than $99.999\,99\%$! Well done...

Or not: it is very improbably that both values are correct. Very, very, very, very - using the most favourite estimations, then the second result should only occur with a probability of less than $10^{-98}$ if the first experiment was correctly implemented. It is even worse the other way around: $10^{-112}$.

Which value is correct?

Not surprising the answer: both are wrong - the three authors somehow botched the implementation of even the easiest way to approach the question - a simulation. How can I be so cock-sure? I simulated it myself - 4,000,000 times - and got a value of $p = 0.004\,5$. Then, I calculated the theoretical value by enumerating all possible games and their respective probabilities: again, $p = 0.004\,5$. Then, I published part of my code at The Sceptical Zone, and thankfully, Roy and Corneel also implemented a simulation - which got compatible results. Lastly, Tom English programmed the problem much more cleverly, getting exactly the same results as I (I just had to wait for mine much longer...)

Why didn't the authors do the same?

Monday, January 29, 2018

The Search Problem of William Dembski, Winston Ewert, and Robert Marks

Introduction to Evolutionary Informatics, by Robert J. Marks II, the “Charles Darwin of Intelligent Design”; William A. Dembski, the “Isaac Newton of Information Theory”; and Winston Ewert, the “Charles Ingram of Active Information.” World Scientific, 332 pages.
Classification: Engineering mathematics. Engineering analysis. (TA347)
Subjects: Evolutionary computation. Information technology–Mathematics.1
Search is a central term in the work of Dr. Dr. William Dembski jr, Dr. Winston Ewert, and Dr. Robert Marks II (DEM): it appears in the title of a couple of papers written by at least two of the authors, and it is mentioned hundreds of times in their textbook "Introduction to Evolutionary Informatics". Strangely - and in difference from the other central term information, it is not defined in this textbook, and neither is search problem or search algorithm. Luckily, dozens of examples of searches are given. I took a closer look to find out what DEM see as the search problem in the "Introduction to Evolutionary Informatics" and how their model differs from those used by other mathematicians and scientists.

Thursday, January 18, 2018

Prof. Marks gets lucky at Cracker Barrel

Introduction to Evolutionary Informatics, by Robert J. Marks II, the “Charles Darwin of Intelligent Design”; William A. Dembski, the “Isaac Newton of Information Theory”; and Winston Ewert, the “Charles Ingram of Active Information.” World Scientific, 332 pages.
Classification: Engineering mathematics. Engineering analysis. (TA347)
Subjects: Evolutionary computation. Information technology–Mathematics.1
Yesterday, I looked again through "Introduction to Evolutionary Informatics", when I spotted the Cracker Barrel puzzle in section 5.4.1.2 Endogenous information of the Cracker Barrel puzzle (p. 128). The rules of this variant of a triangular peg-solitaire are described in the text (or can be found at wikipedia's article on the subject). The humble authors then describe a simulation of the game to calculate how probable it is to solve the puzzle using moves at random:
A search typically requires initialization. For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, at random, a single peg is removed. This starts the game. Using random initialization and random moves, simulation of four million games using a computer program resulted in an estimated win probability p = 0.0070 and an endogenous information of $$I_\Omega = -\log_2 p = 7.15 bits.$$ Winning the puzzle using random moves with a randomly chosen initialization (the choice of the empty hole at the start of the game) is thus a bit more difficult than flipping a coin seven times and getting seven heads in a row
Naturally, I created such an simulation in R for myself: I encoded all thirty-six moves that could occur in a matrix cb.moves, each row indicating the jumping peck, the peck which is jumped over, and the place on which the peck lands. And here is the little function which simulates a single random game:
cb.simul <- function(pos){
# pos: boolean vector of length 15 indating position of pecks
# a move is allowed if there is a peck at the start position & on the field which is
# jumped over, but not at the final position
allowed.moves <- pos[cb.moves[,1]] & pos[cb.moves[,2]] & (!pos[cb.moves[,3]])
# if now move is allowed, return number of pecks left
if(sum(allowed.moves)==0) return(sum(pos))
# otherwise, chose an allowed move at random
number.of.move <- ((1:36)[allowed.moves])[sample(1:sum(allowed.moves),1)]
pos[cb.moves[number.of.move,]] <- c(FALSE,FALSE,TRUE)
return(cb.simul(pos))
}
I run the simulation 4,000,000 times, changing the start position at random. But as a result, my estimated win probability was $p_e=0.0045$ - only two thirds of the number in the text. How can this be? Why were Prof. Marks et.al. so much luckier than I? I re-run the simulation, checked the code, washed, rinsed, repeated: no fundamental change. So, I decided to take a look at all possible games and on the probability with which they occur. The result was this little routine:
cb.eval <- function(pos, prob){
#pos: boolean vector of length 15 indicating position of pecks
#prob: the probability with which this state occurs # a move is allowed if there is a peck at the start position & on the field which is
#jumped over, but not at the final position
allowed.moves <- pos[cb.moves[,1]] & pos[cb.moves[,2]] & (!pos[cb.moves[,3]])
if(sum(allowed.moves)==0){
#end of a game: prob now holds the probability that this game is played nr.of.pecks <- sum(pos)
#number of remaining pecks cb.number[nr.of.pecks] <<- cb.number[nr.of.pecks]+1
#the number of remaining pecks is stored in a global variable cb.prob[nr.of.pecks] <<- cb.prob[nr.of.pecks] + prob
#the probability of this game is added to the appropriate place of the global variable
return()
}
for(k in 1:sum(allowed.moves)){
#moves are still possible, for each move the next stage will be calculated d <- pos
number.of.move <- ((1:36)[allowed.moves])[k]
d[cb.moves[number.of.move,]] <- c(FALSE,FALSE,TRUE)
cb.eval(d,prob/sum(allowed.moves))
}
}
I now calculated the probabilities for solving the puzzle for each of the fifteen possible starting positions. The result was $$p_s=0.0045 .$$This fits my simulation, but not the one of our esteemed and humble authors! What had happened?

An educated guess

I found it odd that the authors run 4,000,000 simulations - 1.000,000 or 10,000,000 seem to be more commonly used numbers. But when you look at the puzzle, you see that it was not necessary for me to look at all fifteen possible starting positions - whether the first peck is missing in position 1 or position 11 does not change the quality of the game: you could rotate the board and perform the same moves. Using symmetries, you find that there are only four essentially different starting positions. the black, red, and blue group with three positions each, and the green group with six positions. For each group, you get a different probability of success
group blackgreenredblue
prob. of choosing this group .2.4.2.2
prob. of success .00686.00343.00709.001726
One quite obvious explanation for the result of the authors is that they did not run one simulation using a random starting position for 4,000,000 times, but simulated for each of the four groups the game 1,000,000 times. Unfortunately they either did not cumulate their results, but took only the one of the results of the black and the red group (or both), or they only thought they switched starting positions from one group of simulations to the next, but indeed always used the black or the red one.

Is it a big deal?

It is easily corrigible: instead of "For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, at random, a single peg is removed." they could write "For the Cracker Barrel puzzle, all of the 15 holes are filled with pegs and, one peck at the tip of the triangle is removed." If the book was actually used as a textbook, the simulation of the Cracker Barrel puzzle is an obvious exercise. I doubt that it is used that way anywhere, so no pupils were annoyed. It is somewhat surprising that such an error occurs: it seems that the program was written by a single contributor and not checked. That seems to have been the case in previous publications, too. Perhaps the authors thought that the program was too simple to be worthy of the full attention - and the more complicated stuff is properly vetted. OTOH, it could be a pattern.... Well, it will certainly be changed in the next edition.

Monday, January 8, 2018

UD in 2017

Just a few pics: