DiEbLog: Marks

Showing posts with label Marks. Show all posts

Monday, May 25, 2015

The Natural Probability on M(Ω)

Two weeks ago, Dr. Winston Ewert announced at Uncommon Descent a kind of open mike. He put up a page at Google Moderator and asked for questions. Unfortunately, not many took advantage of this offer, but I added three questions from the top of my head. The experience made me revisit the paper A General Theory of Information Cost Incurred by Successful Search again, and when I tried - as usual - to construct simple examples, I run into further questions - so, here is another one:

In their paper, the authors W. Dembski, W. Ewert, and R. Marks (DEM) talk about something they call the natural probability:

Processes that exhibit stochastic behavior arise from what may be called a natural probability. The natural probability characterizes the ordinary stochastic behavior of the process in question. Often the natural probability is the uniform probability. Thus, for a perfect cube with distinguishable sides composed of a rigid homogenous material (i.e., an ordinary die), the probability of any one of its six sides landing on a given toss is 1/6. Yet, for a loaded die, those probabilities will be skewed, with one side consuming the lion’s share of probability. For the loaded die, the natural probability is not uniform.

This natural probability on the search space translates through their idea of lifting to the space of measures $\mathbf{M}(\Omega)$:

As the natural probability on $\Omega$, $\mu$ is not confined simply to $\Omega$ lifts to $\mathbf{M}(\Omega)$, so that its lifting, namely $\overline{\mu}$, becomes the natural probability on $\mathbf{M}(\Omega)$ (this parallels how the uniform probability $\mathbf{U}$, when it is the natural probability on $\Omega$, lifts to the uniform probability $\overline{\mathbf{U}}$ on $\mathbf{M}(\Omega)$, which then becomes the natural probability for this higher-order search space).

As usual, I look at an easy example: a loaded coin which always shows head. So $\Omega=\{H,T\}$ and $\mu=\delta_H$ is the natural measure on $\Omega$. What happens on $\mathbf{M}(\Omega)= \{h\cdot\delta_H + t\cdot\delta_T|0 \le h,t \le 1; h+t=1 \}$? Luckily, $$(\mathbf{M}(\{H,T\}),\mathbf{U}) \cong ([0,1],\lambda).$$ Let's jump the hoops:

The Radon-Nikodym derivative of $\delta_H$ with respect to $\mathbf{U}$ is $f(H) = \frac{d\delta_H}{d\mathbf{U}}(H) = 2$, $f(T) = \frac{d\delta_H}{d\mathbf{U}}(T) = 0$
Let $\theta \in \mathbf{M}(\{H,T\})$, i.e., $\theta= h\delta_H + t\delta_T$. Then$$\overline{f}{(\theta)} = \int_{\Omega} f(x)d\theta(x)$$ $$=f(H)\cdot\theta(\{H\}) + f(T) \cdot\theta(\{T\})$$ $$=2 \cdot h$$

Here, I have the density of my natural measure on $\mathbf{M}(\Omega)$ with regard to $\overline{\mathbf{U}}$, $$d\overline{\delta_H}(h\cdot\delta_H + t\cdot\delta_T) = 2 \cdot h \cdot d\overline{\mathbf{U}}(h\cdot\delta_H + t\cdot\delta_T).$$ But what is it good for? For the uniform probability, DEM showed the identity $$\mathbf{U}=\int_{\mathbf{M}(\Omega)}\theta d\overline{\mathbf{U}} .$$ Unfortunately, for $\int_{\mathbf{M}(\Omega)}\theta d\overline{\delta_H}$, I get nothing similar: $$\int_{\mathbf{M}(\Omega)}\theta d\overline{\delta_H} = \frac{2}{3}\delta_H + \frac{1}{3}\delta_T$$

So, again, what does this mean? Wouldn't the Dirac delta function be a more natural measure on $\mathbf{M}(\Omega)$?

I hope that Dr. Winston Ewert reacts to all of the questions before Google Moderator shuts down for good on June 30, 2015...

Monday, May 11, 2015

Five Years of "The Search for a Search"

The Journal of Advanced Computational Intelligence and Intelligent Informatics published the paper The Search for a Search: Measuring the Information Cost of Higher Level Search of William A. Dembski and Robert J. Marks II (DM) in its July edition in 2010. With the five year jubilee of the publication coming, it seems to be appropriate to revisit a pet peeve of mine...

(Shell game performed on Karl-Liebknecht-Straße in Berlin, photograph by E.asphys)

Imagine a shell game. You have observed the con artist for a while, and now you know:

The pea ends up under each of the three shells (left, middle, and right) with the same probability, i.e., $$P(Pea=left)=P(Pea=middle)=P(Pea=right)=1/3$$
If the pea ends up under the left or the middle shell, you are able to track its way. So, in these cases, you will find the pea with probability 1 $$P(Finding\,Pea|Pea=left)=P(Finding\,Pea|Pea=middle)=1$$
However, if the pea ends up under the right shell, in 999 times out 1000, you make a mistake during your tracking and be convinced that it is under the left or the middle shell - the probability of finding this pea is 1/1000$$P(Finding\,Pea|Pea=right)=1/1000$$

You are invited to play the game. Should you use your knowledge (method $M_1$), or should you chose a shell at random (method $M_2)$?

The Ithaca Papers

William A. Dembski announces in his CV/Resumé on his web site Design Inference - Education in Culture and Worldview some books which are still in preparation. Top of the list is

Biological Information: New Perspectives (co-edited with Robert J. Marks II, John Sanford, Michael Behe, and Bruce Gordon). Under contract with Springer Verlag.

Well, rejoice, the electronic version of this book has been published (and is free for download!), and the hard copy is announced for August 2013. Albeit the publisher switched from Springer to World Scientific, the announcement hasn't changed:

In the spring of 2011, a diverse group of scientists gathered at Cornell University to discuss their research into the nature and origin of biological information. This symposium brought together experts in information theory, computer science, numerical simulation, thermodynamics, evolutionary theory, whole organism biology, developmental biology, molecular biology, genetics, physics, biophysics, mathematics, and linguistics. This volume presents new research by those invited to speak at the conference.

While the publication of Stephen C. Meyer's new book Darwin's Doubt is hailed with great fanfare at the Discovery Institute's news-outlet Evolution News, the appearance of this volume hasn't made their news yet - though Dembski and Meyer are both fellows of the Discovery Institute's Center for Science and Culture (granted, Meyer is its director). Only at Dembski's (former) blog, Uncommon Descent, there are two posts about the book:

Instantly, there arose a discussion about Denyse O'Leary's (commenting under the nom de guerre "News") choice of title, where the usual combatants switched sides: the evolutionists claimed the title was designed to mislead the average reader to think that the Cornell University was somewhat involved in the conference, the apologists of Intelligent Design argued that this was just chance. Unfortunately, no one answered to my comment:

In the interest of discussing the data and the evidence, could we have posts on various articles of the book? I’d be quite interested in a thread on Chapter 1.1.2 “A General Theory of Information Cost Incurred by Successful Search” by William A. Dembski, Winston Ewert and Robert J. Marks II.
I hope that the authors are still reading this blog: this way, we could have a productive discussion, and perhaps some questions could be answered by the people involved!
And for the sake of a swift exchange of ideas: could someone please release me from the moderation queue?

Maybe there is no interest in such a discussion at Uncommon Descent. Maybe no one read the comment - it was hold in the moderation queue for five days, and when it appeared, the article wasn't any longer at the front page. Therefore I'll start a number of posts on “A General Theory of Information Cost Incurred by Successful Search” here at my blog: I just can't believe that this peer-edited article would have been successfully peer-reviewed by Springer....

Monday, April 8, 2013

Dembski's and Mark's "The Search for a Search": Trying to work out one of their examples

Three years ago, on April 1st, 2010, W. Dembski's and R. Marks's paper The Search for a Search^[1] was accepted for publication. Recently, W. Dembski referred to it as a seminal paper in his article Before They've Even Seen Stephen Meyer's New Book, Darwinists Waste No Time in Criticizing Darwin's Doubt at the Discovery Institute's Evolution News and Views.. The first result of their paper is the Horizontal No Free Lunch Theorem which shows that average relative performance of [assisted] searches never exceeds unassisted or blind searches. Hopefully this paper is covered at the 2013 Summer Seminar on Intelligent Design In the natural sciences from July 12-20, 2013 in Seattle, WA - at least in the past, Marks and Dembski have been speakers at this seminar. Wouldn't it be a nice warm-up exercise for all these bright students gathered in Seattle to work out an example of an assisted search and see for themselves how the Horizontal No Free Lunch theorem works? I just ask because I haven't seen any such example calculated by Marks, Dembski or one of their students. They have mentioned a few kinds of assisted searches - like Dawkins's weasel, the Easter Egg hunt, etc., but they have never shown how those fit their model.

So I tried to work out a kind of simplified Easter Egg hunt in the language of Dembski and Marks - and I invite everyone to try to elaborate similar examples. My claim: any search which uses a kind of feedback (an oracle, a fitness function, etc.) cannot be described meaningfully in the terms of Marks and Dembski - which cast a doubt on their far-reaching conclusions about assisted searches in general. But see for yourself:

Is the average active information a suitable measure of search performance?

(Though the following doesn't include any maths, the reader is expected to be familiar with William Dembski's and Robert Marks's paper The Search for a Search and should have glanced at On a Remark by Robert J. Marks and William A. Dembski)

One of my problems with the modeling of searches by William Dembski and Robert Marks is that I don't see how every assisted search can be described as a probability measure on the space of the feasible searches. But nevertheless, Winston Ewert insisted that

All assisted search, irrespective of the manner in which they are assisted, can be modeled as a probably distribution biased towards selecting elements in the target.

Marks and Dembski claim that the average active information is a measure of search performance - at least they write in their remark:

If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search.

Their erratum seems indeed to proof the remark in a slightly modified way:

Given a uniform distribution over targets of cardinality k, and baseline uniform distribution, the average active information will be non-positive

(The proof of this statement in the erratum is correct - at least as far as I can see...)
So, lets play a game: From a deck of cards one is chosen at random. If you want to play, you have to pay 1\$, and you get 10\$ if you are able to guess the card correctly. But you are not alone, there are three other people A, B and (surprisingly) C who'll announce their guesses first. They use the following search strategies:

A: he will announce a card according to the uniform distribution
B: he will always announce ♦2
C: He has access to a very powerful oracle, which gives him the right card. Unfortunately - due to an old superstition - he is unable to say ♦2, so every time this card appears he will announce another one at random

Which strategy will you follow? Siding with player A or B gives you a chance of 1/52 for a correct guess, so you will loose on average ca. 81¢ per game. However, if you pose your bet with C, you will win 8.81$ a game in the long run! That's because the probability of a correct guess is 1/52 for both our players A and B, while C's chance for success is 51/52.

But what would Marks, Dembski or Ewert do? They calculate the average active information according to the formula given in the erratum. This E[I₊] is 0 for player A, but -∞ for B and C. As negative active information on average renders the search performance worse than random search, they have to stick with player A.

So, either average active information is not a good measure of search performance, or not every assisted search, irrespective of the manner in which they are assisted can be modeled as a probably distribution. Or it is a bit of both...

Thursday, May 24, 2012

A new erratum for Dembski's and Marks's The Search for a Search

Last month, I made a post On a Remark by Robert J. Marks and William A. Dembski where I addressed errors in a section of Dembski's and Marks's paper The Search for a Search. I exchanged emails over this with Winston Ewert, one of the people at The Evolutionary Informatics Lab ( a former?/grad? student of Marks). He informed me:

You also make brief mention that the HNFLT proof assumes a partition of the search space and thus cannot handle the overlapping targets. This is problematic because any target on the original search space will become an overlapping target on the multi-query search space. I think you are correct on this point. Thanks for bringing it to our attention. We've added a description of the problem as well as an alternate proof for the uniform case on overlapping targets to the pdf: http://evoinfo.org/papers/2010_TheSearchForASearch.pdf.

Here are some thoughts:

On a Remark by Robert J. Marks and William A. Dembski

Abstract

In their 2010 paper The Search for a Search: Measuring the Information Cost of Higher Level Search, the authors William A. Dembski and Robert J. Marks II present as one of two results their so-called Horizontal No Free Lunch Theorem. One of the consequences of this theorem is their remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search. This is quite surprising, as one would expect in the tradition of the No Free Lunch theorem that the performances are equally good (or bad). Using only very basic elements of probability theory, this essay shows that their remark is wrong - as is their theorem.

Definitions

The situation is quite simple: There is a set $\Omega_1$ - the search space - with $N$ elements, so w.l.o.g. $\Omega = \{1,\,\ldots,\,N\}$. Let $T_1 \subset \Omega_1$ denote the target. At first assume that $T_1$ consists of a single element. To find this element $T_1$ you have $K$ guesses and you aren't allowed to guess the same element twice. Such a sequence of $K$ different guesses can be described as an ordered $K$-tuple $(n_1,\,n_2,\,\ldots,\,n_K)$ of $K$ pairwise different numbers and is called a search. The $\frac{N!}{(N-K)!}$ different searches build the space $\Omega_K$. A search strategy is a probability measure $\mathbf{Q}$ on $\Omega_K$. Such a measure can be given by assigning a probability $\mathbf{Q}((n_1,\,n_2,\,\ldots,\,n_K)) = q_{(n_1,\,n_2,\,\ldots,\,n_K)}$ for each of the (pure) strategies $(n_1,\,n_2,\,\ldots,\,n_K)$ in $\Omega_K$: $q_{(n_1,\,n_2,\,\ldots,\,n_K)}$ is the probability to choose the strategy $(n_1,\,n_2,\,\ldots,\,n_K)$ from $\Omega_K$. So obviously, $q_{(n_1,\,n_2,\,\ldots,\,n_K)} \ge 0\; \forall (n_1,\,n_2,\,\ldots,\,n_K)\in\Omega_K$ and $\sum_{(n_1,\,n_2,\,\ldots,\,n_K) \in \Omega_K} q_{(n_1,\,n_2,\,\ldots,\,n_K)} =1 $. Random search (i.e., the random search strategy) correspond to the uniform distribution $\mathbf{U}$ on $\Omega_K$, i.e. $\mathbf{U}((n_1,\,n_2,\,\ldots,\,n_K)) = \frac{(N-K)!}{N!}$. Any other search strategy is called an assisted search. This shows the small scope of the concept of an assisted search in the papers of Dembski and Marks: it is not possible to encode the influence of an oracle or a fitness function into the measure $\mathbf{Q}$, so most examples presented by the authors (like partioned search or easter egg hunt are not covered by this model. The performance of a search strategy $\mathcal{P}(\mathbf{Q},\delta_{T_1})$ for finding an element $T_1 \in \Omega_1$ is defined as the probability to name $T_1$ in one of the $K$ guesses of the search.

Some calculations

Using the characteristic function $\chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(T_1)= \left\lbrace \begin{array}{cl} 1 & if\;T_1 \in \{n_1,\,n_2,\,\ldots,\,n_K\} \\ 0 & otherwise \end{array} \right. $ this probability can be written as: $\mathcal{P}{(\mathbf{Q},\delta_{T_1})} = \sum_{(n_1,\,n_2,\,\ldots,\,n_K) \in \Omega_K} q_{(n_1,\,n_2,\,\ldots,\,n_K)} \cdot \chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(T_1)$ The characteristic function is independent of the order of the elements $n_1, n_2, \ldots n_K$. So the expression can be streamlined using instead of the space of ordered n-tuples $\Omega_K$ the space $\mathfrak{P}_K(\Omega_1)$ of subsets $\{n_1,\,n_2,\,\ldots,\,n_K\}$ with $K$ elements of $Q_1$. This space has ${N \choose K}$ elements. $\mathbf{Q}$ on $\Omega_K$ gives instantly rise to a measure on $\mathfrak{P}_K(\Omega_1)$ - again called $\mathbf{Q}$ - by setting: $ \mathbf{Q}(\{n_1,\,n_2,\,\ldots,\,n_K\}) = \sum_{\sigma \in S_K} q_{(\sigma_1(n_1),\,\sigma_2(n_2),\,\ldots,\,\sigma_K(n_K))}.$ Here, $\sigma = (\sigma_1,\sigma_2,\ldots,\sigma_K)$ are the $K!$ elements of the group of permutations of $K$ elements, $S_K$. Thus, the probability to search successfully for $T_1$ is $\mathcal{P}{(\mathbf{Q},\delta_{T_1})} = \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} \mathbf{Q}(\{n_1,\,n_2,\,\ldots,\,n_K\}) \cdot \chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(T_1).$ But how does this search perform on average - assuming that the underlying measure is uniform? The underlying measure gives the probability with which a target element is chosen: if there is a uniform underlying measure $\mathbf{U}_{\Omega}$, each element $1,2,\ldots, N$ is a the target with a probability of $\frac{1}{N}$. The average performance is given by: $\mathcal{P}{(\mathbf{Q},\mathbf{U}_{\Omega})} =\sum_{l=1}^N \frac{1}{N}\sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} \mathbf{Q}(\{n_1,\,n_2,\,\ldots,\,n_K\}) \cdot \chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(l)$ $= \frac{1}{N}\sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} \mathbf{Q}(\{n_1,\,n_2,\,\ldots,\,n_K\}) \cdot \left(\sum_{l=1}^N \chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(l)\right)$ $= \frac{1}{N}\sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} \mathbf{Q}(\{n_1,\,n_2,\,\ldots,\,n_K\}) \cdot K $ $=\frac{K}{N} \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} \mathbf{Q}(\{n_1,\,n_2,\,\ldots,\,n_K\})$ $=\frac{K}{N}.$ This holds for every measure $\mathbf{Q}$ on the space of the searches, especially for the uniform measure, leading to the
Special Remark: If no information about a search for a single element exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in the same search performance as random search.
But perhaps this result depends on the special condition of an elementary target? What happens when $T_m = \{t_1, t_2, \ldots, t_m\} \in \mathfrak{P}_m(\Omega _1)$? Then a search is seen as a success if it identifies at least one element of the target $T_m$, thus $\mathcal{P}{(\mathbf{Q},\delta_{T_1})} = \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} q_{\{n_1,\,n_2,\,\ldots,\,n_K\}} \cdot \max_{t \in T_m}\{\chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(t)\}.$ Calculating the average over all elements of $\mathfrak{P}_m(\Omega_1)$ results in: $\mathcal{P}{(\mathbf{Q},\mathbf{U}_{\mathfrak{P}_m(\Omega_1)})} = \frac{1}{{N \choose m}} \sum_{T \in \mathfrak{P}_m(\Omega_1)} \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} q_{\{n_1,\,n_2,\,\ldots,\,n_K\}} \cdot \max_{t \in T_m}\{\chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(t)\}.$ $=\frac{1}{{N \choose m}} \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} q_{\{n_1,\,n_2,\,\ldots,\,n_K\}} \sum_{T \in \mathfrak{P}_m(\Omega_1)} \max_{t \in T_m}\{\chi_{\{n_1,\,n_2,\,\ldots,\,n_K\}}(t)\} $ $=\frac{1}{{N \choose m}} \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} q_{\{n_1,\,n_2,\,\ldots,\,n_K\}} \cdot \left({N \choose m} - {N-K \choose m}\right) $ $= \frac{{N \choose m} - {N-K \choose m}}{{N \choose m}} \sum_{\{n_1,\,n_2,\,\ldots,\,n_K\} \in \mathfrak{P}_K(\Omega_1)} q_{\{n_1,\,n_2,\,\ldots,\,n_K\}} $ $= \frac{{N \choose m} - {N-K \choose m}}{{N \choose m}} = 1-\frac{(N-K)(N-K-1)\ldots (N-K-m+1)}{N(N-1)\ldots (N-m+1)}.$ Again the probability of a successful search - and by definition the performance - is on average independent of the measure used on the space of the searches! (Note that the expression becomes $\frac{K}{N}$ for $m=1$ and $1$ for $N-K < m$). The result of this calculation is the
Less Special Remark: If no information about a search for a target exists other than the number of elements the target exists from, so that the underlying measure is uniform, then, on average, any other assumed measure will result in the same search performance as random search.
As the performance of the random search and any other search doesn't differ on any subset of targets of fixed cardinality, the next remark is elementary:
General Remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in the same search performance as random search.
Obviously this General Remark negates the
Remark of Dembski and Marks: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search.

Consequences for the Horizontal No Free Lunch Theorem

There is much critique on this theorem out there, e.g., "A Note Regarding ``Active Entropy". The previous paragraphs add to this: Dembski's and Marks's remark is a direct consequence of the theorem, and as the remark is wrong, the theorem can't be right, neither. How is it possible for two mathematicians to err on such a crucial point? Perhaps they were seduced by the elegance of their argument: the calculations in the sections above are simple and a little bit clumsy. Dembski and Marks smartly augmented the original search space $\Omega_1$ and identified it with the space of pure search strategies $\Omega_K$. For each original target $T_1$ a corresponding target $\tilde{T_1}$ can be found in $\Omega_K$, such that a simple search $(n_1, \ldots, n_K)$ is successful when it is an element of $\tilde{T_1}$. So the problem of searching by $K$ guesses on $\Omega_1$ is reduced to a simple guess on $\Omega_K$. The problem which they overlooked: the corresponding targets aren't nice subsets. Especially if $K>1$, then the corresponding sets for two different targets in the original space have always a non-empty intersection. Nevertheless, Dembski and Marks use an exhaustive partition of the augmented space in their proof of the Horizontal No Free Lunch theorem, but there is no such partition into subsets which makes sense in the framework of their argument.

Conclusion

Dembski and Marks have introduced their concept of Active Information using a great number of examples. Their model for an assisted search works only for one or two of these examples. The theoretical results presented in The Search for a Search seem not to work at all. That leaves the Active Information as being a new performance measure, but it can't be seen how it improves the understanding of the concept of information in connection with search algorithms.

Sunday, October 10, 2010

Bernoulli vs. Clark Kent

When discussing the consequences of their Horizontal No Free Lunch theorem, the authors W. Dembski and R. Marks state:

Because the active entropy is strictly negative, any uninformed assisted search will on average perform worse than the baseline search.

Here, the baseline search is the blind search. But what is an assisted search? As an example, the authors have introduced the perfect search

an assisted search guaranteed to succeed

Let's look at an example: a shell game - three shells, one pea. The operator is a fair man, who will hide the pea with probability 1/3 under any of the three shells.

The players are Bernoulli and Clark Kent. Knowing the paper of W. Dembski and R. Marks on the HNFLT, Bernoulli makes a guess at random, using a uniform distribution on the shells.

Clark Kent just picks the shell which hides the pea (remember - Clark Kent is Superman...), thereby performing a perfect search.

Who fares better? According to W. Dembski and R. Marks, both do equally well: observing the picking pattern of Clark Kent, they conclude that Clark Kent uses a uniform measure on the search space - as Bernoulli does: unfortunately, the mathematics of their paper has no way to deal with a strategy which is influenced by the position of the target (as any assisted search would be.)

Friday, September 24, 2010

Draft of a letter to JACIII

In their article The Search for a Search¹ the authors William Dembski and Robert Marks seem to assume that any search on a space induces a probability measure on the search space, as they write in section 3.1 (Active Information):

"Let U denote a uniform distribution on Ω characteristic of an unassisted search and φ the (nonuniform) measure of Ω for an assisted search. Let U(T) and φ(T) denote the probability over the target set T ∈_[sic]Ω."

Such an induced measure gives the probability to find a subset T of Ω when performing the corresponding search. But for the assisted searches mentioned in the text (a "perfect search" - "indicating an assisted search guaranteed to succeed" - and another search "indicating an assisted search guaranteed to fail"), such probability measures obviously do not exist! Thus these cases are not covered by their Theorem 1 (Horizontal No Free Lunch), as the Kullback-Leibler-distance is defined only for probability measures.

But perhaps the authors intended their theorem only to be valid for those searches which induce a measure on the search space - indeed, they use the term uninformed assisted searches when describing their results. Unfortunately, this term is not defined - neither in the current text, nor in their previous publications Conservation of Information in Search - Measuring the Cost of Success² or Efficient Per Query Information Extraction from a Hamming Oracle³.

Unfortunately, even in this case, the theorem does not work: assuming the easiest nontrivial case of guessing an element of a set with two elements (tossing a coin), Ω = {head, tail}, the two strategies naming an element at random (uniform distribution on

\Omega\,

) and sticking with one element (dirac distribution) give the same success rate for a fair coin. The same result occurs when averaging over the probability of showing head (assuming that this probability is uniformly distributed on [0,1]), contradicting the statement of William Dembski and Robert Marks:

"Because the active entropy is strictly negative, any uninformed assisted search (φ) will on average perform worse than the baseline search". (Highlighting mine)

In my opinion, the fundamental problem stems from identifying the search space Ω_(n) and the space of (deterministic) searches. This is doubtless elegant, but does not seem to work.

References:

¹W. A. Dembski and R. J. Marks II, "The Search for a Search: Measuring the Information Cost of Higher Level Search," Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol. 14, No. 5, pp. 475-486, September 2010

²W. A. Dembski and R. J. Marks II, "Conservation of Information in Search: Measuring the Cost of Success," IEEE Trans. on Systems, Man and Cybernetics A, Systems and Humans, Vol. 39, No. 5, pp. 1051-1061, September 2009

³W. Ewert, G. Montanez, W. A. Dembski, and R. J. Marks II, "Efficient Per Query Information Extraction from a Hamming Oracle," Proc. of the 42nd Meeting of the Southeastern Symposium on System Theory, IEEE, University of Texas at Tyler, pp. 290-297, March 7-9, 2010

Wednesday, September 1, 2010

What is wrong with public debate?

R. Marks just informed me of his policy not to engage in correspondence with anyone publicly critical of him or his work and that independent of the validity or invalidity of the details of the exchange, these things are best discussed thoroughly before any public pronouncements.

Indeed, I wrote to R. Marks about my concerns regarding the Horizontal No Free Lunch theorem. And I sent the links to my blog entries. And these blog entries are critical of his work. But what is the problem?

my blogging could be totally unsubstantiated. In this case he really should ignore it - for any valid theory, you'll find thousands of idiots blogging about imagined problems.
my entries could be well-thought through, but ridiculously wrong. In this case, I made a fool out of myself in public- and he could criticize me for doing so.
there maybe something true in my blog entries: if these grains of truth aren't buried in filthy language, insults, etc., why should they be ignored?

I just don't get it.

Tuesday, August 31, 2010

Horizontal No Free Lunch Theorem

In my previous post I criticized the Horizontal No Free Lunch theorem for searches consisting of at least two queries. But even in the case of a single assisted query, I can't get my head around it.

Dembski and Marks define an assisted query as any choice from Ω₁ that provides more information about the search environment or candidate solutions than a blind search. As an example, they give for instance, we might imagine an Easter egg hunt where one is told “warmer” if one is moving away from an egg and “colder” if one is moving toward it.

The ultimate assisted search is no search at all, the unveiled Easter egg. So, imagine a blind (A), a drunken (B), and a seeing, sober (C) chess-player, and ask them to locate the White Dame (♕) on a chess board. (A) will be find it with a probability of 1/64, (C) with a probability of 1, and lets assume for (B) a probability of 1/2.

Then (A) represents the blind search, (B) an assisted search and (C) what the authors call a perfect search on our search space Ω = {a1,a2,....,h8} (|Ω|=64).

Imagine that ♕ is positioned on a1, so T₁ = {a1}. Then we have three probability measures, α_a1, β_a1, and γ_a1 for each one of our players, which allow us to calculate the probability for each one to find T₁:

α_a1(T₁)= 1/64

β_a1(T₁) = 1/2

γ_a1(T₁) = 1

And we can calculate - according to (3.1) (10) the active information of these measures for various sets and combinations, e.g.
I₊(β_a1|α_a1(T₁)){a1} = log₂1/2 - log₂1/64 = 5.

So, the drunken player has more active information on the whereabouts of ♕ than the blind one.

I₊(γ_a1|α_a1(T₁)){b1} = log₂0 - log₂1/64 = -∞

A more surprising result: if ♕ is on b1, but (C) sees it on a1, he is much less informed than (A).

But of course, γ_a1 isn't the right measure if ♕ is on b1, we would expect γ_b1: the idea of an assisted search seems to be that it assists to the right direction - for our Easter egg hunt: if your helper directs you always to the same place, whether there is an egg or not, he is no helper indeed, he's just a nuisance:

Trivially, a search helped in this way can't be differed from an unassisted search.

But I'm afraid that exactly what W. Dembski and R. Marks do in their Horizontal No Free Lunch Theorem: let's take the partition T~ = {{a1},{a2},...,{h8}} of our Ω₁ above. What does

Σ α_a1{T_i} × I₊(γ_a1|α_a1){T_i}

mean? Nothing else than that our seeing, sober man errs 63 times out of 64 when looking for ♕. That just doesn't make sense.

The meaningful expression would have been:

Σ α_{T_i}{T_i} × I₊(γ_{T_i}|α_{T_i}){T_i}

And while the first sum adds up to -∞ the latter one is not negative at all: it's just 6.

nota bene: I think this is the idea behind Rob's (R0b?) comment to my previous post. So, thanks Rob!

Sunday, August 29, 2010

Search for Search Paper Finally Out

William A. Dembski informs us at Uncommon Descent that the paper A Search for a Search:Measuring the Information Cost of Higher Level Search (a collaboration with Robert Marks II) is finally published. To things are remarkable about his post:

The comments are turned off
It's so restrained - wasn't this paper once announced as a nail to the coffin of Darwinism?

One obvious flaw I found in an earlier draft of the paper is addressed: instead of talking about any searches, they are now talking about searches without repetition.

But I've still a(t least one) problem with this paper, which may be resolved by more careful reading over the next few days. As I wrote to Robert Marks:

I was a little bit irritated that the proof of the HNFLT still uses the
Kullback-Leibler distance, as I can't see how a non-trivial search-space
(i.e., a search spaces for a search existing from at least two queries)
can be exhaustively partitioned in a meaningful way.

Here's is an example I used earlier: Imagine a shell game with three shells where you are allowed to have two guesses. To put it more formally:

The search space Ω₁ is {1,2,3}

In section 2.1. Blind and Assisted Queries and Searches, Dembski and Marks describe how such searches can be construed as a single query when the search space is appropriately defined. They introduce an augmented search space Ω_Q, existing from the sequences without repetition of length Q, i.e., the number of queries. In our case:

The augmented search space Ω₂ is {(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)}

Of course, the target has to be changed accordingly to T_Q, where T_Q ∈ [sic] Ω_Q consists of all the elements containing the original target.

So, if in the original game the shell with the pea was No. 1, in our new space, T₂ = {(1,2),(1,3),(2,1),(3,1)}. All of this seems to be sensible

Let's have a look at a search strategy, for instance:

First, look under a random shell
Second, look under another random shell

By such a strategy, a probability measure is introduced on Ω₂, and the probability of a successful search for T₂ can be seen immediately: it's |T₂|/|Ω₂| = 4/6 = 2/3. No surprise here.

In fact, any search strategy can be seen as a measure on Ω_Q - and that's what Dembski and Marks are doing in the following. My problem: not any subset of Ω_Q can be seen as a reasonable target for a search: The complement of T₂ doesn't represent any target in the original space Ω₁! And though this set can be measured by the measure introduced above (2/6), this measure doesn't make sense in the context of a search.

And that's what I don't understand about the Horizontal No Free Lunch Theorem (3.2). The first sentence is:

Let φ and ψ be arbitrary probability measures and let T~={T_i}_i=1^N be an exaustive partition of Ω all of whose partition elements have positive probability with respect to ψ

How can Ω₂ partitioned in a sensible way? And why should I try to compare two search strategies on sets which they will never look for?

Friday, October 23, 2009

A silent death of the Horizontal no Free Lunch theorem

The only reply I got for my concerns regarding W. Dembski's and R. Marks's article The Search for a Search - and its Horizontal No Free Lunch theorem - was an email from R. Marks, stating that he'd look into it.
Now, two weeks later, R. Marks took the article from his list of publications. The Evolutionary Informatics lab has still an announcement, basically the abstract of the article:

Many searches are needle-in-the-haystack problems, looking for small targets in large spaces. In such cases, blind search can stand no hope of success. Success, instead, requires an assisted search. But whence the assistance required for a search to be successful? To pose the question this way suggests that successful searches do not emerge spontaneously but need themselves to be discovered via a search. The question then naturally arises whether such a higher-level “search for a search” is any easier than the original search. We prove two results: (1) The Horizontal No Free Lunch Theorem, which shows that average relative performance of searches never exceeds unassisted or blind searches. (2) The Vertical No Free Lunch Theorem, which shows that the difficulty of searching for a successful search increases exponentially compared to the difficulty of the original search.

The given link doesn't lead to the article any longer, but to their earlier opus Conservation of Information in Search - neither the term vertical nor the term horizontal appear in this article.

In his first podcast with Casey Luskin, W. Dembski promised that this article was something like a nail to the coffin of evolution. I assume that W. Dembski will explain how he was wrong with this assessment in his next interview...

Thursday, October 22, 2009

Horizontal No Free Lunch Theorem

You keep using that word. I do not think it means what you think it means

Inigo Montoya

In their upcoming paper The Search for a Search: Measuring the Information Cost of Higher Level Search W. Dembski and R.Marks introduce a Horizontal No Free Lunch Theorem.

At Uncommon Descent, I tried to reach W. Dembski with the following post ~~which is - of course - awaiting moderation at the moment~~.

The Horizontal No free Lunch Theorem doesn't work for a search of length m > 1 for a target T ⊆ Ω:

Let Ω be a finite search-space, T ⊆ Ω the target - a non-empty subset of Ω. A search is a (finite) sequence of Ω-valued random variables (φ₁, φ₂, ..., , φ_m). A search is successful, if φ_n ∈ T for one n, 1 ≤ n ≤ m.

I suppose we do agree here. Now, we look at a search Φ as a Ω^m-valued random variable, i.e., Φ := (φ₁, φ₂, ..., , φ_m).

When is it successful? If we are still looking for a T ⊆ Ω we can say that we found T during our search if

Φ ∈ Ω^m \ (Ω \ T)^m

Let's define Θ as the subspace of Ω^m which exists from the representations of targets in Ω, i.e.,

Θ := {Ω^m \ (Ω \ T)^m|T non-empty subset of Ω}

Obviously, Θ is much smaller than Ω^m.

But this Θ is the space of feasible targets. And if you take an exhaustive partition of Θ instead of Ω^m in Theorem III.1 Horizontal No Free Lunch, you'll find that you can indeed have positive values for the active entropy as defined in the same theorem.

But that's not much of a surprise, as random sampling without repetition works better than random sampling with repetition.

But if you allow T to be any subset of Ω^m, your results get somewhat trivial, as you are now looking at m independent searches of length 1 for different targets.

The searches which you state as examples in this paper and the previous one all work with a fixed target, i.e., elements of Θ. You never mention the possibility that the target changes between the steps of the search (one possible interpretation of taking arbitrary subsets of Ω^m into account).

So, I'm faced with two possibilities:

You didn't realize the switch from stationary targets to moving ones when you introduced searching for an arbitrary subset of Ω^m
You realized this switch to a very different concept, but chose not to stress the point.

Tuesday, October 20, 2009

W. Dembski's and R. Marks's The Search for a Search and my major problem with it

William Dembski and Robert Marks announced a new paper The Search for a Search: Measuring the Information Cost of Higher Level Search (draft) which should appear in a peer-reviewed journal at any moment.

My problem: When I read the term search for a target without any qualifier, I imagine that

the target is fix
the search is over when the target is found the first time

A model for this: a game of Battleship against a random generator with this additional rules:

all ships are of size one
you don't know how many ships there are
the game is over with the first hit

So, if your search space is Ω then there are 2^|&Omega| - 1 different targets.

But with this kind of target, the math of W. Dembski's and R. Marks's paper doesn't work: They claim that you can't find a strategy for that works better on average than random search. In the Battleship example: there is no better strategy than shooting at random each turn until your first hit. But this is obviously wrong: A better strategy is not to hit a square twice.

But W. Dembski and R. Marks proved their claim - how so? Well, they redefined the concept of target. The altered rules for the Battleship:

the position and the number of the ships changes each round
to succeed, you have to hit a ship each round

For this version - and not for the one above - they proved all their results. The problem: That's not search as we now it...

The technical details: In section III, B, they formalize a search

Suppose now Ω denotes an arbitrary search space. In general, assume Ω is a compact metric space with metric D, though for most theorems and calculations assume Ω is finite (in which case D is a discrete metric). Let X₁, X₂,...,X_n denote Ω-valued random variables that are independent and identically distributed with uniform probability distribution U. X₁, X₂,...,X_m therefore represents a blind search of Ω. Next, let Y₁, Y₂,...,Y_m denote Ω-valued random variables that are arbitrarily distributed. X₁, X₂,...,X_n therefore characterizes an assisted search of Ω.

That's our first version of Battleship - I've no problem with this section. But the next paragraph states

If we now let Ω^m denote the m-fold Cartesian product of Ω, then X₁, X₂,...,X_m and Y₁, Y₂,...,Y_m are m-valued random variables. Because the uniform probability on Ω^m is just the m-fold product measure of U, X₁, X₂,...,X_m) is uniformly distributed on
= Ω^m. Thus, by attending solely to the probabilistic
structure of searches, we represent the blind search := (X₁, X₂,...,X_m) as having a m-dimensional uniform distribution with respect to a metric D~ that suitably extends to the metric D on Ω. Moreover, we represent the assisted search := (Y₁, Y₂,...,Y_m ), in searching for an arbitrary subset of , as having distribution on .
But all these tildes are superﬂuous. Dropping them, we see that the probabilistic connection between a blind search and an assisted search can be entirely characterized in terms of a search space , a metric D that induces a compact topology on Ω, a target T included
in Ω, a uniform probability measure U on induced by D, and an arbitrary probability measure φ on Ω. U represents the probabilistic structure of a blind search, and φ represents the probabilistic structure of an assisted search.

Now,

is a subset of Ω^m - and there are 2^{|Ω| ^m} -1 non-empty subsets! A successful search means that

∈

, i.e., Y_n ∈ T_n for all 1 ≤ n ≤ m. So, a stationary target would be not represented by

Tⁿ

as one may naively expect, but by

T×Ω×...×Ω
∪ Ω×T×Ω×...×Ω
∪ Ω×Ω×T×Ω×...×Ω
∪...
∪ &Omega×Ω×...×&Omega×T

but most of the targets don't fit this pattern.

Of course you can claim that knowing that you target doesn't move is additional information. But I haven't encountered a definition of search which expects you to be right on target each time...

P.S.: I tried to reach R. Marks via email - and I left an entry at W. Dembski's blog...

P.P.S.: some details.

Saturday, October 17, 2009

I wrote a letter to W. Dembski and R. Marks

After reading their article Conservation of Information in Search:
Measuring the Cost of Success in some detail, I decided to write to W. Dembski and R. Marks, just to hint them to some minor miscalculations: So, they have the chance to get at least the math straight in their peer-reviewed paper.

At uncommondescent.com, I tried to draw attendance of Willima Dembski to two minor errors in your IEEE paper ''Conservation of Information in Search – Measuring the Cost of Success'':

Could you correct the sign errors in equation (27) and on page 1057, left column, for Q?

More problematic is the next line: you speak about the active information per query, but you calculate the active information per generation, as you drop the factor -2 from Q ~ -2log(L)/log(1-2µ) and so reach

I+ ~ L/log(L) * log(1-2µ) instead of
I+ ~ L/(2*log(L)) * log(1-2µ).

So, I+ should be µ*L/ln(L) and not 2*µ*L/ln(L).

(I stumbled upon this when I calculated my approximation I+ ~ µ*L/H_((1-b)L) for the average number of queries Q ~ H_((1-b)L)/µ )

And there is a little problem with Fig. 2: all the paths seem to start with no correct bits at all - while the math in ch. III-2 assumes that half of the bits are correct.

Let L=100, µ=0.00005:

1. if you use b(eta) = .5 (as in your calculation) Q ~ 92,100 and I+ ~ 0.00109. Then, you should take other paths for your pic

2. if you use b(eta) = 0 (as in you figure) Q ~ 106,000 and I+ ~ 0.000943

The number in your text is I+ ~0.0022. That's wrong in any case.

Yours
Di***** Eb**

Don't get me wrong: I don't want to be just nit-picking. One sign error is no big deal, an even number of sign errors cancels itself out :-)
And the factor two? It's at least the same order of magnitude. (And no one know what magnitude is to be expected for active information, so even a factor 100 would not have been easily spotted). And I didn't mention all the inaccuracies in their references' section....

Monday, October 12, 2009

Another approach

I completed this little project here.

I thought about different ways to present my ideas on W. Dembski's and R. Marks's article Conservation of Information in Search: Measuring the Cost of Success, and tried a few out in the last posts. The section which I'm interested in is the description of three evolutionary algorithms they give on pp. 1055-1057. Here's another approach for III E - G:

Original Text	Annotiations
E. Partitioned Search
Partitioned search [12] is a “divide and conquer” procedure best introduced by example.	The name partitioned search seems to be an invention of R. Marks and W. Dembski. The reference is made to Dawkins's book The Blind Watchmaker in which the phrase can't be found. (See Tom English's blog)
Consider the L = 28 character phrase `METHINKS∗IT∗IS∗LIKE∗A∗WEASEL`. (19)	This is indeed a phrase which is used in Dawkins's book - in an algorithm with which Dawkins explained the idea of cumulative selection.
Suppose that the result of our ﬁrst query of L =28 characters is `SCITAMROFN∗IYRANOITULOVE∗SAM`. (20)	An example of the egregious humorous skills of Dembski and Marks: backwards, we get: `masevolutionaryinformatics` That's no problem, as the first phrase of the algorithm can be arbitrarily chosen
Two of the letters {E, S} are in the correct position. They are shown in a bold font. In partitioned search, our search for these letters is ﬁnished.	At least, that's the way Dembski's and Marks's search works
For the incorrect letters, we select 26 new letters and obtain `OOT∗DENGISEDESEHT∗ERA∗NETSIL`. (21)	`listenarethesedesignedtoo` hilarious. And an sign that we don't see the output of an actual program, but something imagined to be a run of their algorithm. BTW, the fitness function would have to encode the position* of the correct letters, and the possible outcomes of the function wouldn't be a totally ordered set, but only partially ordered (what's better: `METHINKSITISXXXXXXXXXXXXXX` or `XXXXXXXXXXXXXXLIKEAWEASEL`). That's at least unusual, and perhaps a reason that no one else uses partitioned search*.
Five new letters are found, bringing the cumulative tally of discovered characters to {T, S,E, ∗,E, S,L}. All seven characters are ratcheted into place. The 19 new letters are chosen, and the process is repeated until the entire target phrase is found.	This ratcheting into place is special for the algorithm: the algorithm described in Dawkins's book doesn't show it.
Assuming uniformity, the probability of successfully identifying a specified letter with sample replacement at least once in Q queries is 1 − (1 − 1/N)^Q, and the probability of identifying all L characters in Q queries is q = (1 − (1 − 1/N)^Q)^L (22)	Yep, that's true. But why not doing a little math and giving us the expected number of queries? That would be a little bit less banal. And it's helpful if you want compare different algorithms
For the alternate search using purely random queries of the entire phrase, a sequence of L letters is chosen. The result is either a success and matches the target phrase, or does not. If there is no match, a completely new sequence of letters is chosen. To compare partitioned search to purely random queries, we can rewrite (5) as p =1 − (1 − (1/N)^L)^Q (23)	yup.
For L =28 and N =27 and moderate values of Q,we have p << q corresponding to a large contribution of active information. The active information is due to knowledge of partial solutions of the target phrase. Without this knowledge, the entire phrase is tagged as “wrong” even if it differs from the target by one character.	So, how big is this active information? For p, it was calculated in section III, A as I₊(p) = log(Q), and using the same approximation, we get I₊(q) = L log(Q) (that's only true-ish for small values of Q and large alphabets...)
The enormous amount of active information provided by partitioned search is transparently evident when the alphabet is binary. Then, independent of L, convergence can always be performed in two steps. From the ﬁrst query, the correct and incorrect bits are identified. The incorrect bits are then complemented to arrive at the correct solution. Generalizing to an alphabet of N characters, a phrase of arbitrary length L can always be identified in, at most, N − 1 queries. The ﬁrst character is offered, and the matching characters in the phrase are identiﬁed and frozen in place. The second character is offered, and the process is repeated. After repeating the procedure N − 1 times, any phrase characters not yet identiﬁed must be the last untested element in the alphabet.	Wow, the hangman's game. In detail.
Partitioned search can be applied at different granularities. We can, for example, apply partitioned search to entire words rather than individual letters. Let there be W words each with L/W characters each. Then, partitioned search probability of success after Q queries is p_W=(1 − (1 − N^−L/W)^Q)^W. (24)	What's that all about? Imagine an alphabet of 32 letters - including {A,B,...,Z,*} and our weasel-phrase. Then the phrase could also be encoded by 28 5-bit words. One 5-bit word is only correct, if all 5 bits are correct. Therefore, we get the same expressions for N=32, L=28, W=28 and N=2, L=140 and W=28.
Equations (22) and (23) are special cases for W = L and W =1.If N^−L/W << 1, we can make the approximation p_W ≈ Q^WN^−L from which it follows that the active information is I₊ ≈ W log₂ Q. (25) With reference to (6), the active information is that of W individual searches: one for each word.	So, for W=L, we get I₊(q) = L log(Q). My interpretation: virtually, this algorithm provides you with L fitness functions, one for each letter, indicating whether it is correct or not. BTW: I think it is surprising how the fitness functions get ignored during the whole article...

Sunday, October 11, 2009

Evolutionary Algorithms in Conservation of Information in Search

work in progress :-)

In their paper Conservation of Information in Search: Measuring the Cost of Success, William A. Dembski, Senior Member, IEEE, and Robert J. Marks II, Fellow, IEEE present three algorithms which could loosely be described as evolutionary. Idiosyncratically, they are labeled

Partitioned Search
Optimization by Mutation, and
Optimization by Mutation With Elitism

These procedures are discussed section II, part E and F (pp. 1055-1057), and some additional math is done in the appendix.

The algorithms and the examples given

The three algorithms are introduced or illustrated with an example, for which the parameters can be found in this table:

Algorithm	class	alphabet	length of target	Mutation	fitness function
Partitioned Search	none	{*,A,B,...Z} (27 el.)	28	every incorrect letter is mutated	position of correct letters
Optimization by Mutation	(1,2) ES	{0,1} (2 el.)	100	each letter is flipped with probability µ=0.00005	number of correct letters
Optimization by Mutation With Elitism	(1+1) ES	{0,1} (2 el.)	131^*	exactly one letter is flipped	number of correct letters

^*Why 131b for the last algorithm? I could understand 133 ~ 28* log₂27, but 131?

References

Four references are given in the section under discussion:

[3] M. Aigner, Discrete Mathematics. Providence, RI: Amer. Math. Soc., 2007.
[12] R. Dawkins, The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe Without Design. New York: Norton, 1996.
[32] D. J. C.MacKay, Information Theory, Inference and Learning Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2002.
[34] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill, 1991, pp. 537–542.

The first one is unproblematic - though superfluous: it is given as a reference for the Lth harmonic number. The three other references are either problematic or at least indicating sloppy work:

1. Partitioned search [12] is a “divide and conquer” procedure best introduced by example. Richard Dawkins doesn't use the expression "partitioned search" in his book.

2. When μ << 1, the search is a simple Markov birth process [34] that converges to the target. The page numbers of the reference don't fit: it should be pp. 647-650, not pp. 537–542. But at least, page numbers are given!

3. To illustrate, consider a single-parent version of a simple mutation algorithm analyzed by McKay [32].David MacKay analyzes explicitly multi-parented algorithms, there is not much of a resemblance with Dembski's and Marks's procedure. And annoyingly, again, page numbers are omitted. MacKay's book has over 600 pages, only a few in Chapter III, 19 : Why have Sex? Information Acquisition and Evolution are dedicated to evolutionary algorithms.

But one has to be grateful for small things: At least these references weren't garbled like

[7] S. Christensen and F. Oppacher, “What can we learn from no free lunch? A ﬁrst attempt to characterize the concept of a searchable,” in Proc. Genetic Evol. Comput., 2001, pp. 1219–1226.

Of course, it's about the concept of a searchable function. Not the first time, R. Marks and W. Dembski got that wrong.
Ann.: Tom English explained that errors in references are common in the engineering literature. So, I'll stop classifying the references of Dembski's and Marks's article as either superfluous or erroneous for now.

Saturday, October 10, 2009

I'm annoyed.

Perhaps I'm easily annoyed. But every time I read the paper Conservation of Information in Search - Measuring the Cost of Success, I'm getting, well, pretty much distressed. I wanted to write a concise rebuke of this article, but you can't be concise: there is so much disturbing in it, often little things. Take for instance this innocuous little sentence

To illustrate, consider a single-parent version of a simple mutation algorithm analyzed by McKay [32].

The reader may be interested in McKay's analysis of a simple mutation algorithm, and follows the reference:

D. J. C.MacKay, Information Theory, Inference and Learning Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2002.

Thankfully, David J.C. MacKay (he seems to prefer this spelling) put his book online. And yes, it is a book, a textbook with over 600 pages. It's covering a wide range, so at least, Dembski and Marks should have given the pages which they thought to be relevant. Which they thought to be relevant is a key phrase, because a skimming of the book doesn't render anything resembling an analysis of a simple mutation algorithm like the one in the article: yes, there is Chapter III, 19 :Why have Sex? Information Acquisition and Evolution - but this is explicitly about algorithms with multiple parents, it just doesn't fit a single parent version - MacKay assumes for instance that the fitness of the parents is normally distributed, something you just can't have for a single parent version. So why do Marks and Dembski give MacKay as a reference? I have no idea.

And it's going on

Just checking the next reference: [34] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill, 1991, pp. 537–542. for this little piece of marginal information

When μ << 1, the search is a simple Markov birth process [34] that converges to the target.

You may say: At least, they are giving a page number. But it's the wrong one! Take out your 3rd edition of Papoulis's very good introduction to the theory of probability, and you'll find Markov birth processes on pp. 647 - 650: why can't they get at least the details right?

Wednesday, September 23, 2009

Random Mutation

In their paper Conservation of Information in Search: Measuring the Cost of Success William Dembski and Robert Marks present an algorithm which resembles Dawkins's weasel (the algorithm I talked about before - at length :-)
It isn't the algorithm Partioned Search introduced on p. 1055, it's the algorithm Random Mutation on p. 1056. The differences

the alphabet consists of two letters, not 27.
the target length is 100, not 28
the size of a generation is quite low: it's two
the mutation probability per letter is very small: µ = 0.00005

Many programmers have implemented weasel-like programs, and the runs I've seen usually use a µ between 0.04 and 0.05, while the generation size is greater than 20.

So, why do W. Dembski and R. Marks use their values? Well, in the appendix of their paper, they calculated an approximation for the number of correct letters depending on the number of generations. This approximation discards terms quadratic in µ, so to use it, not only has µ << 1, but the size of a generation has to be small, too.

But should they use these values? I don't think so: As usual, I modeled their algorithm as a Markov chain and could calculate some exact values. The expected number of generations to get to the target for their choice of parameters is 55,500. (see pic 1)

Some fooling around with their program should have shown that this number can be reduced dramatically by using an more appropriate value for µ. The optimum lies near to µ ~ 0.0005, which takes only 10,600 generations.

And though their approximation doesn't work as fantastically as for the parameter ten times smaller, the numbers are not more than 2.5% off. (see pic 2)