Monday, April 8, 2013

Dembski's and Mark's "The Search for a Search": Trying to work out one of their examples

Three years ago, on April 1st, 2010, W. Dembski's and R. Marks's paper The Search for a Search[1] was accepted for publication. Recently, W. Dembski referred to it as a seminal paper in his article Before They've Even Seen Stephen Meyer's New Book, Darwinists Waste No Time in Criticizing Darwin's Doubt at the Discovery Institute's Evolution News and Views.. The first result of their paper is the Horizontal No Free Lunch Theorem which shows that average relative performance of [assisted] searches never exceeds unassisted or blind searches. Hopefully this paper is covered at the 2013 Summer Seminar on Intelligent Design In the natural sciences from July 12-20, 2013 in Seattle, WA - at least in the past, Marks and Dembski have been speakers at this seminar. Wouldn't it be a nice warm-up exercise for all these bright students gathered in Seattle to work out an example of an assisted search and see for themselves how the Horizontal No Free Lunch theorem works? I just ask because I haven't seen any such example calculated by Marks, Dembski or one of their students. They have mentioned a few kinds of assisted searches - like Dawkins's weasel, the Easter Egg hunt, etc., but they have never shown how those fit their model.

So I tried to work out a kind of simplified Easter Egg hunt in the language of Dembski and Marks - and I invite everyone to try to elaborate similar examples. My claim: any search which uses a kind of feedback (an oracle, a fitness function, etc.) cannot be described meaningfully in the terms of Marks and Dembski - which cast a doubt on their far-reaching conclusions about assisted searches in general. But see for yourself:

Dembski and Marks sometimes give as an example for an assisted search the Easter Egg Hunt, where someone calls out colder or warmer while you are searching for an egg. Let's make it even more simple: Our Easter Egg is hidden behind one of 32 enumerated doors, and after choosing one door you are told whether the egg is behind a door with a bigger number (return value: $+1$), a door with a smaller number (return value: $-1$) - or that you have found the egg (return value: $0$). Now, I'll try to realize this search according the model given in Dembski's and Marks's paper The Search for a Search.

Following 2.1. Blind and Assisted Queries and Searches, we set as the search space $\Omega_1 = \{1,2,...,32\}$ and as target one element $T_1$ of this space. If we are allowed to guess only once, the probability of success is $\frac{1}{32}$. But a simple guess isn't much fun, and indeed, the authors are prepared for multiple queries: Consider Q queries (samples) from $\Omega_1$ without replacement. Well, let's consider four queries, that sounds reasonable, so $Q=4$. We read Such searches can be construed as a single query when the search space is appropriately defined Following the construction, we get $\Omega_4$, a set containing all sequences $(a_1,a_2,a_3,a_4), a_i \in \Omega_1$ of length four with elements of $\Omega_1$ where no element appears more than once. Obviously $|\Omega_4|=32\cdot 31\cdot 30 \cdot 29 = 863,040$. We get a new target $T_4$, too, which consists of all elements containing the original target, $T_1$. How many elements does $T_4$ hold? $|T_4|= 4\cdot 31 \cdot 30 \cdot 29 = 71,920$. That's quite beautifully done - a query $A$, represented by a 4-tuple $(a_1,a_2,a_3,a_4)$ is successful if $A \in T_4$. Making four random guesses means to choose one 4-tuple out of the 863,040 possible. 71,920 of these tuples represent the finding of the egg, so our probability of success if $\frac{71,920}{863,040}=\frac{1}{8}$, exactly the result we would expect. But we don't have to rely on such a blind search, we certainly have an assisted search: after each guess we get a bigger or smaller, or even a well done, you found it. We'll use this to build a strategy
1. We always start with door 16, so the first element $a_1$ of our 4-tuple $A=(a_1,a_2,a_3,a_4)$ is 16.
2. If we hear bigger, we choose 24 next, is we hear lower, the number 8. So $a_2 \in \{8,24\}$
3. Now we go 4 steps in the indicated direction, so $a_3 \in \{4,12,20,28\}$ ...
4. ... and now 2 steps in the right direction, so $a_3 \in \{2,6,10,14,18,22,26,30\}$
Out of the 863,040 possible search strategies, we only use 8:
 (16,8,4,2) (16,8,4,6) (16,8,12,10) (16,8,12,14) (16,24,20,18) (16,24,20,22) (16,24,28,30) (16,24,28,26)
But what happens when we have found our target in the first, the second or the third guess? The model only excepts searches of length four, so we choose at random a fitting tuple. (For example, if we are informed after our third guess that the Easter Egg is indeed behind the door with number 12, we nevertheless open a fourth door, either number 10 or number 14 - both with a probability of 50%) What is the rate of success? If the egg is behind a door adorned with one of the fifteen even numbers smaller than 32 we will find it: so if the Easter Bunny hid it behind each door with the same probability of $\frac{1}{32}$, we will find it with a probability of $\frac{15}{32}$ - 3.75 times better than the random search! This is certainly an assisted search, so how will it be treated by Dembski and Marks? To answer this question, we eagerly look at the next section of the paper 3. Active Information and No Free Lunch - and we will follow it sentence for sentence. But first there is a change of notation, as Dembski and Marks say: Keeping track of Q subscripts and exhaustively contracted perturbation search spaces is distracting. So, $\Omega_4$ becomes $\Omega$, $T_4$ becomes $T$ and A single query to a multi-query space can then be considered a search. If a search is chosen randomly from such a multi-query space, we are still preforming a blind (or unassisted) search. O-kay, but now let's start at earnest:
• Define an assisted query as any choice from $\Omega_1$ that provides more information about the search environment or candidate solutions than a blind search. Our assisted search does so, we get information about our candidate solutions by the smaller and bigger shouts...
• Gauging the effectiveness of assisted search in relation to blind search is our next task. Let's do this!
• Random sampling with respect to the uniform probability U sets a baseline for the effectiveness of blind search. Does this mean, an element of $\Omega (=\Omega_4)$ is chosen, and each choice has the same probability?
• For a finite search space with $|\Omega|$ elements, the probability of locating the target $T$ has uniform probability given by [...] $p=\frac{|T|}{|Q|}$ Yes, that seems to be the meaning. So, in our case $p=\frac{71,920}{863,040}=\frac{1}{8}$
• We assume that we can always do at least as well as uniform random sampling. Indeed, our strategy with success probability $q=\frac{15}{32}$ would do at least as well...
• The question is, how much better can we do? We don't claim that our strategy is optimal, and we will wait for your numbers...
• Given a small target T in $\Omega$ and probability measures U and $\phi$ characterizing blind and assisted search respectively, assisted search will be more effective than blind search when $p < q$, as effective if $p = q$, and less effective if $p > q$. Wait, what? How do we find this probability measure $\phi$ which characterizes our assisted search? On what space does it live? We have to look in the next section to find the answer let [...] $\phi$ [denote] the (nonuniform) measure on $\Omega$ for an assisted search. But how do we fit in the bigger and smaller shouts? For someone who doesn't hear these shouts, but sees us doing the search over and over again, it must look as we choose one of our eight strategies at random, with the equal probability of $\frac{1}{8}$ - and nevertheless, we are able to find the target in 15 (instead of 4) out of 32 tries! Dembski's and Marks's random measure on $\Omega$ isn't able to model our simple assisted search in a meaningful way: when we get the information bigger three times, we choose the tuple $(16,24,28,30)$ with probability one - and no other! To model that, it's not enough to look at the search space, you have to keep track of the return values of your fitness function (remember the $-1, 0, 1$ above?) We look into this later, but for the moment, just assume as $\phi$ the measure on $\Omega$ which our hard-of-hearing observer has to deduce from our actions: each of our eight strategies above is taken with a probability of $\frac{1}{8}$. What would happen if we really searched this way, ignoring the additional information of the bigger and smaller shouts? We would never guess the 32 or any odd number. The probability to get the 16 right would be 1, to get 8 or 24 right 1/2 each, for 4,12,20 or 28 1/4 each, and for the rest of the even numbers 1/8 each. So, the probability of success when choosing randomly one of our eight strategies is $\frac{1}{8}$ - the same probability as for the blind search! Is this a surprise? No, not at all - all strategies realized as random measures on $\Omega$ will yield the same result.
• In other words, an assisted search can be more likely to locate T than blind search, equally likely, or less likely. If less likely, the assisted search is counterproductive, actually doing worse than blind search. One would think so. But again, each search modeled as a probability distribution on $\Omega$ is equally likely to find T as blind search
• For instance, we might imagine an Easter egg hunt where one is told "warmer" if one is moving away from an egg and "colder" if one is moving toward it. If one acts on this guidance, one will be less likely to find the egg than if one simply did a blind search. Pardon? Again, that is intuitively correct, but as for the opposite case which we elaborated above, it just doesn't fit the model: if you try to realize this behavior as a measure on $\Omega$, you get the same result as for the blind search...
• An instructive way to characterize the effectiveness of assisted search in relation to blind search is with the likelihood ratio $\frac{q}{p}$. Unfortunately, the likelihood ratio is always 1, as $p=q$
• This ratio achieves a maximum of $\frac{1}{p}$ when $q = 1$, in which case the assisted search is guaranteed to succeed. The assisted search is better at locating $T$ than blind search provided the ratio is bigger than 1, equivalent to blind search when it is equal to 1, and worse than blind search when it is less than 1. Only that in the first case $p=1$, too: the ration will never be unlike 1.
• Finally, this ratio achieves a minimum value of 0 when $q = 0$, indicating that the assisted search is guaranteed to fail. That won't happen - unless we don't take any guess.
We are left with something trivial: if we ignore the information we get during our search - the assistance - the search will work no better and nor worse than the random search. Everything what comes in the paper after this point is rather trivial: there is no assisted search guaranteed to succeed (i.e., a perfect search), the active information is always zero, as the endogenous information always equals the exogenous information. But at least, it sounds nice...

So, what is the problem with this model? It is obvious when we look at the classical paper No Free Lunch Theorems For Optimization[2] by David H. Wolpert and William G. Macready: Here a search is not only defined by the elements of the search-space, but their "cost" values (the output of an "objective function", oracle, etc.), too. In our case, our search strategy would not only include the 4-tuple (16,24,20,22), but ([16,+1],[24,-1],[20,+1],[22,0]) - here the second value indicates that we heard bigger, smaller, bigger, you found it. Obviously we would not include ([16,+1],[24,+1],[20,+1],[22,+1]) - and this important subtlety is lost in the model of Marks and Dembski.

1. William A. Dembski and Robert J. Marks II, "The Search for a Search: Measuring the Information Cost of Higher Level Search," Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.14, No.5, 2010, pp. 475-486.

2. Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems for Optimization", IEEE Transactions on Evolutionary Computation 1, 67.

16 comments:

1. This comment has been removed by the author.

2. Sorry to throw a wrench in the works, but...

You say that your sampling strategy will find the egg with probability 15/32. This holds if the Easter Bunny in fact chooses a hiding place uniformly at random. As best I can tell, though, Dembski and Marks treat the correct door ("target") as an unknown constant, thereby rendering the sampling process ("search") statistically independent of the fitness function (which is determined by the constant).

Section 2.1 starts by defining the target as a subset of the sample space. It seems to predicate that all targets are equally likely, but then explains that we must assume, in the absence of prior information, that they are. This foray into subjective probability justifies the use of uniform sampling as the baseline, zero-knowledge strategy. Dembski and Marks never treat the target as a random variable. There is no objective chance that the egg is behind any door but one.

With the correct door a constant, the chance of a sample depends only on the sampling strategy. That is why Dembski and Marks can model the strategy as a probability measure on samples. A deterministic strategy (nonrandom, like yours) draws a constant sample A, which categorically does or does not contain the correct door. That is, the chance of success is either 0 or 1. Certain success corresponds to 3 bits of active information, and certain failure to infinitely negative active information.

Dembski and Marks: We assume that we can always do at least as well as uniform random sampling.

Nope. They have specified a particular target. The chance of success for your well designed strategy is 0 for 17 of the possible targets, and 1 for the other 15. Belief that all targets are equally likely entails an expected chance of success of 15/32, but not an expected active information value of log((15/32) / (1/8)) = log(3.75).

You didn't get into active information, but I want to make it clear that I'm not rescuing Dembski and Marks. Considering how you designed your strategy, no one can say that it benefits maximally from prior information or suffers infinitely from prior disinformation. Yet the active information measure indicates one extreme or the other, depending on the arbitrary placement of the egg. As I've said before, active information is a performance ratio complicated needlessly by logarithmic transformation.

3. The target does vary in the misbegotten Horizontal NFL Theorem. So the probability-measure model of search strategies is inadequate in that context for your target-dependent fitness functions.

It may be that Dembski and Marks regard the fitness function as part of the search strategy. In the case of the Weasel program, Dembski was insisting 15 years ago that Dawkins had coded the fitness function to help the program find the target string. No one could get through to him the fact that the fitness function was logically external to the evolution strategy. Perhaps he and Marks think along the lines of setting the target to

I KNOW A HAWK FROM A HANDSAW

and running the Weasel program without changing the fitness function. "If one [the program] acts on this guidance, one will be less likely to find the egg [target string] than if one simply did a blind search. Because the assisted search is in this instance misleading, it actually undermines the success of the search" [Section 3].

Sorry if I'm wasting your time and cluttering your blog. But I believe that there must be some rhyme or reason to what Dembski and Marks have done.

4. Sorry if I'm wasting your time and cluttering your blog. I thank you for your thoughtful comments and the time you are taking. I'll try to be equally thorough in my answers to all of your points, but that takes a little time, too.

The target does vary in the misbegotten Horizontal NFL Theorem.: Indeed it does: they look at a partition of the target space. In the trivial case of a simple guess, I could use a partition into the elements of the space. As the partition isn't weighted in any way in the theorem, I have to assume that each element has the same probability to become a target. That is at least my interpretation...

It may be that Dembski and Marks regard the fitness function as part of the search strategy.
I thought about this, too, but I cannot imagine a way to encode this.

Perhaps he and Marks think along the lines of setting the target to I KNOW A HAWK FROM A HANDSAW and running the Weasel program without changing the fitness function. I got this impression too, and therefore, all of their observations become a little bit trivial: if the fitness function doesn't work most of the time, it's best to ignore it.

But I believe that there must be some rhyme or reason to what Dembski and Marks have done. I really hope so. But every time I try to work out their ideas on an example, I run into problems.

5. In the trivial case of a simple guess, I could use a partition into the elements of the space. As the partition isn't weighted in any way in the theorem, I have to assume that each element has the same probability to become a target. That is at least my interpretation...

Before exhuming the trivial, let's summarize what we know about the nontrivial case of Q ≥ 2. Dembski and Marks have failed to produce a sensible partition of the sample space Ω_Q. If we allow them a nonsense partition, then their model of sampling processes ("searches") as probability measures on Ω_Q is wrong.

Back to a single guess (Q = 1). Suppose that there is an unknown partition of the sample space Ω into possible targets, and that measure ψ on Ω gives the probabilities of targets. The average gain in performance (probability of hitting the target) if we guess according to probability measure φ instead of ψ is the sum of

ψ(T) ∙ φ(T) / ψ(T) = φ(T)

over all targets T. Looks like 1 to me. There is no performance gain on average, no matter how we choose φ. Let's say that we draw φ uniformly at random. For each T, the ratio of φ(T) to Ω(T) is obviously not due to prior information. But when Dembski and Marks refer to guessing as assisted search, that changes everything. What causes φ to perform better or worse than ψ in hitting a particular target T? Information. And how do we expose information? Write "-log" in front of probabilities. The average gain in log performance... er, gain in information... er, active information (yeah, that's the ticket) is the sum of

ψ(T) ∙ log(φ(T) / ψ(T)) = ψ(T) [log(φ(T)) - log(ψ(T))]

over all targets T. This "is immediately recognized as the negative of the Kullback-Leibler distance," also known as the relative entropy.

Am I interpreting the Horizontal NFL Theorem correctly? Active entropy is the expected value of active information over the targets in the partition, with the expectation in terms of ψ. Dembski and Marks introduce ψ as a mere baseline guess, but then make it the arbiter of probabilities of targets — the "true" measure. A ψ-guess has the maximum active entropy, zero. A φ-guess has negative active entropy unless φ(T) = ψ(T) for all targets. So what?

Dembski and Marks: Because the active entropy is strictly negative, any uninformed assisted search (φ) will on average perform worse than the baseline search.

Nonsense. To maximize average performance, set φ({ω}) = 1 for an element ω of a target T for which ψ(T) is maximal. For example, if there are just two targets in the partition, and ψ(T) = .9, then a φ-distributed guess hits T with probability .9, and a ψ-distributed guess hits T with probability .82. The active entropy for φ is negative infinity.

Dembski and Marks: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search.

They confuse subjective and objective probability. Not knowing ψ does not actually render the measure uniform on some space. And what is that space? The unknown partition may include targets of different sizes. Even if ψ is uniform on Ω, it does not follow that the targets are equiprobable.

1. Oops. I meant to change, "φ-distributed guess." Now I'll invoke the natural isomorphism of Ω-valued random variables and probability measures on Ω.

2. I agree wholeheartedly. At the moment, I'm trying to put my thoughts into a little article (perhaps something for BIO-complexity :-) The main problems I've identified are:

1) It generally doesn't work to identify the space of possible searches (which is their $\Omega$) with the space of targets - the letter isn't $\Omega$, but some subset of $\mathfrak{P}(\Omega)$. In the simplest case, it is $\mathfrak{P}_1(\Omega)$, so Dembski and Marks took it for $\Omega$ itself, but e.g., when we look at targets with two elements, it should be $\mathfrak{P}_2(\Omega)$. This is somewhat obvious if we think about how a target is determined - choosing a target of size two at random cannot be described by a random variable over $\Omega$ as $\mathfrak{P}_2(\Omega)$ -it's just a matter of sizes.
2) The measure used on the space of searches $\Omega$ cannot meaningfully be used on the space of targets:

3) In the simplest case, all possible searches have the same probability of success. That is normally Dembski's and Marks's benchmark for the performance of a search. As you said above, taking the logarithms just muddles the water. Their "remark" to the "HNFL" which you quoted above is patently absurd - if the "underlying measure is uniform" on the space of the targets (yes, they should of equal size, too (this can be a little bit loosened I think)), each "search strategy" in the sense of Dembski and Marks shows the same performance - the active entropy is just meaningless.

4) If the space of targets is identified correctly, the measure on the space of searches are no measures on the space of targets - and there is no Kullback-Leibler distance any more. You can try to crowbar it back in by taking a partition of $\Omega$ into elements of the space of targets and go on, but than you may get contradictory results:
Let $\Omega = \{1,2,3,4\}$, we are looking for an two-elementary subset, our search measure is given by $\mathbb(P)(\{1\})=\mathbb(P)(\{2\}) = 1/8$ and $\mathbb(P)(\{3\})=\mathbb(P)(\{4\}) = 3/8$. Then the partition $\{\{1,2\},\{3,4\}\}$ "shows" that the strategy is worse than random search, while on $\{\{1,3\},\{2,4\}\}$ it is equal to random search. And then there is the old conundrum whether there exists a partition of $\Omega$ into targets - generally there isn't.

5) Is anyone but me annoyed by the notion that they define a the success of a search as "obtaining an element in the target" instead of "identifying the target"? They wouldn't win a game of battle-ships that way! If you e.g., are performing a search for a single element of the English alphabet and you are allowed two guesses, and you are informed that your search $(a,b)$ was successful, you still don't know whether the letter you were looking for was the "a" or the "b"!

(updated version)

3. I have to repeat something I've said in the past, namely that we shouldn't allow Dembski and Marks to sweep the empty target under the rug. Even though they analyze only solved instances, the search-selecting agents (problem solvers) they predicate generally cannot know whether instances are solvable.

To see how limiting it is to assume that the solution sets of instances of a problem are mutually exclusive and nonempty, consider the canonical NP-complete problem 3-SAT. I suspect that Dembski and Marks partitioned the solution space into targets for no better reason than to enable the definition of active entropy. You may choose to show that the math does not pan out, even if we play by the rules of their game. But I hope you'll emphasize from the outset that the game is cockamamie.

Regarding your points 4 and 5.... Active entropy is related to binary description of a target-valued random variable T with Pr{T = T} = ψ(T) for all targets T. (The partition of the sample space into targets is known.) Suppose that you assign binary descriptions to minimize the expected description length, but assume that the measure is φ instead of ψ. Then the expected number of excess bits in your description of T is the negative of the active entropy.

At present, the best sense I can make of the Horizontal NFL Theorem is that Dembski and Marks are trying to relate performance in hitting the target to efficiency in describing the target. We've shown that the two are poorly related.

6. Dembski and Marks define as a success of a search "obtaining an element in the target" - in general that is not the same as identifying the target: in a game of battleship it's the difference between hitting a ship and sinking it.

This becomes a problem in their treatment of "$Q$ queries (samples) from $\Omega_1$ without replacement:" For a fixed $Q$, the augmented search space, $\Omega_Q$, consists of all sequences of length $Q$ chosen from $\Omega_1$ where no element appears more than once [...] $T_Q \in \Omega_Q$ [sic] consists of all the elements containing the original target, $T_1$.

Imagine that we are looking for a letter of the English alphabet (our $\Omega_1 = \{a,b,c, ..., z\}$), and we are allowed to guess twice. Then $\Omega_Q = \{(\omega_1,\omega_2) | \omega_1,\omega_2\in\Omega_1 \wedge \omega_1\neq\omega_2\}$. Assume that we are looking for the letter "t". Than the new target $T_2$ is given by $T_2 = \{(t,a),(t,b),...,(t,z),(a,t),(b,t),...,(z,t)\}$: these 50 elements contain the original target.

If we search successfully, we "obtain an element in the target" - for instance $(q,t)$. But knowing that $(q,t) \in T_2$ isn't sufficient to retrieve the original target: we don't know whether we have looked for the "q" or the "t", as $(q,t)$ is also a member of $\{(q,a),(q,b),...,(q,z),(a,q),(b,q),...,(z,q)\}$, the augmented target for finding "q"!

If we realize a search of 26 queries in this example, we end with the paradox result, that we always perform a successful search in the sense of Dembski and Marks, but nevertheless don't know what the original target is.

1. I began my last response above before you posted this. More later.

2. This comment has been removed by the author.

7. Dembski and Marks: $T_Q \in \Omega_Q$ [sic] consists of all the elements containing the original target, T_1.

I interpret this as saying (poorly) that elements of $T_1$ are recognized directly, and that an element of $\Omega_Q$ (the space of ordered $Q$-element samples of $\Omega_1$) is an element of the higher-order target $T_Q$ if and only if it contains an element of $T_1.$ So I don't have the problem here that you do. I recommend staying away from this, and emphasizing that a partition of $\Omega_{Q>2}$ contains at most one target. If $T_Q$ and $T^\prime_Q$ are blocks in a partition of $\Omega_Q$, then $T_1 = T^\prime_1.$ You know the proof.

1. That should have been $\Omega_{Q>1},$ not $\Omega_{Q>2}.$

2. Given a partition of $\Omega_{Q>1},$ you can determine by inspection which block, if any, is a target.

8. Back to single guesses ($Q = 1$). Suppose that we know the partition $\tilde{T} = \{T_i\}_{i=1}^N$ of $\Omega$ into possible targets. If I tell you that $\omega$ is in the target, then you know that the block $[\omega] \in \tilde{T}$ is the target. More interestingly, we can restrict the sample space to $N$ elements, with one representative of each target. For $i = 1, \ldots, N,$ let $t_i$ denote an element of $T_i.$ Then sample $\tilde{\Omega} = \{t_i\}_{i=1}^N$ instead of $\Omega.$ I don't know what to make of this. But it seems worth thinking about.

9. I don't know who beside you, DiEb, would read this far. Dembski and Marks should, and it would be nice if they did. What we have here is a record of honest consideration of the Horizontal No Free Lunch Theorem. It's obvious that we worked hard at it. We bounced ideas off one another. We came by unexpected observations along the way. I don't claim to be a great intellect. But I do know that this is what intellectual exchange looks like.

I'm signing off now. I'd be glad to review your article.