Comments on DiEbLog: Dembski's and Mark's "The Search for a Search": Trying to work out one of their examples

I don't know who beside you, DiEb, would read ...

2013-04-14T23:33:14.258-07:00

I don't know who beside you, DiEb, would read this far. Dembski and Marks should, and it would be nice if they did. What we have here is a record of honest consideration of the Horizontal No Free Lunch Theorem. It's obvious that we worked hard at it. We bounced ideas off one another. We came by unexpected observations along the way. I don't claim to be a great intellect. But I do know that this is what intellectual exchange looks like.

I'm signing off now. I'd be glad to review your article.

Given a partition of $\Omega_{Q>1},$ you can de...

2013-04-14T23:06:37.089-07:00

Given a partition of $\Omega_{Q>1},$ you can determine by inspection which block, if any, is a target.

That should have been $\Omega_{Q>1},$ not $\Ome...

2013-04-14T22:55:07.276-07:00

That should have been $\Omega_{Q>1},$ not $\Omega_{Q>2}.$

Back to single guesses ($Q = 1$). Suppose that we ...

2013-04-14T22:47:16.892-07:00

Back to single guesses ($Q = 1$). Suppose that we know the partition $\tilde{T} = \{T_i\}_{i=1}^N$ of $\Omega$ into possible targets. If I tell you that $\omega$ is in the target, then you know that the block $[\omega] \in \tilde{T}$ is the target. More interestingly, we can restrict the sample space to $N$ elements, with one representative of each target. For $i = 1, \ldots, N,$ let $t_i$ denote an element of $T_i.$ Then sample $\tilde{\Omega} = \{t_i\}_{i=1}^N$ instead of $\Omega.$ I don't know what to make of this. But it seems worth thinking about.

Dembski and Marks: $T_Q \in \Omega_Q$ [sic] consis...

2013-04-14T22:30:53.526-07:00

Dembski and Marks: $T_Q \in \Omega_Q$ [sic] consists of all the elements containing the original target, T_1.

I interpret this as saying (poorly) that elements of $T_1$ are recognized directly, and that an element of $\Omega_Q$ (the space of ordered $Q$-element samples of $\Omega_1$) is an element of the higher-order target $T_Q$ if and only if it contains an element of $T_1.$ So I don't have the problem here that you do. I recommend staying away from this, and emphasizing that a partition of $\Omega_{Q>2}$ contains at most one target. If $T_Q$ and $T^\prime_Q$ are blocks in a partition of $\Omega_Q$, then $T_1 = T^\prime_1.$ You know the proof.

2013-04-14T18:08:57.674-07:00

This comment has been removed by the author.

I began my last response above before you posted t...

2013-04-14T16:40:38.014-07:00

I began my last response above before you posted this. More later.

I have to repeat something I've said in the pa...

2013-04-14T16:26:36.155-07:00

I have to repeat something I've said in the past, namely that we shouldn't allow Dembski and Marks to sweep the empty target under the rug. Even though they analyze only solved instances, the search-selecting agents (problem solvers) they predicate generally cannot know whether instances are solvable.

To see how limiting it is to assume that the solution sets of instances of a problem are mutually exclusive and nonempty, consider the canonical NP-complete problem 3-SAT. I suspect that Dembski and Marks partitioned the solution space into targets for no better reason than to enable the definition of active entropy. You may choose to show that the math does not pan out, even if we play by the rules of their game. But I hope you'll emphasize from the outset that the game is cockamamie.

Regarding your points 4 and 5.... Active entropy is related to binary description of a target-valued random variable T with Pr{T = T} = ψ(T) for all targets T. (The partition of the sample space into targets is known.) Suppose that you assign binary descriptions to minimize the expected description length, but assume that the measure is φ instead of ψ. Then the expected number of excess bits in your description of T is the negative of the active entropy.

At present, the best sense I can make of the Horizontal NFL Theorem is that Dembski and Marks are trying to relate performance in hitting the target to efficiency in describing the target. We've shown that the two are poorly related.

Dembski and Marks define as a success of a search ...

2013-04-14T14:57:08.627-07:00

Dembski and Marks define as a success of a search "obtaining an element in the target" - in general that is not the same as identifying the target: in a game of battleship it's the difference between hitting a ship and sinking it.

This becomes a problem in their treatment of "$Q$ queries (samples) from $\Omega_1$ without replacement:" For a fixed $Q$, the augmented search space, $\Omega_Q$, consists of all sequences of length $Q$ chosen from $\Omega_1$ where no element appears more than once [...] $T_Q \in \Omega_Q$ [sic] consists of all the elements containing the original target, $T_1$.

Imagine that we are looking for a letter of the English alphabet (our $\Omega_1 = \{a,b,c, ..., z\}$), and we are allowed to guess twice. Then $\Omega_Q = \{(\omega_1,\omega_2) | \omega_1,\omega_2\in\Omega_1 \wedge \omega_1\neq\omega_2\}$. Assume that we are looking for the letter "t". Than the new target $T_2$ is given by $T_2 = \{(t,a),(t,b),...,(t,z),(a,t),(b,t),...,(z,t)\}$: these 50 elements contain the original target.

If we search successfully, we "obtain an element in the target" - for instance $(q,t)$. But knowing that $(q,t) \in T_2$ isn't sufficient to retrieve the original target: we don't know whether we have looked for the "q" or the "t", as $(q,t)$ is also a member of $\{(q,a),(q,b),...,(q,z),(a,q),(b,q),...,(z,q)\}$, the augmented target for finding "q"!

If we realize a search of 26 queries in this example, we end with the paradox result, that we always perform a successful search in the sense of Dembski and Marks, but nevertheless don't know what the original target is.

I agree wholeheartedly. At the moment, I'm try...

2013-04-14T02:13:14.295-07:00

I agree wholeheartedly. At the moment, I'm trying to put my thoughts into a little article (perhaps something for BIO-complexity :-) The main problems I've identified are:

1) It generally doesn't work to identify the space of possible searches (which is their $\Omega$) with the space of targets - the letter isn't $\Omega$, but some subset of $\mathfrak{P}(\Omega)$. In the simplest case, it is $\mathfrak{P}_1(\Omega)$, so Dembski and Marks took it for $\Omega$ itself, but e.g., when we look at targets with two elements, it should be $\mathfrak{P}_2(\Omega)$. This is somewhat obvious if we think about how a target is determined - choosing a target of size two at random cannot be described by a random variable over $\Omega$ as $\mathfrak{P}_2(\Omega)$ -it's just a matter of sizes.
2) The measure used on the space of searches $\Omega$ cannot meaningfully be used on the space of targets:

3) In the simplest case, all possible searches have the same probability of success. That is normally Dembski's and Marks's benchmark for the performance of a search. As you said above, taking the logarithms just muddles the water. Their "remark" to the "HNFL" which you quoted above is patently absurd - if the "underlying measure is uniform" on the space of the targets (yes, they should of equal size, too (this can be a little bit loosened I think)), each "search strategy" in the sense of Dembski and Marks shows the same performance - the active entropy is just meaningless.

4) If the space of targets is identified correctly, the measure on the space of searches are no measures on the space of targets - and there is no Kullback-Leibler distance any more. You can try to crowbar it back in by taking a partition of $\Omega$ into elements of the space of targets and go on, but than you may get contradictory results:
Let $\Omega = \{1,2,3,4\}$, we are looking for an two-elementary subset, our search measure is given by $\mathbb(P)(\{1\})=\mathbb(P)(\{2\}) = 1/8$ and $\mathbb(P)(\{3\})=\mathbb(P)(\{4\}) = 3/8$. Then the partition $\{\{1,2\},\{3,4\}\}$ "shows" that the strategy is worse than random search, while on $\{\{1,3\},\{2,4\}\}$ it is equal to random search. And then there is the old conundrum whether there exists a partition of $\Omega$ into targets - generally there isn't.

5) Is anyone but me annoyed by the notion that they define a the success of a search as "obtaining an element in the target" instead of "identifying the target"? They wouldn't win a game of battle-ships that way! If you e.g., are performing a search for a single element of the English alphabet and you are allowed two guesses, and you are informed that your search $(a,b)$ was successful, you still don't know whether the letter you were looking for was the "a" or the "b"!

(updated version)

Oops. I meant to change, "φ-distributed guess...

2013-04-13T00:17:58.111-07:00

Oops. I meant to change, "φ-distributed guess." Now I'll invoke the natural isomorphism of Ω-valued random variables and probability measures on Ω.

In the trivial case of a simple guess, I could use...

2013-04-12T23:59:08.343-07:00

In the trivial case of a simple guess, I could use a partition into the elements of the space. As the partition isn't weighted in any way in the theorem, I have to assume that each element has the same probability to become a target. That is at least my interpretation...

Before exhuming the trivial, let's summarize what we know about the nontrivial case of Q ≥ 2. Dembski and Marks have failed to produce a sensible partition of the sample space Ω_Q. If we allow them a nonsense partition, then their model of sampling processes ("searches") as probability measures on Ω_Q is wrong.

Back to a single guess (Q = 1). Suppose that there is an unknown partition of the sample space Ω into possible targets, and that measure ψ on Ω gives the probabilities of targets. The average gain in performance (probability of hitting the target) if we guess according to probability measure φ instead of ψ is the sum of

ψ(T) ∙ φ(T) / ψ(T) = φ(T)

over all targets T. Looks like 1 to me. There is no performance gain on average, no matter how we choose φ. Let's say that we draw φ uniformly at random. For each T, the ratio of φ(T) to Ω(T) is obviously not due to prior information. But when Dembski and Marks refer to guessing as assisted search, that changes everything. What causes φ to perform better or worse than ψ in hitting a particular target T? Information. And how do we expose information? Write "-log" in front of probabilities. The average gain in log performance... er, gain in information... er, active information (yeah, that's the ticket) is the sum of

ψ(T) ∙ log(φ(T) / ψ(T)) = ψ(T) [log(φ(T)) - log(ψ(T))]

over all targets T. This "is immediately recognized as the negative of the Kullback-Leibler distance," also known as the relative entropy.

Am I interpreting the Horizontal NFL Theorem correctly? Active entropy is the expected value of active information over the targets in the partition, with the expectation in terms of ψ. Dembski and Marks introduce ψ as a mere baseline guess, but then make it the arbiter of probabilities of targets — the "true" measure. A ψ-guess has the maximum active entropy, zero. A φ-guess has negative active entropy unless φ(T) = ψ(T) for all targets. So what?

Dembski and Marks: Because the active entropy is strictly negative, any uninformed assisted search (φ) will on average perform worse than the baseline search.

Nonsense. To maximize average performance, set φ({ω}) = 1 for an element ω of a target T for which ψ(T) is maximal. For example, if there are just two targets in the partition, and ψ(T) = .9, then a φ-distributed guess hits T with probability .9, and a ψ-distributed guess hits T with probability .82. The active entropy for φ is negative infinity.

Dembski and Marks: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search.

They confuse subjective and objective probability. Not knowing ψ does not actually render the measure uniform on some space. And what is that space? The unknown partition may include targets of different sizes. Even if ψ is uniform on Ω, it does not follow that the targets are equiprobable.

Sorry if I'm wasting your time and cluttering ...

2013-04-10T01:08:35.717-07:00

Sorry if I'm wasting your time and cluttering your blog. I thank you for your thoughtful comments and the time you are taking. I'll try to be equally thorough in my answers to all of your points, but that takes a little time, too.

The target does vary in the misbegotten Horizontal NFL Theorem.: Indeed it does: they look at a partition of the target space. In the trivial case of a simple guess, I could use a partition into the elements of the space. As the partition isn't weighted in any way in the theorem, I have to assume that each element has the same probability to become a target. That is at least my interpretation...

It may be that Dembski and Marks regard the fitness function as part of the search strategy.
I thought about this, too, but I cannot imagine a way to encode this.

Perhaps he and Marks think along the lines of setting the target to I KNOW A HAWK FROM A HANDSAW and running the Weasel program without changing the fitness function. I got this impression too, and therefore, all of their observations become a little bit trivial: if the fitness function doesn't work most of the time, it's best to ignore it.

But I believe that there must be some rhyme or reason to what Dembski and Marks have done. I really hope so. But every time I try to work out their ideas on an example, I run into problems.

The target does vary in the misbegotten Horizontal...

2013-04-09T18:24:54.101-07:00

The target does vary in the misbegotten Horizontal NFL Theorem. So the probability-measure model of search strategies is inadequate in that context for your target-dependent fitness functions.

It may be that Dembski and Marks regard the fitness function as part of the search strategy. In the case of the Weasel program, Dembski was insisting 15 years ago that Dawkins had coded the fitness function to help the program find the target string. No one could get through to him the fact that the fitness function was logically external to the evolution strategy. Perhaps he and Marks think along the lines of setting the target to

I KNOW A HAWK FROM A HANDSAW

and running the Weasel program without changing the fitness function. "If one [the program] acts on this guidance, one will be less likely to find the egg [target string] than if one simply did a blind search. Because the assisted search is in this instance misleading, it actually undermines the success of the search" [Section 3].

Sorry if I'm wasting your time and cluttering your blog. But I believe that there must be some rhyme or reason to what Dembski and Marks have done.

Sorry to throw a wrench in the works, but... You ...

2013-04-09T09:43:11.039-07:00

Sorry to throw a wrench in the works, but...

You say that your sampling strategy will find the egg with probability 15/32. This holds if the Easter Bunny in fact chooses a hiding place uniformly at random. As best I can tell, though, Dembski and Marks treat the correct door ("target") as an unknown constant, thereby rendering the sampling process ("search") statistically independent of the fitness function (which is determined by the constant).

Section 2.1 starts by defining the target as a subset of the sample space. It seems to predicate that all targets are equally likely, but then explains that we must assume, in the absence of prior information, that they are. This foray into subjective probability justifies the use of uniform sampling as the baseline, zero-knowledge strategy. Dembski and Marks never treat the target as a random variable. There is no objective chance that the egg is behind any door but one.

With the correct door a constant, the chance of a sample depends only on the sampling strategy. That is why Dembski and Marks can model the strategy as a probability measure on samples. A deterministic strategy (nonrandom, like yours) draws a constant sample A, which categorically does or does not contain the correct door. That is, the chance of success is either 0 or 1. Certain success corresponds to 3 bits of active information, and certain failure to infinitely negative active information.

Dembski and Marks: We assume that we can always do at least as well as uniform random sampling.

Nope. They have specified a particular target. The chance of success for your well designed strategy is 0 for 17 of the possible targets, and 1 for the other 15. Belief that all targets are equally likely entails an expected chance of success of 15/32, but not an expected active information value of log((15/32) / (1/8)) = log(3.75).

You didn't get into active information, but I want to make it clear that I'm not rescuing Dembski and Marks. Considering how you designed your strategy, no one can say that it benefits maximally from prior information or suffers infinitely from prior disinformation. Yet the active information measure indicates one extreme or the other, depending on the arbitrary placement of the egg. As I've said before, active information is a performance ratio complicated needlessly by logarithmic transformation.

2013-04-08T22:55:13.314-07:00

This comment has been removed by the author.