Friday, October 23, 2009

A silent death of the Horizontal no Free Lunch theorem

The only reply I got for my concerns regarding W. Dembski's and R. Marks's article The Search for a Search - and its Horizontal No Free Lunch theorem - was an email from R. Marks, stating that he'd look into it.
Now, two weeks later, R. Marks took the article from his list of publications. The Evolutionary Informatics lab has still an announcement, basically the abstract of the article:
Many searches are needle-in-the-haystack problems, looking for small targets in large spaces. In such cases, blind search can stand no hope of success. Success, instead, requires an assisted search. But whence the assistance required for a search to be successful? To pose the question this way suggests that successful searches do not emerge spontaneously but need themselves to be discovered via a search. The question then naturally arises whether such a higher-level “search for a search” is any easier than the original search. We prove two results: (1) The Horizontal No Free Lunch Theorem, which shows that average relative performance of searches never exceeds unassisted or blind searches. (2) The Vertical No Free Lunch Theorem, which shows that the difficulty of searching for a successful search increases exponentially compared to the difficulty of the original search.
The given link doesn't lead to the article any longer, but to their earlier opus Conservation of Information in Search - neither the term vertical nor the term horizontal appear in this article.

In his first podcast with Casey Luskin, W. Dembski promised that this article was something like a nail to the coffin of evolution. I assume that W. Dembski will explain how he was wrong with this assessment in his next interview...


  1. Thank you for looking into the Horizontal NFLT. I have previously ignored it, as it seems so wrong to me that I know I must be missing something. Maybe you can tell me what I'm missing.

    Assuming just one query (so we can set aside for now any problems with m>1), I don't understand the connection between relative entropy and the probability of success. M&D seem to be saying that, since the "active entropy" of φ with respect to U is negative, φ has a lower probability of success than U. But we can swap U and φ in III.6, and the "active entropy" is still negative. Using M&D's logic, do we then conclude that U has a lower probability of success than φ? Does their logic dictate that every distribution performs worse than every other distribution? Obviously, that's nonsense.

    Relative entropy is simply a weighted average of probability ratios, and I don't see how it could indicate the probability of the outcome matching an arbitrarily chosen target.

    I don't see why all distributions don't perform equally. It should make no difference what distribution we use, because it makes no difference what point gets queried. No point is any better than any other when all possible search structures and targets are averaged together.

    Help me out, Dieb. What am I missing?

  2. Regarding the problem of representing a search as a single event over a space of m-vectors, you make a good point regarding the extension of T into Ω^m. Indeed, knowing that the target doesn't move between queries constitutes some free "active information". I think that M&D's approach is to assume that m is so much smaller than |Ω| that the advantage of sampling without replacement is negligible.

    In their IEEE paper, they talk about active information per query, or else they express active info as a function of the number of queries. In their other two papers, in order to simplify the proofs of their theorems, they treat a search as a single event over an m-dimensional space. Indeed, they have no need to mention searches at all in those two papers, since their theorems apply to any event.

    I interpret section II.B in "Search for a Search" as M&D's argument that they can, without loss of generality, always assume that a search consists of a single event. My understanding is that we must put an implied tilde over everything in their theorems in order to apply their theorems to searches. M&D essentially wash their hands of iterative searches, and deal only with single events.

    Of course, I may not be understanding them correctly. Their intention isn't completely clear, and they don't show how their approach applies to any real searches.

  3. 1. They use the Kullback-Leibler divergence to define their active entropy. I don't think it works the way they intend it to do.

    2. It's not only that the target moves, but that you have to hit it at each step...

    3. You can interpret a search as a single event in Ω^m. This event however has to be of a special form. The problem: a search of m steps in Ω often doesn't work as a guess in Ω^m, and the math breaks down if you try to do this.

    4. For single events, most of the ideas are obvious....

  4. From my point of view, it's obvious that the Kullback-Leibler divergence is the wrong measure for their theorem. It's also obvious to me that all distributions have equal probability of success when averaged over all possible targets. In fact, these points seem so obvious to me that I feel I must be misunderstanding this section of the paper.

    Regarding your point 4, yes, their ideas are pretty obvious for single events. I think that they assume that proving their theorems for single events is sufficient. As you point out, this approach has some pitfalls when you apply it to actual searches.