Friday, May 25, 2012

Is the average active information a suitable measure of search performance?

(Though the following doesn't include any maths, the reader is expected to be familiar with William Dembski's and Robert Marks's paper The Search for a Search and should have glanced at On a Remark by Robert J. Marks and William A. Dembski)

One of my problems with the modeling of searches by William Dembski and Robert Marks is that I don't see how every assisted search can be described as a probability measure on the space of the feasible searches. But nevertheless, Winston Ewert insisted that

All assisted search, irrespective of the manner in which they are assisted, can be modeled as a probably distribution biased towards selecting elements in the target.
Marks and Dembski claim that the average active information is a measure of search performance - at least they write in their remark:
If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search.
Their erratum seems indeed to proof the remark in a slightly modified way:
Given a uniform distribution over targets of cardinality k, and baseline uniform distribution, the average active information will be non-positive
(The proof of this statement in the erratum is correct - at least as far as I can see...)
So, lets play a game: From a deck of cards one is chosen at random. If you want to play, you have to pay 1\$, and you get 10\$ if you are able to guess the card correctly. But you are not alone, there are three other people A, B and (surprisingly) C who'll announce their guesses first. They use the following search strategies:
  • A: he will announce a card according to the uniform distribution
  • B: he will always announce ♦2
  • C: He has access to a very powerful oracle, which gives him the right card. Unfortunately - due to an old superstition - he is unable to say ♦2, so every time this card appears he will announce another one at random
Which strategy will you follow? Siding with player A or B gives you a chance of 1/52 for a correct guess, so you will loose on average ca. 81¢ per game. However, if you pose your bet with C, you will win 8.81$ a game in the long run! That's because the probability of a correct guess is 1/52 for both our players A and B, while C's chance for success is 51/52.

But what would Marks, Dembski or Ewert do? They calculate the average active information according to the formula given in the erratum. This E[I+] is 0 for player A, but -∞ for B and C. As negative active information on average renders the search performance worse than random search, they have to stick with player A.

So, either average active information is not a good measure of search performance, or not every assisted search, irrespective of the manner in which they are assisted can be modeled as a probably distribution. Or it is a bit of both...

12 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thanks Tom for making the main points clear. When Marks and Dembski (and Ewert) published their last paper at BIO-complexity, the "Search for a Search" - once hailed as a coffin-nail to Darwinism by W. Dembski - was suspiciously missing from the list of references...

    ReplyDelete
  3. As for the latter part of your post, I've shown that "active information" is a measure of performance gain made to look like a measure of information gain:

    http://boundedtheoretics.blogspot.com/2012/04/bob-marks-grossly-misunderstands-no.html

    So it should be possible to generate various examples like yours. And I'm all for seeing people do so. Thanks.

    ReplyDelete
  4. "Or it is a bit of both..."

    I would say it's both.

    Ewert is certainly mistaken in asserting that "all assisted search, irrespective of the manner in which they are assisted, can be modeled as a probably distribution biased towards selecting elements in the target." An "oracle"-assisted search can't be modeled as a single distribution. The distribution produced by the search's selection strategy depends on the information provided by the oracle, which in turn depends on the target location. Since the HNFLT assumes that the distribution is not dependent on the target location, it doesn't seem to apply to oracle-assisted searches such as your search strategy C above.

    And I personally don't see any use for the "average active information" measure, although that may be due a lack of imagination on my part. Under HNFLT conditions, the average active information is always non-positive because log(x) is a concave transformation. We could just as easily define a different information measure with a convex transformation, say x^2. If we define hyperactive information as square(ψ(T)/φ(T)), then the average hyperactive information will always be non-negative. So according to this measure, every search is better than random sampling, which just goes to show that inventing arbitrary measures allows us to reach any conclusion we want.

    ReplyDelete
  5. DiEb and Rob,

    I deleted the first comment because it contained an error that weakened the argument of the OP. Sorry for that.

    I'd like to see everyone deny D&M their loaded terminology. Our interpretation of sampling as "search" reflects nothing but our measure of performance on the sample. There is no gain of exploitable information in sampling, and the observations in a sample are data. "Assisted search" is biased sampling. Wolpert and Macready wrote of an "oracle" only to indicate that each function evaluation was completed in constant time.

    There is no free lunch under the conditions of the "erratum." The distribution of the sample does not depend on the choice of sampler. Thus average active information can say nothing about the overall distribution of performance. However, the expected performance of a non-uniform sampler may vary across "targets" of a given size, while the expected performance of the uniform sampler is constant. And the average active information reflects that variation. It does so very badly, if only because the log of the performance ratio is commonly undefined (D&M seem never to think of deterministic samplers), while the performance ratio itself is always defined.

    Dembski and Marks must treat the fitness function as a component of the "search," or their claim that Darwinian evolution is inherently teleological falls to pieces. Give them their madness, and they can, I think, embed fitness measure within probability measure. Denying them that should be, IMHO, at the top of our agenda.

    ReplyDelete
  6. P.S.--For the first time, Dembski and Marks properly allow for the existence of unsolvable problems. In the corollary, the summation includes targets of size 0, for which the "active information" is undefined (negative infinity). Thus the average "active information" in the general case is undefined. The upshot is that D&M have restricted themselves to "searches" that draw just one element of Ω, and now will have to restrict themselves to cases where there's magical knowledge that the problem is soluble.

    ReplyDelete
  7. Correction to my comment above: "...then the average hyperactive information will always be at least one."

    DiEb's example is a great illustration of the problem with "average active information", but a.a.i. is still an improvement over "active entropy". At least a.a.i. is anticommutative -- that is, the a.a.i. of an alternate search compared to a baseline is the negative a.a.i. of the baseline compared to the alternate. Active entropy is non-positive no matter what we compare to what.

    Note that Dembski and Marks haven't acknowledged any problem with their original HNFLT "proof" other than the issue of multiple queries resulting in overlapping targets. So presumably they still see their active entropy approach as legitimate in cases of a single query.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. I, too, think that it was a fine idea to relate this "information" to gambling. After the fact, it seems an obvious thing to do, given the emphasis on gambling in some information theory textbooks. But coming up with simple, seemingly obvious responses to obfuscated math is actually difficult.

    The irrational gambling DiEb illustrates is due entirely to the gratuitous log transformation of the likelihood ratio. (Well, it's not gratuitous if you're engaged in "intelligence creates information" mythopoeia.) Remove "-log" from the average active information expression, and you get average likelihood ratios of 1 for both (A) and (B), and 51 for (C). And the expected payoff (not profit) of 9.81 for (C) is indeed 51 times greater than the expected payoff of 0.19 for the other two strategies.

    To relate this to what Dembski and Marks seem to have been trying to do with the Horizontal NFLT, let's say that you don't know the distribution of the drawn card on the set of 52 standard playing cards. Integrating over all distributions, the expected payoffs for the fixed guess (A) and the uniform guess (B) remain identical. However, the variance of the payoff is minimal for the uniform guess (or am I having yet another very bad brain day?). Now, if you actually know the non-uniform distribution of the drawn card, then you maximize expected payoff by guessing a most-probable card. Is the variance of the payoff not minimized by matching the distribution of your guess to the known distribution of the drawn card? Then the Kullback-Leibler divergence (negative active entropy) of the guessing distribution from the actual distribution of "target" cards would be zero. And the expected performance would be less than optimal.

    ReplyDelete
  10. Tom: "I'd like to see everyone deny D&M their loaded terminology."

    Amen. If a "search" involves only a single "query", as is the case in all of D&M's theorems, then it's nothing more than an Ω-valued random variable. So why not call it a random variable, and talk of the random variable being realized rather than the "search" being "executed"? And a "probablistic hierarchy" is nothing more than a chain of random variables, so why not call it a stochastic process? And D&M claim that the term "target" doesn't necessarily imply pre-identification or intent, but they never offer a clear or consistent definition of the term, other than telling us that it's a subset of Ω. So let's just call it an event.

    And don't even get me started on their usage of the term "information". In Life's Conservation Law, they say that information is created when an event is selected from a set of live possibilities, but they never say what is entailed by the term "live possibility". Non-zero frequentist probability? Non-zero epistemic probability? If so, according to what observer? And they seem to use the term "select" in a teleological sense. ("Selection, as understood before Darwin, had been an activity confined to intelligent agents." "...does nature really possess the power to select and thereby create information?") Then they seem to forget this definition when they introduce the term "active information", which is defined simply as bias, with no definitional requirement that the bias must be selected from a set of live possibilities.

    All of these problematic terms connote teleology to the ID crowd. Once the terms are replaced with standard unloaded terminology, the illusion that D&M's work is relevant to ID dissolves. So I agree with Tom's suggestion that we boycott D&M's terminology. D&M's arguments are rendered impotent simply by expressing them in unloaded language.

    ReplyDelete
  11. Ignore my remarks about variance in the last paragraph of the preceding comment. (Note to self: Think first, then write. Each is sufficiently challenging in itself.)

    ReplyDelete
  12. Rob wrote

    And they seem to use the term "select" in a teleological sense. ("Selection, as understood before Darwin, had been an activity confined to intelligent agents." "...does nature really possess the power to select and thereby create information?")

    It's fun (well, a little) to point out that Dembski's own published definition of "intelligence" fits the action of (non-teleological) natural selection operating on random variability. "Febble"--Elizabeth Liddle of The Skeptical Zone, a British neuroscientist--was banned from Dembski's Uncommon Descent for pointing that out. Story here. The only update to that story is that Febble is now an atheist, no longer a Christian theist.

    ReplyDelete