Over at
Uncommon Descent, Winston Ewert (one of the three authors of the paper
A General Theory of Information Cost Incurred by Successful Search) answers in the article "
Questioning Information Cost to "a number of questions and objections to the paper" I raised. He states fives points, which I will address in this post. Obviously, I'll give my reply at
Uncommon Descent, too, but their format doesn't allow for mathematical formulas, so it is easier to make a first draft here. I thank Winston Ewert for his answers, but I'd appreciate some further clarifications.
Firstly, Dieb objects that the quasi-Bayesian calculation on Page 56 is incorrect, although it obtains the correct result. However, the calculation is called a quasi-Bayesian calculation because it engages in hand-waving rather than presenting a rigorous proof. The text in question is shortly after a theorem and is intended to explicate the consequences of that theorem rather than rigorously prove its result. The calculation is not incorrect, but rather deliberately oversimplified.
Fair enough. So it's not a
quasi-Bayesian calculation, but a
Bayesian quasi-calculation. I will amend my post (
Please show all your work for full credit...) by Winston Ewert's explanation.
Secondly, Dieb objects that many quite different searches can be constructed which are represented by the same probability measure. However, if searches were represented as a mapping from the previously visited points to a new point (as in Wolpert and Macready’s original formulation), algorithms which derive the same queries in different ways will be represented the same way. Giving multiple searches the same representation is neither avoidable nor inherently problematic.
The problem is that Dembski's, Ewert's and Marks's construction of the representation does not only depend on the discriminator (see the next point), but on the target, too. Take $\Omega = \{1,2,3,4\}$ and two searches with two steps:
- The first search consist just of two random guesses, i.e., at each step, one of the numbers is given with probability $1/4$.
- The second search has two guesses, too. But at the first step, $1$ is taken with probability $7/16$ and each other number with $3/16$, while at the second step, one is omitted from the guess and each other number it guessed with a probability of $1/3$.
These two searches are quite different: the first may produce a query $(1,1)$ with probability $1/16$, while the second never will. Now take a discriminator $\Delta$ which returns the target if it is in the query and otherwise another element in the query at random. Such a discriminator seems to be quite natural and it is certainly within the range of the definition on pages 35--36.
Now, the distribution which $\Delta$ infers on $\Omega$ depends on the target: if we are looking for $\{1\}$, we get:
- First search: $\mu_{\{1\}}^1$ given by $\mu_{\{1\}}^1(\{1\}) = 7/16$, $\mu_{\{1\}}^1(\{2\})= \mu_{\{1\}}^1(\{3\})= \mu_{\{1\}}^1(\{4\})=3/16$
- Second search: $\mu_{\{1\}}^2 = \mu_{\{1\}}^1$
These are two algorithms which don't derive the same queries albeit in different ways, but nonetheless they will be represented the same way!
In fact, if our target is $\{2\}$, we get other distributions:
- First search: $\mu_{\{2\}}^1$ given by $\mu_{\{2\}}^1(\{2\}) = 7/16$, $\mu_{\{2\}}^1(\{1\})= \mu_{\{2\}}^1(\{3\})= \mu_{\{2\}}^1(\{4\})=3/16$
- Second search: $\mu_{\{2\}}^2\{1\} = 14/96$, $\mu_{\{2\}}^2\{2\} = 44/96$, $\mu_{\{2\}}^2\{3\} = \mu_{\{2\}}^2\{4\}= 19/96$.
Frankly, this seems to be "inherently problematic".
Thirdly, Dieb objects that a search will be biased by the discriminator towards selecting elements in the target, not a uniform distribution. However, Dieb’s logic depends on assuming that we have a good discriminator. As the paper states, we do not assume this to be the case. If choosing a random search, we cannot assume that we have a good discriminator (or any other component). The search for the search assumes that we have no prior information, not even the ability to identify points in the target.
This seems to be a little absurd. Shouldn't your representation work for any discriminator - even a good one? If we are following Wolpert's and Macready's formulation, a blind search means that we try to maximize a characteristic function. So, the natural discriminator should return this maximum if it is found in a query. If it doesn't, we build a discriminator which does: we have the output of the
inspector, so why not use it? If you are telling us that the output of the inspector may be false, then I'd use another inspector, one which gives us the output of the fitness function. If you say now that the output of the fitness function may be dubious, I'd say "tough luck: I maximize this function whether the function is right or wrong - what else is there to do?". These added layers of entities which have a hidden knowledge about the target which isn't inherent to the fitness function seem to be superfluous.
Fourthly, Dieb doesn’t see the point in the navigator’s output as it is can be seen as just the next element of the search path. However, the navigator produces information like a distance to the target. The distance will be helpful in determining where to query, but it does not determine the next element of the search path. So it cannot be seen as just the next element of the search path.
So, what is the difference between the
inspector and the
navigator? The navigator may take the output of the inspector into account, but nonetheless one could conflate both into a single pair of values - especially as you allow "different forms" for the inspector. So you could get rid of the third row of the search matrix.
Fifthly, Dieb objects that the inspector is treated inconsistently. However, the output of the inspector is not inconsistent but rather general. The information extracted by the inspector is the information relevant to whether or not a point is in the target. That information will take different forms depending on the search, it may be a fitness value, a probability, a yes/no answer, etc.
Sorry, I may have been confused by the phrase
"The inspector $O_{\alpha}$ is an oracle that, in querying a search-space entry, extracts information bearing on its probability of belonging to the target $T$": if we look at the Dawkins's Weasel and take the Hamming-distance as the fitness function, each returned value other than $0$ tells us that the
probability of belonging to the target $T$ for an element is zero itself, whether it is "METHINKS IT IS LIKE A WEASER" or "AAAAAAAAAAAAAAAAAAAAAAAAAAAA". I understand that you want to avoid the notion of proximity to a target, but your phrasing is misleading, too. Have you any example of a problem where the inspector returns a probability other than 0 or 1? In your examples, it seems to be always the output of a fitness function.
The authors of the paper conclude that Dieb’s objections derive from misunderstanding our paper. Despite five blog posts related to this paper, we find that Dieb has failed to raise any useful or interesting questions. Should Dieb be inclined to disagree with our assessment, we suggest that he organize his ideas and publish them as a journal article or in a similar venue.
It's always possible that I've misunderstood certain aspects of the paper. I would be grateful if you helped to clear up such misunderstanding. I hope that my comments above count as useful and at least a little bit interesting. I'm preparing an article, as I've promised earlier, but the work is quite tedious, and any clarification of the matters above. Furthermore, I'd like to know whether this "general framework" is still in use, or whether you have tried another way of representing searches as measures. Again, thank you Winston Ewert!