Sunday, July 14, 2013

Dembski's, Ewert's and Marks's Concept of a Search Applied to Exhaustive Searches

At Uncommon Descent, Winston Ewert, co-author of the paper A General Theory of Information Cost Incurred by Successful Search, writes:
"The search is defined to be a six-tuple consisting of the initiator, terminator, inspector, navigator, nominator, and discriminator. The paper studies the question of picking a search at random, and that would imply picking each of the six components at random. We did not consider it necessary to specifically state that each individual component was also selected at random. That would seem to be implied.
So, let $\Omega = \{\omega_1, \omega_2, \dots, \omega_N\}$ be our finite search space with $N$ elements. We are looking for a single element $\omega_k$, so we try to maximize the fitness function $f = \chi_{\omega_k}$. To keep everything finite, we don't allow repetitions, i.e., in our search each place can only be visited once. This is - as Macready and Wolpert observed - always possible by keeping a look-up table and thus doesn't change the set-up. Therefore, our search is completed in at most $N$ steps.
(BTW: The claim that "each of the six components [is picked] at random" seems not to apply to the inspector: this is a fixed function for a search - in our case, the inspector returns the value of the fitness function. Of course, you can say that we pick the inspector at random out of the set of the one possible inspector.)
Let's take a look at all the searches which are ended by their terminator only after the $N$-s step, i.e., the subset of all exhaustive searches. The price question: What is the probability to find the target in such an exhaustive search? Until now, everyone looking at such problems would have thought that this probability is one: we certainly visited $\omega_k$ and spotted that the function $f$ takes it maximum there. But in the world of Dembski, Ewert, and Marks it is not, as a random discriminator takes its toll - and discriminators aren't obliged to return the target if it was found and identified...
Counterintuitive? That is a flattering description: the discriminator's purpose seems to be to turn even a search which is successful by all human standards into a guess to fit the idée fixe that each search can be "represented" by a measure on the search space.
Addendum: We can drop the condition of not having repetitions in our searches and just look at those searches which are terminated only after the whole search space was visited: terminators with this property exist. Such searches may have length $N$, but can be much longer. The result is the same: the probability of finding the target during a complete enumeration of the search space is (much) less than one. I have to ask: What good is a model in which an exhaustive search doesn't fare much better than a single guess?

Saturday, July 13, 2013

Questioning Information Cost - A reply to Winston Ewert

Over at Uncommon Descent, Winston Ewert (one of the three authors of the paper A General Theory of Information Cost Incurred by Successful Search) answers in the article "Questioning Information Cost to "a number of questions and objections to the paper" I raised. He states fives points, which I will address in this post. Obviously, I'll give my reply at Uncommon Descent, too, but their format doesn't allow for mathematical formulas, so it is easier to make a first draft here. I thank Winston Ewert for his answers, but I'd appreciate some further clarifications.
Firstly, Dieb objects that the quasi-Bayesian calculation on Page 56 is incorrect, although it obtains the correct result. However, the calculation is called a quasi-Bayesian calculation because it engages in hand-waving rather than presenting a rigorous proof. The text in question is shortly after a theorem and is intended to explicate the consequences of that theorem rather than rigorously prove its result. The calculation is not incorrect, but rather deliberately oversimplified.
Fair enough. So it's not a quasi-Bayesian calculation, but a Bayesian quasi-calculation. I will amend my post (Please show all your work for full credit...) by Winston Ewert's explanation.
Secondly, Dieb objects that many quite different searches can be constructed which are represented by the same probability measure. However, if searches were represented as a mapping from the previously visited points to a new point (as in Wolpert and Macready’s original formulation), algorithms which derive the same queries in different ways will be represented the same way. Giving multiple searches the same representation is neither avoidable nor inherently problematic.
The problem is that Dembski's, Ewert's and Marks's construction of the representation does not only depend on the discriminator (see the next point), but on the target, too. Take $\Omega = \{1,2,3,4\}$ and two searches with two steps:
  • The first search consist just of two random guesses, i.e., at each step, one of the numbers is given with probability $1/4$.
  • The second search has two guesses, too. But at the first step, $1$ is taken with probability $7/16$ and each other number with $3/16$, while at the second step, one is omitted from the guess and each other number it guessed with a probability of $1/3$.
These two searches are quite different: the first may produce a query $(1,1)$ with probability $1/16$, while the second never will. Now take a discriminator $\Delta$ which returns the target if it is in the query and otherwise another element in the query at random. Such a discriminator seems to be quite natural and it is certainly within the range of the definition on pages 35--36.
Now, the distribution which $\Delta$ infers on $\Omega$ depends on the target: if we are looking for $\{1\}$, we get:
  • First search: $\mu_{\{1\}}^1$ given by $\mu_{\{1\}}^1(\{1\}) = 7/16$, $\mu_{\{1\}}^1(\{2\})= \mu_{\{1\}}^1(\{3\})= \mu_{\{1\}}^1(\{4\})=3/16$
  • Second search: $\mu_{\{1\}}^2 = \mu_{\{1\}}^1$
These are two algorithms which don't derive the same queries albeit in different ways, but nonetheless they will be represented the same way!
In fact, if our target is $\{2\}$, we get other distributions:
  • First search: $\mu_{\{2\}}^1$ given by $\mu_{\{2\}}^1(\{2\}) = 7/16$, $\mu_{\{2\}}^1(\{1\})= \mu_{\{2\}}^1(\{3\})= \mu_{\{2\}}^1(\{4\})=3/16$
  • Second search: $\mu_{\{2\}}^2\{1\} = 14/96$, $\mu_{\{2\}}^2\{2\} = 44/96$, $\mu_{\{2\}}^2\{3\} = \mu_{\{2\}}^2\{4\}= 19/96$.
Frankly, this seems to be "inherently problematic".
Thirdly, Dieb objects that a search will be biased by the discriminator towards selecting elements in the target, not a uniform distribution. However, Dieb’s logic depends on assuming that we have a good discriminator. As the paper states, we do not assume this to be the case. If choosing a random search, we cannot assume that we have a good discriminator (or any other component). The search for the search assumes that we have no prior information, not even the ability to identify points in the target.
This seems to be a little absurd. Shouldn't your representation work for any discriminator - even a good one? If we are following Wolpert's and Macready's formulation, a blind search means that we try to maximize a characteristic function. So, the natural discriminator should return this maximum if it is found in a query. If it doesn't, we build a discriminator which does: we have the output of the inspector, so why not use it? If you are telling us that the output of the inspector may be false, then I'd use another inspector, one which gives us the output of the fitness function. If you say now that the output of the fitness function may be dubious, I'd say "tough luck: I maximize this function whether the function is right or wrong - what else is there to do?". These added layers of entities which have a hidden knowledge about the target which isn't inherent to the fitness function seem to be superfluous.
Fourthly, Dieb doesn’t see the point in the navigator’s output as it is can be seen as just the next element of the search path. However, the navigator produces information like a distance to the target. The distance will be helpful in determining where to query, but it does not determine the next element of the search path. So it cannot be seen as just the next element of the search path.
So, what is the difference between the inspector and the navigator? The navigator may take the output of the inspector into account, but nonetheless one could conflate both into a single pair of values - especially as you allow "different forms" for the inspector. So you could get rid of the third row of the search matrix.
Fifthly, Dieb objects that the inspector is treated inconsistently. However, the output of the inspector is not inconsistent but rather general. The information extracted by the inspector is the information relevant to whether or not a point is in the target. That information will take different forms depending on the search, it may be a fitness value, a probability, a yes/no answer, etc.
Sorry, I may have been confused by the phrase "The inspector $O_{\alpha}$ is an oracle that, in querying a search-space entry, extracts information bearing on its probability of belonging to the target $T$": if we look at the Dawkins's Weasel and take the Hamming-distance as the fitness function, each returned value other than $0$ tells us that the probability of belonging to the target $T$ for an element is zero itself, whether it is "METHINKS IT IS LIKE A WEASER" or "AAAAAAAAAAAAAAAAAAAAAAAAAAAA". I understand that you want to avoid the notion of proximity to a target, but your phrasing is misleading, too. Have you any example of a problem where the inspector returns a probability other than 0 or 1? In your examples, it seems to be always the output of a fitness function.
The authors of the paper conclude that Dieb’s objections derive from misunderstanding our paper. Despite five blog posts related to this paper, we find that Dieb has failed to raise any useful or interesting questions. Should Dieb be inclined to disagree with our assessment, we suggest that he organize his ideas and publish them as a journal article or in a similar venue.
It's always possible that I've misunderstood certain aspects of the paper. I would be grateful if you helped to clear up such misunderstanding. I hope that my comments above count as useful and at least a little bit interesting. I'm preparing an article, as I've promised earlier, but the work is quite tedious, and any clarification of the matters above. Furthermore, I'd like to know whether this "general framework" is still in use, or whether you have tried another way of representing searches as measures. Again, thank you Winston Ewert!

Thursday, July 4, 2013

BI:NP - A General Theory of Information Cost Incurred by Successful Search

(This is an email I wrote to William Dembski, Winston Ewert and Robert Marks II)

Hi,

it's nice to be able to read the proceedings of the conference on Biological Information – New Perspectives for free. However, I have a few questions regarding your contribution "A General Theory of Information Cost Incurred by Successful Search":

1) Your quasi-Baysian calculation on p. 56 gets the right result, but IMO it isn't correct: Please see http://dieben.blogspot.de/2013/07/please-show-all-your-work-for-full.html for details.

2) You claim that you have found a representation for searches as measures on the original space. Again, this works for guesses, but seems to be quite problematic when it comes to searches: here, many quite different searches can be constructed which are "represented" by the same $\mu$ in $M(\Omega)$!

3) You are using the uniform measure on $M(\Omega)$. Again, fine with guesses - but when it comes to searches, this becomes questionable: if $\mu_{(X_1, X_2 \dots, X_n)}$ are measures representing searches $S(X_1, X_2 \dots, X_n$), where at each step an element of $\Omega$ is chosen according to a (uniformly random) chosen measure $\theta_k$, then the measures induced by a "discriminator" (which returns an element of $T$ if it was found, otherwise a random element of the first line of the search matrix) aren't again uniformly distributed on $M(\Omega)$. In fact, we will get that for n tending to infinity, the measures approach $\delta_T$!

4) For me, your description of a search is quite convoluted: I don't see the point of the "navigator"'s output, as this can be seen just as the next element of your search path. And then there is the output of the "inspector": you are treating it quite inconsistently - once, it is the probability of an element to be a member of the target, the next time it is the output of a fitness function...

I'd like to see you addressing these issues above. Denyse O'Leary promised a series of posts at Uncommon Descent, each one dedicated to an article of the proceedings. If you don't wish to answer via mail - or comment on my blog - perhaps we can discuss these questions there?

Yours

Di…Eb…

Wednesday, July 3, 2013

Please show all your work for full credit...

(I promised a chapter-by-chapter critique of "A General Theory of Information Cost incurred by Successful Search". This is quite tedious work, so I wanted to make this little point up front - for full reading pleasure you have to be acquainted with (some of) the definitions used in the paper.)
(Nota Bene: In a reply to this post, Winston Ewert wrote on Uncommon Descent: "Dieb objects that the quasi-Bayesian calculation on Page 56 is incorrect, although it obtains the correct result. However, the calculation is called a quasi-Bayesian calculation because it engages in hand-waving rather than presenting a rigorous proof. The text in question is shortly after a theorem and is intended to explicate the consequences of that theorem rather than rigorously prove its result. The calculation is not incorrect, but rather deliberately oversimplified.")

On page 55 of their article "A General Theory of Information Cost incurred by Successful Search" (free download as pdf), the authors W. A. Dembski, W. Ewert and R. J Marks II (in future I'll refer to them as DEM) write:

To see how the probability costs ossociated with null and alternative searches relate, it is instructive to consider the following two quasi-Bayesian ways of reckoning these costs:
$\mathbf{P}$(locating $T$ via null search)=$\mathbf{P}$(null search locates T & null search is available)
=$\mathbf{P}$(null search locates T|null search is avail.) × $\mathbf{P}$(null search is avail.)
=$\mathbf{U}(T) \times 1$ [because the availability of null search is taken for granted]
=$p$.

$\mathbf{P}$(locating $T$ via alt. search)=$\mathbf{P}$(alt. search locates T & alt. search is available)
=$\mathbf{P}$(alt. search locates T|alt. search is avail.) × $\mathbf{P}$(alt search is avail.)
=$\mu(T) \times \overline{\mathbf{U}}(\overline{T}_q)$
$\le q\,\times\,p/q$
=$p$.
I have no problems with the results - at least if we can assume that the uniform measure is apt to be used on $\mathbf{M}(\Omega)$. But the equations seems to be a little bit fishy. Let me explain what I mean, using the most simple setting possible: Let $\Omega = \{0,1\}$, a set with two elements, and let $T=1$ be our target. Then $\mathbf{M}(\Omega)$ can be represented by the interval $[0;1]$: for $x \in [0;1]$, $\mu_x = (1-x) \delta_0 + x \delta_1$ is the measure with $\mu_x(\{1\}) = x$. We can even introduce an associated search $S_x := S_(\mu_x)$, which is in fact just a single guess on $\Omega$ distributed according to $\mu_x$. Ergo $\mathbf{E}(S_x) = x$. Now we can perform an experiment in two steps:
  1. Choose a measure $\mu_x \in \mathbf{M}(\Omega)$ at random.
  2. Try to locate $T$ using $S_x$.
This experiment can be represented by choosing (X,Y) on $[0;1] \times [0;1]$ according to the uniform distribution on the unit square: We look up a number x, which represents our measure, then a number y: if $y \le x$, we have located our target using $S_x$, otherwise not.
The picture displays the situation and allows us two answer some questions easily:
  • What's the probability to locate our target using the process above? Well, it's $p = 1/2$, represented by the whole green area
  • For a fixed $q$, what is the probability to choose $\mu_x$ and find our target? That's zero (or nil): the red line symbolizes this event, which is a null-set.
Dembski, Ewert, and Marks (DEM) obviously don't want to have this, that's why they don't look at $\{\theta \in \mathbf{M}(Q) | \theta(T) = q \}$, but at $\overline{T}_q = \{\theta \in \mathbf{M}(Q) | \theta(T) \ge q \}$. (pp. 53-54)
  • What's the probability to choose a measure for which the associated guess finds the target with a probability of at least q, i.e., $\overline{\mathbf{U}}(\overline{T}_q) $? That would be (1-q), easily to be seen in our case, but much more difficult to calculate for more complicated arrangements. DEM give a upper limit for this probability of $p/q$.
  • What is the probability of finding our target when we have chosen a measure in $\overline{\mathbf{U}}(\overline{T}_q)$? That depends on the measure, but it is at least $q$. On average, it is $\frac{1+q}{2}$: we get this by examining the darker green area...
  • What is the probability of choosing a measure in $\overline{\mathbf{U}}(\overline{T}_q)$ and finding the target? This is given by the darker green area, ergo $(1-q)\frac{1+q}{2}=\frac{1-q^2}{2}$
Now, the darker green area will always be smaller than the whole green area, not only in this simple example, but for all others, too. Therefore the statement: $$\mathbf{P}\text{(locating }T\text{ via alt. search}) \le p$$ is absolutely (and trivially) correct, as $\mathbf{P}\text{(locating }T\text{ via alt. search})$ is the probability of choosing an element of $\overline{\mathbf{U}}(\overline{T}_q)$ and finding the target using that element. But there is a problem in the equality $$\mu(T) \times \overline{\mathbf{U}}(\overline{T}_q) \le q\,\times\,p/q$$ While $\overline{\mathbf{U}}(\overline{T}_q) \le p/q$ we find that $$\mu(T) \ge q:$$ A measure taken from $\overline{T}_q$ will result in a search which finds the target with a probability of at least $q$. Above, we have seen, that the probability is on average $\frac{1+q}{2} > q$. So we cannot say anything about the size of $\mu(T) \times \overline{\mathbf{U}}(\overline{T}_q)$!. The shaded area in the picture shows $q\,\times\,p/q$: it has nothing to do with the probabilities which one can see so neatly in the graphic, it just happens to have the right area of $p$...
BTW: I don't think that we can split $\mathbf{P}$(alt. search locates T & alt. search is available) neatly into a product used by DEM, a little integrating would be necessary...
So, I wouldn't give full marks for this exercise, but perhaps I'm wrong?