I do not agree that that it what I’m doing. I don’t know why my willingness to use Bayes’ Theorem commits me to SilasBarta::validity.
Because you’re apparently giving the same status (“SilasBarta::validity”) to Bayesian inferences that I’m giving to the disputed syllogism S1. In what sense is it true that Bob is “probably” the murderer, given that you only know he’s been accused, and that his prints were then found on the murder weapon? Okay: in that sense I say that the conclusion of S1 is valid.
Where do you think I’m saying something different?
I deny that I am permitting myself the same thing as you. I try to make my problems well-structured enough that I have grounds for using a given probability distribution. I remain unconvinced that probabilistic syllogisms not attached to any particular instance have enough structure to justify a probability distribution for their elements—too much is left unspecified.
What about the Bayes Theorem itself, which does exactly that (specify a probability distribution on variables not attached to any particular instance)?
In your problem, suppose that, for whatever reason, I prefer the floodle scale to the probability scale, where floodle = prob + sin(2piprob)/(2.1*pi). Why do I not get to apply a Shannon-maxent derivation on the floodle scale?
Because a) your information was given with the probability metric, not the floodle metric, and b) a change in variable can never be informative, while this one allows you to give yourself arbitrary information that you can’t have, by concentrating your probability on an arbitrary hypothesis.
The link I gave specified that the uniform distribution maximizes entropy even for the Jaynes definition.
Because you’re apparently giving the same status (“SilasBarta::validity”) to Bayesian inferences that I’m giving to the disputed syllogism S1.
For me, the necessity of using Bayesian inference follows from Cox’s Theorem, an argument which invokes no meta-probability distribution. Even if Bayesian inference turns out to have SilasBarta::validity, I would not justify it on those grounds.
What about the Bayes Theorem itself, which does exactly that (specify a probability distribution on variables not attached to any particular instance)?
I wouldn’t say that Bayes’ Theorem specifies a probability distribution on variables not attached to any particular instance; rather it uses consistency with classical logic to eliminate a degree of freedom in how other methods can specify otherwise arbitrary probability distributions. That is, once I’ve somehow picked a prior and a likelihood, Bayes’ Theorem shows how consistency with logic forces my posterior distribution to be proportional to the product of those two factors.
Because a) your information was given with the probability metric, not the floodle metric, and b) a change in variable can never be informative, while this one allows you to give yourself arbitrary information that you can’t have, by concentrating your probability on an arbitrary hypothesis.
I’m going to leave this by because it is predicated on what I believe to be a confusion about the significance of using Shannon entropy instead of Jaynes’s version.
The link I gave specified that the uniform distribution maximizes entropy even for the Jaynes definition.
We’re at the “is not! / is too!” stage in our dialogue, so absent something novel to the conversation, this will be my final reply on this point.
The link does not so specify: this old revision shows that the example refers specifically to the Shannon definition. I believe the more general Jaynes definition was added later in the usual Wikipedia mishmash fashion, without regard to the examples listed in the article.
In any event, at this point I can only direct you to the literature I regard as definitive: section 12.3 of PT:LOS (pp 374-8) (ETA: Added link—Google Books is my friend). (The math in the Wikipedia article Principle of maximum entropy follows Jaynes’s material closely. I ought to know: I wrote the bulk of it years ago.) Here’s some relevant text from that section:
The conclusions, evidently, will depend on which [invariant] measure we adopt. This is the shortcoming from which the maximum entropy principle has suffered until now, and which must be cleared up before we can regard it as a full solution to the prior probability problem.
Let us note the intuitive meaning of this measure. Consider the one-dimensional case, and suppose it is known that a < x < b but we have no other prior information. Then… [e]xcept for a constant factor, the measure m(x) is also the prior distribution describing ‘complete ignorance’ of x. The ambiguity is, therefore, just the ancient one which has always plagued Bayesian statistics: how do we find the prior representing ‘complete ignorance’? Once this problem is solved [emphasis added], the maximum entropy principle will lead to a definite, parameter-independent method of setting up prior distributions based on any testable prior information.
Y… you mean you were citing as evidence a Wikipedia article you had heavily edited? Bad Cyan! ;-)
Okay, I agree we’re at a standstill. I look forward to comments you may have after I finish the article I mentioned. FWIW, the article isn’t about this specific point I’ve been defending, but rather, about the Bayesian interpretation of standard fallacy lists, where my position here falls out as a (debatable) implication.
Because you’re apparently giving the same status (“SilasBarta::validity”) to Bayesian inferences that I’m giving to the disputed syllogism S1. In what sense is it true that Bob is “probably” the murderer, given that you only know he’s been accused, and that his prints were then found on the murder weapon? Okay: in that sense I say that the conclusion of S1 is valid.
Where do you think I’m saying something different?
What about the Bayes Theorem itself, which does exactly that (specify a probability distribution on variables not attached to any particular instance)?
Because a) your information was given with the probability metric, not the floodle metric, and b) a change in variable can never be informative, while this one allows you to give yourself arbitrary information that you can’t have, by concentrating your probability on an arbitrary hypothesis.
The link I gave specified that the uniform distribution maximizes entropy even for the Jaynes definition.
For me, the necessity of using Bayesian inference follows from Cox’s Theorem, an argument which invokes no meta-probability distribution. Even if Bayesian inference turns out to have SilasBarta::validity, I would not justify it on those grounds.
I wouldn’t say that Bayes’ Theorem specifies a probability distribution on variables not attached to any particular instance; rather it uses consistency with classical logic to eliminate a degree of freedom in how other methods can specify otherwise arbitrary probability distributions. That is, once I’ve somehow picked a prior and a likelihood, Bayes’ Theorem shows how consistency with logic forces my posterior distribution to be proportional to the product of those two factors.
I’m going to leave this by because it is predicated on what I believe to be a confusion about the significance of using Shannon entropy instead of Jaynes’s version.
We’re at the “is not! / is too!” stage in our dialogue, so absent something novel to the conversation, this will be my final reply on this point.
The link does not so specify: this old revision shows that the example refers specifically to the Shannon definition. I believe the more general Jaynes definition was added later in the usual Wikipedia mishmash fashion, without regard to the examples listed in the article.
In any event, at this point I can only direct you to the literature I regard as definitive: section 12.3 of PT:LOS (pp 374-8) (ETA: Added link—Google Books is my friend). (The math in the Wikipedia article Principle of maximum entropy follows Jaynes’s material closely. I ought to know: I wrote the bulk of it years ago.) Here’s some relevant text from that section:
Y… you mean you were citing as evidence a Wikipedia article you had heavily edited? Bad Cyan! ;-)
Okay, I agree we’re at a standstill. I look forward to comments you may have after I finish the article I mentioned. FWIW, the article isn’t about this specific point I’ve been defending, but rather, about the Bayesian interpretation of standard fallacy lists, where my position here falls out as a (debatable) implication.