Is there some reason we can’t bypass this problem by using, say, the surreal numbers instead of the real numbers? This is a genuine question, not a rhetorical one.
I dunno, but I don’t see any reason why that should be the case—especially on continuous domains.
So problem one is that this is circular if someone values the truth.
The same criticism applies to Occam’s Razor
Not quite—Occam’s razor is arbitrary, but it isn’t circular. The circularity comes in from thinking both “it is good to believe true things” and “things that are good to believe are more likely to be true.”
For your stock market example, note that any theory which lets you predict the stock market would be high-utility. Some theories are complete opposites, like “the stock market will crash” and “the stock market will boom”. By symmetry I suspect that their weighted contributions to the sum of models will cancel out.
“By symmetry” seems to be misapplied in this particular case. Think of a single person with a single belief. What does that person do when faced with a choice of how to update their belief?
The same goes for Santa sentences, “it is high utility to believe that this sentence is true and that Santa Claus exists” and “it is high utility to believe that this sentence is true and that Santa Claus does not exist” simply cancel out.
There’s still a problematic contradiction between “it is high utility to believe that this sentence is true and that Santa Claus exists” and “Santa Claus exists.”
I dunno, but I don’t see any reason why that should be the case—especially on continuous domains.
Now that I consider it a bit more, since the number of deterministic programs for modeling an environment is countably infinite, it should only require a hyperreal infinitesimal weight to maintain conservation of probability. The surreals are completely overkill. And furthermore, that’s only in the ideal case—a practical implementation would only examine a finite subset of programs, in which case the theoretical difficulty doesn’t even arise.
Not quite—Occam’s razor is arbitrary, but it isn’t circular. The circularity comes in from thinking both “it is good to believe true things” and “things that are good to believe are more likely to be true.”
It doesn’t look circular to me. It just looks like a restatement. If you consider a specific model of an environment, the way you evaluate its posterior probability (i.e. whether it’s true) is from its predictions, and the way you get utility from the model is also by acting on its predictions. The truth and the goodness of a belief end up being perfectly dependent factors when your posterior probability is dominated by evidence, so it doesn’t seem problematic to me to also have the truth and goodness of a belief unified for evaluation of the prior probability.
“By symmetry” seems to be misapplied in this particular case. Think of a single person with a single belief.
Hm, I think you and I are viewing these differently. I had in mind an analogy to the AIXI model of AI: it’s a single entity, but it doesn’t have just a single belief. AIXI keeps all the beliefs that fit and weights them according to the Solomonoff prior, then it acts based on the weighted combination of all those beliefs. Now obviously I haven’t done the math so I could be way off here, but I suspect that the appeal to symmetry works in the case of equal-and-opposite high-utility beliefs like the ones mentioned for the stock market and Santa clause precisely because the analogous AI model with the Goldpan would keeps all beliefs (in a weighted combination) instead of choosing just one.
It should only require a hyperreal infinitesimal weight to maintain conservation of probability.
Doing arithmetic on infinities is not the same as doing infinite sequences of arithmetic. You can talk about a hyperreal-valued uniform prior, but can you actually do anything with it that you couldn’t do with an ordinary limit?
P(o%7CH_i)n%5E{-1})
The reasons that limit doesn’t suffice to specify a uniform prior are: (1) The result of the limit depends on the order of the list of hypotheses, which doesn’t sound very uniform to me (I don’t know if it’s worse than the choice of universal Turing machine in Solomonoff, but at least the pre-theoretic notion of simplicity comes with intuitions about which UTMs are simple). (2) For even more perverse orders, the limit doesn’t have to converge at all. (Even if utility is bounded, partial sums of EU can bounce around the bounded range forever.)
Hyperreal-valued expected utility doesn’t change (1). It does eliminate (2), but I think you have to sacrifice computability to do even that much: Construction of the hyperreals involves the axiom of choice, which prevents you from actually determining which real number is infinitesimally close to the hyperreal encoded by a given divergent sequence.
You wrote the above in regards to hyperreal infinities, “the hyperreal encoded by a given divergent sequence”. I’m under the impression that hyperreal infinitesimals are encoded by convergent sequences: specifically, sequences that converge to zero. The hyperreal [1, 1⁄2, 1⁄3, 1⁄4, …] is the one that corresponds to the limit you gave. Does that adequately dispel the computability issue you raised?
In any case, non-computability isn’t a major defect of the utilitarian prior vis-a-vis the also non-computable Solomonoff prior. It is an important caution, however.
Your first objection seems much more damaging to the idea of a utilitarian prior. Indeed, there seems little reason to expect max(U(o|Hi)) to vary in a systematic way with a useful enumeration of the hypotheses.
A non-constant sequence that converges to zero encodes an infinitesimal, and I think any infinitesimal has an encoding of that form. But a sequence that’s bounded in absolute value but doesn’t converge, e.g.
), also encodes some real plus some infinitesimal. It’s this latter kind that involves the axiom of choice, to put it in an equivalence class with some convergent sequence.
[1, 1⁄2, 1⁄3, 1⁄4, …] is the infinitesimal in the proposed definition of a uniform prior, but the hyperreal outcome of the expected utility calculation is
,%20{1\over%202}\sum_{i=1}%5E2U(o%7CH_i),%20{1\over%203}\sum_{i=1}%5E3U(o%7CH_i),%20...]) which might very well be the divergent kind.
Agreed that my first objection was more important.
I dunno, but I don’t see any reason why that should be the case—especially on continuous domains.
Not quite—Occam’s razor is arbitrary, but it isn’t circular. The circularity comes in from thinking both “it is good to believe true things” and “things that are good to believe are more likely to be true.”
“By symmetry” seems to be misapplied in this particular case. Think of a single person with a single belief. What does that person do when faced with a choice of how to update their belief?
There’s still a problematic contradiction between “it is high utility to believe that this sentence is true and that Santa Claus exists” and “Santa Claus exists.”
Now that I consider it a bit more, since the number of deterministic programs for modeling an environment is countably infinite, it should only require a hyperreal infinitesimal weight to maintain conservation of probability. The surreals are completely overkill. And furthermore, that’s only in the ideal case—a practical implementation would only examine a finite subset of programs, in which case the theoretical difficulty doesn’t even arise.
It doesn’t look circular to me. It just looks like a restatement. If you consider a specific model of an environment, the way you evaluate its posterior probability (i.e. whether it’s true) is from its predictions, and the way you get utility from the model is also by acting on its predictions. The truth and the goodness of a belief end up being perfectly dependent factors when your posterior probability is dominated by evidence, so it doesn’t seem problematic to me to also have the truth and goodness of a belief unified for evaluation of the prior probability.
Hm, I think you and I are viewing these differently. I had in mind an analogy to the AIXI model of AI: it’s a single entity, but it doesn’t have just a single belief. AIXI keeps all the beliefs that fit and weights them according to the Solomonoff prior, then it acts based on the weighted combination of all those beliefs. Now obviously I haven’t done the math so I could be way off here, but I suspect that the appeal to symmetry works in the case of equal-and-opposite high-utility beliefs like the ones mentioned for the stock market and Santa clause precisely because the analogous AI model with the Goldpan would keeps all beliefs (in a weighted combination) instead of choosing just one.
Doing arithmetic on infinities is not the same as doing infinite sequences of arithmetic. You can talk about a hyperreal-valued uniform prior, but can you actually do anything with it that you couldn’t do with an ordinary limit?
P(o%7CH_i)n%5E{-1})The reasons that limit doesn’t suffice to specify a uniform prior are: (1) The result of the limit depends on the order of the list of hypotheses, which doesn’t sound very uniform to me (I don’t know if it’s worse than the choice of universal Turing machine in Solomonoff, but at least the pre-theoretic notion of simplicity comes with intuitions about which UTMs are simple). (2) For even more perverse orders, the limit doesn’t have to converge at all. (Even if utility is bounded, partial sums of EU can bounce around the bounded range forever.)
Hyperreal-valued expected utility doesn’t change (1). It does eliminate (2), but I think you have to sacrifice computability to do even that much: Construction of the hyperreals involves the axiom of choice, which prevents you from actually determining which real number is infinitesimally close to the hyperreal encoded by a given divergent sequence.
Thanks!
You wrote the above in regards to hyperreal infinities, “the hyperreal encoded by a given divergent sequence”. I’m under the impression that hyperreal infinitesimals are encoded by convergent sequences: specifically, sequences that converge to zero. The hyperreal [1, 1⁄2, 1⁄3, 1⁄4, …] is the one that corresponds to the limit you gave. Does that adequately dispel the computability issue you raised?
In any case, non-computability isn’t a major defect of the utilitarian prior vis-a-vis the also non-computable Solomonoff prior. It is an important caution, however.
Your first objection seems much more damaging to the idea of a utilitarian prior. Indeed, there seems little reason to expect max(U(o|Hi)) to vary in a systematic way with a useful enumeration of the hypotheses.
A non-constant sequence that converges to zero encodes an infinitesimal, and I think any infinitesimal has an encoding of that form. But a sequence that’s bounded in absolute value but doesn’t converge, e.g.
), also encodes some real plus some infinitesimal. It’s this latter kind that involves the axiom of choice, to put it in an equivalence class with some convergent sequence.[1, 1⁄2, 1⁄3, 1⁄4, …] is the infinitesimal in the proposed definition of a uniform prior, but the hyperreal outcome of the expected utility calculation is
,%20{1\over%202}\sum_{i=1}%5E2U(o%7CH_i),%20{1\over%203}\sum_{i=1}%5E3U(o%7CH_i),%20...]) which might very well be the divergent kind.Agreed that my first objection was more important.