Or time-discounting utility, which would make the calculations described in (CORRECTION: this later paper) converge, despite what endoself said. Brief summary of paper: You can’t compute the sum of an infinite series whose sum goes to positive infinity.
You can view time-discounting as what you have to do to get your utility function over the future to converge.
This is a mathematical claim; so please have the decency to evaluate it, and don’t downvote it unless you can read my analysis below and provide a mathematical explanation of why it is in error. Mathematical truth is not arrived at by vote.
Not true—It’s not like the expected utility grows polynomially in time and you need to damp it out. It grows in hypothesis-space, and you’d need to weight it with an uncomputable function to get it to behave, an option assumed unavailable for our AI.
Hypothesis space is not mysterious; the paper is only about computing the expected value of all of a possible set of infinite series. Look on page 4, at Theorem 1:
If U is completely unbounded from above on D, then E(U(x1..n psi(Q,p,y1..n, x1..n)) | gamma_Q(y1..n) = x1..n) is either undefined or positive infinity.
E is the expected value of the utility function U. x1..n psi(...) is a sequence of values of x ranging from one to infinity, where each x_t is a perception at time t. Gamma is what the agent has perceived of the past. U(x1..infinity | agent’s perception up to time n)), computes the sum of utility U(x1..t) as t goes to infinity for one particular infinite series. E(U) is the expected value over all possible such series.
I assume this means the expected value of the sum, rather than the expected point the series converges on. The latter would mean our decision-making agent cared only about the infinite future, and not at all about the present. Making it a sum means the agent does no time-discounting.
The reason this value is unbounded is because each U(xt) is unbounded. If we use the term “unbounded” to mean “completely free to vary in any way”, then the theorem’s results would hold regardless of time-discounting. But the paper does not use the term “unbounded” in that way—and it should not. It uses the term “computably unbounded from above” to mean (Definition 1),
A. For every sequence s=x1..n, for every infinite sequence d x1..infinity that it is the initial sequence of, U(s) ⇐ U(d). (The paper words this starting with possible infinite sequences d, and restricts s to initial subsets of some d. That is the excruciatingly correct way to do it; but that makes it wordier, and would require me to rearrange everything to debug A.)
B. For every integer m, there is some finite sequence s=x1..n such that U(s) > m.
Two observations about this definition:
Point A is strange, and I consider it a bug in the paper. It isn’t just saying that all sequences of events have “unbounded” utility; it requires, by definition, that U(d=x1..t) goes to positive infinity as t goes to infinity for all possible sequences of events. In other words, our utility always increases to infinity no matter what we do. This is a problem both because it rules out any cases where we care what our agent does, and also because it assumes something at least as strong as the result the entire paper is trying to prove.
If we correct this, to read that for every s in X^p there exists some d in X^N such that U(s) ⇐ U(d) (or if we don’t, but let’s fix it anyway), then we can construct a sequence that is “unbounded” according to the definition, but that is exponentially bounded, meaning that we can find some n, p such that U(x1..n) = c^n, and for all t > n, U(x1..t) < c^t. Given that all possible sequences U(x1), U(x2), … conform to this—and I expect that any real-world sequence of environments and utilities can be made to conform to that requirement—then, with exponential time-discounting with a base larger than c, the sum of the infinite series converges. That us, E(U(x1) + U(x2)/c + U(x3)/c^2 + …) converges.
I would rather have seen the paper address the more interesting and relevant question of under what conditions the expectation of a bounded utility function can converge or diverge (if you don’t do time-discounting, as in the paper). Even the expected utility of a utility function bounded between −1 and 1 can diverge when you’re computing the sum—and you should be computing either the sum, or a time-discounted sum; no other computations are relevant to decision theory.
You appear to be describing a different paper, probably this one by the same author. The paper cited by endoself doesn’t have a page 4 (its pages are numbered 0..3), has no theorem on its page 3, doesn’t use the notation you describe, doesn’t contain a Definition 1, and in fact seems to have little in common with whatever it is you’re talking about.
You are correct! The two papers are both by Peter, and have nearly the same titles. I’ll look at this earlier paper; but I expect the later paper, which I am responding to, is a generalization of it.
I guess bounded utility functions are where it’s at then. Or finite hypothesis spaces. Or, of course, making the busy beaver function illegal :3
Or time-discounting utility, which would make the calculations described in (CORRECTION: this later paper) converge, despite what endoself said. Brief summary of paper: You can’t compute the sum of an infinite series whose sum goes to positive infinity.
You can view time-discounting as what you have to do to get your utility function over the future to converge.
This is a mathematical claim; so please have the decency to evaluate it, and don’t downvote it unless you can read my analysis below and provide a mathematical explanation of why it is in error. Mathematical truth is not arrived at by vote.
Not true—It’s not like the expected utility grows polynomially in time and you need to damp it out. It grows in hypothesis-space, and you’d need to weight it with an uncomputable function to get it to behave, an option assumed unavailable for our AI.
Hypothesis space is not mysterious; the paper is only about computing the expected value of all of a possible set of infinite series. Look on page 4, at Theorem 1:
If U is completely unbounded from above on D, then E(U(x1..n psi(Q,p,y1..n, x1..n)) | gamma_Q(y1..n) = x1..n) is either undefined or positive infinity.
E is the expected value of the utility function U. x1..n psi(...) is a sequence of values of x ranging from one to infinity, where each x_t is a perception at time t. Gamma is what the agent has perceived of the past. U(x1..infinity | agent’s perception up to time n)), computes the sum of utility U(x1..t) as t goes to infinity for one particular infinite series. E(U) is the expected value over all possible such series.
I assume this means the expected value of the sum, rather than the expected point the series converges on. The latter would mean our decision-making agent cared only about the infinite future, and not at all about the present. Making it a sum means the agent does no time-discounting.
The reason this value is unbounded is because each U(xt) is unbounded. If we use the term “unbounded” to mean “completely free to vary in any way”, then the theorem’s results would hold regardless of time-discounting. But the paper does not use the term “unbounded” in that way—and it should not. It uses the term “computably unbounded from above” to mean (Definition 1),
A. For every sequence s=x1..n, for every infinite sequence d x1..infinity that it is the initial sequence of, U(s) ⇐ U(d). (The paper words this starting with possible infinite sequences d, and restricts s to initial subsets of some d. That is the excruciatingly correct way to do it; but that makes it wordier, and would require me to rearrange everything to debug A.)
B. For every integer m, there is some finite sequence s=x1..n such that U(s) > m.
Two observations about this definition:
Point A is strange, and I consider it a bug in the paper. It isn’t just saying that all sequences of events have “unbounded” utility; it requires, by definition, that U(d=x1..t) goes to positive infinity as t goes to infinity for all possible sequences of events. In other words, our utility always increases to infinity no matter what we do. This is a problem both because it rules out any cases where we care what our agent does, and also because it assumes something at least as strong as the result the entire paper is trying to prove.
If we correct this, to read that for every s in X^p there exists some d in X^N such that U(s) ⇐ U(d) (or if we don’t, but let’s fix it anyway), then we can construct a sequence that is “unbounded” according to the definition, but that is exponentially bounded, meaning that we can find some n, p such that U(x1..n) = c^n, and for all t > n, U(x1..t) < c^t. Given that all possible sequences U(x1), U(x2), … conform to this—and I expect that any real-world sequence of environments and utilities can be made to conform to that requirement—then, with exponential time-discounting with a base larger than c, the sum of the infinite series converges. That us, E(U(x1) + U(x2)/c + U(x3)/c^2 + …) converges.
I would rather have seen the paper address the more interesting and relevant question of under what conditions the expectation of a bounded utility function can converge or diverge (if you don’t do time-discounting, as in the paper). Even the expected utility of a utility function bounded between −1 and 1 can diverge when you’re computing the sum—and you should be computing either the sum, or a time-discounted sum; no other computations are relevant to decision theory.
You appear to be describing a different paper, probably this one by the same author. The paper cited by endoself doesn’t have a page 4 (its pages are numbered 0..3), has no theorem on its page 3, doesn’t use the notation you describe, doesn’t contain a Definition 1, and in fact seems to have little in common with whatever it is you’re talking about.
You are correct! The two papers are both by Peter, and have nearly the same titles. I’ll look at this earlier paper; but I expect the later paper, which I am responding to, is a generalization of it.