I mean the sort of “eventually approximately consistent over computable patterns” thing exhibited by logical inductors, which is stronger than limit-computability.
Say that I’m given the specification of a Turing machine, and I have a computable utility mapping from output states (including “does not halt”) to [0,1]. We presumably agree that is possible.
I agree that it’s impossible to make a computable mapping from Turing machines to outcomes, so therefore I cannot have a computable utility function from TMs to the reals which assigns the same value to any two TMs with identical output.
But I can have a logical inductor which, for each TM, produces a sequence of predictions about that TM’s output’s utility. Every TM that halts will eventually get the correct utility, and every TM that doesn’t will converge to some utility in [0,1], with the usual properties for logical inductors guaranteeing that TMs easily proven to have the same output will converge to the same number, etc.
That’s a computable sequence of utility functions over TMs with asymptotic good properties. At any stage, I could stop and tell you that I choose some particular TM as the best one as it seems to me now.
I haven’t really thought in a long while about questions like “do logical inductors’ good properties of self-prediction mean that they could avoid the procrastination paradox”, so I could be talking nonsense there.
I think we’re going to have to back up a bit. Call the space of outcomes O and the space of Turing machines M. It sounds like you’re talking about two functions, U:O→R and eval:M→O. I was thinking of U as the utility function we were talking about, but it seems you were thinking of U∘eval.
You suggested U should be computable but eval should not be. It seems to me that eval should certainly be computable (with the caveat that it might be a partial function, rather than a total function), as computation is the only thing Turing machines do, and that if non-halting is included in a space of outcomes (so that eval is total), it should be represented as some sort of limit of partial information, rather than represented explicitly, so that eval is continuous.
In any case, a slight generalization of Rice’s theorem tells us that any computable function from Turing machines to reals that depends only of the machine’s semantics must be constant, so I suppose I’m forced to agree that, if we want a utility function U∘eval that is defined on all Turing machines and depends only on their semantics, then at least one of U or eval should be uncomputable. But I guess I have to ask why we would want to assign utilities to Turing machines.
I’ve been using computable to mean a total function (each instance is computable in finite time).
I’m thinking of an agent outside a universe about to take an action, and each action will cause that universe to run a particular TM. (You could maybe frame this as “the agent chooses the tape for the TM to run on”.) For me, this is analogous to acting in the world and causing the world to shift toward some outcomes over others.
By asserting that U should be the computable one, I’m asserting that “how much do I like this outcome” is a more tractable question than “which actions result in this outcome”.
An intuition pump in a human setting:
I can check whether given states of a Go board are victories for one player or the other, or if the game is not yet finished (this is analogous to U being a total computable function). But it’s much more difficult to choose, for an unfinished game where I’m told I have a winning strategy, a move such that I still have a winning strategy. The best I can really do as a human is calculate a bit and then guess at how the leaves will probably resolve if we go down them (this is analogous to eval being an enumerable but not necessarily computable function).
In general, individual humans are much better at figuring out what outcomes we want than we are at figuring out exactly how to achieve those outcomes. (It would be quite weird if the opposite were the case.) We’re not good at either in an absolute sense, of course.
I mean the sort of “eventually approximately consistent over computable patterns” thing exhibited by logical inductors, which is stronger than limit-computability.
It’s not clear to me what this means in the context of a utility function.
Let’s talk first about non-embedded agents.
Say that I’m given the specification of a Turing machine, and I have a computable utility mapping from output states (including “does not halt”) to [0,1]. We presumably agree that is possible.
I agree that it’s impossible to make a computable mapping from Turing machines to outcomes, so therefore I cannot have a computable utility function from TMs to the reals which assigns the same value to any two TMs with identical output.
But I can have a logical inductor which, for each TM, produces a sequence of predictions about that TM’s output’s utility. Every TM that halts will eventually get the correct utility, and every TM that doesn’t will converge to some utility in [0,1], with the usual properties for logical inductors guaranteeing that TMs easily proven to have the same output will converge to the same number, etc.
That’s a computable sequence of utility functions over TMs with asymptotic good properties. At any stage, I could stop and tell you that I choose some particular TM as the best one as it seems to me now.
I haven’t really thought in a long while about questions like “do logical inductors’ good properties of self-prediction mean that they could avoid the procrastination paradox”, so I could be talking nonsense there.
I think we’re going to have to back up a bit. Call the space of outcomes O and the space of Turing machines M. It sounds like you’re talking about two functions, U:O→R and eval:M→O. I was thinking of U as the utility function we were talking about, but it seems you were thinking of U∘eval.
You suggested U should be computable but eval should not be. It seems to me that eval should certainly be computable (with the caveat that it might be a partial function, rather than a total function), as computation is the only thing Turing machines do, and that if non-halting is included in a space of outcomes (so that eval is total), it should be represented as some sort of limit of partial information, rather than represented explicitly, so that eval is continuous.
In any case, a slight generalization of Rice’s theorem tells us that any computable function from Turing machines to reals that depends only of the machine’s semantics must be constant, so I suppose I’m forced to agree that, if we want a utility function U∘eval that is defined on all Turing machines and depends only on their semantics, then at least one of U or eval should be uncomputable. But I guess I have to ask why we would want to assign utilities to Turing machines.
I’ve been using computable to mean a total function (each instance is computable in finite time).
I’m thinking of an agent outside a universe about to take an action, and each action will cause that universe to run a particular TM. (You could maybe frame this as “the agent chooses the tape for the TM to run on”.) For me, this is analogous to acting in the world and causing the world to shift toward some outcomes over others.
By asserting that U should be the computable one, I’m asserting that “how much do I like this outcome” is a more tractable question than “which actions result in this outcome”.
An intuition pump in a human setting:
I can check whether given states of a Go board are victories for one player or the other, or if the game is not yet finished (this is analogous to U being a total computable function). But it’s much more difficult to choose, for an unfinished game where I’m told I have a winning strategy, a move such that I still have a winning strategy. The best I can really do as a human is calculate a bit and then guess at how the leaves will probably resolve if we go down them (this is analogous to eval being an enumerable but not necessarily computable function).
In general, individual humans are much better at figuring out what outcomes we want than we are at figuring out exactly how to achieve those outcomes. (It would be quite weird if the opposite were the case.) We’re not good at either in an absolute sense, of course.