Yeah, I explored this direction pretty thoroughly a few years ago. The simplest way is to assume that agents don’t have probabilities, only utility functions over combined outcomes, where a “combined outcome” is a combination of outcomes in all possible worlds. (That also takes care of updating on observations, we just follow UDT instead.) Then if we have two agents with utility functions U and V over combined outcomes, any Pareto-optimal way of merging them must behave like an agent with utility function aU+bV for some a and b. The theory sheds no light on choosing a and b, so that’s as far as it goes. Do you think there’s more stuff to be found?
It sounds like you considered a more general setting than I am an the moment. I want to eventually move to that kind of “combined outcome” setting, but first, I want to understand more classical preference structures and break things one at a time.
Do you think your version sheds any light on value learning in UDT? I had a discussion with Alex Appel about this, in which it seemed like you have a “nosy neighbors” problem, where a potential set of values may care about what happens even in worlds where different values hold; but, this problem seemed to be bounded by such other-world preferences acting like beliefs. For example, you could imagine a UDT agent with world-models in which either vegetarianism or carnivorism are right (which somehow make different predictions). Each set of preferences can either be “nosy” (cares what happens regardless of which facts end up true) or “non-nosy” (each preference set only cares about what happens in their own world—vegetarianism cares about the amount of meat eaten in veg-world, and carnivorism cares about amount of meat eaten in carn-world).
The claim which seemed plausible was that nosiness has some kind of balancing behavior which acts like probability: putting some of your caring measure on other worlds reduces your caring measure on your own.
By nosy preferences, do you mean something like this?
“I am grateful to Zeus for telling me that cows have feelings. Now I know that, even if Zeus had told me that cows are unfeeling brutes, eating them would still be wrong.”
But that just seems irrational and not worth modeling. Or do you have some other kind of situation in mind?
Pretty much that, actually. It doesn’t seem too irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn’t an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.
The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it’s an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other worlds, to dictate some of what happens in your world (since the marginal contribution of your world to the prior expected utility looks like some linear combination of the various utility functions, weighted by how much they care about your world) So, in practice, it’d be a bad idea to build a UDT value learning prior containing utility functions that have preferences over all worlds, since it’d add a bunch of extra junk from different utility functions to our world if run.
“I’m grateful to HAL for telling me that cows have feelings. Now I’m pretty sure that, even if HAL had a glitch and mistakenly told me that cows are devoid of feeling, eating them would still be wrong.”
That’s valid reasoning. The right way to formalize it is to have two worlds, one where eating cows is okay and another where eating cows is not okay, without any “nosy preferences”. Then you receive probabilistic evidence about which world you’re in, and deal with it in the usual way.
I’m not clear on whether it is rational or not. It seems like behavior we don’t want from a value learner, but I was curious about how “inevitable” it is from attempts to mix updatelessness with value learning. (Perhaps it is a really simple point, but I haven’t thought it entirely through, still.)
Well, the version of UDT I’m using doesn’t have probabilities, only a utility function over combined outcomes. It’s just a simpler way to think about things. I think you and Scott might be overestimating the usefulness of probabilities. For example, in the Sleeping Beauty problem, the coinflip is “spacelike separated” from you (under Scott’s peculiar definition), but it can be assigned different “probabilities” depending on your utility function over combined outcomes.
That seems good to understand better in itself, but it isn’t a crux for the argument. Whether you’ve got “probabilities” or a “caring measure” or just raw utility which doesn’t reduce to anything like that, it still seems like you’re justifying it with Pareto-type arguments. Scott’s claim is that Pareto-type arguments won’t apply if you correctly take into account the way in which you have control over certain things. I’m not sure if that makes any sense, but basically the question is whether CCT can make sense in a logical setting where you may have self-referential sentences and so on.
That’s a great question. My current (very vague) idea is that we might need to replace first order logic with something else. A theory like PA is already updateful, because it can learn that a sentence is true, so trying to build updateless reasoning on top of it might be as futile as trying to build updateless reasoning on top of probabilities. But I have no idea what an updateless replacement for first order logic could look like.
Another part of the idea (not fully explained in Scott’s post I referenced earlier) is that nonexploited bargaining (AKA bargaining away from the pareto fronteir AKA cooperating with agents with different notions of fairness) provides a model of why agents should not just take pareto improvements all the time, and may therefore be a seed of “non-Bayesian” decision theory (in so far as Bayes is about taking pareto improvements).
Yeah, I explored this direction pretty thoroughly a few years ago. The simplest way is to assume that agents don’t have probabilities, only utility functions over combined outcomes, where a “combined outcome” is a combination of outcomes in all possible worlds. (That also takes care of updating on observations, we just follow UDT instead.) Then if we have two agents with utility functions U and V over combined outcomes, any Pareto-optimal way of merging them must behave like an agent with utility function aU+bV for some a and b. The theory sheds no light on choosing a and b, so that’s as far as it goes. Do you think there’s more stuff to be found?
It sounds like you considered a more general setting than I am an the moment. I want to eventually move to that kind of “combined outcome” setting, but first, I want to understand more classical preference structures and break things one at a time.
Do you think your version sheds any light on value learning in UDT? I had a discussion with Alex Appel about this, in which it seemed like you have a “nosy neighbors” problem, where a potential set of values may care about what happens even in worlds where different values hold; but, this problem seemed to be bounded by such other-world preferences acting like beliefs. For example, you could imagine a UDT agent with world-models in which either vegetarianism or carnivorism are right (which somehow make different predictions). Each set of preferences can either be “nosy” (cares what happens regardless of which facts end up true) or “non-nosy” (each preference set only cares about what happens in their own world—vegetarianism cares about the amount of meat eaten in veg-world, and carnivorism cares about amount of meat eaten in carn-world).
The claim which seemed plausible was that nosiness has some kind of balancing behavior which acts like probability: putting some of your caring measure on other worlds reduces your caring measure on your own.
Anything structurally similar in your framework?
By nosy preferences, do you mean something like this?
“I am grateful to Zeus for telling me that cows have feelings. Now I know that, even if Zeus had told me that cows are unfeeling brutes, eating them would still be wrong.”
But that just seems irrational and not worth modeling. Or do you have some other kind of situation in mind?
Pretty much that, actually. It doesn’t seem too irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn’t an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.
The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it’s an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other worlds, to dictate some of what happens in your world (since the marginal contribution of your world to the prior expected utility looks like some linear combination of the various utility functions, weighted by how much they care about your world) So, in practice, it’d be a bad idea to build a UDT value learning prior containing utility functions that have preferences over all worlds, since it’d add a bunch of extra junk from different utility functions to our world if run.
Are you talking about something like this?
“I’m grateful to HAL for telling me that cows have feelings. Now I’m pretty sure that, even if HAL had a glitch and mistakenly told me that cows are devoid of feeling, eating them would still be wrong.”
That’s valid reasoning. The right way to formalize it is to have two worlds, one where eating cows is okay and another where eating cows is not okay, without any “nosy preferences”. Then you receive probabilistic evidence about which world you’re in, and deal with it in the usual way.
I’m not clear on whether it is rational or not. It seems like behavior we don’t want from a value learner, but I was curious about how “inevitable” it is from attempts to mix updatelessness with value learning. (Perhaps it is a really simple point, but I haven’t thought it entirely through, still.)
I have a recent result about value learning in UDT, it turns out to work very nicely and doesn’t suffer from the problem you describe.
Another way in which there might be something interesting in this direction is if we can further formalize Scott’s argument about when Bayesian probabilities are appropriate and inappropriate, which is framed in terms of pareto-style justifications of bayesianism.
Well, the version of UDT I’m using doesn’t have probabilities, only a utility function over combined outcomes. It’s just a simpler way to think about things. I think you and Scott might be overestimating the usefulness of probabilities. For example, in the Sleeping Beauty problem, the coinflip is “spacelike separated” from you (under Scott’s peculiar definition), but it can be assigned different “probabilities” depending on your utility function over combined outcomes.
That seems good to understand better in itself, but it isn’t a crux for the argument. Whether you’ve got “probabilities” or a “caring measure” or just raw utility which doesn’t reduce to anything like that, it still seems like you’re justifying it with Pareto-type arguments. Scott’s claim is that Pareto-type arguments won’t apply if you correctly take into account the way in which you have control over certain things. I’m not sure if that makes any sense, but basically the question is whether CCT can make sense in a logical setting where you may have self-referential sentences and so on.
That’s a great question. My current (very vague) idea is that we might need to replace first order logic with something else. A theory like PA is already updateful, because it can learn that a sentence is true, so trying to build updateless reasoning on top of it might be as futile as trying to build updateless reasoning on top of probabilities. But I have no idea what an updateless replacement for first order logic could look like.
Another part of the idea (not fully explained in Scott’s post I referenced earlier) is that nonexploited bargaining (AKA bargaining away from the pareto fronteir AKA cooperating with agents with different notions of fairness) provides a model of why agents should not just take pareto improvements all the time, and may therefore be a seed of “non-Bayesian” decision theory (in so far as Bayes is about taking pareto improvements).