Objection 2: this loses out on possible gains from acausal trade. E.g. if a paperclip-maximizer finds itself in a universe where it’s hard to make paperclips but easy to make staples, it’d like to be able to give resources to staple-maximizers in exchange for them building more paperclips in universes where that’s easier. This requires a kind of updateless decision theory:
Proposal 3: they merge into an agent which maximizes a weighted sum of their utilities (with those weights evolving over time), where the weights are set by bargaining subject to the constraint that each agent obeys commitments that logically earlier versions of itself would have made.
Objection 3: this faces the commitment races problem, where each agent wants to make earlier and earlier commitments to only accept good deals.
Proposal 4: same as proposal 3 but each agent also obeys commitments that they would have made from behind a veil of ignorance where they didn’t yet know who they were or what their values were. From that position, they wouldn’t have wanted to do future destructive commitment races.
Objection 4: as we take this to the limit we abstract away every aspect of each agent—their values, beliefs, position in the world, etc—until everything is decided by their prior from behind a veil of ignorance. But when you don’t know who you are, or what your values are, how do you know what your prior is?
Proposal 5: all these commitments are only useful if they’re credible to other agents. So, behind the veil, choose a Schelling prior which is both clearly non-cherrypicked and also easy for a wide range of agents to reason about. In other words, choose the prior which is most conducive to cooperation across the multiverse.
Okay, so basically we’ve ended up describing not just an ideal agent, but the ideal agent. The cost of this, of course, is that we’ve made it totally computationally intractable. In a later post I’ll describe some approximations which might make it more relevant.
Proposal 4: same as proposal 3 but each agent also obeys commitments that they would have made from behind a veil of ignorance where they didn’t yet know who they were or what their values were. From that position, they wouldn’t have wanted to do future destructive commitment races.
I don’t think this solves Commitment Races in general, because of two different considerations:
Trivially, I can say that you still have the problem when everyone needs to bootstrap a Schelling veil of ignorance.
Less trivially, even behind the most simple/Schelling veils of ignorance, I find it likely that hawkish commitments are incentivized. For example, the veil might say that you might be Powerful agent A, or Weak agent B, and if some Powerful agents have weird enough utilities (and this seems likely in a big pool of agents), hawkishly committing in case you are A will be a net-positive bet.
This might still mostly solve Commitment Races in our particular multi-verse. I have intuitions both for and against this bootstrapping being possible. I’d be interested to hear yours.
Trivially, I can say that you still have the problem when everyone needs to bootstrap a Schelling veil of ignorance.
I don’t understand your point here, explain?
even behind the most simple/Schelling veils of ignorance, I find it likely that hawkish commitments are incentivized. For example, the veil might say that you might be Powerful agent A, or Weak agent B, and if some Powerful agents have weird enough utilities (and this seems likely in a big pool of agents), hawkishly committing in case you are A will be a net-positive bet.
This seems to be claiming that in some multiverses, the gains to powerful agents from being hawkish outweigh the losses to weak agents. But then why is this a problem? It just seems like the optimal outcome.
Say there are 5 different veils of ignorance (priors) that most minds consider Schelling (you could try to argue there will be exactly one, but I don’t see why).
If everyone simply accepted exactly the same one, then yes, lots of nice things would happen and you wouldn’t get catastrophically inefficient conflict.
But every one of these 5 priors will have different outcomes when it is implemented by everyone. For example, maybe in prior 3 agent A is slightly better off and agent B is slightly worse off.
So you need to give me a reason why a commitment race doesn’t recur in the level of “choosing which of the 5 priors everyone should implement”. That is, maybe A will make a very early commitment to only every implement prior 3. As always, this is rational if A thinks the others will react a certain way (give in to the threat and implement 3). And I don’t have a reason to expect agents not to have such priors (although I agree they are slightly less likely than more common-sensical priors).
That is, as always, the commitment races problem doesn’t have a general solution on paper. You need to get into the details of our multi-verse and our agents to argue that they won’t have these crazy priors and will coordinate well.
This seems to be claiming that in some multiverses, the gains to powerful agents from being hawkish outweigh the losses to weak agents. But then why is this a problem? It just seems like the optimal outcome.
It seems likely that in our universe there are some agents with arbitrarily high gains-from-being-hawkish, that don’t have correspondingly arbitrarily low measure. (This is related to Pascalian reasoning, see Daniel’s sequence.) For example, someone whose utility is exponential on number of paperclips. I don’t agree that the optimal outcome (according to my ethics) is for me (who’s utility is at most linear on happy people) to turn all my resources into paperclips. Maybe if I was a preference utilitarian biting enough bullets, this would be the case. But I just want happy people.
It seems to me that agent’s strategy in the limit will either be null action or evolution-dictated action, not sure which. That is, “in universe where it’s easy to do A the agent will choose to do A” somewhat implies “according to how easy it is for agent doing A to gain more optimization power, actions will be chosen” which is essentially evolution.
Here’s a (messy, haphazard) list of ways a group of idealized agents could merge into a single agent:
Proposal 1: they merge into an agent which maximizes a weighted sum of their utilities. They decide on the weights using some bargaining solution.
Objection 1: this is not Pareto-optimal in the case where the starting agents have different beliefs. In that case we want:
Proposal 2: they merge into an agent which maximizes a weighted sum of their utilities, where those weights are originally set by bargaining but evolve over time depending on how accurately each original agent predicted the future.
Objection 2: this loses out on possible gains from acausal trade. E.g. if a paperclip-maximizer finds itself in a universe where it’s hard to make paperclips but easy to make staples, it’d like to be able to give resources to staple-maximizers in exchange for them building more paperclips in universes where that’s easier. This requires a kind of updateless decision theory:
Proposal 3: they merge into an agent which maximizes a weighted sum of their utilities (with those weights evolving over time), where the weights are set by bargaining subject to the constraint that each agent obeys commitments that logically earlier versions of itself would have made.
Objection 3: this faces the commitment races problem, where each agent wants to make earlier and earlier commitments to only accept good deals.
Proposal 4: same as proposal 3 but each agent also obeys commitments that they would have made from behind a veil of ignorance where they didn’t yet know who they were or what their values were. From that position, they wouldn’t have wanted to do future destructive commitment races.
Objection 4: as we take this to the limit we abstract away every aspect of each agent—their values, beliefs, position in the world, etc—until everything is decided by their prior from behind a veil of ignorance. But when you don’t know who you are, or what your values are, how do you know what your prior is?
Proposal 5: all these commitments are only useful if they’re credible to other agents. So, behind the veil, choose a Schelling prior which is both clearly non-cherrypicked and also easy for a wide range of agents to reason about. In other words, choose the prior which is most conducive to cooperation across the multiverse.
Okay, so basically we’ve ended up describing not just an ideal agent, but the ideal agent. The cost of this, of course, is that we’ve made it totally computationally intractable. In a later post I’ll describe some approximations which might make it more relevant.
Nice!
I don’t think this solves Commitment Races in general, because of two different considerations:
Trivially, I can say that you still have the problem when everyone needs to bootstrap a Schelling veil of ignorance.
Less trivially, even behind the most simple/Schelling veils of ignorance, I find it likely that hawkish commitments are incentivized. For example, the veil might say that you might be Powerful agent A, or Weak agent B, and if some Powerful agents have weird enough utilities (and this seems likely in a big pool of agents), hawkishly committing in case you are A will be a net-positive bet.
This might still mostly solve Commitment Races in our particular multi-verse. I have intuitions both for and against this bootstrapping being possible. I’d be interested to hear yours.
I don’t understand your point here, explain?
This seems to be claiming that in some multiverses, the gains to powerful agents from being hawkish outweigh the losses to weak agents. But then why is this a problem? It just seems like the optimal outcome.
Say there are 5 different veils of ignorance (priors) that most minds consider Schelling (you could try to argue there will be exactly one, but I don’t see why).
If everyone simply accepted exactly the same one, then yes, lots of nice things would happen and you wouldn’t get catastrophically inefficient conflict.
But every one of these 5 priors will have different outcomes when it is implemented by everyone. For example, maybe in prior 3 agent A is slightly better off and agent B is slightly worse off.
So you need to give me a reason why a commitment race doesn’t recur in the level of “choosing which of the 5 priors everyone should implement”. That is, maybe A will make a very early commitment to only every implement prior 3. As always, this is rational if A thinks the others will react a certain way (give in to the threat and implement 3). And I don’t have a reason to expect agents not to have such priors (although I agree they are slightly less likely than more common-sensical priors).
That is, as always, the commitment races problem doesn’t have a general solution on paper. You need to get into the details of our multi-verse and our agents to argue that they won’t have these crazy priors and will coordinate well.
It seems likely that in our universe there are some agents with arbitrarily high gains-from-being-hawkish, that don’t have correspondingly arbitrarily low measure. (This is related to Pascalian reasoning, see Daniel’s sequence.) For example, someone whose utility is exponential on number of paperclips. I don’t agree that the optimal outcome (according to my ethics) is for me (who’s utility is at most linear on happy people) to turn all my resources into paperclips.
Maybe if I was a preference utilitarian biting enough bullets, this would be the case. But I just want happy people.
I may be missing background concepts, but I don’t see how Proposal 5 is really responding to Objection 4.
It seems to me that agent’s strategy in the limit will either be null action or evolution-dictated action, not sure which. That is, “in universe where it’s easy to do A the agent will choose to do A” somewhat implies “according to how easy it is for agent doing A to gain more optimization power, actions will be chosen” which is essentially evolution.