This idea that possible worlds can trade with each other seems to have fairly radical implications. Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences? (Note: the resulting unified preferences are not necessarily characterized by expected utility maximization.)
Let me trace the steps of my logic here. First take 2 agents in the same world who know each other’s source code. Clearly, each adopting a common set of preferences can be viewed as playing Cooperate in a one-shot PD. Now take an agent who has identified a counterfactual agent in another possible world (who has in turn identified it). Each agent should also adopt a common set of preferences, in the expectation that the other will do so as well. Either iterating this process, or by doing a single global trade across all agents in all possible worlds, we should arrive at a common set of preferences between everyone.
Hmm, maybe this is just what you meant by “one global decision”? Since my original interest was to figure out what probabilities mean in the context of indexical uncertainty, let me ask you, do probabilities have any role to play in your decision theory?
Agents don’t need to merge by changing anything in their individual preferences, merging is just a way of looking at the system, like in process algebra. Three agents can be considered as three separate agents cooperating with each other, or as two agents, one a merge of the first two of the original ones, or as one merged agent. All different perspectives on the same system, revealing its structure.
The crucial relation in this picture is that the global cooperation must be a Pareto improvement over cooperations (merges) among any subset of the agents. This is a possible origin for the structure of fair coopertive strategy. More than that, if each agent that could otherwise be considered as individual is divided in this manner on a set of elementary preferences, and all of these elementary preferences are then dumped together in the global cooperation, this may provide all the detail the precise choice of the fair cooperative strategy might need. The “weights” come from the control that each of the elementary agents has over the world.
Are you familiar with Cooperative Game Theory? I’m just learning it now, but it sounds very similar to what you’re talking about, and maybe you can reused some of its theory and math. (For some reason I’ve only paid attention to non-cooperative game theory until recently.) Here’s a quote from page 356 of “Handbook of Game Theory with Economic Applications, Vol 1”:
Of all solution concepts of cooperative games, the core is probably the easiest to understand. It is the set of all feasible outcomes (payoffs) that no player (participant) or group of participants (coalition) can improve upon by acting for themselves.
I couldn’t find anything that “clicked” with cooperation in PD. Above, I wasn’t talking about a kind of Nash equilibrium protected from coalition deviations. The correlated strategy needs to be a Pareto improvement over possible coalition strategies run by subsets of the agents, but it doesn’t need to be stable in any sense. It can be strictly dominated, for example, by either individual or coalition deviations.
A core in Cooperative Game Theory doesn’t have to be a Nash equilibrium. Take a PD game with payoffs (2,2) (-1,3) (3,-1) (0,0). In Cooperative Game Theory, (-1,3) and (3,-1) are not considered improvements that a player can make over (2,2) by acting for himself. Maybe one way to think about it is that there is an agreement phase, and an action phase, and the core is the set of agreements that no subset of players can improve upon by publicly going off (and forming their own agreement) during the agreement phase. Once an agreement is reached, there is no deviation allowed in the action phase.
Again, I’m just learning Cooperative Game Theory, but that’s my understanding and it seems to correspond exactly to your concept.
The following is an honest non-rhetorical question: Is it not misleading to use the word ‘cooperation’ as you seem to be using it here? Don’t you still get ‘cooperation’ in this sense if the subsets of agents are not causally interacting with each other (say) but have still semi-Platonicly ‘merged’ via implicit logical interaction as compared to some wider context of decision algorithms that by logical necessity exhibit comparatively less merging? This sets up a situation where an agent can (even accidentally) engineer ‘Pareto improvements’ just by improving its decision algorithm (or more precisely replacing ‘its’ decision algorithm (everywhere ‘it’ is instantiated, of course...) with a new one that has the relevant properties of a new, possibly very different logical reference class). It’s a total bastardization of the concept of trade but it seems to be enough to result in some acausal economy (er, that is, some positive-affect-laden mysterious timeless attractor simultaneously constructed and instantiated by timeful interaction) or ‘global cooperation’ as you put it, and yet despite all that timeless interaction there are many ways it could turn out that would not look to our flawed timeful minds like cooperation. I don’t trust my intuitions about what ‘cooperation’ would look like at levels of organization or intelligence much different from my own, so I’m hesitant to use the word.
(I realize this is ‘debating definitions’ but connotations matter a lot when everything is so fuzzily abstract and yet somewhat affect-laden, I think. And anyway I’m not sure I’m actually debating definitions because I might be missing an important property of Pareto improvements that makes their application to agents that are logical-property-shifting-over-time not only a useless analogy but a confused one.)
This question is partially prompted by your post about the use of the word ‘blackmail’ as if it was technically clear and not just intuitively clear which interactions are blackmail, trade, cooperation, et cetera, outside of human social perception (which is of course probably correlated with more-objectively-correct-than-modern-human meta-ethical truths but definitely not precisely so).
If the above still looks like word salad to you… sigh please let me know so I can avoid pestering you ’til I’ve worked more on making my concepts and sentences clearer. (If it still looks way too much like word salad but you at least get the gist, that’d be good to know too.)
Is it not misleading to use the word ‘cooperation’ as you seem to be using it here?
Yes, it’s better to just say that there is probably some acausal morally relevant interaction, wherein the agents work on their own goals.
(I don’t understand what you were saying about time/causality. I disagree with Nesov_2009′s treatment of preference as magical substance inherent in parts of things.)
Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences?
It does, and I discussed that here. An interesting implication that I noticed a few weeks back is that an UFAI would want to cooperate with a counterfactual FAI, so we get a slice of the future even if we fail to build FAI, depending on how probable it was that we would be able to do that. A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process). (I really should make a post on this. Some of the credit due to Rolf Nelson for UFAI deterrence idea.)
A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process).
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
I’d like to note a connection between Vladimir’s idea, and Robin Hanson’s moral philosophy, which also involves taking into account the wants of counterfactual agents.
I’m also reminded of Eliezer’s Three Worlds Collide story. If Vladimir’s right, many more worlds (in the sense of possible worlds) will be colliding (i.e., compromising/cooperating).
I look forward to seeing the technical details when they’ve been worked out.
Ok, so I see that probability plays a role in determining one’s “bargaining power”, which makes sense. We still need a rule that outputs a compromise set of preferences when given a set of agents, their probabilities, individual preferences, and resources as input, right? Does the rule need to be uniquely fair or obvious, so that everyone can agree to it without discussion? Do you have a suggestion for what this rule should be?
Edit: I see you’ve answered some of my questions already in the other reply. This is really interesting stuff!
This idea that possible worlds can trade with each other seems to have fairly radical implications. Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences? (Note: the resulting unified preferences are not necessarily characterized by expected utility maximization.)
Let me trace the steps of my logic here. First take 2 agents in the same world who know each other’s source code. Clearly, each adopting a common set of preferences can be viewed as playing Cooperate in a one-shot PD. Now take an agent who has identified a counterfactual agent in another possible world (who has in turn identified it). Each agent should also adopt a common set of preferences, in the expectation that the other will do so as well. Either iterating this process, or by doing a single global trade across all agents in all possible worlds, we should arrive at a common set of preferences between everyone.
Hmm, maybe this is just what you meant by “one global decision”? Since my original interest was to figure out what probabilities mean in the context of indexical uncertainty, let me ask you, do probabilities have any role to play in your decision theory?
Agents don’t need to merge by changing anything in their individual preferences, merging is just a way of looking at the system, like in process algebra. Three agents can be considered as three separate agents cooperating with each other, or as two agents, one a merge of the first two of the original ones, or as one merged agent. All different perspectives on the same system, revealing its structure.
The crucial relation in this picture is that the global cooperation must be a Pareto improvement over cooperations (merges) among any subset of the agents. This is a possible origin for the structure of fair coopertive strategy. More than that, if each agent that could otherwise be considered as individual is divided in this manner on a set of elementary preferences, and all of these elementary preferences are then dumped together in the global cooperation, this may provide all the detail the precise choice of the fair cooperative strategy might need. The “weights” come from the control that each of the elementary agents has over the world.
Are you familiar with Cooperative Game Theory? I’m just learning it now, but it sounds very similar to what you’re talking about, and maybe you can reused some of its theory and math. (For some reason I’ve only paid attention to non-cooperative game theory until recently.) Here’s a quote from page 356 of “Handbook of Game Theory with Economic Applications, Vol 1”:
I couldn’t find anything that “clicked” with cooperation in PD. Above, I wasn’t talking about a kind of Nash equilibrium protected from coalition deviations. The correlated strategy needs to be a Pareto improvement over possible coalition strategies run by subsets of the agents, but it doesn’t need to be stable in any sense. It can be strictly dominated, for example, by either individual or coalition deviations.
A core in Cooperative Game Theory doesn’t have to be a Nash equilibrium. Take a PD game with payoffs (2,2) (-1,3) (3,-1) (0,0). In Cooperative Game Theory, (-1,3) and (3,-1) are not considered improvements that a player can make over (2,2) by acting for himself. Maybe one way to think about it is that there is an agreement phase, and an action phase, and the core is the set of agreements that no subset of players can improve upon by publicly going off (and forming their own agreement) during the agreement phase. Once an agreement is reached, there is no deviation allowed in the action phase.
Again, I’m just learning Cooperative Game Theory, but that’s my understanding and it seems to correspond exactly to your concept.
Sounds interesting, thank you.
The following is an honest non-rhetorical question: Is it not misleading to use the word ‘cooperation’ as you seem to be using it here? Don’t you still get ‘cooperation’ in this sense if the subsets of agents are not causally interacting with each other (say) but have still semi-Platonicly ‘merged’ via implicit logical interaction as compared to some wider context of decision algorithms that by logical necessity exhibit comparatively less merging? This sets up a situation where an agent can (even accidentally) engineer ‘Pareto improvements’ just by improving its decision algorithm (or more precisely replacing ‘its’ decision algorithm (everywhere ‘it’ is instantiated, of course...) with a new one that has the relevant properties of a new, possibly very different logical reference class). It’s a total bastardization of the concept of trade but it seems to be enough to result in some acausal economy (er, that is, some positive-affect-laden mysterious timeless attractor simultaneously constructed and instantiated by timeful interaction) or ‘global cooperation’ as you put it, and yet despite all that timeless interaction there are many ways it could turn out that would not look to our flawed timeful minds like cooperation. I don’t trust my intuitions about what ‘cooperation’ would look like at levels of organization or intelligence much different from my own, so I’m hesitant to use the word.
(I realize this is ‘debating definitions’ but connotations matter a lot when everything is so fuzzily abstract and yet somewhat affect-laden, I think. And anyway I’m not sure I’m actually debating definitions because I might be missing an important property of Pareto improvements that makes their application to agents that are logical-property-shifting-over-time not only a useless analogy but a confused one.)
This question is partially prompted by your post about the use of the word ‘blackmail’ as if it was technically clear and not just intuitively clear which interactions are blackmail, trade, cooperation, et cetera, outside of human social perception (which is of course probably correlated with more-objectively-correct-than-modern-human meta-ethical truths but definitely not precisely so).
If the above still looks like word salad to you… sigh please let me know so I can avoid pestering you ’til I’ve worked more on making my concepts and sentences clearer. (If it still looks way too much like word salad but you at least get the gist, that’d be good to know too.)
Yes, it’s better to just say that there is probably some acausal morally relevant interaction, wherein the agents work on their own goals.
(I don’t understand what you were saying about time/causality. I disagree with Nesov_2009′s treatment of preference as magical substance inherent in parts of things.)
It does, and I discussed that here. An interesting implication that I noticed a few weeks back is that an UFAI would want to cooperate with a counterfactual FAI, so we get a slice of the future even if we fail to build FAI, depending on how probable it was that we would be able to do that. A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process). (I really should make a post on this. Some of the credit due to Rolf Nelson for UFAI deterrence idea.)
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
I’d like to note a connection between Vladimir’s idea, and Robin Hanson’s moral philosophy, which also involves taking into account the wants of counterfactual agents.
I’m also reminded of Eliezer’s Three Worlds Collide story. If Vladimir’s right, many more worlds (in the sense of possible worlds) will be colliding (i.e., compromising/cooperating).
I look forward to seeing the technical details when they’ve been worked out.
Ok, so I see that probability plays a role in determining one’s “bargaining power”, which makes sense. We still need a rule that outputs a compromise set of preferences when given a set of agents, their probabilities, individual preferences, and resources as input, right? Does the rule need to be uniquely fair or obvious, so that everyone can agree to it without discussion? Do you have a suggestion for what this rule should be?
Edit: I see you’ve answered some of my questions already in the other reply. This is really interesting stuff!