Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences?
It does, and I discussed that here. An interesting implication that I noticed a few weeks back is that an UFAI would want to cooperate with a counterfactual FAI, so we get a slice of the future even if we fail to build FAI, depending on how probable it was that we would be able to do that. A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process). (I really should make a post on this. Some of the credit due to Rolf Nelson for UFAI deterrence idea.)
A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process).
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
I’d like to note a connection between Vladimir’s idea, and Robin Hanson’s moral philosophy, which also involves taking into account the wants of counterfactual agents.
I’m also reminded of Eliezer’s Three Worlds Collide story. If Vladimir’s right, many more worlds (in the sense of possible worlds) will be colliding (i.e., compromising/cooperating).
I look forward to seeing the technical details when they’ve been worked out.
Ok, so I see that probability plays a role in determining one’s “bargaining power”, which makes sense. We still need a rule that outputs a compromise set of preferences when given a set of agents, their probabilities, individual preferences, and resources as input, right? Does the rule need to be uniquely fair or obvious, so that everyone can agree to it without discussion? Do you have a suggestion for what this rule should be?
Edit: I see you’ve answered some of my questions already in the other reply. This is really interesting stuff!
It does, and I discussed that here. An interesting implication that I noticed a few weeks back is that an UFAI would want to cooperate with a counterfactual FAI, so we get a slice of the future even if we fail to build FAI, depending on how probable it was that we would be able to do that. A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process). (I really should make a post on this. Some of the credit due to Rolf Nelson for UFAI deterrence idea.)
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
I’d like to note a connection between Vladimir’s idea, and Robin Hanson’s moral philosophy, which also involves taking into account the wants of counterfactual agents.
I’m also reminded of Eliezer’s Three Worlds Collide story. If Vladimir’s right, many more worlds (in the sense of possible worlds) will be colliding (i.e., compromising/cooperating).
I look forward to seeing the technical details when they’ve been worked out.
Ok, so I see that probability plays a role in determining one’s “bargaining power”, which makes sense. We still need a rule that outputs a compromise set of preferences when given a set of agents, their probabilities, individual preferences, and resources as input, right? Does the rule need to be uniquely fair or obvious, so that everyone can agree to it without discussion? Do you have a suggestion for what this rule should be?
Edit: I see you’ve answered some of my questions already in the other reply. This is really interesting stuff!