We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc).
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
You only get your slice of the world because you already control it.
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
I assume that control will be winner-takes-all,
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
Game-theoretic considerations are only relevant when you have non-trivial control,
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers.
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
But if the first difference is much bigger than the second, the pattern is the reverse.
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
Yep, these are key considerations.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
What do “small control” and “large control” mean?
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
Worrying less about our individual (or national) shares, and being more cooperative with other humans or uploads seems like an important upshot.