The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).
This depends on something like aggregative utilitarianism. If additional resources have diminishing marginal value in fulfilling human aims, that getting a little slice of the universe (in the course of negotiating terms of surrender with the inhuman AI, if it can make credible commitments, or because we serve as acausal bargaining chips with other civilizations elsewhere in the universe) may be enough. Is getting 100% of the lightcone a hundred times better than 1%?
Is getting 100% of the lightcone a hundred times better than 1%?
I think yes, if we take into account that the more of the lightcone we (our FAI) get, the more trading opportunities we would have with UFAI in other possible worlds. Diminishing marginal value shouldn’t apply across possible worlds, because otherwise it would imply gross violations of expected utility maximization.
Also, I suspect that there are possible worlds with much greater resources than our universe (perhaps with physics that allow hypercomputation, or just many orders of magnitude more total exploitable resources), and some of them would have potential trading partners who are willing to give us a small share of their world for a large share of ours. We may eventually achieve most of our value from trading with them. But of course such trade wouldn’t be possible if we didn’t have something to trade with!
Interesting. This suggests thinking about FAI not as using its control to produce terminal value in its own world, but as using its control to buy as much terminal value as it can, in various world-programs. Since it doesn’t matter where the value is produced, most of the value doesn’t have to be produced in the possible worlds with FAIs in them. Indeed, it sounds unlikely that specifically the FAI worlds will be optimal for FAI-value optimization. FAIs (and the worlds they control) act as instrumental leverage, a way of controlling the global mathematical universe into having more value for our preference.
Thus, more FAIs means stronger control over the mathematical universe, while more UFAIs mean that the mathematical universe is richer, and so the FAIs can get more value out of it with the same control. The metaphors of trade and comparative advantage start applying again, not on the naive level of cohabitation on the same world, but on the level of the global ontology. Mathematics grants you total control over your domain, so that your “atoms” can’t be reused for something else by another stronger agent, and so you do benefit from most superintelligent “aliens”.
Yes, assuming that trading across possible worlds can be done in the first place. One thing that concerns me is the combinatorial explosion of potential trading partners. How do they manage to “find” each other?
It’s the same combinatorial explosion as with the future possible worlds. Even though you can’t locate individual valuable future outcomes (through certain instrumental sequences of exact events), you can still make decisions about your actions leading to certain consequences “in bulk”, and I expect the trade between possible worlds can be described similarly (after all, it does work on exactly the same decision-making algorithm). Thus, you usually won’t know who are you trading with, exactly, but on the net estimate that your actions are in the right direction.
I currently agree it’s a bad analogy and I no longer endorse the position that global acausal trade is probably feasible, although its theoretical possibility seems to be a stable conclusion.
There are two distinct issues here: (1) how high would a human with original preference value a universe which only gives a small weight to their preference, and (2) how likely is the changed preference to give any weight whatsoever to the original preference, in other words to produce a universe to any extent valuable to the original preference, even if original preference values universes only weakly optimized in its direction.
Moving to a different preference is different from lowering weight of the original preference. A slightly changed (formal definition of) preference may put no weight at all on the preceding preference. The optimal outcome according to the modified preference can thus be essentially moral noise, paperclips, to the original preference. Giving a small slice of the universe, on the other hand, is what you get out of aggregation of preference, and a changed preference doesn’t necessarily have a form of aggregation that includes original preference. (On the other hand, there is a hope that human-like preferences include sympathy, which does make them aggregate preferences of other persons with some weight.)
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc).
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
You only get your slice of the world because you already control it.
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
I assume that control will be winner-takes-all,
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
Game-theoretic considerations are only relevant when you have non-trivial control,
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers.
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
But if the first difference is much bigger than the second, the pattern is the reverse.
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
This depends on something like aggregative utilitarianism. If additional resources have diminishing marginal value in fulfilling human aims, that getting a little slice of the universe (in the course of negotiating terms of surrender with the inhuman AI, if it can make credible commitments, or because we serve as acausal bargaining chips with other civilizations elsewhere in the universe) may be enough. Is getting 100% of the lightcone a hundred times better than 1%?
I think yes, if we take into account that the more of the lightcone we (our FAI) get, the more trading opportunities we would have with UFAI in other possible worlds. Diminishing marginal value shouldn’t apply across possible worlds, because otherwise it would imply gross violations of expected utility maximization.
Also, I suspect that there are possible worlds with much greater resources than our universe (perhaps with physics that allow hypercomputation, or just many orders of magnitude more total exploitable resources), and some of them would have potential trading partners who are willing to give us a small share of their world for a large share of ours. We may eventually achieve most of our value from trading with them. But of course such trade wouldn’t be possible if we didn’t have something to trade with!
Interesting. This suggests thinking about FAI not as using its control to produce terminal value in its own world, but as using its control to buy as much terminal value as it can, in various world-programs. Since it doesn’t matter where the value is produced, most of the value doesn’t have to be produced in the possible worlds with FAIs in them. Indeed, it sounds unlikely that specifically the FAI worlds will be optimal for FAI-value optimization. FAIs (and the worlds they control) act as instrumental leverage, a way of controlling the global mathematical universe into having more value for our preference.
Thus, more FAIs means stronger control over the mathematical universe, while more UFAIs mean that the mathematical universe is richer, and so the FAIs can get more value out of it with the same control. The metaphors of trade and comparative advantage start applying again, not on the naive level of cohabitation on the same world, but on the level of the global ontology. Mathematics grants you total control over your domain, so that your “atoms” can’t be reused for something else by another stronger agent, and so you do benefit from most superintelligent “aliens”.
Yes, assuming that trading across possible worlds can be done in the first place. One thing that concerns me is the combinatorial explosion of potential trading partners. How do they manage to “find” each other?
It’s the same combinatorial explosion as with the future possible worlds. Even though you can’t locate individual valuable future outcomes (through certain instrumental sequences of exact events), you can still make decisions about your actions leading to certain consequences “in bulk”, and I expect the trade between possible worlds can be described similarly (after all, it does work on exactly the same decision-making algorithm). Thus, you usually won’t know who are you trading with, exactly, but on the net estimate that your actions are in the right direction.
Isn’t the set of future worlds with high measure a lot smaller?
I currently agree it’s a bad analogy and I no longer endorse the position that global acausal trade is probably feasible, although its theoretical possibility seems to be a stable conclusion.
Robin Hanson would be so pleased that it turns out economics is the fundamental law of the entire ensemble universe.
There are two distinct issues here: (1) how high would a human with original preference value a universe which only gives a small weight to their preference, and (2) how likely is the changed preference to give any weight whatsoever to the original preference, in other words to produce a universe to any extent valuable to the original preference, even if original preference values universes only weakly optimized in its direction.
Moving to a different preference is different from lowering weight of the original preference. A slightly changed (formal definition of) preference may put no weight at all on the preceding preference. The optimal outcome according to the modified preference can thus be essentially moral noise, paperclips, to the original preference. Giving a small slice of the universe, on the other hand, is what you get out of aggregation of preference, and a changed preference doesn’t necessarily have a form of aggregation that includes original preference. (On the other hand, there is a hope that human-like preferences include sympathy, which does make them aggregate preferences of other persons with some weight.)
We should assign some substantial probability to getting some weighting of our preference (from bargaining with transparency, acausal trade, altruistic brain emulations, etc). If a moderate weighting of our preferences gets most of the potential utility, then the expected utility of inhuman AIs getting powerful won’t be astronomically less than the expected utility of, e.g. a ‘Friendly AI’ acting on idealized human preferences.
Game-theoretic considerations are only relevant when you have non-trivial control, not when your atoms are used for something else. If singleton’s preference gives some weight to your preference, this is a case of having control directly through the singleton’s preference, but the origin of this control is not game-theoretic. If the singleton’s preference has sympathy for your preference, your explicit instantiation in the world doesn’t need to have any control, in order to win through the implicit control via singleton’s preference.
Game-theoretic aggregation, on the other hand, doesn’t work by influence on other agent’s preference. You only get your slice of the world because you already control it. Another agent may perform trade, but this is trade of control, rearranging what specifically each of you controls, without changing your preferences.
I assume that control will be winner-takes-all, so preferences of other agents existing at the time only matter if the winner’s preference directly pays to their preferences any attention, but not if they had some limited control from the start.
My point is that inhuman AI may give no weight to our preference, while FAI may give at least some weight to everyone’s preference. Game-theoretic trade won’t matter here because agents other than the singleton have no control to bargain with. FAI gives weight to other preferences not because of trade, but by construction from the start, even if people it gives weight to don’t exist at all (FAI giving them weight in optimization might cause them to appear, or a better event at least as good from their perspective).
This isn’t obviously the most natural way to describe a scenario in which an AI thinks it has a 90% chance of winning a conflict with humanity, but also has the ability to jointly create (with humanity) agents to enforce an agreement (and can do this quickly enough to be relevant), so cuts a deal splitting up the resources of the light cone at a 9:1 ratio.
Given that there are plausible sets of parameter values where this assumption is false, we can’t use it to assess overall expected value to astronomical precision.
I specifically mentioned acausal trade, a la Rolf Nelson’s AI-deterrence scheme, which needs non-trivial control only in some region of the ensemble of possibilities the AI considers. Indeed, the AI might treat us well simply because of the chance that benevolent non-human aliens will respond positively if its algorithm has this output (as the benevolent aliens might be modeling the AI’s algorithm).
Yes, I forgot about that (though I remain uncertain about how well this argument works, not having worked out a formal model). To summarize the arguments for why future is still significantly more valuable than what we have now, even if we run into Unfriendly AI,
(1) if there is a non-negligible chance that we’ll have FAI in the future, or that we could’ve created FAI if some morally random facts in the past (such as the coin in counterfactual mugging) were different, then we can estimate the present expected value of the world as pretty high, as a factor of getting whole universes (counterfactually or probably) optimized towards your specific preference is present in the expected utility computation. The counterfactual value is present even if it’s certain that the future contains Unfriendly AI.
(2) It’s even better, because the unfriendly singletons will also optimize their worlds towards your preference a little for game-theoretic reasons, even if they don’t care at all about your preference. This game is not with you personally, a human that controls very little and whose control can’t compel a singleton to any significant extent, but with the counterfactual FAIs. The FAIs that could be created, but weren’t, can act as Omega in counterfactual mugging, making it profitable for the indifferent singletons to pay the FAI a little in FAI-favored kind of world-optimization.
(3) Some singletons that don’t follow your preference in particular, but have remotely human-like preference, will have a component of sympathy in their preference, and will dole your preference some fair portion of control in their world, that is much greater than the portion of control you held originally. This sympathy seems to be godshatter of game-theoretic considerations that compel even singletons with non-evolved (artificial, random) preferences according to arguments (1) and (2).
The conclusion to this seems to be that creating an Unfriendly AI is significantly better than ending up with no rational singleton at all (existential disaster that terminates civilization), but significantly worse than a small chance of FAI.
Your comments are mostly good, but I dispute the final assumption that no singleton ⇒ disaster. There has as yet been no investigation into the merits of singleton vs. an economy (or ecosystem) of independent agents.
If we were living in the 18th century, it would be reasonable to suppose that the only stable situation is one where one agent is king. But we are not.
Yep, these are key considerations.
So there’s the utility difference between business-as-usual (no AI), and getting a small share of resources optimized for your preference, and the utility difference between getting small and large shares of resources. If the second difference is much larger than the first, then (1) is crucial, and (2) and (3) are not so good. But if the first difference is much bigger than the second, the pattern is the reverse.
And if we’re comparing expected utility conditioning on no local FAI here and EU conditioning on FAI here, moderate credences can suffice (depending on the shape of your utility function).
Whether FAI is local or not can’t matter, whether something is real or counterfactual is morally irrelevant. If we like small control, it means that the possible worlds with UFAI are significantly valuable, just as the worlds with FAI, provided there are enough worlds with FAI to weakly control the UFAIs; and if we like only large control, it means that the possible worlds with UFAI are not as valuable, and it’s mostly the worlds with FAI that matter.
What do “small control” and “large control” mean?
It’s not literally the reverse, because if you don’t create those FAIs, nobody will, and so the UFAIs won’t have the incentive to give you your small share. It’s never good to increase probability of UFAI at the expense of probability of FAI. I’m not sure whether there is any policy guideline suggested by these considerations, conditional on the pattern in utility you discuss. What should we do differently depending on how much we value small vs. large control? It’s still clearly preferable to have UFAI to having no future AI, and to have FAI to having UFAI, in both cases.
Worrying less about our individual (or national) shares, and being more cooperative with other humans or uploads seems like an important upshot.