Guardian Angels: Discrete Extrapolated Volitions
Questions for discussion, with my tentative answers. Assuming I am wrong about some things, there is something interesting to consider. This is inspired by the recent SL4-type and CEV-centric topics in the discussion section.
Questions:
I
Is it easier to calculate the extrapolated volition of an individual or a group?
If it is easier to do for an individual, is it because it is strictly simpler to do it, in that calculating humanity’s CEV involves making at least every calculation that would be made for calculating the extrapolated volition of one individual?
How definitively can these questions be answered without knowing exactly how to calculate CEV?
II
Is it possible to create multiple AIs such that one AI does not prevent others from being created, such as by releasing equally powerful AIs simultaneously?
Is it possible to box AIs such that they reliably never escape before a certain, if short, period of time, such as by giving them a low-cost way out with a calculable minimum and maximum time to exploit that route?
Is it likely there would be a cooperative equilibrium among unmerged AIs?
III
Assuming the possibility of all of the following: what would happen if every person had a superintelligent AI with a utility function of that person’s idealized extrapolated utility function?
How would that compare to a scenario with a single AI embodying a successful calculation of CEV?
What would be different if a person or some few people did not have a superintelligence valuing what they would value, and only many people had their own AI?
My Answers:
I
It depends on the error level tolerated. If only very low error is tolerated, it is easier to do it for a group.
N/A
Not sure.
II
Probably not.
Maybe, probably not, but impossible to know with high confidence.
Probably not. Throughout history, offense has often been a step ahead of defense, which often catches up to it. I think this is not particular to evolutionary biology or the technologies that happen to have been developed. It seems easier to break complicated things with many moving parts than to build and defend them. Also, specific technologies people plausibly speculate may exist are more powerful offensively than defensively. I would expect them to merge, probably peacefully.
III
Hard to say, as that would be trying to predict the actions of more intelligent beings in a dynamic environment.
It might be better, or worse. The chance of it being similar is notably high.
Not sure.
- 29 Sep 2011 0:31 UTC; 1 point) 's comment on Wanted: backup plans for “seed AI turns out to be easy” by (
A singleton AI with individual CEV’s for each human can do at least as well by simulating the negotiation of uniformly powerful individual AIs for each CEV. This is more stable by having the singleton’s simulation enforce uniform levels of power, where actual AIs could potentially diverge in power.
I don’t think “individual CEV” is proper. It’s like calling an ATM an “ATM organism”, which would be even worse than calling it an “ATM machine”, as is common. The “C” means individual extrapolated volitions are combined coherently.
I agree it would in theory be better to have a singleton. But that requires knowing how to cohere extrapolated volitions. My idea is that it might be possible to push off that task to superintelligences without destroying the world in the process.
While it would be useful to be able to split the ‘combine from different agents wishes’ part from the ‘act as if the agents smarter and wiser’ part as it is currently described the ‘C’ is still necessary even for an individual. Because most organisms including, most importantly, humans do not have coherent value systems as they stand. So as it stands we need to say things like CEV and CEV for the label to make sense. The core of the problem here is that there are three important elements of the process that we are trying to represent with just two letters of the acronym.
Make smarter, wiser and generally more betterer (intended emphasis on the informality needed for this level of terseness)
Make internally coherent
Combine with others
Those three don’t neatly separate into ‘C’ and ‘E’.
From http://singinst.org/upload/CEV.html, I added some emphasis to explain why I understand it the way I do.
So coherence is something done after un-muddling.
In retrospect, it would be ridiculously easy for an AI under these conditions to secure early release and get out of the box before others.
One crazy nihilist with a destructive utility function would ruin the whole thing, by building a nuke or something. Offense wins decisively over defense.
Only if they were filtered to add restrictions or remove certain types of utility functions. And probably not even then, since AIs with evil utility functions could crop up randomly in that environment, from botched self-modifications or damage.
A single AI would be much better, since it could resolve all prisoners’ dilemmas, coordination games, and ultimatum games in a way that’s optimal, rather than merely pareto efficient.
Releasing equally powerful AIs simultaneously is very risky, because it gives them an incentive to rush their self-improvements through, rather than take their time to check them for errors. Also, one of the AIs would probably succeed in destroying the others; cybersecurity so far has been a decisive win for offense.
Most peoples’ utility functions include some empathy, which would cover for many people being excluded from counting directly. However, if a person doesn’t have a superintelligence valuing what they would value, then some of their values will be excluded if no one else approves of them. This is mostly a good thing, since the values that would be excluded this way would probably be destructive ones. However, people who were not included directly would lose out in any contentions over scarce resources, which could turn into a serious problem for them if resources become scarce.
A more convenient possible world was alluded to when I asked about excluding some individuals.
No merging?
Maybe, but I had also asked about the relative difficulty of calculating CEV and DEV. If DEV is easier, perhaps possible rather than impossible, that’s an advantage of it.
War is a risk, it includes the possibility of mutual destruction, particularly if offense is more powerful. You don’t think they’d merge resources and values instead of risking it?
Most likely scenario I agree, still less than probable,
Cyberwar is different than regular war in that all competently performed attacks are inherently anonymous. Attacks performed very competently are also undetectable. This is very destabilizing. And it gets worse; while AIs might try to get around this by all merging together, none of them would be able to prove they hadn’t hidden a copy of themselves somewhere.
I don’t think undetectability solves things. Offensive subsystems could survive their creator’s demise like two people in a grenade lobbing fight.
Suppose all hid a copy, the merged AI would still be more powerful than any hidden copies, and if it was destroyed everyone would be a small copy again. If there were many AIs, an individual would be banking on its ability to defeat a much larger entity. Offense is more powerful on most scales and technological levels but not by incomprehensible orders of magnitude.