CEV is an attempt to route around the problem you illustrate here, but it might be impossible. Oracle AI might also be impossible. But, well, you know how I feel about doing the impossible. When it comes to saving the world, all we can do is try. Both routes are worth pursuing, and I like your new paper on Oracle AI.
EDIT: Stuart, I suspect you’re getting downvoted because you only repeated a point against which many arguments have already been given, instead of replying to those counter-arguments with something new.
When it comes to saving the world, all we can do is try.
If you really believe that it is nearly impossible to solve friendly AI, wouldn’t it be better to focus on another existential risk?
Say you believe that unfriendly AI will wipe us out with a probability of 60% and that there is another existential risk that will wipe us out with a probability of 10% even if unfriendly AI turns out to be no risk. Both risks have the same utility x (if we don’t assume that an unfriendly AI could also wipe out aliens etc.). Thus .6x > .1x. But if the probability of solving friendly AI = a to the probability of solving the second risk = b is no more than a = 1/6b then the expected utility of mitigating friendly AI is at best equal to the other existential risk because .6ax ≤ .1bx.
(Note: I really suck at math, so if I made a embarrassing mistake I hope you understand what I am talking about anyway.)
If you really believe that it is nearly impossible to solve friendly AI, wouldn’t it be better to focus on another existential risk?
Solving other x-risks will not save us from uFAI. Solving FAI will save us from other x-risks. Solving Oracle AI might save us from other x-risks. I think we should be working on both FAI and Oracle AI.
Solving other x-risks will not save us from uFAI. Solving FAI will save us from other x-risks.
Good point. I will have to think about it further. Just a few thoughts:
Safe nanotechnology (unsafe nanotechnology being an existential risk) will also save us from various existential risks. Arguably less than a fully-fledged friendly AI. But assume that the disutility of both scenarios is about the same.
An evil AI (as opposed to an unfriendly AI) is as unlikely as a friendly AI. Both risks will probably simply wipe us out and don’t cause extra disutility. If you consider the the extermination of alien life you might get a higher amount of disutility. But I believe that can be outweighed by negative effects of unsafe nanotechnology that doesn’t manage to wipe out humanity but rather cause various dystopian scenarios. Such scenarios are more likely than evil AI because nanotechnology is a tool used by humans who can be deliberately unfriendly.
So let’s say that solving friendly AI has 10x the utility of ensuring safe nanotechnology because it can save us from more existential risks than the use of advanced nanotechnology could.
But one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI. Which is why I asked if it might be possible that the difficulty of solving friendly AI might outweigh its utility and therefore justify us to disregard friendly AI for now. If that is the case it might be better to focus on another existential risk that might wipe us out in all possible worlds where unfriendly AI either comes later or doesn’t pose a risk at all.
An evil AI (as opposed to an unfriendly AI) is as unlikely as a friendly AI.
Surely only if you completely ignore effects from sociology and psychology!
But one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI. Which is why I asked if it might be possible that the difficulty of solving friendly AI might outweigh its utility and therefore justify us to disregard friendly AI for now.
Machine intellignece may be distant or close. Nobody knows for sure—although there are some estimates. “Close” seems to have some non-negligible probability mass to many observers—so, humans would be justified in paying a lot more attention than many humans are doing.
“AI vs nanotechnology” is rather a false dichotomty. Convergence means that machine intelligence and nanotechnology will spiral in together. Synergy means that each facilitates the production of the other.
If you were to develop safe nanotechnology before unfriendly AI then you should be able to suppress the further development of AGI. With advanced nanotechnology you could spy on and sabotage any research that could lead to existential risk scenarios.
You could also use nanotechnology to advance WBE and use it to develop friendly AI.
Convergence means that machine intelligence and nanotechnology will spiral in together. Synergy means that each facilitates the production of the other.
Even in the possible worlds where it is true that uncontrollable recursive self-improvement is possible (which I doubt anyone would claim is a certainty and therefore that there are possible outcomes where any amount of nanotechnology won’t result in unfriendly AI), one will come first. If nanotechnology is going to come first then we won’t have to worry about unfriendly AI anymore because we will all be dead.
The question is not only about the utility associated with various existential risks and their probability but also the probability of mitigating the risk. It doesn’t matter if friendly AI can do more good than nanotechnology if nanotechnology comes first or if friendly AI is unsolvable in time.
Probably slightly. Most likely we will get machine intelligence before nanotech and good robots. To build an e-brain you just need a nanotech NAND gate. It is easier to build a brain than an ecosystem. Some lament the difficulties of software engineering—but their concerns seem rather overrated . Yes, software lags behind hardware—but not by a huge amount.
If nanotechnology is going to come first then we won’t have to worry about unfriendly AI anymore because we will all be dead.
That seems rather pessimistic to me.
Note that nanotechnology is just an example.
The “convergence” I mentioned also includes robots and biotechnology. That should take out any other examples you might have been thinking of.
The problem with CEV can be phrased by extending the metaphor: a CEV built from both hitler and Gandhi means that the areas in which their values differ, are not relevant to the final output. So attitudes to Jews and violence, for instance, will be unpredictable in that CEV (so we should model them now as essentially random).
Stuart, I suspect you’re getting downvoted because you only repeated a point against which many arguments have already been given, instead of replying to those counter-arguments with something new.
It’s interesting. Normally my experience is that metaphorical posts get higher votes than technical ones—nor could I have predicted the votes from reading the comments. Ah well; at least it seems to have generated discussion.
The problem with CEV can be phrased by extending the metaphor: a CEV built from both hitler and Gandhi means that the areas in which their values differ, are not relevant to the final output. So attitudes to Jews and violence, for instance, will be unpredictable in that CEV (so we should model them now as essentially random).
That’s not how I understand CEV. But, the theory is in its infancy and underspecified, so it currently admits of many variants.
Hum… If we got the combined CEV of two people, one of whom thought violence was ennobling and one who thought it was degrading, would you expect either or both of:
a) their combined CEV would be the same as if we had started with two people both indifferent to violence
b) their combined CEV would be biased in a particular direction that we can know ahead of time
The idea is that their extrapolated volitions would plausibly not contain such conflicts, though it’s not clear yet whether we can know what that would be ahead of time. Nor is it clear whether their combined CEV would be the same as the combined CEV of two people indifferent to violence.
So, to my ears, it sounds like we don’t have much of an idea at all where the CEV would end up—which means that it most likely ends up somewhere bad, since most random places are bad.
Well, if it captures the key parts of what you want, you can know it will turn out fine even if you’re extremely ignorant about what exactly the result will be.
Yes, as the Spartans answered to Alexander the Great’s father when he said “You are advised to submit without further delay, for if I bring my army into your land, I will destroy your farms, slay your people, and raze your city.” :
which means that it most likely ends up somewhere bad, since most random places are bad.
I don’t think that follows, at all. CEV isn’t a random-walk. It will at the very least end up at a subset of human values. Maybe you meant something different here, by the word ‘bad’?
CEV is an attempt to route around the problem you illustrate here, but it might be impossible. Oracle AI might also be impossible. But, well, you know how I feel about doing the impossible. When it comes to saving the world, all we can do is try. Both routes are worth pursuing, and I like your new paper on Oracle AI.
EDIT: Stuart, I suspect you’re getting downvoted because you only repeated a point against which many arguments have already been given, instead of replying to those counter-arguments with something new.
If you really believe that it is nearly impossible to solve friendly AI, wouldn’t it be better to focus on another existential risk?
Say you believe that unfriendly AI will wipe us out with a probability of 60% and that there is another existential risk that will wipe us out with a probability of 10% even if unfriendly AI turns out to be no risk. Both risks have the same utility x (if we don’t assume that an unfriendly AI could also wipe out aliens etc.). Thus .6x > .1x. But if the probability of solving friendly AI = a to the probability of solving the second risk = b is no more than a = 1/6b then the expected utility of mitigating friendly AI is at best equal to the other existential risk because .6ax ≤ .1bx.
(Note: I really suck at math, so if I made a embarrassing mistake I hope you understand what I am talking about anyway.)
Solving other x-risks will not save us from uFAI. Solving FAI will save us from other x-risks. Solving Oracle AI might save us from other x-risks. I think we should be working on both FAI and Oracle AI.
Good point. I will have to think about it further. Just a few thoughts:
Safe nanotechnology (unsafe nanotechnology being an existential risk) will also save us from various existential risks. Arguably less than a fully-fledged friendly AI. But assume that the disutility of both scenarios is about the same.
An evil AI (as opposed to an unfriendly AI) is as unlikely as a friendly AI. Both risks will probably simply wipe us out and don’t cause extra disutility. If you consider the the extermination of alien life you might get a higher amount of disutility. But I believe that can be outweighed by negative effects of unsafe nanotechnology that doesn’t manage to wipe out humanity but rather cause various dystopian scenarios. Such scenarios are more likely than evil AI because nanotechnology is a tool used by humans who can be deliberately unfriendly.
So let’s say that solving friendly AI has 10x the utility of ensuring safe nanotechnology because it can save us from more existential risks than the use of advanced nanotechnology could.
But one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI. Which is why I asked if it might be possible that the difficulty of solving friendly AI might outweigh its utility and therefore justify us to disregard friendly AI for now. If that is the case it might be better to focus on another existential risk that might wipe us out in all possible worlds where unfriendly AI either comes later or doesn’t pose a risk at all.
Surely only if you completely ignore effects from sociology and psychology!
Machine intellignece may be distant or close. Nobody knows for sure—although there are some estimates. “Close” seems to have some non-negligible probability mass to many observers—so, humans would be justified in paying a lot more attention than many humans are doing.
“AI vs nanotechnology” is rather a false dichotomty. Convergence means that machine intelligence and nanotechnology will spiral in together. Synergy means that each facilitates the production of the other.
If you were to develop safe nanotechnology before unfriendly AI then you should be able to suppress the further development of AGI. With advanced nanotechnology you could spy on and sabotage any research that could lead to existential risk scenarios.
You could also use nanotechnology to advance WBE and use it to develop friendly AI.
Even in the possible worlds where it is true that uncontrollable recursive self-improvement is possible (which I doubt anyone would claim is a certainty and therefore that there are possible outcomes where any amount of nanotechnology won’t result in unfriendly AI), one will come first. If nanotechnology is going to come first then we won’t have to worry about unfriendly AI anymore because we will all be dead.
The question is not only about the utility associated with various existential risks and their probability but also the probability of mitigating the risk. It doesn’t matter if friendly AI can do more good than nanotechnology if nanotechnology comes first or if friendly AI is unsolvable in time.
Note that nanotechnology is just an example.
Probably slightly. Most likely we will get machine intelligence before nanotech and good robots. To build an e-brain you just need a nanotech NAND gate. It is easier to build a brain than an ecosystem. Some lament the difficulties of software engineering—but their concerns seem rather overrated . Yes, software lags behind hardware—but not by a huge amount.
That seems rather pessimistic to me.
The “convergence” I mentioned also includes robots and biotechnology. That should take out any other examples you might have been thinking of.
The problem with CEV can be phrased by extending the metaphor: a CEV built from both hitler and Gandhi means that the areas in which their values differ, are not relevant to the final output. So attitudes to Jews and violence, for instance, will be unpredictable in that CEV (so we should model them now as essentially random).
It’s interesting. Normally my experience is that metaphorical posts get higher votes than technical ones—nor could I have predicted the votes from reading the comments. Ah well; at least it seems to have generated discussion.
That’s not how I understand CEV. But, the theory is in its infancy and underspecified, so it currently admits of many variants.
Hum… If we got the combined CEV of two people, one of whom thought violence was ennobling and one who thought it was degrading, would you expect either or both of:
a) their combined CEV would be the same as if we had started with two people both indifferent to violence
b) their combined CEV would be biased in a particular direction that we can know ahead of time
The idea is that their extrapolated volitions would plausibly not contain such conflicts, though it’s not clear yet whether we can know what that would be ahead of time. Nor is it clear whether their combined CEV would be the same as the combined CEV of two people indifferent to violence.
So, to my ears, it sounds like we don’t have much of an idea at all where the CEV would end up—which means that it most likely ends up somewhere bad, since most random places are bad.
Well, if it captures the key parts of what you want, you can know it will turn out fine even if you’re extremely ignorant about what exactly the result will be.
Yes, as the Spartans answered to Alexander the Great’s father when he said “You are advised to submit without further delay, for if I bring my army into your land, I will destroy your farms, slay your people, and raze your city.” :
“If”.
Yup. So, perhaps, focus on that “if.”
Shouldn’t we be able to rule out at least some classes of scenarios? For instance, paperclip maximization seems like an unlikely CEV output.
Most likely we can rule out most scenarios that all humans agree are bad. So better than clippy, probably.
But we really need a better model of what CEV does! Then we can start to talk sensibly about it.
I don’t think that follows, at all. CEV isn’t a random-walk. It will at the very least end up at a subset of human values. Maybe you meant something different here, by the word ‘bad’?