Having thought along these lines, I agree that “we’ll use the AI to stop everyone else” is a bad strategy. A specific reason that I didn’t see mentioned as such: “You need to let me out of the box so I can stop all the bad AI companies” is one of the arguments I’d expect an AI to use to convince someone to let it out of the box.
Well, then, supposing that you do accidentally create what appears to be a superintelligent AI, what should you do? I think it’s important for people to figure out a good protocol for this well in advance. Ideally, most of the AI people will trust most of the other AI people to implement this protocol, and then they won’t feel the need to race (and take risks) to become the first to AI. (This is essentially reversing some of the negatives that Andrew Critch talks about.)
The protocol I’m thinking of is, basically: Raise the alarm and bring in some experts who’ve spent years preparing to handle the situation. (Crying wolf could be a problem—we wouldn’t want to raise false alarms so often and make it so expensive that people start saying “Meh, it looks probably ok, don’t bother with the alarm”—so the protocol should probably start with some basic automated tests that can be guaranteed safe.)
It’s like handling first contact with a (super-powerful) alien race. You are the technician who has implemented the communication relays that connect you with them. It’s unlikely that you’re the best at negotiating with them. Also, it’s a little odd if you end up making decisions that have huge impacts on the rest of humanity that they had no say in; from a certain perspective that is inappropriate for you to do.
So, what the experts will do with the AI is important. I mean, if the experts liked catgirls/catboys and said “we’ll negotiate with the AI to develop a virus to turn everyone into catpeople”, a lot of AI researchers would probably say “fuck no” and might prefer to take their own chances negotiating with the AI. So we need to trust the experts to not do something like that. What should they do, then?
Whatever your view of an ideal society, an uber-powerful AI can probably be used to implement it. But people probably will have lots of views of ideal societies that differ in important aspects. (Some dimensions: Are people immortal? Do animals and pets exist? Do humans have every need taken care of by robots, or do they mostly take care of themselves? Do artificial beings exist? Do we get arbitrary gene-editing or remain as we are today?) If AI people can expect the expert negotiators to drastically change the world into some vision of an ideal society… I’m not certain of this, but it seems reasonably likely that there’s no single vision that the majority of AI people would be pleased with—perhaps not even a vision that the majority wouldn’t say “Fuck no, I’d rather YOLO it” to. In which case the “call in the experts” protocol fails.
It would be a shame if humanity’s predictable inability to agree on political philosophies pushed them to kill each other with reckless AGI development.
That being the case, my inclination would be to ask the AI for some list of obviously-good things (cures for Alzheimer’s, cancer, etc.) and then … I’m not sure what next. Probably the last step should be “get input from the rest of humanity on how they want to be”, unless we’ve somehow reached agreement on that in advance. In theory one could put it up to an actual democratic vote.
If it were truly up to me, I might be inclined to go for “extend everyone’s life by 50 years, make everyone gradually more intelligent, ensure that no one nukes/superplagues/etc. the world in the meantime, and reevaluate in 50 years”.
Maybe one could have a general mechanism where people suggest wishes (maybe with Reddit-style upvoting to see which ones get put to a vote), and then, if a wish gets >95% approval or something, it gets implemented. Things like “Alzheimer’s cure” would presumably pass that bar, and things like “Upload everyone’s brains into computers and destroy their organic bodies” wouldn’t, at least not anytime soon.
Another aspect of this situation: If you do raise the alarm, and then start curing cancer / otherwise getting benefits that clearly demonstrate you have a superintelligent AI… Anyone who knew what research paths you were following at the time gets a hint of how to make their own AGI. And even if they don’t know which AI group managed to do it, if there are only, say, 5-10 serious AI groups that seem far enough along, that still narrows down the search space a lot. So… I think it would be good if “responsible” AI groups agreed to the following: If one group manages to create an AGI and show that it has sufficiently godlike capabilities (it cures cancer and whatnot [well, you’d want problems whose answers can be verified more quickly than that]) and that it’s following the “bring in the experts and let them negotiate on behalf of humanity” protocol, then the other groups can retire, or at least stop their research and direct their efforts towards helping the first group. This would be voluntary, and wouldn’t incentivize anyone to be reckless.
(Maybe it’d incentivize them to prove the capabilities quickly; but perhaps that can be part of the “quick automated test suite” given to candidate AGIs.)
There could be a few “probably-bad actor” AI groups that wouldn’t agree to this, but (a) they wouldn’t be the majority and (b) if this were a clearly good protocol that the good people agreed to, then that would limit the bad-actor groups’ access to good-aligned talent.
A specific reason that I didn’t see mentioned as such: “You need to let me out of the box so I can stop all the bad AI companies” is one of the arguments I’d expect an AI to use to convince someone to let it out of the box.
If your AGI is capable and informed enough to give you English-language arguments about the world’s strategic situation, then you’ve either made your system massively too capable to be safe, or you already solved the alignment problem for far more limited AGI and are now able to devote arbitrarily large amounts of time to figuring out the full alignment problem.
Well, then, supposing that you do accidentally create what appears to be a superintelligent AI, what should you do?
AGI is very different from superintelligent AI, even if it’s easy to go from the former to the latter. If you accidentally make superintelligent AI (i.e., AI that’s vastly superhuman on every practically important cognitive dimension), you die. If you deliberately make superintelligent AI, you also die, if we’re in the ‘playing around with the very first AGIs’ regime and not the ‘a pivotal act has already been performed (be it by a private actor, a government, or some combination) and now we’re working on the full alignment problem with no time pressure’ regime.
Also, it’s a little odd if you end up making decisions that have huge impacts on the rest of humanity that they had no say in; from a certain perspective that is inappropriate for you to do.
Keep in mind that ‘this policy seems a little odd’ is a very small cost to pay relative to ‘every human being dies and all of the potential value of the future is lost’. A fire department isn’t a government, and there are cases where you should put out an immediate fire and then get everyone’s input, rather than putting the fire-extinguishing protocol to a vote while the building continues to burn down in front of you. (This seems entirely compatible with the OP to me; ‘governments should be involved’ doesn’t entail ‘government responses should be put to direct population-wide vote by non-experts’.)
Specifically, when I say ‘put out the fire’ I’m talking about ‘prevent something from killing all humans in the near future’; I’m not saying ‘solve all of humanity’s urgent problems, e.g., end cancer and hunger’. That’s urgent, but it’s a qualitatively different sort of urgency. (Delaying a cancer cure by two years would be an incredible tragedy on a human scale, but it’s a rounding error in a discussion of astronomical scales of impact.)
Another aspect of this situation: If you do raise the alarm, and then start curing cancer / otherwise getting benefits that clearly demonstrate you have a superintelligent AI… Anyone who knew what research paths you were following at the time gets a hint of how to make their own AGI.
Alignment is hard; and the more complex or varied are the set of tasks you want to align, the more difficult alignment will be. For the very first uses of AGI, you should find the easiest possible tasks that will ensure that no one else can destroy the world with AGI (whether you’re acting unilaterally, or in collaboration with one or more governments or whatever).
If the easiest, highest-probability-of-success tasks to give your AGI include ‘show how capable this AGI is’ (as in one of the sub-scenarios the OP mentioned), then it’s probably a very bad idea to try to find a task that’s also optimized for its direct humanitarian benefit. That’s just begging for motivated reasoning to sneak in and cause you to take on too difficult of a task, resulting in you destroying the world or just burning too much time (such that someone else destroys the world).
Having thought along these lines, I agree that “we’ll use the AI to stop everyone else” is a bad strategy. A specific reason that I didn’t see mentioned as such: “You need to let me out of the box so I can stop all the bad AI companies” is one of the arguments I’d expect an AI to use to convince someone to let it out of the box.
Well, then, supposing that you do accidentally create what appears to be a superintelligent AI, what should you do? I think it’s important for people to figure out a good protocol for this well in advance. Ideally, most of the AI people will trust most of the other AI people to implement this protocol, and then they won’t feel the need to race (and take risks) to become the first to AI. (This is essentially reversing some of the negatives that Andrew Critch talks about.)
The protocol I’m thinking of is, basically: Raise the alarm and bring in some experts who’ve spent years preparing to handle the situation. (Crying wolf could be a problem—we wouldn’t want to raise false alarms so often and make it so expensive that people start saying “Meh, it looks probably ok, don’t bother with the alarm”—so the protocol should probably start with some basic automated tests that can be guaranteed safe.)
It’s like handling first contact with a (super-powerful) alien race. You are the technician who has implemented the communication relays that connect you with them. It’s unlikely that you’re the best at negotiating with them. Also, it’s a little odd if you end up making decisions that have huge impacts on the rest of humanity that they had no say in; from a certain perspective that is inappropriate for you to do.
So, what the experts will do with the AI is important. I mean, if the experts liked catgirls/catboys and said “we’ll negotiate with the AI to develop a virus to turn everyone into catpeople”, a lot of AI researchers would probably say “fuck no” and might prefer to take their own chances negotiating with the AI. So we need to trust the experts to not do something like that. What should they do, then?
Whatever your view of an ideal society, an uber-powerful AI can probably be used to implement it. But people probably will have lots of views of ideal societies that differ in important aspects. (Some dimensions: Are people immortal? Do animals and pets exist? Do humans have every need taken care of by robots, or do they mostly take care of themselves? Do artificial beings exist? Do we get arbitrary gene-editing or remain as we are today?) If AI people can expect the expert negotiators to drastically change the world into some vision of an ideal society… I’m not certain of this, but it seems reasonably likely that there’s no single vision that the majority of AI people would be pleased with—perhaps not even a vision that the majority wouldn’t say “Fuck no, I’d rather YOLO it” to. In which case the “call in the experts” protocol fails.
It would be a shame if humanity’s predictable inability to agree on political philosophies pushed them to kill each other with reckless AGI development.
That being the case, my inclination would be to ask the AI for some list of obviously-good things (cures for Alzheimer’s, cancer, etc.) and then … I’m not sure what next. Probably the last step should be “get input from the rest of humanity on how they want to be”, unless we’ve somehow reached agreement on that in advance. In theory one could put it up to an actual democratic vote.
If it were truly up to me, I might be inclined to go for “extend everyone’s life by 50 years, make everyone gradually more intelligent, ensure that no one nukes/superplagues/etc. the world in the meantime, and reevaluate in 50 years”.
Maybe one could have a general mechanism where people suggest wishes (maybe with Reddit-style upvoting to see which ones get put to a vote), and then, if a wish gets >95% approval or something, it gets implemented. Things like “Alzheimer’s cure” would presumably pass that bar, and things like “Upload everyone’s brains into computers and destroy their organic bodies” wouldn’t, at least not anytime soon.
Another aspect of this situation: If you do raise the alarm, and then start curing cancer / otherwise getting benefits that clearly demonstrate you have a superintelligent AI… Anyone who knew what research paths you were following at the time gets a hint of how to make their own AGI. And even if they don’t know which AI group managed to do it, if there are only, say, 5-10 serious AI groups that seem far enough along, that still narrows down the search space a lot. So… I think it would be good if “responsible” AI groups agreed to the following: If one group manages to create an AGI and show that it has sufficiently godlike capabilities (it cures cancer and whatnot [well, you’d want problems whose answers can be verified more quickly than that]) and that it’s following the “bring in the experts and let them negotiate on behalf of humanity” protocol, then the other groups can retire, or at least stop their research and direct their efforts towards helping the first group. This would be voluntary, and wouldn’t incentivize anyone to be reckless.
(Maybe it’d incentivize them to prove the capabilities quickly; but perhaps that can be part of the “quick automated test suite” given to candidate AGIs.)
There could be a few “probably-bad actor” AI groups that wouldn’t agree to this, but (a) they wouldn’t be the majority and (b) if this were a clearly good protocol that the good people agreed to, then that would limit the bad-actor groups’ access to good-aligned talent.
If your AGI is capable and informed enough to give you English-language arguments about the world’s strategic situation, then you’ve either made your system massively too capable to be safe, or you already solved the alignment problem for far more limited AGI and are now able to devote arbitrarily large amounts of time to figuring out the full alignment problem.
AGI is very different from superintelligent AI, even if it’s easy to go from the former to the latter. If you accidentally make superintelligent AI (i.e., AI that’s vastly superhuman on every practically important cognitive dimension), you die. If you deliberately make superintelligent AI, you also die, if we’re in the ‘playing around with the very first AGIs’ regime and not the ‘a pivotal act has already been performed (be it by a private actor, a government, or some combination) and now we’re working on the full alignment problem with no time pressure’ regime.
Keep in mind that ‘this policy seems a little odd’ is a very small cost to pay relative to ‘every human being dies and all of the potential value of the future is lost’. A fire department isn’t a government, and there are cases where you should put out an immediate fire and then get everyone’s input, rather than putting the fire-extinguishing protocol to a vote while the building continues to burn down in front of you. (This seems entirely compatible with the OP to me; ‘governments should be involved’ doesn’t entail ‘government responses should be put to direct population-wide vote by non-experts’.)
Specifically, when I say ‘put out the fire’ I’m talking about ‘prevent something from killing all humans in the near future’; I’m not saying ‘solve all of humanity’s urgent problems, e.g., end cancer and hunger’. That’s urgent, but it’s a qualitatively different sort of urgency. (Delaying a cancer cure by two years would be an incredible tragedy on a human scale, but it’s a rounding error in a discussion of astronomical scales of impact.)
Alignment is hard; and the more complex or varied are the set of tasks you want to align, the more difficult alignment will be. For the very first uses of AGI, you should find the easiest possible tasks that will ensure that no one else can destroy the world with AGI (whether you’re acting unilaterally, or in collaboration with one or more governments or whatever).
If the easiest, highest-probability-of-success tasks to give your AGI include ‘show how capable this AGI is’ (as in one of the sub-scenarios the OP mentioned), then it’s probably a very bad idea to try to find a task that’s also optimized for its direct humanitarian benefit. That’s just begging for motivated reasoning to sneak in and cause you to take on too difficult of a task, resulting in you destroying the world or just burning too much time (such that someone else destroys the world).