Besides, if you could just ask an Oracle AI how to make it friendly, what’s the difference to an AI that’s build to answer and implement that question? Given that such an AI is supposedly perfectly rational, wouldn’t it be careful to answer the question correctly even if it was defined poorly? Wouldn’t it try to answer the question carefully as to not diminish or obstruct the answer? If the answer is no, then how would an Oracle AI be different in the respect of coming up with an adequate answer to a poorly formed and therefore vague question?
In other words, if you expect an Oracle AI to guess what you mean by friendliness and give a correct answer, why wouldn’t that work with an unbounded AI as well?
An AI just doesn’t care what you want. And if it cared what you want then it wouldn’t know what exactly you want. And if it cared what you want and cared to figure out what exactly you want then it would already be friendly.
The problem is that an AI doesn’t care and doesn’t care to care. Why would that be different with an Oracle AI? If you could just ask it to solve the friendly AI problem then it is only a small step from there to ask it to actually implement it by making itself friendly.
It may not be possible to build a FAI at all—or we may end up with a limited oracle that can answer only easier questions, or only fully specified ones.
I know and I didn’t downvote your post either. I think it is good to stimulate more discussion about alternatives (or preliminary solutions) to friendly AI in case it turns out to be unsolvable in time.
...or we may end up with a limited oracle that can answer only easier questions, or only fully specified ones.
The problem is that you appear to be saying that it would somehow be “safe”. If you are talking about expert systems then it would presumably not be a direct risk but (if it is advanced enough to make real progress that humans alone can’t) a huge stepping stone towards fully general intelligence. That means that if you target Oracle AI instead of friendly AI you will just increase the probability of uFAI.
Oracle AI has to be a last resort when the shit hits the fan.
(ETA: If you mean we should also work on solutions to keep a possible Oracle AI inside a box (a light version of friendly AI), then I agree. But one should first try to figure out how likely friendly AI is to be solved before allocating resources to Oracle AI.)
Oracle AI has to be a last resort when the shit hits the fan.
If we had infinite time, I’d agree with you. But I’m feeling that we have little chance of solving FAI before the shit indeed does hit the fan and us. The route safe Oracle → Oracle asisted FAI design seems more plausible to me. Especially as we are so much better at correcting errors than preventing them, so a prediction Oracle (if safe) would play to our strengths.
But I’m feeling that we have little chance of solving FAI before the shit indeed does hit the fan and us.
If I assume a high probability of risks from AI and a short planning horizon then I agree. But it is impossible to say. I take the same stance as Holden Karnofsky from GiveWell regarding the value of FAI research at this point:
I think that if you’re aiming to develop knowledge that won’t be useful until very very far in the future, you’re probably wasting your time, if for no other reason than this: by the time your knowledge is relevant, someone will probably have developed a tool (such as a narrow AI) so much more efficient in generating this knowledge that it renders your work moot.
I think the same applies for fail-safe mechanisms and Oracle AI, although to a lesser extent.
The route safe Oracle → Oracle asisted FAI design...
What is your agenda for developing such a safe Oracle? Are you going to do AGI research first and along the way try to come up with solutions on how to make it safe? I think that would be a promising approach. But if you are trying to come up with ways on how to ensure the safety of a fictive Oracle, whose nature is a mystery to you, then the argument mentioned above counts again.
Besides, if you could just ask an Oracle AI how to make it friendly, what’s the difference to an AI that’s build to answer and implement that question? Given that such an AI is supposedly perfectly rational, wouldn’t it be careful to answer the question correctly even if it was defined poorly? Wouldn’t it try to answer the question carefully as to not diminish or obstruct the answer? If the answer is no, then how would an Oracle AI be different in the respect of coming up with an adequate answer to a poorly formed and therefore vague question?
In other words, if you expect an Oracle AI to guess what you mean by friendliness and give a correct answer, why wouldn’t that work with an unbounded AI as well?
An AI just doesn’t care what you want. And if it cared what you want then it wouldn’t know what exactly you want. And if it cared what you want and cared to figure out what exactly you want then it would already be friendly.
The problem is that an AI doesn’t care and doesn’t care to care. Why would that be different with an Oracle AI? If you could just ask it to solve the friendly AI problem then it is only a small step from there to ask it to actually implement it by making itself friendly.
It may not be possible to build a FAI at all—or we may end up with a limited oracle that can answer only easier questions, or only fully specified ones.
I know and I didn’t downvote your post either. I think it is good to stimulate more discussion about alternatives (or preliminary solutions) to friendly AI in case it turns out to be unsolvable in time.
The problem is that you appear to be saying that it would somehow be “safe”. If you are talking about expert systems then it would presumably not be a direct risk but (if it is advanced enough to make real progress that humans alone can’t) a huge stepping stone towards fully general intelligence. That means that if you target Oracle AI instead of friendly AI you will just increase the probability of uFAI.
Oracle AI has to be a last resort when the shit hits the fan.
(ETA: If you mean we should also work on solutions to keep a possible Oracle AI inside a box (a light version of friendly AI), then I agree. But one should first try to figure out how likely friendly AI is to be solved before allocating resources to Oracle AI.)
If we had infinite time, I’d agree with you. But I’m feeling that we have little chance of solving FAI before the shit indeed does hit the fan and us. The route safe Oracle → Oracle asisted FAI design seems more plausible to me. Especially as we are so much better at correcting errors than preventing them, so a prediction Oracle (if safe) would play to our strengths.
If I assume a high probability of risks from AI and a short planning horizon then I agree. But it is impossible to say. I take the same stance as Holden Karnofsky from GiveWell regarding the value of FAI research at this point:
I think the same applies for fail-safe mechanisms and Oracle AI, although to a lesser extent.
What is your agenda for developing such a safe Oracle? Are you going to do AGI research first and along the way try to come up with solutions on how to make it safe? I think that would be a promising approach. But if you are trying to come up with ways on how to ensure the safety of a fictive Oracle, whose nature is a mystery to you, then the argument mentioned above counts again.