Of course the AI could choose its answer maliciously, and I am very skeptical about the possibility of preventing that.
Why would it do that? I would say that if it is answering maliciously it is tautologically not the AI you defined. If it is correctly implemented to only care about giving correct answers and doing nothing outside the temporal and spacial limitations then it will not answer maliciously. It isn’t even a matter preventing it from doing that so much as it just wouldn’t, by its very nature, do malicious things.
As a side note creating an AI that is malicious is almost as hard as creating an AI that is friendly. For roughly the same reason that it is hard to lose money to an idealized semi-strong efficient market is almost as hard as beating that same market. You need to have information that has not yet been supplied to the market and do the opposite of what it would take to beat it. We have little to fear that our AI creation will be malicious—what makes AIs scary is and is hard to prevent is indifference.
I think that he meant indifferent rather than malicious, since his point makes a lot more sense in that case. We want the AI to optimize one utility function, but if we knew what that function was, we could build an FAI. Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours. I think what Paul meant by a ‘malicious’ answer is one that furthers its goals in a way that happens to be to the detriment of ours.
I think that he meant indifferent rather than malicious
For most part, yes. And my first paragraph reply represents my reply to the meaning of ‘unFriendly’ rather than just the malicious subset thereof.
Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours.
That is an interpretation that directly contradicts the description given—it isn’t compatible with not caring about the future beyond an hour—or, for that matter, actually being an ‘oracle’ at all. If it was the intended meaning then my responses elsewhere would not have been cautious agreement but instead something along the lines of:
What the heck? You’re creating a complete FAI then hacking an extreme limitation onto the top? Well, yeah, that’s going to be safe—given that it is based on a tautologically safe thing but it is strictly worse than the FAI without restrictions.
Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours.
That is an interpretation that directly contradicts the description given—it isn’t compatible with not caring about the future beyond an hour—or, for that matter, actually being an ‘oracle’ at all.
I was thinking of some of those extremely bad questions that are sometimes proposed to be asked of an oracle AI: “Why don’t we just ask it how to make a lot of money.”, etc. Paul’s example of asking it to give the output that gets us to press the reward button falls into the same category (unless I’m misinterpreting what he meant there?).
Why would it do that? I would say that if it is answering maliciously it is tautologically not the AI you defined. If it is correctly implemented to only care about giving correct answers and doing nothing outside the temporal and spacial limitations then it will not answer maliciously. It isn’t even a matter preventing it from doing that so much as it just wouldn’t, by its very nature, do malicious things.
As a side note creating an AI that is malicious is almost as hard as creating an AI that is friendly. For roughly the same reason that it is hard to lose money to an idealized semi-strong efficient market is almost as hard as beating that same market. You need to have information that has not yet been supplied to the market and do the opposite of what it would take to beat it. We have little to fear that our AI creation will be malicious—what makes AIs scary is and is hard to prevent is indifference.
I think that he meant indifferent rather than malicious, since his point makes a lot more sense in that case. We want the AI to optimize one utility function, but if we knew what that function was, we could build an FAI. Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours. I think what Paul meant by a ‘malicious’ answer is one that furthers its goals in a way that happens to be to the detriment of ours.
For most part, yes. And my first paragraph reply represents my reply to the meaning of ‘unFriendly’ rather than just the malicious subset thereof.
That is an interpretation that directly contradicts the description given—it isn’t compatible with not caring about the future beyond an hour—or, for that matter, actually being an ‘oracle’ at all. If it was the intended meaning then my responses elsewhere would not have been cautious agreement but instead something along the lines of:
I was thinking of some of those extremely bad questions that are sometimes proposed to be asked of an oracle AI: “Why don’t we just ask it how to make a lot of money.”, etc. Paul’s example of asking it to give the output that gets us to press the reward button falls into the same category (unless I’m misinterpreting what he meant there?).