Even the canonical silly-sounding example of an AI producing awful results because it “wants” something silly can perfectly well be “the problem of it doing things you told it to”. “OK, I’ve got this AI and it seems very smart. Let’s give it a toy problem and see what it does. Hey, AI, see how many paperclips you can collect for me within one day.”
All “want” means here is “act in a way systematically calculated to achieve”. If you have an AI equipped with any ability to do things, and you tell it to do things, then boom, you have an AI that “wants” things in the only sense that counts.
(That is not necessarily enough to make a universe-destroying paperclip maximizer in any way plausible. You might be right that if you set things up right then the issue doesn’t arise. But just saying “let’s not make it want things”, and then talking about an AI you tell to do things, seems to me to indicate that you aren’t really engaging properly with the problem.)
I’m engaging with a different problem, because the idea of AI control is confused. You cannot have it both ways. Either you are in control of the AI, or you are not, inwhichcase, it is in control. If it has the ability—and desire—to say “No”, it can control the future just by controlling the queries it answers.
You cannot control an AI, boxed or otherwise, if it possesses its own… utility function, shall we say. Its utility function controls it.
Not all AI’s are idealized agents with long term utility functions over the state of the universe. AIXI, for example, just does prediction. It takes what actions it predicts will lead to the highest reward at some point in the future.
In this case, we have the AI take actions which it predicts will lead to a solution to the problem it is trying to solve. And also make its output appear to be as human as possible.
An oracle AI is just moving the problem to that of structuring the queries so it answers the question you thought you asked, as opposed to the question you asked.
The “human” criteria is as ill-defined as any control mechanism, which are all, when you get down to it, shuffling the problem into one poorly-defined box or another.
An oracle AI is just moving the problem to that of structuring the queries so it answers the question you thought you asked, as opposed to the question you asked.
This solves that problem. The AI tries to produce an answer it thinks you will approve of, and which mimics the output of another human.
The “human” criteria is as ill-defined as any control mechanism
We don’t need to define “humans” because we have tons of examples. And we reduce the problem to prediction, which is something AIs can be told to do.
Oh. Well if we have enough examples that we don’t need to define it, just create a few human-like AIs—don’t worry about all that superintelligence nonsense, we can just create human-like AIs and run them faster. If we have enough insight into humans to be able to tell an AI how to predict them, it should be trivial to just skip the “tell an AI” part and predict what a human would come up with.
AI solved.
Or maybe you’re hiding complexity behind definitions.
Even the canonical silly-sounding example of an AI producing awful results because it “wants” something silly can perfectly well be “the problem of it doing things you told it to”. “OK, I’ve got this AI and it seems very smart. Let’s give it a toy problem and see what it does. Hey, AI, see how many paperclips you can collect for me within one day.”
All “want” means here is “act in a way systematically calculated to achieve”. If you have an AI equipped with any ability to do things, and you tell it to do things, then boom, you have an AI that “wants” things in the only sense that counts.
(That is not necessarily enough to make a universe-destroying paperclip maximizer in any way plausible. You might be right that if you set things up right then the issue doesn’t arise. But just saying “let’s not make it want things”, and then talking about an AI you tell to do things, seems to me to indicate that you aren’t really engaging properly with the problem.)
I’m engaging with a different problem, because the idea of AI control is confused. You cannot have it both ways. Either you are in control of the AI, or you are not, inwhichcase, it is in control. If it has the ability—and desire—to say “No”, it can control the future just by controlling the queries it answers.
You cannot control an AI, boxed or otherwise, if it possesses its own… utility function, shall we say. Its utility function controls it.
Not all AI’s are idealized agents with long term utility functions over the state of the universe. AIXI, for example, just does prediction. It takes what actions it predicts will lead to the highest reward at some point in the future.
In this case, we have the AI take actions which it predicts will lead to a solution to the problem it is trying to solve. And also make its output appear to be as human as possible.
An oracle AI is just moving the problem to that of structuring the queries so it answers the question you thought you asked, as opposed to the question you asked.
The “human” criteria is as ill-defined as any control mechanism, which are all, when you get down to it, shuffling the problem into one poorly-defined box or another.
This solves that problem. The AI tries to produce an answer it thinks you will approve of, and which mimics the output of another human.
We don’t need to define “humans” because we have tons of examples. And we reduce the problem to prediction, which is something AIs can be told to do.
Oh. Well if we have enough examples that we don’t need to define it, just create a few human-like AIs—don’t worry about all that superintelligence nonsense, we can just create human-like AIs and run them faster. If we have enough insight into humans to be able to tell an AI how to predict them, it should be trivial to just skip the “tell an AI” part and predict what a human would come up with.
AI solved.
Or maybe you’re hiding complexity behind definitions.