My understanding of the distinction made in the article was:
Both “agent” and “tool” are ways of interacting with a highly sophisticated optimization process, which takes a “goal” and applies knowledge to find ways of achieving that goal.
An agent then acts out the plan.
A tool reports the plan to a human (often in in a sophisticated way, including plan details, alternatives, etc.).
So, no, it has nothing to do with whether I’m optimizing “my own” utility vs someone else’s.
You divide planning from acting, as if those two are completely separate things. Problem is, in some situations they are not.
If you are speaking with someone, then the act of speach is acting. In this sense, even a “tool” is allowed to act. Now imagine a super-intelligent tool which is able to predict human’s reactions to its words, and make it a part of equation. Now the simple task of finding x such that cost(x) is the smallest, suddenly becomes a task of finding x and finding a proper way to report this x to human, such that cost(x) is the smallest. If this opens some creative new options, where the f(x) is smaller than it should usually be, for the super-intelligent “tool” it will be a correct solution.
So for example reporting a result which makes the human commit suicide, if as a side effect this will make the report true, and it will minimize f(x) beyond normally achievable bounds, is acceptable solution.
Example question: “How should I get rid of my disease most cheaply.” Example answer: “You won’t. You will die soon in terrible pains. This report is 99.999% reliable”. Predicted human reaction: becomes insane from horror, dedices to kill himself, does it clumsily, suffers from horrible pains, then dies. Success rate: 100%, the disease is gone. Costs of cure: zero. Mission completed.
To me, this is still in the spirit of an agent-type architecture. A tool-type architecture will tend to decouple the optimization of the answer given from the optimization of the way it is presented, so that the presentation does not maximize the truth of the statement.
However, I must admit that at this point I’m making a fairly conjunctive argument; IE, the more specific I get about tool/agent distinctions, the less credibility I can assign to the statement “almost all powerful AIs constructed in the near future will be tool-style systems”.
(But I still would maintain my assertion that you would have to specifically program this type of behavior if you wanted to get it.)
This is like the whole point of why LessWrong exists. To remind people that making a superintelligent tool and expecting it to magically gain human common sense is a fast way to extinction.
The superintelligent tool will care about suicide only if you program it to care about suicide. It will care about damage only if you program it to care about damage. -- If you only program it to care about answering correctly, it will answer correctly… and ignore suicide and damage as irrelevant.
If you ask your calculator how much is 2+2, the calculator answers 4 regardles of whether that answer will drive you to suicide or not. (In some contexts, it hypothetically could.) A superintelligent calculator will be able to answer more complex questions. But it will not magically start caring about things you did not program it to care about.
The “superintelligent tool” in the example you provided gave a blatantly incorrect answer by it’s own metric. If it counts suicide as a win, why did it say the disease would not be gotten rid of?
In the example the “win” could be defined as an answer which is: a) technically correct, b) relatively cheap among the technically correct answers.
This is (in my imagination) something that builders of the system could consider reasonable, if either they didn’t consider Friendliness or they believed that a “tool AI” which “only gives answers” is automatically safe.
The computer gives an answer which is technically correct (albeit a self-fulfilling prophecy) and cheap (in dollars spent for cure). For the computer, this answer is a “win”. Not because of the suicide—that part is completely irrelevant. But because of the technical correctness and cheapness.
My understanding of the distinction made in the article was:
Both “agent” and “tool” are ways of interacting with a highly sophisticated optimization process, which takes a “goal” and applies knowledge to find ways of achieving that goal.
An agent then acts out the plan.
A tool reports the plan to a human (often in in a sophisticated way, including plan details, alternatives, etc.).
So, no, it has nothing to do with whether I’m optimizing “my own” utility vs someone else’s.
You divide planning from acting, as if those two are completely separate things. Problem is, in some situations they are not.
If you are speaking with someone, then the act of speach is acting. In this sense, even a “tool” is allowed to act. Now imagine a super-intelligent tool which is able to predict human’s reactions to its words, and make it a part of equation. Now the simple task of finding x such that cost(x) is the smallest, suddenly becomes a task of finding x and finding a proper way to report this x to human, such that cost(x) is the smallest. If this opens some creative new options, where the f(x) is smaller than it should usually be, for the super-intelligent “tool” it will be a correct solution.
So for example reporting a result which makes the human commit suicide, if as a side effect this will make the report true, and it will minimize f(x) beyond normally achievable bounds, is acceptable solution.
Example question: “How should I get rid of my disease most cheaply.” Example answer: “You won’t. You will die soon in terrible pains. This report is 99.999% reliable”. Predicted human reaction: becomes insane from horror, dedices to kill himself, does it clumsily, suffers from horrible pains, then dies. Success rate: 100%, the disease is gone. Costs of cure: zero. Mission completed.
To me, this is still in the spirit of an agent-type architecture. A tool-type architecture will tend to decouple the optimization of the answer given from the optimization of the way it is presented, so that the presentation does not maximize the truth of the statement.
However, I must admit that at this point I’m making a fairly conjunctive argument; IE, the more specific I get about tool/agent distinctions, the less credibility I can assign to the statement “almost all powerful AIs constructed in the near future will be tool-style systems”.
(But I still would maintain my assertion that you would have to specifically program this type of behavior if you wanted to get it.)
Neglecting the cost of the probable implements of suicide, and damage to the rest of the body, doesn’t seem like the sign of a well-optimized tool.
This is like the whole point of why LessWrong exists. To remind people that making a superintelligent tool and expecting it to magically gain human common sense is a fast way to extinction.
The superintelligent tool will care about suicide only if you program it to care about suicide. It will care about damage only if you program it to care about damage. -- If you only program it to care about answering correctly, it will answer correctly… and ignore suicide and damage as irrelevant.
If you ask your calculator how much is 2+2, the calculator answers 4 regardles of whether that answer will drive you to suicide or not. (In some contexts, it hypothetically could.) A superintelligent calculator will be able to answer more complex questions. But it will not magically start caring about things you did not program it to care about.
The “superintelligent tool” in the example you provided gave a blatantly incorrect answer by it’s own metric. If it counts suicide as a win, why did it say the disease would not be gotten rid of?
In the example the “win” could be defined as an answer which is: a) technically correct, b) relatively cheap among the technically correct answers.
This is (in my imagination) something that builders of the system could consider reasonable, if either they didn’t consider Friendliness or they believed that a “tool AI” which “only gives answers” is automatically safe.
The computer gives an answer which is technically correct (albeit a self-fulfilling prophecy) and cheap (in dollars spent for cure). For the computer, this answer is a “win”. Not because of the suicide—that part is completely irrelevant. But because of the technical correctness and cheapness.