This is like the whole point of why LessWrong exists. To remind people that making a superintelligent tool and expecting it to magically gain human common sense is a fast way to extinction.
The superintelligent tool will care about suicide only if you program it to care about suicide. It will care about damage only if you program it to care about damage. -- If you only program it to care about answering correctly, it will answer correctly… and ignore suicide and damage as irrelevant.
If you ask your calculator how much is 2+2, the calculator answers 4 regardles of whether that answer will drive you to suicide or not. (In some contexts, it hypothetically could.) A superintelligent calculator will be able to answer more complex questions. But it will not magically start caring about things you did not program it to care about.
The “superintelligent tool” in the example you provided gave a blatantly incorrect answer by it’s own metric. If it counts suicide as a win, why did it say the disease would not be gotten rid of?
In the example the “win” could be defined as an answer which is: a) technically correct, b) relatively cheap among the technically correct answers.
This is (in my imagination) something that builders of the system could consider reasonable, if either they didn’t consider Friendliness or they believed that a “tool AI” which “only gives answers” is automatically safe.
The computer gives an answer which is technically correct (albeit a self-fulfilling prophecy) and cheap (in dollars spent for cure). For the computer, this answer is a “win”. Not because of the suicide—that part is completely irrelevant. But because of the technical correctness and cheapness.
Neglecting the cost of the probable implements of suicide, and damage to the rest of the body, doesn’t seem like the sign of a well-optimized tool.
This is like the whole point of why LessWrong exists. To remind people that making a superintelligent tool and expecting it to magically gain human common sense is a fast way to extinction.
The superintelligent tool will care about suicide only if you program it to care about suicide. It will care about damage only if you program it to care about damage. -- If you only program it to care about answering correctly, it will answer correctly… and ignore suicide and damage as irrelevant.
If you ask your calculator how much is 2+2, the calculator answers 4 regardles of whether that answer will drive you to suicide or not. (In some contexts, it hypothetically could.) A superintelligent calculator will be able to answer more complex questions. But it will not magically start caring about things you did not program it to care about.
The “superintelligent tool” in the example you provided gave a blatantly incorrect answer by it’s own metric. If it counts suicide as a win, why did it say the disease would not be gotten rid of?
In the example the “win” could be defined as an answer which is: a) technically correct, b) relatively cheap among the technically correct answers.
This is (in my imagination) something that builders of the system could consider reasonable, if either they didn’t consider Friendliness or they believed that a “tool AI” which “only gives answers” is automatically safe.
The computer gives an answer which is technically correct (albeit a self-fulfilling prophecy) and cheap (in dollars spent for cure). For the computer, this answer is a “win”. Not because of the suicide—that part is completely irrelevant. But because of the technical correctness and cheapness.