Can someone help me understand why a non-Friendly Question-Answerer is a bad idea?
A Question-Answerer is a system that [...] somehow computes the “answer to the question.” To analyze the difficulty of creating a Question-Answerer, suppose that we ask it the question “what ought we (or I) to do?” [...]
If it cannot answer this question, many of its answers are radically unsafe. Courses of action recommended by the Question-Answerer will likely be unsafe, insofar as “safety” relies on the definition of human value.
I understand that such an AI won’t be able to tell me if something is safe. But if it doesn’t have goals, it wouldn’t try to persuade me that anything is safe. So this sounds like my daily life: There are tools I can use to find answers to some of my questions, but ultimately it is I who must decide whether something is safe or not. This AI doesn’t sound dangerous.
EDIT: Can someone give an example of a disaster involving such an AI?
It seems like the worst it could do is misunderstand your question and give you a recipe for gray goo when you really wanted a recipe for a cake. Bonus points if the gray goo recipe looks a lot like a cake recipe.
It seems to me that I often see people thinking about FAI assuming a best case scenario where all intelligent people are less wrong users who see friendliness as paramount, and discarding solutions that don’t have above a 99.9% chance of succeeding. But really we want an entire stable of solutions, depending on how potential UFAI projects are going, right?
I don’t believe that a gray goo recipe can look like a cake recipe. I believe there are recipes for disastrously harmful things that look like recipes for desirable things; but is a goal-less Question Answerer producing a deceitful recipe more likely than a human working alone accidentally producing one?
The problem of making the average user as prudent as a Less Wrong user seems much easier than FAI. Average users already know to take the results of Wolfram Alpha and Google with a grain of salt. People working on synthetic organisms and nuclear radiation already know to take precautions when doing anything for the first time.
My point about assuming the entire world were less wrong users is that there are teams, made up of people who are not less wrong users, who will develop UFAI if we wait long enough. So a quick and slightly dirty plan (like making this sort of potentially dangerous Oracle AI) may beat a slow and perfect one.
Can someone give an example of a disaster involving such an AI?
The AI might find answers that satisfy the question but violate background assumptions we never thought to include and wouldn’t realize until it was too late (if even then). An easy-to-imagine one that we wouldn’t fall for is a cure for cancer that succeeds by eradicating all cellular life. Of course, it’s more difficult to come up with one that we would fall for, but anything involving cognitive modifications would be a candidate.
So, the reason we wouldn’t fall for that one is that the therapy wouldn’t pass the safety tests required by first-world governments. We have safety tests for all sorts of new technologies, with the stringency of the tests depending on the kind of technology — some testing for children’s toys, more testing for drugs, hopefully more testing for permanent cognitive enhancement. It seems like these tests should protect us from a Question-Answerer as much as from human mistakes.
Actual unfriendly AI seems scarier because it could try to pass our safety tests, in addition to accomplishing its terminal goals. But a Question-Answerer designing something that passes all the tests and nevertheless causes disaster seems about as likely as a well-intentioned but not completely competent human doing the same.
I guess I should have asked for a disaster involving a Question-Answerer which is more plausible than the same scenario with the AI replaced by a human.
Can someone help me understand why a non-Friendly Question-Answerer is a bad idea?
I understand that such an AI won’t be able to tell me if something is safe. But if it doesn’t have goals, it wouldn’t try to persuade me that anything is safe. So this sounds like my daily life: There are tools I can use to find answers to some of my questions, but ultimately it is I who must decide whether something is safe or not. This AI doesn’t sound dangerous.
EDIT: Can someone give an example of a disaster involving such an AI?
It seems like the worst it could do is misunderstand your question and give you a recipe for gray goo when you really wanted a recipe for a cake. Bonus points if the gray goo recipe looks a lot like a cake recipe.
It seems to me that I often see people thinking about FAI assuming a best case scenario where all intelligent people are less wrong users who see friendliness as paramount, and discarding solutions that don’t have above a 99.9% chance of succeeding. But really we want an entire stable of solutions, depending on how potential UFAI projects are going, right?
More bonus points if the recipe really generates a cake… which later with some probability turns into the gray goo.
Now you can have your cake and it will eat you too. :D
I don’t believe that a gray goo recipe can look like a cake recipe. I believe there are recipes for disastrously harmful things that look like recipes for desirable things; but is a goal-less Question Answerer producing a deceitful recipe more likely than a human working alone accidentally producing one?
The problem of making the average user as prudent as a Less Wrong user seems much easier than FAI. Average users already know to take the results of Wolfram Alpha and Google with a grain of salt. People working on synthetic organisms and nuclear radiation already know to take precautions when doing anything for the first time.
My point about assuming the entire world were less wrong users is that there are teams, made up of people who are not less wrong users, who will develop UFAI if we wait long enough. So a quick and slightly dirty plan (like making this sort of potentially dangerous Oracle AI) may beat a slow and perfect one.
Oh! I see. That makes sense.
The AI might find answers that satisfy the question but violate background assumptions we never thought to include and wouldn’t realize until it was too late (if even then). An easy-to-imagine one that we wouldn’t fall for is a cure for cancer that succeeds by eradicating all cellular life. Of course, it’s more difficult to come up with one that we would fall for, but anything involving cognitive modifications would be a candidate.
So, the reason we wouldn’t fall for that one is that the therapy wouldn’t pass the safety tests required by first-world governments. We have safety tests for all sorts of new technologies, with the stringency of the tests depending on the kind of technology — some testing for children’s toys, more testing for drugs, hopefully more testing for permanent cognitive enhancement. It seems like these tests should protect us from a Question-Answerer as much as from human mistakes.
Actual unfriendly AI seems scarier because it could try to pass our safety tests, in addition to accomplishing its terminal goals. But a Question-Answerer designing something that passes all the tests and nevertheless causes disaster seems about as likely as a well-intentioned but not completely competent human doing the same.
I guess I should have asked for a disaster involving a Question-Answerer which is more plausible than the same scenario with the AI replaced by a human.