Maybe this has been discussed before—if so, please just answer with a link.
Has anyone considered the possibility that the only friendly AI may be one that commits suicide?
There’s great diversity in human values, but all of them have in common that they take as given the limitations of Homo sapiens. In particular, the fact that each Homo sapiens has roughly equal physical and mental capacities to all other Homo sapiens. We have developed diverse systems of rules for interpersonal behavior, but all of them are built for dealing with groups of people like ourselves. (For instance, ideas like reciprocity only make sense if the things we can do to other people are similar to the things they can do to us.)
The decision function of a lone, far more powerful AI would not have this quality. So it would be very different from all human decision functions or principles. Maybe this difference should cause us to call it immoral.
Do you ever have a day when you log on and it seems like everyone is “wrong on the Internet”? (For values of “everyone” equal to 3, on this occasion.) Robin Hanson and Katja Grace both have posts (on teenage angst, on population) where something just seems off, elusively wrong; and now SarahC suggests that “the only friendly AI may be one that commits suicide”. Something about this conjunction of opinions seems obscurely portentous to me. Maybe it’s just a know-thyself moment; there’s some nascent opinion of my own that’s going to crystallize in response.
Now that my special moment of sharing is out of the way… Sarah, is the friendly AI allowed to do just one act of good before it kills itself? Make a child smile, take a few pretty photos from orbit, save someone from dying, stop a war, invent cures for a few hundred diseases? I assume there is some integrity of internal logic behind this thought of yours, but it seems to be overlooking so much about reality that there has to be a significant cognitive disconnect at work here.
I get it from OB also, which I have not followed for some time, and many other places. For me it is the suspicion that I am looking at thought gone wrong.
I would call it “pet theory syndrome.” Someone comes up with a way of “explaining” things and then suddenly the whole world is seen through that particular lens rather than having a more nuanced view; nearly everything is reinterpreted. In Hanson’s case, the pet theories are near/far and status.
I would call it “pet theory syndrome.” Someone comes up with a way of “explaining” things and then suddenly the whole world is seen through that particular lens rather than having a more nuanced view; nearly everything is reinterpreted. In Hanson’s case, the pet theories are near/far and status.
Prediction markets also.
Is anyone worried that LW might have similar issues? If so, what would be the relevant pet theories?
On a related note: suppose a community of moderately rational people had one member who was a lot more informed than them on some subject, but wrong about it. Isn’t it likely they might all end up wrong together? Prediction Markets was the original subject, but it could go for a much wider range of topics: Multiple Worlds, Hansonian Medicine, Far/near, Cryonics...
I don’t get this impression from OB at all. The thoughts at OB even when I disagree with them are far more coherent than the sort of examples given as thought gone wrong. I’m also not sure it is easy to actually distinguish between “thought gone wrong” in the sense of being outright nonsense as drescribed in the linked essay and actually good but highly technical thought processes. For example I could write something like:
Noetherianess of a ring is forced by being Artinian, but the reverse does not hold. The dual nature is puzzling given that Noetherianess is a property which forces ideals to have a real impact on the structure in a way that seems more direct than that of Artin even though Artinian is a stronger condition. One must ask what causes the breakdown in symmetry between the descending and ascending chain conditions.
Now, what I wrote above isn’t nonsense. It is just poorly written, poorly explained math. But if you don’t have some background, this likely looks as bad as the passages quoted by the linked essay. Even when the writing is not poor like that above, one can easily find sections from conversations on LW about say CEV or Bayesianism that look about as nonsensical if one doesn’t know the terms. So without extensive investigation I don’t think one can easily judge whether a given passage is nonsense or not. The essay linked to is therefore less than compelling (in fact, having studied many of their examples I can safely say that they really are nonsensical but it isn’t clear to me how you can tell that from the short passages given with their complete lack of context Edit:. And it could very well be that I just haven’t thought about them enough or approached them correctly just as someone who is very bad at math might consider it to be collectively nonsense even after careful examination) It does however seem that some disciplines run into this problem far more often than others. Thus, philosophy and theology both seem to run into the parading nonsensical streams of words together problem more often than most other areas. I suspect that this is connected to the lack of anything resembling an experimental method.
The thoughts at OB even when I disagree with them are far more coherent than the sort of examples given as thought gone wrong. I’m also not sure it is easy to actually distinguish between “thought gone wrong” in the sense of being outright nonsense as drescribed in the linked essay and actually good but highly technical thought processes.
OB isn’t a technical blog though.
Having criticised it so harshly, I’d better back that up with evidence. Exhibit A: a highly detailed scenario of our far future, supported by not much. Which in later postings to OB (just enter “dreamtime” into the OB search box) becomes part of the background assumptions, just as earlier OB speculations become part of the background assumptions of that posting. It’s like looking at the sky and drawing in constellations (the stars in this analogy being the snippets of scientific evidence adduced here and there).
That example seems to be more in the realm of “not very good thinking” than thought gone wrong. The thoughts are coherent, just not well justified. it isn’t like the sort of thing that is quoted in the example essay where thought gone wrong seems to mean something closer to “not even wrong because it is incoherent.”
Ok, OB certainly isn’t the sort of word salad that Stove is attacking, so that wasn’t a good comparison. But there does seem to me to be something systematically wrong with OB. There is the man-with-a-hammer thing, but I don’t have a problem with people having their hobbyhorses, I know I have some of my own. I’m more put off by the way that speculations get tacitly upgraded to background assumptions, the join-the-dots use of evidence, and all those “X is Y” titles.
From an Enlightenment or Positivist point of view, which is Hume’s point of view, and mine, there is simply no avoiding the conclusion that the human race is mad. There are scarcely any human beings who do not have some lunatic beliefs or other to which they attach great importance. People are mostly sane enough, of course, in the affairs of common life: the getting of food, shelter, and so on. But the moment they attempt any depth or generality of thought, they go mad almost infallibly. The vast majority, of course, adopt the local religious madness, as naturally as they adopt the local dress. But the more powerful minds will, equally infallibly, fall into the worship of some intelligent and dangerous lunatic, such as Plato, or Augustine, or Comte, or Hegel, or Marx.
I’m not necessarily arguing for this position as saying we need to address it. “Suicidal AI” is to the problem of constructing FAI as anarchism is to political theory; if you want to build something (an FAI, a good government) then, on the philosophical level, you have to at least take a stab at countering the argument that perhaps it is impossible to build it.
I’m working under the assumption that we don’t really know at this point what “Friendly” means, otherwise there wouldn’t be a problem to solve. We don’t yet know what we want the AI to do.
What we do know about morality is that human beings practice it. So all our moral laws and intuitions are designed, in particular, for small, mortal creatures, living among other small, mortal creatures.
Egalitarianism, for example, only makes sense if “all men are created equal” is more or less a statement of fact. What should an egalitarian human make of a powerful AI? Is it a tyrant? Well, no, a tyrant is a human who behaves as if he’s not equal to other humans; the AI simply isn’t equal. Well, then, is the AI a good citizen? No, not really, because citizens treat each other on an equal footing...
The trouble here, I think, is that really all our notions of goodness are really “what is good for a human to do.” Perhaps you could extend them to “what is good for a Klingon to do”—but a lot of moral opinions are specifically about how to treat other people who are roughly equivalent to yourself. “Do unto others as you would have them do unto you.” The kind of rules you’d set for an AI would be fundamentally different from our rules for ourselves and each other.
It would be as if a human had a special, obsessive concern and care for an ant farm. You can protect the ants from dying. But there are lots of things you can’t do for the ants: be an ant’s friend, respect an ant, keep up your end of a bargain with an ant, treat an ant as a brother…
I had a friend once who said, “If God existed, I would be his enemy.” Couldn’t someone have the same sentiment about an AI?
(As always, I may very well be wrong on the Internet.)
You say, human values are made for agents of equal power; an AI would not be equal; so maybe the friendly thing to do is for it to delete itself. My question was, is it allowed to do just one or two positive things before it does this? I can also ask: if overwhelming power is the problem, can’t it just reduce itself to human scale? And when you think about all the things that go wrong in the world every day, then it is obvious that there is plenty for a friendly superhuman agency to do. So the whole idea that the best thing it could do is delete itself or hobble itself looks extremely dubious. If your point was that we cannot hope to figure out what friendliness should actually be, and so we just shouldn’t make superhuman agents, that would make more sense.
The comparison to government makes sense in that the power of a mature AI is imagined to be more like that of a state than that of a human individual. It is likely that once an AI had arrived at a stable conception of purpose, it would produce many, many other agents, of varying capability and lifespan, for the implementation of that purpose in the world. There might still be a central super-AI, or its progeny might operate in a completely distributed fashion. But everything would still have been determined by the initial purpose. If it was a purpose that cared nothing for life as we know it, then these derived agencies might just pave the earth and build a new machine ecology. If it was a purpose that placed a value on humans being there and living a certain sort of life, then some of them would spread out among us and interact with us accordingly. You could think of it in cultural terms: the AI sphere would have a culture, a value system, governing its interactions with us. Because of the radical contingency of programmed values, that culture might leave us alone, it might prod our affairs into taking a different shape, or it might act to swiftly and decisively transform human nature. All of these outcomes would appear to be possibilities.
It seems unlikely that an FAI would commit suicide if humans need to be protected from UAI, or if there are other threats that only an FAI could handle.
Maybe this has been discussed before—if so, please just answer with a link.
Has anyone considered the possibility that the only friendly AI may be one that commits suicide?
There’s great diversity in human values, but all of them have in common that they take as given the limitations of Homo sapiens. In particular, the fact that each Homo sapiens has roughly equal physical and mental capacities to all other Homo sapiens. We have developed diverse systems of rules for interpersonal behavior, but all of them are built for dealing with groups of people like ourselves. (For instance, ideas like reciprocity only make sense if the things we can do to other people are similar to the things they can do to us.)
The decision function of a lone, far more powerful AI would not have this quality. So it would be very different from all human decision functions or principles. Maybe this difference should cause us to call it immoral.
Do you ever have a day when you log on and it seems like everyone is “wrong on the Internet”? (For values of “everyone” equal to 3, on this occasion.) Robin Hanson and Katja Grace both have posts (on teenage angst, on population) where something just seems off, elusively wrong; and now SarahC suggests that “the only friendly AI may be one that commits suicide”. Something about this conjunction of opinions seems obscurely portentous to me. Maybe it’s just a know-thyself moment; there’s some nascent opinion of my own that’s going to crystallize in response.
Now that my special moment of sharing is out of the way… Sarah, is the friendly AI allowed to do just one act of good before it kills itself? Make a child smile, take a few pretty photos from orbit, save someone from dying, stop a war, invent cures for a few hundred diseases? I assume there is some integrity of internal logic behind this thought of yours, but it seems to be overlooking so much about reality that there has to be a significant cognitive disconnect at work here.
I’ve noticed I get this feeling relatively often from Overcoming Bias. I think it comes with the contrarian blogging territory.
I get it from OB also, which I have not followed for some time, and many other places. For me it is the suspicion that I am looking at thought gone wrong.
I would call it “pet theory syndrome.” Someone comes up with a way of “explaining” things and then suddenly the whole world is seen through that particular lens rather than having a more nuanced view; nearly everything is reinterpreted. In Hanson’s case, the pet theories are near/far and status.
Prediction markets also.
Is anyone worried that LW might have similar issues? If so, what would be the relevant pet theories?
On a related note: suppose a community of moderately rational people had one member who was a lot more informed than them on some subject, but wrong about it. Isn’t it likely they might all end up wrong together? Prediction Markets was the original subject, but it could go for a much wider range of topics: Multiple Worlds, Hansonian Medicine, Far/near, Cryonics...
That’s where the scientific method comes in handy, though quite a few of Hanson’s posts sound like pop psychology rather than a testable hypothesis.
I don’t get this impression from OB at all. The thoughts at OB even when I disagree with them are far more coherent than the sort of examples given as thought gone wrong. I’m also not sure it is easy to actually distinguish between “thought gone wrong” in the sense of being outright nonsense as drescribed in the linked essay and actually good but highly technical thought processes. For example I could write something like:
Now, what I wrote above isn’t nonsense. It is just poorly written, poorly explained math. But if you don’t have some background, this likely looks as bad as the passages quoted by the linked essay. Even when the writing is not poor like that above, one can easily find sections from conversations on LW about say CEV or Bayesianism that look about as nonsensical if one doesn’t know the terms. So without extensive investigation I don’t think one can easily judge whether a given passage is nonsense or not. The essay linked to is therefore less than compelling (in fact, having studied many of their examples I can safely say that they really are nonsensical but it isn’t clear to me how you can tell that from the short passages given with their complete lack of context Edit:. And it could very well be that I just haven’t thought about them enough or approached them correctly just as someone who is very bad at math might consider it to be collectively nonsense even after careful examination) It does however seem that some disciplines run into this problem far more often than others. Thus, philosophy and theology both seem to run into the parading nonsensical streams of words together problem more often than most other areas. I suspect that this is connected to the lack of anything resembling an experimental method.
OB isn’t a technical blog though.
Having criticised it so harshly, I’d better back that up with evidence. Exhibit A: a highly detailed scenario of our far future, supported by not much. Which in later postings to OB (just enter “dreamtime” into the OB search box) becomes part of the background assumptions, just as earlier OB speculations become part of the background assumptions of that posting. It’s like looking at the sky and drawing in constellations (the stars in this analogy being the snippets of scientific evidence adduced here and there).
That example seems to be more in the realm of “not very good thinking” than thought gone wrong. The thoughts are coherent, just not well justified. it isn’t like the sort of thing that is quoted in the example essay where thought gone wrong seems to mean something closer to “not even wrong because it is incoherent.”
Ok, OB certainly isn’t the sort of word salad that Stove is attacking, so that wasn’t a good comparison. But there does seem to me to be something systematically wrong with OB. There is the man-with-a-hammer thing, but I don’t have a problem with people having their hobbyhorses, I know I have some of my own. I’m more put off by the way that speculations get tacitly upgraded to background assumptions, the join-the-dots use of evidence, and all those “X is Y” titles.
Got a good summary of this? The author seems to be taking way too long to make his point.
“Most human thought has been various different kinds of nonsense that we mostly haven’t yet categorized or named.”
This paragraph, perhaps?
I think that should go in the next quotes thread.
Or perhaps the quotes thread from 12 months ago.
I’m not necessarily arguing for this position as saying we need to address it. “Suicidal AI” is to the problem of constructing FAI as anarchism is to political theory; if you want to build something (an FAI, a good government) then, on the philosophical level, you have to at least take a stab at countering the argument that perhaps it is impossible to build it.
I’m working under the assumption that we don’t really know at this point what “Friendly” means, otherwise there wouldn’t be a problem to solve. We don’t yet know what we want the AI to do.
What we do know about morality is that human beings practice it. So all our moral laws and intuitions are designed, in particular, for small, mortal creatures, living among other small, mortal creatures.
Egalitarianism, for example, only makes sense if “all men are created equal” is more or less a statement of fact. What should an egalitarian human make of a powerful AI? Is it a tyrant? Well, no, a tyrant is a human who behaves as if he’s not equal to other humans; the AI simply isn’t equal. Well, then, is the AI a good citizen? No, not really, because citizens treat each other on an equal footing...
The trouble here, I think, is that really all our notions of goodness are really “what is good for a human to do.” Perhaps you could extend them to “what is good for a Klingon to do”—but a lot of moral opinions are specifically about how to treat other people who are roughly equivalent to yourself. “Do unto others as you would have them do unto you.” The kind of rules you’d set for an AI would be fundamentally different from our rules for ourselves and each other.
It would be as if a human had a special, obsessive concern and care for an ant farm. You can protect the ants from dying. But there are lots of things you can’t do for the ants: be an ant’s friend, respect an ant, keep up your end of a bargain with an ant, treat an ant as a brother…
I had a friend once who said, “If God existed, I would be his enemy.” Couldn’t someone have the same sentiment about an AI?
(As always, I may very well be wrong on the Internet.)
You say, human values are made for agents of equal power; an AI would not be equal; so maybe the friendly thing to do is for it to delete itself. My question was, is it allowed to do just one or two positive things before it does this? I can also ask: if overwhelming power is the problem, can’t it just reduce itself to human scale? And when you think about all the things that go wrong in the world every day, then it is obvious that there is plenty for a friendly superhuman agency to do. So the whole idea that the best thing it could do is delete itself or hobble itself looks extremely dubious. If your point was that we cannot hope to figure out what friendliness should actually be, and so we just shouldn’t make superhuman agents, that would make more sense.
The comparison to government makes sense in that the power of a mature AI is imagined to be more like that of a state than that of a human individual. It is likely that once an AI had arrived at a stable conception of purpose, it would produce many, many other agents, of varying capability and lifespan, for the implementation of that purpose in the world. There might still be a central super-AI, or its progeny might operate in a completely distributed fashion. But everything would still have been determined by the initial purpose. If it was a purpose that cared nothing for life as we know it, then these derived agencies might just pave the earth and build a new machine ecology. If it was a purpose that placed a value on humans being there and living a certain sort of life, then some of them would spread out among us and interact with us accordingly. You could think of it in cultural terms: the AI sphere would have a culture, a value system, governing its interactions with us. Because of the radical contingency of programmed values, that culture might leave us alone, it might prod our affairs into taking a different shape, or it might act to swiftly and decisively transform human nature. All of these outcomes would appear to be possibilities.
It seems unlikely that an FAI would commit suicide if humans need to be protected from UAI, or if there are other threats that only an FAI could handle.