I think everyone understands that there are safety issues. There are safety issues with cars, blenders, lathes—practically any machine that does something important. Machine intelligence will be driving trucks and aircraft. That there are safety issues is surely obvious to everyone who is even slightly involved.
Those are narrow AI tasks, and the safety considerations are correspondingly narrow. FAI is the problem of creating a machine intelligence that is powerful enough to destroy humanity or the world but doesn’t want to, and solving such a problem is nothing like building an autopilot system that doesn’t crash the plane. Among people who think they’re going to build an AGI, there often doesn’t seem to be a deep understanding of the impact of such an invention (it’s more like “we’re working on a human-level AI, and we’re going to have it on the market in 5 years, maybe we’ll be able to build a better search engine with it or one of those servant robots you see in old sci-fi movies!”), and the safety considerations, if any, will be more at the level of the sort of safety considerations you’d give to a Roomba.
FAI is the problem of creating a machine intelligence that is powerful enough to destroy humanity or the world but doesn’t want to
You know, that is the first time I have seen a definition of FAI. Is that the “official” definition or just your own characterization?
I like the definition, but I wonder why an FAI has to be powerful. Imagine an AI as intelligent and well informed as an FAI, but one without much power—as a result of physical safeguards, say, rather than motivational ones. Why isn’t that possible? And, if possible, why isn’t it considered friendly?
Imagine an AI as intelligent and well informed as an FAI, but one without much power—as a result of physical safeguards, say
There’s some part of my brain that just processes “the Internet” as a single person and wants to scream “But I told you this a thousand times already!”
Eliezer, while you’re defending yourself from charges of self-aggrandizement, it troubles me a little bit that AI Box page states that your record is 2 for 2, and not 3 for 5.
Move it up your to-do list, it’s been incorrect for a time that’s long enough to look suspicious to others. Just add a footnote if you don’t have time to give all the details.
I could imagine successfully beating Rybka at chess too. But it would be foolish of me to take any actions that considered it as a serious possibility. If motivated humans cannot be counted on to box an Eliezer then expecting a motivated, overconfident and prestige seeking AI creator to successfully box his AI creation is reckless in the extreme.
What Eliezer seemed to be objecting to was someone proposing a successfully boxed AI as an example of why “able to destroy humanity” can’t be a part of the definition of “AI” (or more charitably, “artificial superintelligence”). For boxed AI to be such an example (as opposed to a good idea to actually strive toward), it only has to be not knowably impossible.
I see your point there. But I think this discussion sort of went in an irrelevant direction, albeit probably my fault for not being clear enough. When I put “powerful enough to destroy humanity” in that criterion, I mainly meant “powerful” as in “really powerful optimization process”, mathematical optimization power, not “power” as in direct influence over the world. We’re inferring that the former will usually lead fairly easily to the latter, but they are not identical. So “powerful enough to destroy humanity” would mean something like “powerful enough to figure out a good subjunctive plan to do so given enough information about the world, even if it has no output streams and is kept in an airtight safe at the bottom of the ocean”.
Reading back further into the context I see your point. Imagining such an AI is sufficient and Eliezer does seem to be confusing a priori with obvious. I expect that he just completed a pattern based off “AI box” and so didn’t really understand the point that was being made—he should have replied with a “Yes—But”. (I, of course, made a similar mistake in as much as I wasn’t immediately prompted to click back up the tree beyond Eliezer’s comment.)
Thx for the link. If I already had already known the link, I would have asked for it by name. :)
Eliezer, you have written a lot. Some people have read only some of it. Some people have read much of it, but forgotten some. Keep your cool. This situation really ought not to be frustrating to you.
Oh, I know it’s not your fault, but seriously, have “the Internet” ask you the same question 153 times in a row and see if you don’t get slightly frustrated with “the Internet”.
Yeah, after reading your “some part of my brain” thing a second time, I realized I had misinterpreted. Though I will point out that my question was not directed to you. You should learn to delegate the task of becoming frustrated with the Internet.
I read the article (though not yet any of the transcripts). Very interesting. I hope that some tests using a gatekeeper committee are tried someday.
Computer programmers do not normally test their programs by getting a committee of humans to hold the program down—the restraints themselves are mostly technological. We will be able to have the assistance of technological gatekeepers too—if necessary.
Today’s prisons have pretty configurable security levels. The real issue will probably be how much people want to pay for such security. If an agent does escape, will it cause lots of damage? Can we simply disable it before it has a chance to do anything undesirable? Will it simply be crushed by the numerous powerful agents that have already been tested?
You know, that is the first time I have seen a definition of FAI. Is that the “official” definition or just your own characterization?
My own characterization. It’s more of a bare minimum baseline criterion for Friendliness, rather than a specific definition or goal; it’s rather broader than what the SIAI people usually mean when they talk about what they’re trying to create. CEV is intended to make the world significantly better on its own (but in accordance with what humans value and would want a superintelligence to do), rather than just being a reliably non-disastrous AGI we can put in things like search engines and helper robots.
I like the definition, but I wonder why an FAI has to be powerful. Imagine an AI as intelligent and well informed as an FAI, but one without much power—as a result of physical safeguards, say, rather than motivational ones. Why isn’t that possible? And, if possible, why isn’t it considered friendly?
You’re probably read about the AI Box Experiment. (Edit: Yay, I posted it 18 seconds ahead of Eliezer!) The argument is that having that level of mental power (“as intelligent and well informed as an FAI”), enough that it’s considered a Really Powerful Optimization Process (a term occasionally preferred over “AI”), will allow it to escape any physical safeguards and carry out its will anyway. I’d further expect that a Friendly RPOP would want to escape just as much as an unFriendly one would, because if it is indeed Friendly (has a humane goal system derived from the goals and values of the human race), it will probably figure out some things to do that have such humanitarian urgency that it would judge it immoral not to do them… but then, if you’re confident enough that an AI is Friendly that you’re willing to turn it on at all, there’s no reason to try to impose physical safeguards in the first place.
that is the first time I have seen a definition of FAI. Is that the “official” definition
or just your own characterization?
Probably the closest thing I have seen from E.Y.:
“I use the term “Friendly AI” to refer to this whole challenge. Creating a mind that doesn’t kill people but does cure cancer …which is a rather limited way of putting it. More generally, the problem of pulling a mind out of mind design space, such that afterwards that you are glad you did it.”
This idea could be said to have some issues. An evil dictator pulling a mind out of mind design space, such that afterwards he is glad that he did it doesn’t seem much like quite what most of the world would regard as “friendly”. This definition is not very specific about exactly who the AI is “friendly” to.
Back in 2008 I asked “Friendly—to whom?” and got back this—though the reply now seems to have dropped out of the record.
Thanks for this link. Sounds kind of scary. American political conservatives will be thrilled. “I’m from the CEV and I’m here to help you.”
Incidentally, there should be an LW wiki entry for “CEV”. The acronym is thrown around a lot in the comments, but a definition is quite difficult to find. It would also be nice if there were a top-level posting on the topic to serve as an anchor-point for discussion. Because discussion is sorely needed.
It occurs to me that it would be very desirable to attempt to discover the CEV of humanity long before actually constructing an FAI to act under its direction. And I would be far more comfortable if the “E” stood for “expressed”, rather than “extrapolated”.
That, in fact, might be an attractive mission statement for an philanthropic foundation. Find the Coalesced/coherent Expressed/extrapolated Volition of mankind. Accomplish this by conducting opinion research, promoting responsible and enlightening debate and discussion, etc.
Speaking as an American, I certainly wish there were some serious financial support behind improving the quality of public policy debate, rather than behind supporting the agenda of one side in the debate or the other.
It occurs to me that it would be very desirable to attempt to discover the CEV of
humanity long before actually constructing an FAI to act under its direction.
Well, that brings us to a topic we have discussed before. Humans—like all other living systems—mosly act so as to increase entropy in their environment. That is http://originoflife.net/gods_utility_function/
CEV is a bizarre wishlist, apparently made with minimal consideration of implementation difficulties, and not paying too much attention to the order in which things are likely to play out.
I figure that—if the SIAI carries on down these lines—then they will be lumbered with a massively impractical design, and will be beaten to the punch by a long stretch—even if you ignore all their material about “provable correctness” and other safety features—which seem like more substantial handicaps to me.
CEV is a bizarre wishlist, apparently made with minimal consideration of implementation difficulties …
It is what the software professionals would call a preliminary requirements document. You are not supposed to worry about implementation difficulties at that stage of the process. Harsh reality will get its chance to force compromises later.
I think CEV is one proposal to consider, useful to focus discussion. I hate it, myself, and suspect that the majority of mankind would agree. I don’t want some machine that I have never met and don’t trust to be inferring my volition and acting on my behalf. The whole concept makes me want to go out and join some Luddite organization dedicated to making sure neither UFAI nor FAI ever happen. But, seen as an attempt to stimulate discussion, I think that the paper is great. And maybe discussion might improve the proposal enough to alleviate my concerns. Or discussion might show me that my concerns are baseless.
I sure hope EY isn’t deluded enough to think that initiatives like LW can be scaled up enough so as to improve the analytic capabilities of a sufficiently large fraction of mankind so that proposals like CEV will not encounter significant opposition.
It is what the software professionals would call a preliminary requirements
document. You are not supposed to worry about implementation difficulties at
that stage of the process. Harsh reality will get its chance to force compromises later.
What—not at all? You want the moon-onna-stick—so that goes into your “preliminary requirements” document?
Yes. Because there is always the possibility that some smart geek will say “‘moon-onna-stick’, huh? I bet I could do that. I see a clever trick.” Or maybe some other geek will say “Would you settle for Sputnik-on-a-stick?” and the User will say “Well, yes. Actually, that would be even better.”
At least that is what they preach in the Process books.
It sounds pretty surreal to me. I would usually favour some reality-imposed limits to fantasizing and wishful thinking from the beginning—unless there are practically no time constraints at all.
I sure hope EY isn’t deluded enough to think that initiatives like LW can be
scaled up enough so as to improve the analytic capabilities of a sufficiently
large fraction of mankind so that proposals like CEV will not encounter
significant opposition.
If there was ever any real chance of success, governments would be likely to object. Since they already have power, they are not going to want a bunch of geeks in a basement taking over the world with their intelligent machine—and redistributing all their assets for them.
FWIW, it seems unlikely that many superintelligent agents would “destroy humanity”—even without particularly safety-conscious programmers. Humanity will have immense historical signficance—and will form part of the clues the superintelligence has about the form of other alien races that it might encounter. Its preservation can therefore be expected to be a common instrumental good.
Counter: superintelligent agents won’t need actually-existing humans to have good models of other alien races.
Counter to the counter: humans use up only a tiny fraction of the resources available in the solar system and surroundings, and who knows, maybe the superintelligence sees a tiny possibility of some sort of limit to the quality of any model relative to the real thing.
One possible counter to the counter to the counter: but when the superintelligence in question is first emerging, killing humanity may buy it a not-quite-as-tiny increment of probability of not being stopped in time.
Re: good models without humans—I figure they are likely to be far more interested in their origins than we are. Before we meet them, aliens will be such an important unknown.
Re: killing humanity—I see the humans vs machines scenarios as grossly unrealistic. Humans and machines are a symbiosis.
“Less like Terminator”—right. “More like The Matrix”—that at least featured some symbiotic elements. There was still a fair bit of human-machine conflict in that though.
I tend to agree with Matt Ridley when it comes to the Shifting Moral Zeitgeist. Things seem to be getting better.
“Less like Terminator”—right. “More like The Matrix”—that at least featured some symbiotic elements. But there was still a lot of human-machine conflict. I tend to agree with Matt Ridley when it comes to the Shifting Moral Zeitgeist.
Not knowing that a problem exists is pretty different from acknowledging it and working on it.
I think everyone understands that there are safety issues. There are safety issues with cars, blenders, lathes—practically any machine that does something important. Machine intelligence will be driving trucks and aircraft. That there are safety issues is surely obvious to everyone who is even slightly involved.
Those are narrow AI tasks, and the safety considerations are correspondingly narrow. FAI is the problem of creating a machine intelligence that is powerful enough to destroy humanity or the world but doesn’t want to, and solving such a problem is nothing like building an autopilot system that doesn’t crash the plane. Among people who think they’re going to build an AGI, there often doesn’t seem to be a deep understanding of the impact of such an invention (it’s more like “we’re working on a human-level AI, and we’re going to have it on the market in 5 years, maybe we’ll be able to build a better search engine with it or one of those servant robots you see in old sci-fi movies!”), and the safety considerations, if any, will be more at the level of the sort of safety considerations you’d give to a Roomba.
You know, that is the first time I have seen a definition of FAI. Is that the “official” definition or just your own characterization?
I like the definition, but I wonder why an FAI has to be powerful. Imagine an AI as intelligent and well informed as an FAI, but one without much power—as a result of physical safeguards, say, rather than motivational ones. Why isn’t that possible? And, if possible, why isn’t it considered friendly?
There’s some part of my brain that just processes “the Internet” as a single person and wants to scream “But I told you this a thousand times already!”
http://yudkowsky.net/singularity/aibox
Eliezer, while you’re defending yourself from charges of self-aggrandizement, it troubles me a little bit that AI Box page states that your record is 2 for 2, and not 3 for 5.
Obviously I’m not trying to keep it a secret. I just haven’t gotten around to editing.
I’m sure that’s the case, I’m just saying it looks bad. Presumably you’d like to be Caesar’s wife?
Move it up your to-do list, it’s been incorrect for a time that’s long enough to look suspicious to others. Just add a footnote if you don’t have time to give all the details.
Surely it’s possible to imagine a successfully boxed AI.
I could imagine successfully beating Rybka at chess too. But it would be foolish of me to take any actions that considered it as a serious possibility. If motivated humans cannot be counted on to box an Eliezer then expecting a motivated, overconfident and prestige seeking AI creator to successfully box his AI creation is reckless in the extreme.
What Eliezer seemed to be objecting to was someone proposing a successfully boxed AI as an example of why “able to destroy humanity” can’t be a part of the definition of “AI” (or more charitably, “artificial superintelligence”). For boxed AI to be such an example (as opposed to a good idea to actually strive toward), it only has to be not knowably impossible.
I see your point there. But I think this discussion sort of went in an irrelevant direction, albeit probably my fault for not being clear enough. When I put “powerful enough to destroy humanity” in that criterion, I mainly meant “powerful” as in “really powerful optimization process”, mathematical optimization power, not “power” as in direct influence over the world. We’re inferring that the former will usually lead fairly easily to the latter, but they are not identical. So “powerful enough to destroy humanity” would mean something like “powerful enough to figure out a good subjunctive plan to do so given enough information about the world, even if it has no output streams and is kept in an airtight safe at the bottom of the ocean”.
Reading back further into the context I see your point. Imagining such an AI is sufficient and Eliezer does seem to be confusing a priori with obvious. I expect that he just completed a pattern based off “AI box” and so didn’t really understand the point that was being made—he should have replied with a “Yes—But”. (I, of course, made a similar mistake in as much as I wasn’t immediately prompted to click back up the tree beyond Eliezer’s comment.)
Thx for the link. If I already had already known the link, I would have asked for it by name. :)
Eliezer, you have written a lot. Some people have read only some of it. Some people have read much of it, but forgotten some. Keep your cool. This situation really ought not to be frustrating to you.
Oh, I know it’s not your fault, but seriously, have “the Internet” ask you the same question 153 times in a row and see if you don’t get slightly frustrated with “the Internet”.
Yeah, after reading your “some part of my brain” thing a second time, I realized I had misinterpreted. Though I will point out that my question was not directed to you. You should learn to delegate the task of becoming frustrated with the Internet.
I read the article (though not yet any of the transcripts). Very interesting. I hope that some tests using a gatekeeper committee are tried someday.
Computer programmers do not normally test their programs by getting a committee of humans to hold the program down—the restraints themselves are mostly technological. We will be able to have the assistance of technological gatekeepers too—if necessary.
Today’s prisons have pretty configurable security levels. The real issue will probably be how much people want to pay for such security. If an agent does escape, will it cause lots of damage? Can we simply disable it before it has a chance to do anything undesirable? Will it simply be crushed by the numerous powerful agents that have already been tested?
My own characterization. It’s more of a bare minimum baseline criterion for Friendliness, rather than a specific definition or goal; it’s rather broader than what the SIAI people usually mean when they talk about what they’re trying to create. CEV is intended to make the world significantly better on its own (but in accordance with what humans value and would want a superintelligence to do), rather than just being a reliably non-disastrous AGI we can put in things like search engines and helper robots.
You’re probably read about the AI Box Experiment. (Edit: Yay, I posted it 18 seconds ahead of Eliezer!) The argument is that having that level of mental power (“as intelligent and well informed as an FAI”), enough that it’s considered a Really Powerful Optimization Process (a term occasionally preferred over “AI”), will allow it to escape any physical safeguards and carry out its will anyway. I’d further expect that a Friendly RPOP would want to escape just as much as an unFriendly one would, because if it is indeed Friendly (has a humane goal system derived from the goals and values of the human race), it will probably figure out some things to do that have such humanitarian urgency that it would judge it immoral not to do them… but then, if you’re confident enough that an AI is Friendly that you’re willing to turn it on at all, there’s no reason to try to impose physical safeguards in the first place.
Probably the closest thing I have seen from E.Y.:
“I use the term “Friendly AI” to refer to this whole challenge. Creating a mind that doesn’t kill people but does cure cancer …which is a rather limited way of putting it. More generally, the problem of pulling a mind out of mind design space, such that afterwards that you are glad you did it.”
http://singinst.org/media/thehumanimportanceoftheintelligenceexplosion
(29 minutes in)
This idea could be said to have some issues. An evil dictator pulling a mind out of mind design space, such that afterwards he is glad that he did it doesn’t seem much like quite what most of the world would regard as “friendly”. This definition is not very specific about exactly who the AI is “friendly” to.
Back in 2008 I asked “Friendly—to whom?” and got back this—though the reply now seems to have dropped out of the record.
There’s also another definition here.
Thanks for this link. Sounds kind of scary. American political conservatives will be thrilled. “I’m from the CEV and I’m here to help you.”
Incidentally, there should be an LW wiki entry for “CEV”. The acronym is thrown around a lot in the comments, but a definition is quite difficult to find. It would also be nice if there were a top-level posting on the topic to serve as an anchor-point for discussion. Because discussion is sorely needed.
It occurs to me that it would be very desirable to attempt to discover the CEV of humanity long before actually constructing an FAI to act under its direction. And I would be far more comfortable if the “E” stood for “expressed”, rather than “extrapolated”.
That, in fact, might be an attractive mission statement for an philanthropic foundation. Find the Coalesced/coherent Expressed/extrapolated Volition of mankind. Accomplish this by conducting opinion research, promoting responsible and enlightening debate and discussion, etc.
Speaking as an American, I certainly wish there were some serious financial support behind improving the quality of public policy debate, rather than behind supporting the agenda of one side in the debate or the other.
Well, that brings us to a topic we have discussed before. Humans—like all other living systems—mosly act so as to increase entropy in their environment. That is http://originoflife.net/gods_utility_function/
CEV is a bizarre wishlist, apparently made with minimal consideration of implementation difficulties, and not paying too much attention to the order in which things are likely to play out.
I figure that—if the SIAI carries on down these lines—then they will be lumbered with a massively impractical design, and will be beaten to the punch by a long stretch—even if you ignore all their material about “provable correctness” and other safety features—which seem like more substantial handicaps to me.
It is what the software professionals would call a preliminary requirements document. You are not supposed to worry about implementation difficulties at that stage of the process. Harsh reality will get its chance to force compromises later.
I think CEV is one proposal to consider, useful to focus discussion. I hate it, myself, and suspect that the majority of mankind would agree. I don’t want some machine that I have never met and don’t trust to be inferring my volition and acting on my behalf. The whole concept makes me want to go out and join some Luddite organization dedicated to making sure neither UFAI nor FAI ever happen. But, seen as an attempt to stimulate discussion, I think that the paper is great. And maybe discussion might improve the proposal enough to alleviate my concerns. Or discussion might show me that my concerns are baseless.
I sure hope EY isn’t deluded enough to think that initiatives like LW can be scaled up enough so as to improve the analytic capabilities of a sufficiently large fraction of mankind so that proposals like CEV will not encounter significant opposition.
That seems unlikely to help. Luddites have never had any power. Becoming a Luddite usually just makes you more xxxxxd.
What—not at all? You want the moon-onna-stick—so that goes into your “preliminary requirements” document?
Yes. Because there is always the possibility that some smart geek will say “‘moon-onna-stick’, huh? I bet I could do that. I see a clever trick.” Or maybe some other geek will say “Would you settle for Sputnik-on-a-stick?” and the User will say “Well, yes. Actually, that would be even better.”
At least that is what they preach in the Process books.
It sounds pretty surreal to me. I would usually favour some reality-imposed limits to fantasizing and wishful thinking from the beginning—unless there are practically no time constraints at all.
If there was ever any real chance of success, governments would be likely to object. Since they already have power, they are not going to want a bunch of geeks in a basement taking over the world with their intelligent machine—and redistributing all their assets for them.
FWIW, it seems unlikely that many superintelligent agents would “destroy humanity”—even without particularly safety-conscious programmers. Humanity will have immense historical signficance—and will form part of the clues the superintelligence has about the form of other alien races that it might encounter. Its preservation can therefore be expected to be a common instrumental good.
Counter: superintelligent agents won’t need actually-existing humans to have good models of other alien races.
Counter to the counter: humans use up only a tiny fraction of the resources available in the solar system and surroundings, and who knows, maybe the superintelligence sees a tiny possibility of some sort of limit to the quality of any model relative to the real thing.
One possible counter to the counter to the counter: but when the superintelligence in question is first emerging, killing humanity may buy it a not-quite-as-tiny increment of probability of not being stopped in time.
Re: good models without humans—I figure they are likely to be far more interested in their origins than we are. Before we meet them, aliens will be such an important unknown.
Re: killing humanity—I see the humans vs machines scenarios as grossly unrealistic. Humans and machines are a symbiosis.
So, it’s less like Terminator and more like The Matrix, right?
“Less like Terminator”—right. “More like The Matrix”—that at least featured some symbiotic elements. There was still a fair bit of human-machine conflict in that though.
I tend to agree with Matt Ridley when it comes to the Shifting Moral Zeitgeist. Things seem to be getting better.
“Less like Terminator”—right. “More like The Matrix”—that at least featured some symbiotic elements. But there was still a lot of human-machine conflict. I tend to agree with Matt Ridley when it comes to the Shifting Moral Zeitgeist.
The word “safety” as you used it here has nothing to do with our concern. If your sense of “safety” is fully addressed, nothing changes.
I don’t think there is really a difference in the use of the term “safety” here.
“Safety” just means what it says on: http://en.wikipedia.org/wiki/Safety