I’m not suggesting that the problems would come from what we normally think of as software bugs (though see the suggestion in this comment). I’m suggesting that they would come from a failure to specify the right things in a complex scenario—and that this problem bears enough similarities to software bugs that they could be a good test bed for working out how to approach such problems.
The flaws leading to an unexpectedly unfriendly AI certainly might lead back to a flaw in the design—but I think it is overly optimistic to think that the human mind (or a group of minds, or perhaps any mind) is capable of reliably creating specs that are sufficient to avoid this. We can and do spend tremendous time on this sort of thing already, and bad things still happen. You hold the shuttle up as an example of reliability done right (which it is) - but it still blew up, because not all of shuttle design is software. In the same way, the issue could arise from some environmental issue that alters the AI in such a way that it is unpredictable—power fluctuations, bit flip, who knows. The world is a horribly non-deterministic place, from a human POV.
By way of analogy—consider weather prediction. We have worked on it for all of history, we have satellites and supercomputers—and we are still only capable of accurate predictions for a few days or week, getting less and less accurate as we go. This isn’t a case of making a mistake—it is a case of a very complex end-state arising from simple beginnings, and lacking the ability to make perfectly accurate predictions about some things. To put it another way—it may simply be the problem is not computable, now or with any forseeable technology.
My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would—you really didn’t gain all that much, and the wiser course of action is still not to make the AI.
The actual percentages are wildly debatable, of course, but I would say that if you think there is any chance—no matter how small—of triggering ye olde existential crisis, you don’t do it—and I do not believe that technique alone could get us anywhere close to that.
The ideas you propose in OP seem wise, and good for society—and wholly ineffective in actually stopping us from creating an unfriendly AI, The reasons are simply that the complexity defies analysis, at least by human beings. The fear is that the unfriendly arises from unintended design consequences, from unanticipated system effects rather than bugs in code or faulty intent
It’s a consequence of entropy—there are simply far, far more ways for something to get screwed up than for it to be right. So unexpected effects arising from complexity are far, far more likely to cause issues than be beneficial unless you can somehow correct for them—planning ahead only will get you so far.
Your OP suggests that we might be more successful if we got more of it right “the first time”. But—things this complex are not created, finished, de-novo—they are an iterative, evolutionary task. The training could well be helpful, but I suspect not for the reasons you suggested. The real trick is to design things so that when they go wrong—it still works correctly. You have to plan for and expect failure, or that inevitable failure is the end of the line.
My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would—you really didn’t gain all that much, and the wiser course of action is still not to make the AI.
That’s not the choice we are making. The choice we are making is to decide to develop those techniques.
The techniques are useful, in and of themselves, without having to think about utility in creating a friendly AI.
So, yes, by all means, work on better skills.
But—the point I’m trying to make is that while they may help, they are insufficient to provide any real degree of confidence in preventing the creation of an unfriendly AI, because the emergent effects that would likely be responsible for such are not amenable to planning about ahead of time.
It seems to me your original proposal is the logical equivalent to “Hey, if we can figure out how to better predict where lightning strikes—we could go there ahead of time and be ready to stop the fires quickly, before the spread”. Well, sure—except that sort of prediction would depend on knowing ahead of time the outcome of very unpredictable events (“where, exactly, will the lightning strike?”) - and it would be far more practical to spend the time and effort on things like lightning rods and firebreaks.
But—the point I’m trying to make is that while they may help, they are insufficient to provide any real degree of confidence in preventing the creation of an unfriendly AI
Basically you attack a strawman.
and it would be far more practical to spend the time and effort on things like lightning rods and firebreaks.
Unfortunately I don’t think anybody has proposed an idea of how to solve FAI that’s as straightforward as building lighting rods.
In computer security there the idea of “defense in depth”. You try to get every layer right and as secure as possible.
″… idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI.” is what you said. I said preventing the creation of an unfriendly AI.
Ok. valid point. Not the same.
I would say the items described will do nothing whatsoever to “increase the likelihood of society acquiring robustly safe and beneficial AI.”
They are certainly of value in normal software development, but it seems increasingly likely as time passes without a proper general AI actually being created that such a task is far, far more difficult than anyone expected, and that if one does come into being, it will happen in a manner other than the typical software development process as we do things today. It will be an incremental process of change and refinement seeking a goal, is my guess. Starting from a great starting point might presumably reduce the iterations a bit, but other than a head start toward the finish line, I cannot imagine it would affect the course much.
If we drop single cell organisms on a terraformed planet, and come back a hundred million years or so—we might well expect to find higher life forms evolved from it, but finding human beings is basically not gonna happen. If we repeat that—same general outcome (higher life forms), but wildly differing specifics. The initial state of the system ends up being largely unimportant—what matters is evolution, the ability to reproduce, mutate and adapt. Direction during that process could well guide it—but the exact configuration of the initial state (the exact type of organisms we used as a seed) is largely irrelevant.
re. Computer security—I actually do that for a living. Small security rant—my apologies:
You do not actually try to get every layer “as right and secure as possible.” The whole point of defense in depth is that any given security measure can fail, so to ensure protection, you use multiple layers of different technologies so that when (not if) one layer fails, the other layers are there to “take up the slack”, so to speak.
The goal on each layer is not “as secure as possible”, but simply “as secure as reasonable” (you seek a “sweet spot” that balances security and other factors like cost), and you rely on the whole to achieve the goal. Considerations include cost to implement and maintain, the value of what you are protecting, the damage caused should security fail, who your likely attackers will be and their technical capabilities, performance impact, customer impact, and many other factors.
Additionally, security costs at a given layer do not increase linearly, so making a given layer more secure, while often possible, quickly becomes inefficient. Example—Most websites use a 2k SSL key; 4k is more secure, and 8k is even moreso. Except − 8k doesn’t work everywhere, and the bigger keys come with a performance impact that matters at scale—and the key size is usually not the reason a key is compromised. So—the entire world (for the most part) does not use the most secure option, simply because it’s not worth it—the additional security is swamped by the drawbacks. (Similar issues occur regarding cipher choice, fwiw).
In reality—in nearly all situations, human beings are the weak link. You can have awesome security, and all it takes is one bozo and it all comes down. SSL is great, until someone manages to get a key signed fraudulently, and bypasses it entirely. Packet filtering is dandy, except that fred in accounting wanted to play minecraft and opened up a ssh tunnel, incorrectly. MFA is fine, except the secretary who logged into the VPN using MFA just plugged the thumb drive they found in the parking lot into per PC, and actually ran “Elf Bowling”, and now your AD is owned and the attacker is escalating privledge from inside. so it doesn’t matter that much about your hard candy shell, he’s in the soft, chewy center. THIS, by the way, is where things like education are of the most value—not in making the very skilled more skilled, but in making the clueless somewhat more clueful. If you want to make a friendly AI—remove human beings from the loop as much as possible...
Ok, done with rant. Again, sorry—I live this 40-60 hours a week.
I disagree that “you really didn’t gain all that much” in your example. There are possible numbers such that it’s better to avoid producing AI, but (a) that may not be a lever which is available to us, and (b) AI done right would probably represent an existential eucatastrophe, greatly improving our ability to avoid or deal with future threats.
I have an intellectual issue with using “probably” before an event that has never happened before, in the history of the universe (so far as I can tell).
And—if I am given the choice between slow, steady improvement in the lot of humanity (which seems to be the status quo), and a dice throw that results in either paradise, or extinction—I’ll stick with slow steady, thanks, unless the odds were overwhelmingly positive. And—I suspect they are, but in the opposite direction, because there are far more ways to screw up than to succeed, and once the AI is out—you no longer have a chance to change it much. I’d prefer to wait it out, slowly refining things, until paradise is assured.
Hmm. That actually brings a thought to mind. If an unfriendly AI was far more likely than a friendly one (as I have just been suggesting) - why aren’t we made of computronium? I can think of a few reasons, with no real way to decide. The scary one is “maybe we are, and this evolution thing is the unfriendly part...”
I’m not suggesting that the problems would come from what we normally think of as software bugs (though see the suggestion in this comment). I’m suggesting that they would come from a failure to specify the right things in a complex scenario—and that this problem bears enough similarities to software bugs that they could be a good test bed for working out how to approach such problems.
The flaws leading to an unexpectedly unfriendly AI certainly might lead back to a flaw in the design—but I think it is overly optimistic to think that the human mind (or a group of minds, or perhaps any mind) is capable of reliably creating specs that are sufficient to avoid this. We can and do spend tremendous time on this sort of thing already, and bad things still happen. You hold the shuttle up as an example of reliability done right (which it is) - but it still blew up, because not all of shuttle design is software. In the same way, the issue could arise from some environmental issue that alters the AI in such a way that it is unpredictable—power fluctuations, bit flip, who knows. The world is a horribly non-deterministic place, from a human POV.
By way of analogy—consider weather prediction. We have worked on it for all of history, we have satellites and supercomputers—and we are still only capable of accurate predictions for a few days or week, getting less and less accurate as we go. This isn’t a case of making a mistake—it is a case of a very complex end-state arising from simple beginnings, and lacking the ability to make perfectly accurate predictions about some things. To put it another way—it may simply be the problem is not computable, now or with any forseeable technology.
I’m not sure quite what point you’re trying to make:
If you’re arguing that with the best attempt in the world it might be we still get it wrong, I agree.
If you’re arguing that greater diligence and better techniques won’t increase our chances, I disagree.
If you’re arguing something else, I’ve missed the point.
Fair question.
My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would—you really didn’t gain all that much, and the wiser course of action is still not to make the AI.
The actual percentages are wildly debatable, of course, but I would say that if you think there is any chance—no matter how small—of triggering ye olde existential crisis, you don’t do it—and I do not believe that technique alone could get us anywhere close to that.
The ideas you propose in OP seem wise, and good for society—and wholly ineffective in actually stopping us from creating an unfriendly AI, The reasons are simply that the complexity defies analysis, at least by human beings. The fear is that the unfriendly arises from unintended design consequences, from unanticipated system effects rather than bugs in code or faulty intent
It’s a consequence of entropy—there are simply far, far more ways for something to get screwed up than for it to be right. So unexpected effects arising from complexity are far, far more likely to cause issues than be beneficial unless you can somehow correct for them—planning ahead only will get you so far.
Your OP suggests that we might be more successful if we got more of it right “the first time”. But—things this complex are not created, finished, de-novo—they are an iterative, evolutionary task. The training could well be helpful, but I suspect not for the reasons you suggested. The real trick is to design things so that when they go wrong—it still works correctly. You have to plan for and expect failure, or that inevitable failure is the end of the line.
That’s not the choice we are making. The choice we are making is to decide to develop those techniques.
The techniques are useful, in and of themselves, without having to think about utility in creating a friendly AI.
So, yes, by all means, work on better skills.
But—the point I’m trying to make is that while they may help, they are insufficient to provide any real degree of confidence in preventing the creation of an unfriendly AI, because the emergent effects that would likely be responsible for such are not amenable to planning about ahead of time.
It seems to me your original proposal is the logical equivalent to “Hey, if we can figure out how to better predict where lightning strikes—we could go there ahead of time and be ready to stop the fires quickly, before the spread”. Well, sure—except that sort of prediction would depend on knowing ahead of time the outcome of very unpredictable events (“where, exactly, will the lightning strike?”) - and it would be far more practical to spend the time and effort on things like lightning rods and firebreaks.
Basically you attack a strawman.
Unfortunately I don’t think anybody has proposed an idea of how to solve FAI that’s as straightforward as building lighting rods.
In computer security there the idea of “defense in depth”. You try to get every layer right and as secure as possible.
Strawman?
″… idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI.” is what you said. I said preventing the creation of an unfriendly AI.
Ok. valid point. Not the same.
I would say the items described will do nothing whatsoever to “increase the likelihood of society acquiring robustly safe and beneficial AI.”
They are certainly of value in normal software development, but it seems increasingly likely as time passes without a proper general AI actually being created that such a task is far, far more difficult than anyone expected, and that if one does come into being, it will happen in a manner other than the typical software development process as we do things today. It will be an incremental process of change and refinement seeking a goal, is my guess. Starting from a great starting point might presumably reduce the iterations a bit, but other than a head start toward the finish line, I cannot imagine it would affect the course much.
If we drop single cell organisms on a terraformed planet, and come back a hundred million years or so—we might well expect to find higher life forms evolved from it, but finding human beings is basically not gonna happen. If we repeat that—same general outcome (higher life forms), but wildly differing specifics. The initial state of the system ends up being largely unimportant—what matters is evolution, the ability to reproduce, mutate and adapt. Direction during that process could well guide it—but the exact configuration of the initial state (the exact type of organisms we used as a seed) is largely irrelevant.
re. Computer security—I actually do that for a living. Small security rant—my apologies:
You do not actually try to get every layer “as right and secure as possible.” The whole point of defense in depth is that any given security measure can fail, so to ensure protection, you use multiple layers of different technologies so that when (not if) one layer fails, the other layers are there to “take up the slack”, so to speak.
The goal on each layer is not “as secure as possible”, but simply “as secure as reasonable” (you seek a “sweet spot” that balances security and other factors like cost), and you rely on the whole to achieve the goal. Considerations include cost to implement and maintain, the value of what you are protecting, the damage caused should security fail, who your likely attackers will be and their technical capabilities, performance impact, customer impact, and many other factors.
Additionally, security costs at a given layer do not increase linearly, so making a given layer more secure, while often possible, quickly becomes inefficient. Example—Most websites use a 2k SSL key; 4k is more secure, and 8k is even moreso. Except − 8k doesn’t work everywhere, and the bigger keys come with a performance impact that matters at scale—and the key size is usually not the reason a key is compromised. So—the entire world (for the most part) does not use the most secure option, simply because it’s not worth it—the additional security is swamped by the drawbacks. (Similar issues occur regarding cipher choice, fwiw).
In reality—in nearly all situations, human beings are the weak link. You can have awesome security, and all it takes is one bozo and it all comes down. SSL is great, until someone manages to get a key signed fraudulently, and bypasses it entirely. Packet filtering is dandy, except that fred in accounting wanted to play minecraft and opened up a ssh tunnel, incorrectly. MFA is fine, except the secretary who logged into the VPN using MFA just plugged the thumb drive they found in the parking lot into per PC, and actually ran “Elf Bowling”, and now your AD is owned and the attacker is escalating privledge from inside. so it doesn’t matter that much about your hard candy shell, he’s in the soft, chewy center. THIS, by the way, is where things like education are of the most value—not in making the very skilled more skilled, but in making the clueless somewhat more clueful. If you want to make a friendly AI—remove human beings from the loop as much as possible...
Ok, done with rant. Again, sorry—I live this 40-60 hours a week.
I disagree that “you really didn’t gain all that much” in your example. There are possible numbers such that it’s better to avoid producing AI, but (a) that may not be a lever which is available to us, and (b) AI done right would probably represent an existential eucatastrophe, greatly improving our ability to avoid or deal with future threats.
I have an intellectual issue with using “probably” before an event that has never happened before, in the history of the universe (so far as I can tell).
And—if I am given the choice between slow, steady improvement in the lot of humanity (which seems to be the status quo), and a dice throw that results in either paradise, or extinction—I’ll stick with slow steady, thanks, unless the odds were overwhelmingly positive. And—I suspect they are, but in the opposite direction, because there are far more ways to screw up than to succeed, and once the AI is out—you no longer have a chance to change it much. I’d prefer to wait it out, slowly refining things, until paradise is assured.
Hmm. That actually brings a thought to mind. If an unfriendly AI was far more likely than a friendly one (as I have just been suggesting) - why aren’t we made of computronium? I can think of a few reasons, with no real way to decide. The scary one is “maybe we are, and this evolution thing is the unfriendly part...”