Shane: If somebody is going to set off a super intelligent machine I’d rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven’t even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that’s likely to be the one that matters.
If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It’s “provably” safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don’t need a theory of FAI for the theory’s sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. “Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?” Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it’s not just wishful thinking?
You can’t get FAI by hacking an AGI design at last minute, by performing “safety measures”, adding a “Friendliness module”, you shouldn’t expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if “issues of safety are considered”, you still almost certainly die. The target is too small. It’s not obvious that the target is so small, and it’s not obvious that you can’t cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you “maximize” the chance of getting FAI out of it, you still loose. Nature doesn’t care if you “maximized you chances” or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn’t matter if it doesn’t make you win. It doesn’t mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can’t expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can’t argue that you’ll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.
Provability is not about setting a standard that is too high, it is about knowing what you are doing—like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you’ll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as “provably correct”, but given the absence of mathematical formulation of this problem in the first place, at best it’s “almost certainly correct”. Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn’t stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn’t good for anything.
You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won’t just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn’t really understand what you want of it and thus can’t be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.
[P.S. I had to repost it, original version had more links but was stopped by the filter.]
This should probably go on a FAI FAQ, especially this bit:
If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still).
The “know” being in italics and the following “(maybe not a very good one, but still)” are meant to stress that “maybe it’ll work, dunno” is not an intended interpretation.
It’s an effective response to talk like “But why not work on a maybe-Friendly AI, it’s better than nothing” that I don’t usually see.
It’s a generally useful insight, that even if we can employ a mathematical proof, we only have a “Proven Friendly AI with N% confidence” for some N, and so a well-considered 1% FAI is still a FAI, since the default is ”?”. Generally useful as in, that insight applies to practically everything else.
Shane: If somebody is going to set off a super intelligent machine I’d rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven’t even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that’s likely to be the one that matters.
If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It’s “provably” safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don’t need a theory of FAI for the theory’s sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. “Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?” Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it’s not just wishful thinking?
You can’t get FAI by hacking an AGI design at last minute, by performing “safety measures”, adding a “Friendliness module”, you shouldn’t expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if “issues of safety are considered”, you still almost certainly die. The target is too small. It’s not obvious that the target is so small, and it’s not obvious that you can’t cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you “maximize” the chance of getting FAI out of it, you still loose. Nature doesn’t care if you “maximized you chances” or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn’t matter if it doesn’t make you win. It doesn’t mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can’t expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can’t argue that you’ll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.
Provability is not about setting a standard that is too high, it is about knowing what you are doing—like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you’ll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as “provably correct”, but given the absence of mathematical formulation of this problem in the first place, at best it’s “almost certainly correct”. Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn’t stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn’t good for anything.
You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won’t just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn’t really understand what you want of it and thus can’t be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.
[P.S. I had to repost it, original version had more links but was stopped by the filter.]
This should probably go on a FAI FAQ, especially this bit:
The “know” being in italics and the following “(maybe not a very good one, but still)” are meant to stress that “maybe it’ll work, dunno” is not an intended interpretation.
Edited quote.
It’s an effective response to talk like “But why not work on a maybe-Friendly AI, it’s better than nothing” that I don’t usually see.
It’s a generally useful insight, that even if we can employ a mathematical proof, we only have a “Proven Friendly AI with N% confidence” for some N, and so a well-considered 1% FAI is still a FAI, since the default is ”?”. Generally useful as in, that insight applies to practically everything else.