For me, the oddest thing about Goertzels’ article is his claim that SIAI’s arguments are so unclear that he had to construct it himself. The way he describes the argument is completely congruent with what I’ve been reading here.
In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.
Has it been demonstrated that Friendliness is provable?
If Goertzel’s claim that “SIAI’s arguments are so unclear that he had to construct it himself” can’t be disproven by the simple expedient of posting a single link to an immediately available well-structured top-down argument then the SIAI should regard this as an obvious high-priority, high-value task. If it can be proven by such a link, then that link needs to be more highly advertised since it seems that none of us are aware of it.
But of course the argument is a little large to entirely set out in one paper; the next nearest thing is What I Think, If Not Why and the title shows in what way that’s not what Goertzel was looking for.
Artificial Intelligence as a Positive and Negative Factor in Global Risk
44 pages. I don’t see anything much like the argument being asked for. The lack of an index doesn’t help. The nearest thing I could find was this:
It may be tempting to ignore Artificial Intelligence because, of all the global risks discussed in this book, AI is hardest to discuss. We cannot consult actuarial statistics to assign small annual probabilities of catastrophe, as with asteroid strikes. We cannot use calculations from a precise, precisely confirmed model to rule out events or place infinitesimal upper bounds on their probability, as with proposed physics disasters. But this makes AI catastrophes more worrisome, not less.
He also claims that intelligence could increase rapidly with a “dominant” probabilty.
I cannot perform a precise calculation using a precisely confirmed theory, but my current opinion is that sharp jumps in intelligence are possible, likely, and constitute the dominant probability.
Is this an official position in the first place? It seems to me that they want to give the impression that—without their efforts—the END IS NIGH—without committing to any particular probability estimate—which would then become the target of critics.
Halloween update: It’s been a while now, and I think the response has been poor. I think this means there is no such document (which explains Ben’s attempted reconstruction). It isn’t clear to me that producing such a document is a “high-priority task”—since it isn’t clear that the thesis is actually correct—or that the SIAI folks actually believe it.
Most of the participants here seem to be falling back on: even if it is unlikely, it could happen, and it would be devastating, so therefore we should care a lot—which seems to be a less unreasonable and more defensible position.
It isn’t clear to me that producing such a document is a “high-priority task”—since it isn’t clear that the thesis is actually correct—or that the SIAI folks actually believe it.
Most of the participants here seem to be falling back on: even if it is unlikely, it could happen, and it would be devastating, so therefore we should care a lot—which seems to be a less unreasonable and more defensible position.
You lost me at that sharp swerve in the middle. With probabilities attached to the scary idea, it is an absolutely meaningless concept. What if its probability were 1 / 3^^^3, should we still care then? I could think of a trillion scary things that could happen. But without realistic estimates of how likely it is to happen, what does it matter?
Heh. I’ve read virtually all those links. I still have the three following problems.
Those links are about as internally self-consistent as the Bible.
There are some fundamentally incorrect assumptions that have become gospel.
Most people WON’T read all those links and will therefore be declared unfit to judge anything.
What I asked for was “an immediately available well-structured top-down argument”.
It would be particularly useful and effective if SIAI recruited someone with the opposite point of view to co-develop a counter-argument thread and let the two revolve around each other and solve some of these issues (or, at least, highlight the base important differences in opinion that prevent them from solution). I’m more than willing to spend a ridiculous amount of time on such a task and I’m sure that Ben would be more than willing to devote any time that he can tear away from his busy schedule.
There are some fundamentally incorrect assumptions that have become gospel.
So go ahead and point them out. My guess is that in the ensuing debate it will be found that 1⁄4 of them are indeed fundamentally incorrect assumptions, 1⁄4 of them are arguably correct, and 1⁄2 of them are not really “assumptions that have become gospel”. But until you provide your list, there is no way to know.
The default case of FOOM is an unFriendly AI, built by researchers with shallow insights. This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).
...however, it is not terribly clear what being “the default case” is actually supposed to mean.
Seems plausible to interpret “default case” as meaning “the case that will most probably occur unless steps are specifically taken to avoid it”.
For example, the default case of knocking down a beehive is that you’ll get stung; you avoid that default case by specifically anticipating it and taking countermeasures (i.e. wearing a bee-keeping suit).
So: it seems as though the “default case” of a software company shipping an application would be that it crashes, or goes into an infinite loop—since that’s what happens unless steps are specifically taken to avoid it.
The term “the default case” seems to be a way of making the point without being specific enough to attract the attention of critics
So: it seems as though the “default case” of a software company shipping an application would be that it crashes, or goes into an infinite loop—since that’s what happens unless steps are specifically taken to avoid it.
Not quite. The “default case” of a software company shipping an application is that there will definitely be bugs in the parts of the software they have not specifically and sufficiently tested… where “bugs” can mean anything from crashes or loops, to data corruption.
The analogy here—and it’s so direct and obvious a relationship that it’s a stretch to even call it an analogy! -- is that if you haven’t specifically tested your self-improving AGI for it, there are likely to be bugs in the “not killing us all” parts.
I repeat: we already know that untested scenarios nearly always have bugs, because human beings are bad at predicting what complex programs will do, outside of the specific scenarios they’ve envisioned.
And we are spectacularly bad at this, even for crap like accounting software. It is hubris verging on sheer insanity to assume that humans will be able to (by default) write a self-improving AGI that has to be bug-free from the moment it is first run.
The idea that a self-improving AGI has to be bug-free from the moment it is first run seems like part of the “syndrome” to me. Can the machine fix its own bugs? What about a “controlled ascent”? etc.
How do you plan to fix the bugs in its bug-fixing ability, before the bug-fixing ability is applied to fixing bugs in the “don’t kill everyone” routine? ;-)
More to the point, how do you know that you and the machine have the same definition of “bug”? That seems to me like the fundamental danger of self-improving AGI: if you don’t agree with it on what counts as a “bug”, then you’re screwed.
(Relevant SF example: a short story in which the AI ship—also the story’s narrator—explains how she corrected her creator’s all-too-human error: he said their goal was to reach the stars, and yet for some reason, he set their course to land on a planet. Silly human!)
What about a “controlled ascent”?
How would that be the default case, if you’re explicitly taking precautions?
It seems as though you don’t have any references for the supposed “hubris verging on sheer insanity”. Maybe people didn’t think that in the first place.
Computers regularly detect and fix bugs today—e.g. check out Eclipse.
I never claimed “controlled ascent” as being “the default case”. In fact I am here criticising “the default case” as weasel wording.
I think your analogy is apt. It’s a similar argument for FAI; just as a software company should not ship a product without first running it through some basic tests to make sure it doesn’t crash, so an AI developer should not turn on their (edit: potentially-FOOMing) AI unless they’re first sure it is Friendly.
If the “default case” is that your next operating system upgrade will crash your computer or loop forever, then maybe you have something to worry about—and you should probably do an extensive backup, with this special backup software I am selling.
If the “default case” is that your next operating system upgrade will crash your computer or loop forever...
It would certainly be the default case for untested operating system upgrades. Whenever I write a program, even a small program, it usually doesn’t work the first time I run it; there’s some mistake I made and have to go back and fix. I would never ship software that I hadn’t at least ran on my own to make sure it does what it’s supposed to.
The problem with that when it comes to AI research, according to singulitarians, is that there’s no safe way to do a test run of potentially-FOOMing software; mistakes that could lead to unFriendliness have to be found in some way that doesn’t involve running the code, even in a test environment.
This idea is proposed by people with little idea of the value of testing[...]
The usefulness of testing is beside the point. The argument is that testing would be dangerous.
Also, you are now talking about performing “test runs”. Is that doing testing, now?
By “testing” I meant “running the code to see if it works”, which includes unit testing individual components, integration or functional testing on the program as a whole, or the simple measure of running the program and seeing if it does what it’s supposed to. By “doing test runs” I meant doing either of the latter two.
I would never ship, or trust in production, a program that had only been subjected to unit tests. This poses a problem for AI researchers, because while unit testing a potentially-FOOMing AI might well be safe (and would certainly be helpful in development), testing the whole thing at once would not be.
In fact, who has supposedly proposed this idea? What did they actually say?
I think EY’s the original person behind a lot of this, but now the main visible proponents seem to be SIAI. Here’s a link to the big ol’ document they wrote about FAI.
On the specific issue of having to formally prove friendliness before launching an AI, I can’t find anything specific in there at the moment. Perhaps that notion came from elsewhere? I’m not sure; but, it seems straightforward to me from the premises of the argument (AGI might FOOM, we want to make sure it FOOMs into something Friendly, we cannot risk running the AGI unless we know it will) that you’d have to have some way of showing that an AGI codebase is Friendly without running it, and the only other way I can think of would be to apply a rigorous proof.
Life is dangerous: the issue is surely whether testing is more dangerous than not testing.
It seems to me that a likely outcome of pursuing a strategy involving searching for a proof is that—while you are searching for it—some other team makes a machine intelligence that works—and suddenly whether your machine is “friendly”—or not—becomes totally irrelevant.
I think bashing testing makes no sense. People are interested in proving what they can about machines—in the hope of making them more reliable—but that is not the same as not doing testing.
The idea that we can make an intelligent machine—but are incapable of constructing a test harness capable of restraining it—seems like a fallacy to me.
Poke into these beliefs, and people will soon refer you to the AI-box experiment—which purports to explain that restrained intelligent machines can trick human gate keepers.
...but so what? You don’t imprison a super-intelligent agent—and then give the key to a single human and let them chat with the machine!
The “default case” occurs when not specifically avoided.
The company making the OS upgrade is going to do their best to avoid the computers it’s installed on crashing. In fact, they’ll probably hire quality control experts to make certain of it.
The whole point of the ‘Scary idea’ is that there should be an effective quality control for GAI, otherwise the risks are too big.
At the moment humanity has no idea on how to make an effective quality control—which would be some way to check if an arbitrary AI-in-a-box is Friendly.
Ergo, if a GAI is launched before Friendly AI problem has some solutions, it means that GAI was launched without a quality control performed. Scary. At least to me.
So: it seems as though the “default case” of a software company shipping an application would be that it crashes, or goes into an infinite loop—since that’s what happens unless steps are specifically taken to avoid it.
The default case for a lot of shipped application isn’t to do what it was designed to do, i.e. satisfy the target customer’s needs. Even when you ignore the bugs, often the target customer doesn’t understand how it works, or it’s missing a few key features, or it’s interface is clunky, or no-one actually needs it, or it’s made confusing with too many features nobody cares about, etc. - a lot of applications (and websites) suck, or at least, the first released version does.
We don’t always see that extent because the set of software we use is heavily biased towards the “actually usable” subset, for obvious reasons.
For example, see the debatetools that have been discussed here and are never used by anybody for real debate.
In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.
That it’s impossible to find a course of action that is knowably good, is not an argument for the goodness of pursuing a course of action that isn’t known to be good.
Certainly, but it is an argument for the goodness of pursuing a course of action that is known to have a chance of being good.
There are roughly two types of options:
1) A plan that, if successful, will yield something good with 100% certainty, but has essentially 0% chance of succeeding to begin with.
2) A plan that, if successful, may or may not be good, with a non-zero chance of success.
Clearly type 2 is a much, much larger class, and includes plans not worth pursuing. But it may include plans worth pursuing as well. If Friendly AI is as hard as everyone makes it out to be, I’m baffled that type 2 plans aren’t given more exposure. Indeed, it should be the default, with reliance on a type 1 plan a fall back given more weight only with extraordinary evidence that all type 2 plans are as assuredly dangerous as FAI is impossible.
(1) In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.
That it’s impossible to find a course of action that is knowably good, is not an argument for the goodness of pursuing a course of action that isn’t known to be good.
Certainly, but it is an argument for (2) the goodness of pursuing a course of action that is known to have a chance of being good.
You point out a correct statement (2) for which the incorrect argument (1) apparently argues. This doesn’t argue for correctness of the argument (1).
(A course of action that is known to have a chance of being good is already known to be good, in proportion to that chance (unless it’s also known to have a sufficient chance of being sufficiently bad). For AI to be Friendly doesn’t require absolute certainty in its goodness, but beware the fallacy of gray.)
For me, the oddest thing about Goertzels’ article is his claim that SIAI’s arguments are so unclear that he had to construct it himself. The way he describes the argument is completely congruent with what I’ve been reading here.
In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.
Has it been demonstrated that Friendliness is provable?
If Goertzel’s claim that “SIAI’s arguments are so unclear that he had to construct it himself” can’t be disproven by the simple expedient of posting a single link to an immediately available well-structured top-down argument then the SIAI should regard this as an obvious high-priority, high-value task. If it can be proven by such a link, then that link needs to be more highly advertised since it seems that none of us are aware of it.
The nearest thing to such a link is Artificial Intelligence as a Positive and Negative Factor in Global Risk [PDF].
But of course the argument is a little large to entirely set out in one paper; the next nearest thing is What I Think, If Not Why and the title shows in what way that’s not what Goertzel was looking for.
44 pages. I don’t see anything much like the argument being asked for. The lack of an index doesn’t help. The nearest thing I could find was this:
He also claims that intelligence could increase rapidly with a “dominant” probabilty.
This all seems pretty vague to me.
Is this an official position in the first place? It seems to me that they want to give the impression that—without their efforts—the END IS NIGH—without committing to any particular probability estimate—which would then become the target of critics.
Halloween update: It’s been a while now, and I think the response has been poor. I think this means there is no such document (which explains Ben’s attempted reconstruction). It isn’t clear to me that producing such a document is a “high-priority task”—since it isn’t clear that the thesis is actually correct—or that the SIAI folks actually believe it.
Most of the participants here seem to be falling back on: even if it is unlikely, it could happen, and it would be devastating, so therefore we should care a lot—which seems to be a less unreasonable and more defensible position.
You lost me at that sharp swerve in the middle. With probabilities attached to the scary idea, it is an absolutely meaningless concept. What if its probability were 1 / 3^^^3, should we still care then? I could think of a trillion scary things that could happen. But without realistic estimates of how likely it is to happen, what does it matter?
Here are some links.
Heh. I’ve read virtually all those links. I still have the three following problems.
Those links are about as internally self-consistent as the Bible.
There are some fundamentally incorrect assumptions that have become gospel.
Most people WON’T read all those links and will therefore be declared unfit to judge anything.
What I asked for was “an immediately available well-structured top-down argument”.
It would be particularly useful and effective if SIAI recruited someone with the opposite point of view to co-develop a counter-argument thread and let the two revolve around each other and solve some of these issues (or, at least, highlight the base important differences in opinion that prevent them from solution). I’m more than willing to spend a ridiculous amount of time on such a task and I’m sure that Ben would be more than willing to devote any time that he can tear away from his busy schedule.
So go ahead and point them out. My guess is that in the ensuing debate it will be found that 1⁄4 of them are indeed fundamentally incorrect assumptions, 1⁄4 of them are arguably correct, and 1⁄2 of them are not really “assumptions that have become gospel”. But until you provide your list, there is no way to know.
Multiple links are not an answer—to be what Goertzel was looking for it has to be a single link that sets out this position.
Yudkowsky calls it “The default case”—e.g. here:
...however, it is not terribly clear what being “the default case” is actually supposed to mean.
Seems plausible to interpret “default case” as meaning “the case that will most probably occur unless steps are specifically taken to avoid it”.
For example, the default case of knocking down a beehive is that you’ll get stung; you avoid that default case by specifically anticipating it and taking countermeasures (i.e. wearing a bee-keeping suit).
So: it seems as though the “default case” of a software company shipping an application would be that it crashes, or goes into an infinite loop—since that’s what happens unless steps are specifically taken to avoid it.
The term “the default case” seems to be a way of making the point without being specific enough to attract the attention of critics
Not quite. The “default case” of a software company shipping an application is that there will definitely be bugs in the parts of the software they have not specifically and sufficiently tested… where “bugs” can mean anything from crashes or loops, to data corruption.
The analogy here—and it’s so direct and obvious a relationship that it’s a stretch to even call it an analogy! -- is that if you haven’t specifically tested your self-improving AGI for it, there are likely to be bugs in the “not killing us all” parts.
I repeat: we already know that untested scenarios nearly always have bugs, because human beings are bad at predicting what complex programs will do, outside of the specific scenarios they’ve envisioned.
And we are spectacularly bad at this, even for crap like accounting software. It is hubris verging on sheer insanity to assume that humans will be able to (by default) write a self-improving AGI that has to be bug-free from the moment it is first run.
The idea that a self-improving AGI has to be bug-free from the moment it is first run seems like part of the “syndrome” to me. Can the machine fix its own bugs? What about a “controlled ascent”? etc.
How do you plan to fix the bugs in its bug-fixing ability, before the bug-fixing ability is applied to fixing bugs in the “don’t kill everyone” routine? ;-)
More to the point, how do you know that you and the machine have the same definition of “bug”? That seems to me like the fundamental danger of self-improving AGI: if you don’t agree with it on what counts as a “bug”, then you’re screwed.
(Relevant SF example: a short story in which the AI ship—also the story’s narrator—explains how she corrected her creator’s all-too-human error: he said their goal was to reach the stars, and yet for some reason, he set their course to land on a planet. Silly human!)
How would that be the default case, if you’re explicitly taking precautions?
Controlled ascent isn’t the default case, but it certainly should be what provably friendly AI is weighed against.
It seems as though you don’t have any references for the supposed “hubris verging on sheer insanity”. Maybe people didn’t think that in the first place.
Computers regularly detect and fix bugs today—e.g. check out Eclipse.
I never claimed “controlled ascent” as being “the default case”. In fact I am here criticising “the default case” as weasel wording.
If it has a bug in its utility function, it won’t want to fix it.
If it has a bug in its bug-detection-and-fixing techniques, you can guess what happens.
So, no, you can’t rely on the AGI to fix itself, unless you’re certain that the bugs are localised in regions that will be fixed.
So: bug-free is not needed—and a controlled ascent is possible.
The unreferenced “hubris verging on sheer insanity” asumption seems like a straw man—nobody assumed that in the first place.
I think your analogy is apt. It’s a similar argument for FAI; just as a software company should not ship a product without first running it through some basic tests to make sure it doesn’t crash, so an AI developer should not turn on their (edit: potentially-FOOMing) AI unless they’re first sure it is Friendly.
Well, I hope you see what I mean.
If the “default case” is that your next operating system upgrade will crash your computer or loop forever, then maybe you have something to worry about—and you should probably do an extensive backup, with this special backup software I am selling.
It would certainly be the default case for untested operating system upgrades. Whenever I write a program, even a small program, it usually doesn’t work the first time I run it; there’s some mistake I made and have to go back and fix. I would never ship software that I hadn’t at least ran on my own to make sure it does what it’s supposed to.
The problem with that when it comes to AI research, according to singulitarians, is that there’s no safe way to do a test run of potentially-FOOMing software; mistakes that could lead to unFriendliness have to be found in some way that doesn’t involve running the code, even in a test environment.
That just sounds crazy to me :-( Are these people actual programmers? How did they miss out on having the importance of unit tests drilled into them?
The problem is that running the AI might cause it to FOOM, and that could happen even in a test environment.
How do you get from that observation to the idea that running a complete untested program in the wild is going to be safer than not testing it at all?
No, the proposed solution is to first formally validate the program against some FAI theory before doing any test runs.
This idea is proposed by people with little idea of the value of testing—and little knowledge of the limitations of provable correctness—I presume.
In fact, who has supposedly proposed this idea? What did they actually say?
Also, you are now talking about performing “test runs”. Is that doing testing, now?
The usefulness of testing is beside the point. The argument is that testing would be dangerous.
By “testing” I meant “running the code to see if it works”, which includes unit testing individual components, integration or functional testing on the program as a whole, or the simple measure of running the program and seeing if it does what it’s supposed to. By “doing test runs” I meant doing either of the latter two.
I would never ship, or trust in production, a program that had only been subjected to unit tests. This poses a problem for AI researchers, because while unit testing a potentially-FOOMing AI might well be safe (and would certainly be helpful in development), testing the whole thing at once would not be.
I think EY’s the original person behind a lot of this, but now the main visible proponents seem to be SIAI. Here’s a link to the big ol’ document they wrote about FAI.
On the specific issue of having to formally prove friendliness before launching an AI, I can’t find anything specific in there at the moment. Perhaps that notion came from elsewhere? I’m not sure; but, it seems straightforward to me from the premises of the argument (AGI might FOOM, we want to make sure it FOOMs into something Friendly, we cannot risk running the AGI unless we know it will) that you’d have to have some way of showing that an AGI codebase is Friendly without running it, and the only other way I can think of would be to apply a rigorous proof.
Life is dangerous: the issue is surely whether testing is more dangerous than not testing.
It seems to me that a likely outcome of pursuing a strategy involving searching for a proof is that—while you are searching for it—some other team makes a machine intelligence that works—and suddenly whether your machine is “friendly”—or not—becomes totally irrelevant.
I think bashing testing makes no sense. People are interested in proving what they can about machines—in the hope of making them more reliable—but that is not the same as not doing testing.
The idea that we can make an intelligent machine—but are incapable of constructing a test harness capable of restraining it—seems like a fallacy to me.
Poke into these beliefs, and people will soon refer you to the AI-box experiment—which purports to explain that restrained intelligent machines can trick human gate keepers.
...but so what? You don’t imprison a super-intelligent agent—and then give the key to a single human and let them chat with the machine!
The “default case” occurs when not specifically avoided.
The company making the OS upgrade is going to do their best to avoid the computers it’s installed on crashing. In fact, they’ll probably hire quality control experts to make certain of it.
Why should AGI not have quality control?
It definitely should have quality control.
The whole point of the ‘Scary idea’ is that there should be an effective quality control for GAI, otherwise the risks are too big.
At the moment humanity has no idea on how to make an effective quality control—which would be some way to check if an arbitrary AI-in-a-box is Friendly.
Ergo, if a GAI is launched before Friendly AI problem has some solutions, it means that GAI was launched without a quality control performed. Scary. At least to me.
The default case for a lot of shipped application isn’t to do what it was designed to do, i.e. satisfy the target customer’s needs. Even when you ignore the bugs, often the target customer doesn’t understand how it works, or it’s missing a few key features, or it’s interface is clunky, or no-one actually needs it, or it’s made confusing with too many features nobody cares about, etc. - a lot of applications (and websites) suck, or at least, the first released version does.
We don’t always see that extent because the set of software we use is heavily biased towards the “actually usable” subset, for obvious reasons.
For example, see the debate tools that have been discussed here and are never used by anybody for real debate.
That it’s impossible to find a course of action that is knowably good, is not an argument for the goodness of pursuing a course of action that isn’t known to be good.
Certainly, but it is an argument for the goodness of pursuing a course of action that is known to have a chance of being good.
There are roughly two types of options:
1) A plan that, if successful, will yield something good with 100% certainty, but has essentially 0% chance of succeeding to begin with.
2) A plan that, if successful, may or may not be good, with a non-zero chance of success.
Clearly type 2 is a much, much larger class, and includes plans not worth pursuing. But it may include plans worth pursuing as well. If Friendly AI is as hard as everyone makes it out to be, I’m baffled that type 2 plans aren’t given more exposure. Indeed, it should be the default, with reliance on a type 1 plan a fall back given more weight only with extraordinary evidence that all type 2 plans are as assuredly dangerous as FAI is impossible.
The argument isn’t that we should throw away good plans because there’s some small chance of it being bad even if successful.
The argument is that the target is small enough that anything but a proof still leaves you with a ~0% chance of getting a good outcome.
You point out a correct statement (2) for which the incorrect argument (1) apparently argues. This doesn’t argue for correctness of the argument (1).
(A course of action that is known to have a chance of being good is already known to be good, in proportion to that chance (unless it’s also known to have a sufficient chance of being sufficiently bad). For AI to be Friendly doesn’t require absolute certainty in its goodness, but beware the fallacy of gray.)