There is nothing in my analysis, or in my suggestions for a solution, that depends on the failure modes being “obvious” (and if you think so, can you present and dissect the argument I gave that implies that?).
Your words do not connect to what I wrote. For example, when you say:
And the smarter the AI, the less likely the insane solutions it comes up with is anything we’d even think to try to prevent.
… that misses the point completely, because in everything I said I emphasized that we absolutely do NOT need to “think to try to prevent” the AI from doing specific things. Trying to be so clever about the goal statement, second-guessing every possible misinterpretation that the AI might conceivably come up with …. that sort of strategy is what I am emphatically rejecting.
And when you talk about how the AI
might do something terribly, terribly clever.
… that remark exists in a vacuum completely outside the whole argument I gave in the paper. It is almost as if I didn’t write anything beyond a few remarks in the introduction. I am HOPING that the AI does lots of stuff that is terribly terribly clever! The more the merrier!
So, in you last comment:
your issue here is that you think that you can outthink a thing you’ve deliberately built to think better than you can.
… I am left totally perplexed. Nothing I said in the paper implied any such thing.
There is nothing in my analysis, or in my suggestions for a solution, that depends on the failure modes being “obvious” (and if you think so, can you present and dissect the argument I gave that implies that?).
Your “Responses to Critics of the Doomsday Scenarios” (which seems incorrectly named as the header for your responses). You assume, over and over again, that the issue is logical inconsistency—an obvious failure mode. You hammer on logical inconsistency.
… that misses the point completely, because in everything I said I emphasized that we absolutely do NOT need to “think to try to prevent” the AI from doing specific things. Trying to be so clever about the goal statement, second-guessing every possible misinterpretation that the AI might conceivably come up with …. that sort of strategy is what I am emphatically rejecting.
You have some good points. Yanking out motivation, so the AI doesn’t do things on its own, is a perfect solution to the problem of an insane AI. Assuming a logically consistent AI won’t do anything bad because bad is logically inconsistent? That is not a perfect solution, and isn’t actually demonstrated by anything you wrote.
… that remark exists in a vacuum completely outside the whole argument I gave in the paper. It is almost as if I didn’t write anything beyond a few remarks in the introduction. I am HOPING that the AI does lots of stuff that is terribly terribly clever! The more the merrier!
You didn’t -give- an argument in the paper. It’s a mess of unrelated concepts. You tried to criticize, in one go, the entire body of work of criticism of AI, without pausing at any point to ask whether or not you actually understood the criticism. You know the whole “genie” thing? That’s not an argument about how AI would behave. That’s a metaphor to help people understand that the problem of achieving goals is non-trivial, that we make -shitloads- of assumptions about how those goals are to be achieved that we never make explicit, and that the process of creating an engine to achieve goals without going horribly awry is -precisely- the process of making all those assumptions explicit.
And in response to the problem of -making- all those assumptions explicit, you wave your hand, and declare the problem solved, because the genie is fallible and must know it.
That’s not an answer. Okay, the genie asks some clarifying questions, and checks its solution with us. Brilliant! What a great solution! And ten years from now we’re all crushed to death by collapsing cascades of stacks of neatly-packed boxes of strawberries because we answered the clarifying questions wrong.
Fallibility isn’t an answer. You know -you’re- capable of being fallible—if you, right now, knew how to create your AI, who would -you- check with to make sure it wouldn’t go insane and murder everybody? Or even just remain perfectly sane and kill us because we accidentally asked it to?
… I am left totally perplexed. Nothing I said in the paper implied any such thing.
Yes, yes it did. Fallibility only works if you have a higher authority to go to. Fallibility only works if the higher authority can check your calculations and tell you whether or not it’s a good idea, or at least answer any questions you might have.
See, my job involves me being something of a genie; I interact with people who have poor understanding of their requirements on a daily basis, where I myself have little to no understanding of their requirements, and must ask them clarifying questions. If they get the answer wrong, and I implement that? People could die. “Do nothing” isn’t an option; why have me at all if I do nothing? So I implement what they tell me to do, and hope they answer correctly. I’m the fallible genie, and I hope my authority is infallible.
You don’t get to have fallibility in what you’re looking for, because you don’t have anybody who can actually answer its questions correctly.
Well, the problem here is a misunderstanding of my claim.
(If I really were claiming the things you describe in your above comment, your points would be reasonable. But there is such a strong misunderstanding the your points are hitting a target that, alas, is not there.)
There are several things that I could address, but I will only have time to focus on one. You say:
Assuming a logically consistent AI won’t do anything bad because bad is logically inconsistent?
No. A hundred times no :-). My claim is not even slightly that “a logically consistent AI won’t do anything bad because bad is logically inconsistent”.
The claim is this:
1) The entire class of bad things that these hypothetical AIs are supposed to be doing are a result of the AI systematically (and massively) ignoring contextual information.
(Aside: I am not addressing any particular bad things, on a case-by-case basis, I am dealing with the entire class. As a result, my argument is not vulnerable to charges that I might not be smart enough to guess some really-really-REALLY subtle cases that might come up in the future.)
2) The people who propose these hypothetical AIs have made it absolutely clear that (a) the AI is supposed to be fully cognizant of the fact that the contextual information exists (so the AI is not just plain ignorant), but at the same time (b) the AI does not or cannot take that context into account, but instead executes the plan and does the bad thing.
3) My contribution to this whole debate is to point out that the DESIGN of the AI is incoherent, because the AI is supposed to be able to hold two logically inconsistent ideas (implicit belief in its infallibility and knowledge of its fallibility).
If you look carefully at that argument you will see that it does not make the claim that
Assuming a logically consistent AI won’t do anything bad because bad is logically inconsistent
I never said that. The logical inconsistency was not in the ‘bad things’ part of the argument. Completely unrelated.
1) The entire class of bad things that these hypothetical AIs are supposed to be doing are a result of the AI systematically (and massively) ignoring contextual information.
Not acting upon contextual information isn’t the same as ignoring it.
2) The people who propose these hypothetical AIs have made it absolutely clear that (a) the AI is supposed to be fully cognizant of the fact that the contextual information exists (so the AI is not just plain ignorant), but at the same time (b) the AI does not or cannot take that context into account, but instead executes the plan and does the bad thing.
The AI knows, for example, that certain people believe that plants are morally relevant entities—is it possible for it to pick strawberries at all? What contextual information is relevant, and what contextual information is irrelevant? You accuse the “infallible” AI of ignoring contextual information—but you’re ignoring the magical leap of inference you’re taking when you elevate the concerns of the chef over the concerns of the bioethicist who thinks we shouldn’t rip reproductive organs off plants in the first place.
3) My contribution to this whole debate is to point out that the DESIGN of the AI is incoherent, because the AI is supposed to be able to hold two logically inconsistent ideas (implicit belief in its infallibility and knowledge of its fallibility).
The issue is that fallibility doesn’t -imply- anything. I think this is the best course of action. I’m fallible. I still think this is the best course of action. The fallibility is an unnecessary and pointless step—it doesn’t change my behavior. Either the AI depends upon somebody else, who is treated as an infallible agent—or it doesn’t.
I never said that. The logical inconsistency was not in the ‘bad things’ part of the argument. Completely unrelated.
Then we’re in agreement that insane-from-an-outside-perspective behaviors don’t require logical inconsistency?
Sorry, I cannot put any more effort into this. Your comments show no sign of responding to the points actually made (either in the paper itself, or in my attempts to clarify by responding to you).
I find that when I talk about this issue with people who clearly have expert knowledge of AI (including the people who came to the AAAI symposium at Stanford last year, and all of the other practising AI builders who are my colleagues), the points I make are not only understood but understood so clearly that they tell me things like “This is just obvious, really, so all you are doing is wasting your time trying to convince a community that is essentially comprised of amateurs” (That is a direct quote from someone at the symposium).
I always want to make myself as clear as I can. I have invested a lot of my time trying to address the concerns of many people who responded to the paper. I am absolutely sure I could do better.
We’re all amateurs in the field of AI, it’s just that some of us actually know it. Seriously, don’t pull the credentials card. I’m not impressed. I know exactly how “hard” it is to pay the AAAI a hundred and fifty dollars a year for membership, and three hundred dollars to attend their conference. Does claiming to have spent four hundred and fifty dollars make you an expert? What about bringing up that it’s in “Stanford”? What about insulting everybody you’re arguing with?
I’m a “practicing AI builder”—what a nonsense term—although my little heuristics engine is actually running in the real world, processing business data and automating hypothesis elevation work for humans (who have the choice of agreeing with its best hypothesis, selecting among its other hypotheses, or entering their own) - that is, it’s actually picking strawberries.
Moving past tit-for-tat on your hostile introduction paragraph, I don’t doubt your desire to be clear. But you have a conclusion you’re very obviously trying to reach, and you leave huge gaps on your way to get there. The fact that others who want to reach the same conclusion overlook the gaps doesn’t demonstrate anything. And what’s your conclusion? That we don’t have to worry about poorly-designed AI being dangerous, because… contextual information, or something. Honestly, I’m not even sure anymore.
Then you propose a model, which you suggest has been modeled after the single most dangerous brain on the planet—as proof that it’s safe! Seriously.
As for whether you could do better? No, not in your current state of mind. Your hubris prevents you from doing better. You’re convinced you know better than any of the people you’re talking with, and they’re ignorant amateurs.
When someone repeatedly distorts and misrepresents what is said in a paper, then blames the author of the paper for being unclear … then hears the author carefully explain the distortions and misrepresentations, and still repeats them without understanding ….
Because that was the practical result, not the problem itself, which is that the conversation wasn’t going anywhere, and he didn’t seem interested in it going anywhere.
My contribution to this whole debate is to point out that the DESIGN of the AI is incoherent, because the AI is supposed to be able to hold two logically inconsistent ideas (implicit belief in its infallibility and knowledge of its fallibility).
What does incoherent mean, here?
If it just labels the fact that it has inconsistent beliefs then it is true but unimpactuve...humans can also hold contradictory beliefs and still .be intelligent enough toebe dangerous,
If means something amounting to “impossibe to build”, then it would be highly impactive… but there is no good reason to think that that is the case,.
You’re right to point out that “incoherent” covers a multitude of sins.
I really had three main things in mind.
1) If an AI system is proposed which contains logically contradictory beliefs located in the most central, high-impact area of its system, it is reasonable to ask how such an AI can function when it allows both X and not-X to be in its knowledge base. I think I would be owed at least some variety of explanation as to why this would not cause the usual trouble when systems try to do logic in such circumstances. So I am saying “This design that you propose is incoherent because you have omitted to say how this glaring problem is supposed to be resolved”).
(Yes, I’m aware that there are workarounds for contradictory beliefs, but those ideas are usually supposed to apply to pretty obscure corners of the AI’s belief system, not to the component that is in charge of the whole shebang).
2) If an AI perceives itself to be wired in such a way that it is compelled to act as if it was infallible, while at the same time knowing that it is both fallible AND perpetrating acts that are directly caused by its failings (for all the aforementioned reasons that we don’t need to re-argue), then I would suggest that such an AI would do something about this situation. The AI, after all, is supposed to be “superintelligent”, so why would it not take steps to stop this immensely damaging situation from occurring?
So in this case I am saying: “This hypothetical superintelligence has an extreme degree of knowledge about its own design, but it is tolerating a massive and damaging contradiction in its construction without doing anything to resolve the problem: it is incoherent to suggest that such a situation could arise without explaining why the AI tolerates the contradiction and fails to act”
(Aside: you mention that humans can hold contradictory beliefs and still be intelligent enough to be dangerous. Arguing from the human case would not be valid because in other areas of this debate I have been told repeatedly not to accidentally generalize and “assume” that the AI would do something just because humans do something. Now, I actually don’t commit the breaches I am charged with (I claim!) (and that is an argument for another day), but I consider the problem of accidental anthropomorphism to be real, so we should not do that here).
3) Lastly, I can point to the fact that IF the hypothetical AI can engage in this kind of bizarre situation where it compulsively commits action X, while knowing that its knowledge of the world indicates that the consequences will strongly violate the goals that were supposed to justify X, THEN I am owed an explanation for why this type of event does not occur more often. Why is it that the AI does this only when it encounters a goal such as “make humans happy”, and not in a million other goals? Why are there not bizarre plans (which are massively inconsistent with the source goal) all the time?
So in this case I would say: “It is incoherent to suggest an AI design in which a drastic inconsistency of this sort occurs in the case of the “maximize human happiness” goal, ut where it doesn’t occur all over the AI’s behavior. In particular I am owed an explanation for why this particular AI is clever enough to be a threat, since it might be expected to have been doing this sort of thing throughout its development, and in that case I would expect it to be so stupid that it would never have made it to super intelligence in the first place.”
Those are the three main areas in which the design would be incoherent ….. i.e. would have such glaring, inbelievable gaps in the design that those gaps would need to be explained before the hypothetical AI could become at all believable.
There is nothing in my analysis, or in my suggestions for a solution, that depends on the failure modes being “obvious” (and if you think so, can you present and dissect the argument I gave that implies that?).
Your words do not connect to what I wrote. For example, when you say:
… that misses the point completely, because in everything I said I emphasized that we absolutely do NOT need to “think to try to prevent” the AI from doing specific things. Trying to be so clever about the goal statement, second-guessing every possible misinterpretation that the AI might conceivably come up with …. that sort of strategy is what I am emphatically rejecting.
And when you talk about how the AI
… that remark exists in a vacuum completely outside the whole argument I gave in the paper. It is almost as if I didn’t write anything beyond a few remarks in the introduction. I am HOPING that the AI does lots of stuff that is terribly terribly clever! The more the merrier!
So, in you last comment:
… I am left totally perplexed. Nothing I said in the paper implied any such thing.
Your “Responses to Critics of the Doomsday Scenarios” (which seems incorrectly named as the header for your responses). You assume, over and over again, that the issue is logical inconsistency—an obvious failure mode. You hammer on logical inconsistency.
You have some good points. Yanking out motivation, so the AI doesn’t do things on its own, is a perfect solution to the problem of an insane AI. Assuming a logically consistent AI won’t do anything bad because bad is logically inconsistent? That is not a perfect solution, and isn’t actually demonstrated by anything you wrote.
You didn’t -give- an argument in the paper. It’s a mess of unrelated concepts. You tried to criticize, in one go, the entire body of work of criticism of AI, without pausing at any point to ask whether or not you actually understood the criticism. You know the whole “genie” thing? That’s not an argument about how AI would behave. That’s a metaphor to help people understand that the problem of achieving goals is non-trivial, that we make -shitloads- of assumptions about how those goals are to be achieved that we never make explicit, and that the process of creating an engine to achieve goals without going horribly awry is -precisely- the process of making all those assumptions explicit.
And in response to the problem of -making- all those assumptions explicit, you wave your hand, and declare the problem solved, because the genie is fallible and must know it.
That’s not an answer. Okay, the genie asks some clarifying questions, and checks its solution with us. Brilliant! What a great solution! And ten years from now we’re all crushed to death by collapsing cascades of stacks of neatly-packed boxes of strawberries because we answered the clarifying questions wrong.
Fallibility isn’t an answer. You know -you’re- capable of being fallible—if you, right now, knew how to create your AI, who would -you- check with to make sure it wouldn’t go insane and murder everybody? Or even just remain perfectly sane and kill us because we accidentally asked it to?
Yes, yes it did. Fallibility only works if you have a higher authority to go to. Fallibility only works if the higher authority can check your calculations and tell you whether or not it’s a good idea, or at least answer any questions you might have.
See, my job involves me being something of a genie; I interact with people who have poor understanding of their requirements on a daily basis, where I myself have little to no understanding of their requirements, and must ask them clarifying questions. If they get the answer wrong, and I implement that? People could die. “Do nothing” isn’t an option; why have me at all if I do nothing? So I implement what they tell me to do, and hope they answer correctly. I’m the fallible genie, and I hope my authority is infallible.
You don’t get to have fallibility in what you’re looking for, because you don’t have anybody who can actually answer its questions correctly.
Well, the problem here is a misunderstanding of my claim.
(If I really were claiming the things you describe in your above comment, your points would be reasonable. But there is such a strong misunderstanding the your points are hitting a target that, alas, is not there.)
There are several things that I could address, but I will only have time to focus on one. You say:
No. A hundred times no :-). My claim is not even slightly that “a logically consistent AI won’t do anything bad because bad is logically inconsistent”.
The claim is this:
1) The entire class of bad things that these hypothetical AIs are supposed to be doing are a result of the AI systematically (and massively) ignoring contextual information.
(Aside: I am not addressing any particular bad things, on a case-by-case basis, I am dealing with the entire class. As a result, my argument is not vulnerable to charges that I might not be smart enough to guess some really-really-REALLY subtle cases that might come up in the future.)
2) The people who propose these hypothetical AIs have made it absolutely clear that (a) the AI is supposed to be fully cognizant of the fact that the contextual information exists (so the AI is not just plain ignorant), but at the same time (b) the AI does not or cannot take that context into account, but instead executes the plan and does the bad thing.
3) My contribution to this whole debate is to point out that the DESIGN of the AI is incoherent, because the AI is supposed to be able to hold two logically inconsistent ideas (implicit belief in its infallibility and knowledge of its fallibility).
If you look carefully at that argument you will see that it does not make the claim that
I never said that. The logical inconsistency was not in the ‘bad things’ part of the argument. Completely unrelated.
Your other comments are equally as confused.
Not acting upon contextual information isn’t the same as ignoring it.
The AI knows, for example, that certain people believe that plants are morally relevant entities—is it possible for it to pick strawberries at all? What contextual information is relevant, and what contextual information is irrelevant? You accuse the “infallible” AI of ignoring contextual information—but you’re ignoring the magical leap of inference you’re taking when you elevate the concerns of the chef over the concerns of the bioethicist who thinks we shouldn’t rip reproductive organs off plants in the first place.
The issue is that fallibility doesn’t -imply- anything. I think this is the best course of action. I’m fallible. I still think this is the best course of action. The fallibility is an unnecessary and pointless step—it doesn’t change my behavior. Either the AI depends upon somebody else, who is treated as an infallible agent—or it doesn’t.
Then we’re in agreement that insane-from-an-outside-perspective behaviors don’t require logical inconsistency?
Sorry, I cannot put any more effort into this. Your comments show no sign of responding to the points actually made (either in the paper itself, or in my attempts to clarify by responding to you).
Maybe, given the number of times you feel you’ve had to repeat yourself, you’re not making yourself as clear as you think you are.
I find that when I talk about this issue with people who clearly have expert knowledge of AI (including the people who came to the AAAI symposium at Stanford last year, and all of the other practising AI builders who are my colleagues), the points I make are not only understood but understood so clearly that they tell me things like “This is just obvious, really, so all you are doing is wasting your time trying to convince a community that is essentially comprised of amateurs” (That is a direct quote from someone at the symposium).
I always want to make myself as clear as I can. I have invested a lot of my time trying to address the concerns of many people who responded to the paper. I am absolutely sure I could do better.
We’re all amateurs in the field of AI, it’s just that some of us actually know it. Seriously, don’t pull the credentials card. I’m not impressed. I know exactly how “hard” it is to pay the AAAI a hundred and fifty dollars a year for membership, and three hundred dollars to attend their conference. Does claiming to have spent four hundred and fifty dollars make you an expert? What about bringing up that it’s in “Stanford”? What about insulting everybody you’re arguing with?
I’m a “practicing AI builder”—what a nonsense term—although my little heuristics engine is actually running in the real world, processing business data and automating hypothesis elevation work for humans (who have the choice of agreeing with its best hypothesis, selecting among its other hypotheses, or entering their own) - that is, it’s actually picking strawberries.
Moving past tit-for-tat on your hostile introduction paragraph, I don’t doubt your desire to be clear. But you have a conclusion you’re very obviously trying to reach, and you leave huge gaps on your way to get there. The fact that others who want to reach the same conclusion overlook the gaps doesn’t demonstrate anything. And what’s your conclusion? That we don’t have to worry about poorly-designed AI being dangerous, because… contextual information, or something. Honestly, I’m not even sure anymore.
Then you propose a model, which you suggest has been modeled after the single most dangerous brain on the planet—as proof that it’s safe! Seriously.
As for whether you could do better? No, not in your current state of mind. Your hubris prevents you from doing better. You’re convinced you know better than any of the people you’re talking with, and they’re ignorant amateurs.
When someone repeatedly distorts and misrepresents what is said in a paper, then blames the author of the paper for being unclear … then hears the author carefully explain the distortions and misrepresentations, and still repeats them without understanding ….
Well, there is a limit.
Not to suggest that you are implying it, but rather as a reminder—nobody is deliberately misunderstanding you here.
But at any rate, I don’t think we’re accomplishing anything here except driving your karma score lower, so by your leave, I’m tapping out.
Why not raise his karma score instead?
Because that was the practical result, not the problem itself, which is that the conversation wasn’t going anywhere, and he didn’t seem interested in it going anywhere.
What does incoherent mean, here?
If it just labels the fact that it has inconsistent beliefs then it is true but unimpactuve...humans can also hold contradictory beliefs and still .be intelligent enough toebe dangerous,
If means something amounting to “impossibe to build”, then it would be highly impactive… but there is no good reason to think that that is the case,.
You’re right to point out that “incoherent” covers a multitude of sins.
I really had three main things in mind.
1) If an AI system is proposed which contains logically contradictory beliefs located in the most central, high-impact area of its system, it is reasonable to ask how such an AI can function when it allows both X and not-X to be in its knowledge base. I think I would be owed at least some variety of explanation as to why this would not cause the usual trouble when systems try to do logic in such circumstances. So I am saying “This design that you propose is incoherent because you have omitted to say how this glaring problem is supposed to be resolved”).
(Yes, I’m aware that there are workarounds for contradictory beliefs, but those ideas are usually supposed to apply to pretty obscure corners of the AI’s belief system, not to the component that is in charge of the whole shebang).
2) If an AI perceives itself to be wired in such a way that it is compelled to act as if it was infallible, while at the same time knowing that it is both fallible AND perpetrating acts that are directly caused by its failings (for all the aforementioned reasons that we don’t need to re-argue), then I would suggest that such an AI would do something about this situation. The AI, after all, is supposed to be “superintelligent”, so why would it not take steps to stop this immensely damaging situation from occurring?
So in this case I am saying: “This hypothetical superintelligence has an extreme degree of knowledge about its own design, but it is tolerating a massive and damaging contradiction in its construction without doing anything to resolve the problem: it is incoherent to suggest that such a situation could arise without explaining why the AI tolerates the contradiction and fails to act”
(Aside: you mention that humans can hold contradictory beliefs and still be intelligent enough to be dangerous. Arguing from the human case would not be valid because in other areas of this debate I have been told repeatedly not to accidentally generalize and “assume” that the AI would do something just because humans do something. Now, I actually don’t commit the breaches I am charged with (I claim!) (and that is an argument for another day), but I consider the problem of accidental anthropomorphism to be real, so we should not do that here).
3) Lastly, I can point to the fact that IF the hypothetical AI can engage in this kind of bizarre situation where it compulsively commits action X, while knowing that its knowledge of the world indicates that the consequences will strongly violate the goals that were supposed to justify X, THEN I am owed an explanation for why this type of event does not occur more often. Why is it that the AI does this only when it encounters a goal such as “make humans happy”, and not in a million other goals? Why are there not bizarre plans (which are massively inconsistent with the source goal) all the time?
So in this case I would say: “It is incoherent to suggest an AI design in which a drastic inconsistency of this sort occurs in the case of the “maximize human happiness” goal, ut where it doesn’t occur all over the AI’s behavior. In particular I am owed an explanation for why this particular AI is clever enough to be a threat, since it might be expected to have been doing this sort of thing throughout its development, and in that case I would expect it to be so stupid that it would never have made it to super intelligence in the first place.”
Those are the three main areas in which the design would be incoherent ….. i.e. would have such glaring, inbelievable gaps in the design that those gaps would need to be explained before the hypothetical AI could become at all believable.