I think the weakest link here is human response to the AI revealing it can be deceptive. There is absolutely no guarantee that people would act correctly under these circumstances. Human negligence for a long enough time would eventually give the AI a consistent ability to manipulate humans.
I also agree that simulating relationships makes sense as it can happen in “AI time” without having to wait for human response.
The other reservations seem less of an issue to me...
That game theory knowledge coupled with the most basic knowledge about humans is insufficient to cleverly manipulate them is clear to me. I don’t see how these things can be combined in this way, or why the AI would choose to be particularly careful (it doesn’t know what it should be careful about). I’d love to see a plausible line of thought by which the AI would try and succeed in deceiving humans at the get-go point without exposing its intentions.
A stack trace reader or an equivalent monitoring system is something you implement externally without telling the AI about it. Sure, if it’s powerful enough it will mislead or disable it; but while it has no knowledge of it, and no knowledge on what sort of operations it should mask, it has no means of doing so.
(If it encrypts everything it does, somehow fearing any monitoring, that is in itself cause for concern).
To put this into an analogy: You can feed a surveillance camera a repeating loop, but not if you don’t know that the camera exists. If you suspect it exists, you still don’t know where; so you don’t know which approach to take so as to remain undetected.
Yes, I agree that there’s no guarantee that humans would behave as you describe. Indeed, I don’t find it likely. But, sure, they might.
===
I agree that a stack trace can exist outside the AI’s zone of control. What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
By way of analogy: I can perform a structural integrity analysis on a bar of metal to determine whether it can support a given weight, but performing an equivalent analysis on a complicated structure comprising millions of bars of metal connected in a variety of arrangements via a variety of connectors using the same techniques is not necessarily possible.
But, sure, it might be.
======
I’d love to see a plausible line of thought by which the AI would try and succeed in deceiving humans at the get-go point without exposing its intentions.
Well, one place to start is with an understanding of the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
Ah, that does clarify it. I agree, analyzing the AI’s thought process would likely be difficult, maybe impossible! I guess I was being a bit hyperbolic in my earlier “crack it open” remarks (though depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail.
Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
[...]the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
I actually perceive your phrase to be a subset of my own; I am making the (reasonable, I think) assumption that humans will attempt to communicate with the budding AI. Say, in a lab environment. It would acquire its initial data from this interaction.
I think both these sets of knowledge depend a lot on how the AI is built. For instance, a “babbling” AI—one that is given an innate capability of stringing words together onto a screen, and the drive to do so—would initially say a lot of gibberish and would (presumably) get more coherent as it gets a better grip on its environment. In such a scenario, the minimal set of information about humans required to have a conversation is zero; it would be having conversations before it even knows what it is saying.
(This could actually make detection of deception harder down the line, because such attempts can be written off as “quirks” or AI mistakes)
Now, I’ll take your phrase and twist it just a bit: The minimal set of knowledge the AI needs in order to try deceiving humans. That would be the knowledge that humans can be modeled as having beliefs (which drive behavior) and these can be altered by the AI’s actions, at least to some degree. Now, assuming this information isn’t hard-coded, it doesn’t seem likely that is all an AI would know about us; it should be able to see some patterns at least to our communications with it. However, I don’t see how such information would be useful for deception purposes before extensive experimentation.
(Is the fact that the operator communicates with me between 9am and 5pm an intrinsic property of the operator? For all I know, that is a law of nature...)
depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
Yup, agreed that it might. And agreed that it might succeed, if it does take place.
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail. Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
Agreed on all counts.
Re: what the AI knows… I’m not sure how to move forward here. Perhaps what’s necessary is a step backwards.
If I’ve understood you correctly, you consider “having a conversation” to encompass exchanges such as: A: “What day is it?” B: “Na ni noo na”
If that’s true, then sure, I agree that the minimal set of information about humans required to do that is zero; hell, I can do that with the rain. And I agree that a system that’s capable of doing that (e.g., the rain) is sufficiently unlikely to be capable of effective deception that the hypothesis isn’t even worthy of consideration. I also suggest that we stop using the phrase “having a conversation” at all, because it does not convey anything meaningful.
Having said that… for my own part, I initially understood you to be talking about a system capable of exchanges like:
A: “What day is it?” B: “Day seventeen.” A: “Why do you say that?” B: “Because I’ve learned that ‘a day’ refers to a particular cycle of activity in the lab, and I have observed seventeen such cycles.”
A system capable of doing that, I maintain, already knows enough about humans that I expect it to be capable of deception. (The specific questions and answers don’t matter to my point, I can choose others if you prefer.)
My point was that the AI is likely to start performing social experiments well before it is capable of even that conversation you depicted. It wouldn’t know how much it doesn’t know about humans.
And I agree that humans might be able to detect attempts at deception in a system at that stage of its development. I’m not vastly confident of it, though.
I have likewise adjusted down my confidence that this would be as easy or as inevitable as I previously anticipated. Thus I would no longer say I am “vastly confident” in it, either.
Still good to have this buffer between making an AI and total global catastrophe, though!
I think the weakest link here is human response to the AI revealing it can be deceptive. There is absolutely no guarantee that people would act correctly under these circumstances. Human negligence for a long enough time would eventually give the AI a consistent ability to manipulate humans.
I also agree that simulating relationships makes sense as it can happen in “AI time” without having to wait for human response.
The other reservations seem less of an issue to me...
That game theory knowledge coupled with the most basic knowledge about humans is insufficient to cleverly manipulate them is clear to me. I don’t see how these things can be combined in this way, or why the AI would choose to be particularly careful (it doesn’t know what it should be careful about). I’d love to see a plausible line of thought by which the AI would try and succeed in deceiving humans at the get-go point without exposing its intentions.
A stack trace reader or an equivalent monitoring system is something you implement externally without telling the AI about it. Sure, if it’s powerful enough it will mislead or disable it; but while it has no knowledge of it, and no knowledge on what sort of operations it should mask, it has no means of doing so. (If it encrypts everything it does, somehow fearing any monitoring, that is in itself cause for concern).
To put this into an analogy: You can feed a surveillance camera a repeating loop, but not if you don’t know that the camera exists. If you suspect it exists, you still don’t know where; so you don’t know which approach to take so as to remain undetected.
Yes, I agree that there’s no guarantee that humans would behave as you describe.
Indeed, I don’t find it likely.
But, sure, they might.
=== I agree that a stack trace can exist outside the AI’s zone of control. What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
By way of analogy: I can perform a structural integrity analysis on a bar of metal to determine whether it can support a given weight, but performing an equivalent analysis on a complicated structure comprising millions of bars of metal connected in a variety of arrangements via a variety of connectors using the same techniques is not necessarily possible.
But, sure, it might be.
======
Well, one place to start is with an understanding of the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
Ah, that does clarify it. I agree, analyzing the AI’s thought process would likely be difficult, maybe impossible! I guess I was being a bit hyperbolic in my earlier “crack it open” remarks (though depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail. Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
I actually perceive your phrase to be a subset of my own; I am making the (reasonable, I think) assumption that humans will attempt to communicate with the budding AI. Say, in a lab environment. It would acquire its initial data from this interaction.
I think both these sets of knowledge depend a lot on how the AI is built. For instance, a “babbling” AI—one that is given an innate capability of stringing words together onto a screen, and the drive to do so—would initially say a lot of gibberish and would (presumably) get more coherent as it gets a better grip on its environment. In such a scenario, the minimal set of information about humans required to have a conversation is zero; it would be having conversations before it even knows what it is saying. (This could actually make detection of deception harder down the line, because such attempts can be written off as “quirks” or AI mistakes)
Now, I’ll take your phrase and twist it just a bit: The minimal set of knowledge the AI needs in order to try deceiving humans. That would be the knowledge that humans can be modeled as having beliefs (which drive behavior) and these can be altered by the AI’s actions, at least to some degree. Now, assuming this information isn’t hard-coded, it doesn’t seem likely that is all an AI would know about us; it should be able to see some patterns at least to our communications with it. However, I don’t see how such information would be useful for deception purposes before extensive experimentation.
(Is the fact that the operator communicates with me between 9am and 5pm an intrinsic property of the operator? For all I know, that is a law of nature...)
Yup, agreed that it might.
And agreed that it might succeed, if it does take place.
Agreed on all counts.
Re: what the AI knows… I’m not sure how to move forward here. Perhaps what’s necessary is a step backwards.
If I’ve understood you correctly, you consider “having a conversation” to encompass exchanges such as:
A: “What day is it?”
B: “Na ni noo na”
If that’s true, then sure, I agree that the minimal set of information about humans required to do that is zero; hell, I can do that with the rain.
And I agree that a system that’s capable of doing that (e.g., the rain) is sufficiently unlikely to be capable of effective deception that the hypothesis isn’t even worthy of consideration.
I also suggest that we stop using the phrase “having a conversation” at all, because it does not convey anything meaningful.
Having said that… for my own part, I initially understood you to be talking about a system capable of exchanges like: A: “What day is it?”
B: “Day seventeen.”
A: “Why do you say that?”
B: “Because I’ve learned that ‘a day’ refers to a particular cycle of activity in the lab, and I have observed seventeen such cycles.”
A system capable of doing that, I maintain, already knows enough about humans that I expect it to be capable of deception. (The specific questions and answers don’t matter to my point, I can choose others if you prefer.)
My point was that the AI is likely to start performing social experiments well before it is capable of even that conversation you depicted. It wouldn’t know how much it doesn’t know about humans.
(nods) Likely.
And I agree that humans might be able to detect attempts at deception in a system at that stage of its development. I’m not vastly confident of it, though.
I have likewise adjusted down my confidence that this would be as easy or as inevitable as I previously anticipated. Thus I would no longer say I am “vastly confident” in it, either.
Still good to have this buffer between making an AI and total global catastrophe, though!
Sure… a process with an N% chance of global catastrophic failure is definitely better than a process with N+delta% chance.