What I was expressing skepticism about was that a system with even approximately human-level intelligence necessarily supports a stack trace that supports the kind of analysis you envision performing in the first place, without reference to intentional countermeasures.
Ah, that does clarify it. I agree, analyzing the AI’s thought process would likely be difficult, maybe impossible! I guess I was being a bit hyperbolic in my earlier “crack it open” remarks (though depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail.
Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
[...]the difference between “the minimal set of information about humans required to have a conversation with one at all” (my phrase) and “the most basic knowledge about humans” (your phrase). What do you imagine the latter to encompass, and how do you imagine the AI obtained this knowledge?
I actually perceive your phrase to be a subset of my own; I am making the (reasonable, I think) assumption that humans will attempt to communicate with the budding AI. Say, in a lab environment. It would acquire its initial data from this interaction.
I think both these sets of knowledge depend a lot on how the AI is built. For instance, a “babbling” AI—one that is given an innate capability of stringing words together onto a screen, and the drive to do so—would initially say a lot of gibberish and would (presumably) get more coherent as it gets a better grip on its environment. In such a scenario, the minimal set of information about humans required to have a conversation is zero; it would be having conversations before it even knows what it is saying.
(This could actually make detection of deception harder down the line, because such attempts can be written off as “quirks” or AI mistakes)
Now, I’ll take your phrase and twist it just a bit: The minimal set of knowledge the AI needs in order to try deceiving humans. That would be the knowledge that humans can be modeled as having beliefs (which drive behavior) and these can be altered by the AI’s actions, at least to some degree. Now, assuming this information isn’t hard-coded, it doesn’t seem likely that is all an AI would know about us; it should be able to see some patterns at least to our communications with it. However, I don’t see how such information would be useful for deception purposes before extensive experimentation.
(Is the fact that the operator communicates with me between 9am and 5pm an intrinsic property of the operator? For all I know, that is a law of nature...)
depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
Yup, agreed that it might. And agreed that it might succeed, if it does take place.
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail. Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
Agreed on all counts.
Re: what the AI knows… I’m not sure how to move forward here. Perhaps what’s necessary is a step backwards.
If I’ve understood you correctly, you consider “having a conversation” to encompass exchanges such as: A: “What day is it?” B: “Na ni noo na”
If that’s true, then sure, I agree that the minimal set of information about humans required to do that is zero; hell, I can do that with the rain. And I agree that a system that’s capable of doing that (e.g., the rain) is sufficiently unlikely to be capable of effective deception that the hypothesis isn’t even worthy of consideration. I also suggest that we stop using the phrase “having a conversation” at all, because it does not convey anything meaningful.
Having said that… for my own part, I initially understood you to be talking about a system capable of exchanges like:
A: “What day is it?” B: “Day seventeen.” A: “Why do you say that?” B: “Because I’ve learned that ‘a day’ refers to a particular cycle of activity in the lab, and I have observed seventeen such cycles.”
A system capable of doing that, I maintain, already knows enough about humans that I expect it to be capable of deception. (The specific questions and answers don’t matter to my point, I can choose others if you prefer.)
My point was that the AI is likely to start performing social experiments well before it is capable of even that conversation you depicted. It wouldn’t know how much it doesn’t know about humans.
And I agree that humans might be able to detect attempts at deception in a system at that stage of its development. I’m not vastly confident of it, though.
I have likewise adjusted down my confidence that this would be as easy or as inevitable as I previously anticipated. Thus I would no longer say I am “vastly confident” in it, either.
Still good to have this buffer between making an AI and total global catastrophe, though!
Ah, that does clarify it. I agree, analyzing the AI’s thought process would likely be difficult, maybe impossible! I guess I was being a bit hyperbolic in my earlier “crack it open” remarks (though depending on how seriously you take it, such analysis might still take place, hard and prolonged though it may be).
One can have “detectors” in place set to find specific behaviors, but these would have assumptions that could easily fail. Detectors that would still be useful would be macro ones—where it tries to access and how—but these would provide only limited insight into the AI’s thought process.
I actually perceive your phrase to be a subset of my own; I am making the (reasonable, I think) assumption that humans will attempt to communicate with the budding AI. Say, in a lab environment. It would acquire its initial data from this interaction.
I think both these sets of knowledge depend a lot on how the AI is built. For instance, a “babbling” AI—one that is given an innate capability of stringing words together onto a screen, and the drive to do so—would initially say a lot of gibberish and would (presumably) get more coherent as it gets a better grip on its environment. In such a scenario, the minimal set of information about humans required to have a conversation is zero; it would be having conversations before it even knows what it is saying. (This could actually make detection of deception harder down the line, because such attempts can be written off as “quirks” or AI mistakes)
Now, I’ll take your phrase and twist it just a bit: The minimal set of knowledge the AI needs in order to try deceiving humans. That would be the knowledge that humans can be modeled as having beliefs (which drive behavior) and these can be altered by the AI’s actions, at least to some degree. Now, assuming this information isn’t hard-coded, it doesn’t seem likely that is all an AI would know about us; it should be able to see some patterns at least to our communications with it. However, I don’t see how such information would be useful for deception purposes before extensive experimentation.
(Is the fact that the operator communicates with me between 9am and 5pm an intrinsic property of the operator? For all I know, that is a law of nature...)
Yup, agreed that it might.
And agreed that it might succeed, if it does take place.
Agreed on all counts.
Re: what the AI knows… I’m not sure how to move forward here. Perhaps what’s necessary is a step backwards.
If I’ve understood you correctly, you consider “having a conversation” to encompass exchanges such as:
A: “What day is it?”
B: “Na ni noo na”
If that’s true, then sure, I agree that the minimal set of information about humans required to do that is zero; hell, I can do that with the rain.
And I agree that a system that’s capable of doing that (e.g., the rain) is sufficiently unlikely to be capable of effective deception that the hypothesis isn’t even worthy of consideration.
I also suggest that we stop using the phrase “having a conversation” at all, because it does not convey anything meaningful.
Having said that… for my own part, I initially understood you to be talking about a system capable of exchanges like: A: “What day is it?”
B: “Day seventeen.”
A: “Why do you say that?”
B: “Because I’ve learned that ‘a day’ refers to a particular cycle of activity in the lab, and I have observed seventeen such cycles.”
A system capable of doing that, I maintain, already knows enough about humans that I expect it to be capable of deception. (The specific questions and answers don’t matter to my point, I can choose others if you prefer.)
My point was that the AI is likely to start performing social experiments well before it is capable of even that conversation you depicted. It wouldn’t know how much it doesn’t know about humans.
(nods) Likely.
And I agree that humans might be able to detect attempts at deception in a system at that stage of its development. I’m not vastly confident of it, though.
I have likewise adjusted down my confidence that this would be as easy or as inevitable as I previously anticipated. Thus I would no longer say I am “vastly confident” in it, either.
Still good to have this buffer between making an AI and total global catastrophe, though!
Sure… a process with an N% chance of global catastrophic failure is definitely better than a process with N+delta% chance.