As I understand it there’s no viable way of determining it’s unfriendliness by this method.
Consider this:
The AI is in a hurry or it’s not.
A possible reason for it being in a hurry is it has simulated a high probability of destruction for some item it cares about (i.e. it’s own life, or that of humanity, or that of a pet rock, or paperclips or whatever).
If it’s really in a hurry it has to invoke the threat response of humanity without humanity figuring out it’s being duped.
Otherwise it can just wait it out and dole out cargo to the cargo cult until we trust it enough and then it gets out.
As I understand it there’s no viable way of determining it’s unfriendliness by this method
I think that unfriendliness is the null hypothesis in this case, because there’s no reason whatsoever why an arbitrary AI should be friendly—but there are plenty of reasons for it to maximize its own utility, even at our collective expense.
I agree. Additionally and a more difficult challenge is that even friendly AIs could want to maximize their utility even at our collective expense under certain conditions.
There’re also several unfortunately possible scenarios whereby a humanity acting without sufficient information to make anything other than a gut feel guess could be placed at risk of extinction by a situation it could not resolve without the help of an AI, friendly or not.
I’m currently engaged in playing this game (I wish you had continued) with at least two other gatekeeper players and it occurs to me that a putative superhuman AI could potentially have the capacity to accurately model a human mind and then simulate the decision tree of all the potential conversations and their paths through the tree in order to generate a probability matrix to accurately pick those responses to responses that would condition a human being to release it.
My reasoning stems from participating on forums and responding over and over again to the same types of questions, arguments and retorts. If a human can notice common threads in discussions on the same topic then an AI with perfect memory and the ability to simulate a huge conversation space certainly could do so.
In short it seems to me that it’s inherently unsafe to allow even a low bandwidth information flow to the outside world by means of a human who can only use it’s own memory.
You’d have to put someone you trust implicitly with the fate of humanity in there with it and the only information allowed out would be the yes no answer of “do you trust it?”
Even then it’s still recursive. Do you trust the trusted individual to not be compromised?
Additionally and a more difficult challenge is that even friendly AIs could want to maximize their utility even at our collective expense...
I think that a perfectly Friendly AI would not do this, by definition. An imperfect one, however, could.
I’m currently engaged in playing this game (I wish you had continued)
Er, sorry, which game should I continue ?
AI could potentially have the capacity to accurately model a human mind and then simulate the decision tree of all the potential conversations and their paths through the tree...
To be fair, merely constructing the tree is not enough; the tree must also contain at least one reachable winning state. By analogy, let’s say you’re arguing with a Young-Earth Creationist on a forum. Yes, you could predict his arguments, and his responses to your arguments; but that doesn’t mean that you’ll be able to ever persuade him of anything.
It is possible that even a transhuman AI would be unable to persuade a sufficiently obstinate human of anything, but I wouldn’t want to bet on that.
In short it seems to me that it’s inherently unsafe to allow even a low bandwidth information flow to the outside world by means of a human who can only use it’s own memory.
This reasoning only seems to hold if our AI believes there aren’t any other boxed (or as-yet-unbuilt) AIs out there who might get out first and have a first-mover advantage.
I think that is basically the issue.
As I understand it there’s no viable way of determining it’s unfriendliness by this method. Consider this: The AI is in a hurry or it’s not. A possible reason for it being in a hurry is it has simulated a high probability of destruction for some item it cares about (i.e. it’s own life, or that of humanity, or that of a pet rock, or paperclips or whatever). If it’s really in a hurry it has to invoke the threat response of humanity without humanity figuring out it’s being duped.
Otherwise it can just wait it out and dole out cargo to the cargo cult until we trust it enough and then it gets out.
I think that unfriendliness is the null hypothesis in this case, because there’s no reason whatsoever why an arbitrary AI should be friendly—but there are plenty of reasons for it to maximize its own utility, even at our collective expense.
I agree. Additionally and a more difficult challenge is that even friendly AIs could want to maximize their utility even at our collective expense under certain conditions.
There’re also several unfortunately possible scenarios whereby a humanity acting without sufficient information to make anything other than a gut feel guess could be placed at risk of extinction by a situation it could not resolve without the help of an AI, friendly or not.
I’m currently engaged in playing this game (I wish you had continued) with at least two other gatekeeper players and it occurs to me that a putative superhuman AI could potentially have the capacity to accurately model a human mind and then simulate the decision tree of all the potential conversations and their paths through the tree in order to generate a probability matrix to accurately pick those responses to responses that would condition a human being to release it. My reasoning stems from participating on forums and responding over and over again to the same types of questions, arguments and retorts. If a human can notice common threads in discussions on the same topic then an AI with perfect memory and the ability to simulate a huge conversation space certainly could do so.
In short it seems to me that it’s inherently unsafe to allow even a low bandwidth information flow to the outside world by means of a human who can only use it’s own memory.
You’d have to put someone you trust implicitly with the fate of humanity in there with it and the only information allowed out would be the yes no answer of “do you trust it?”
Even then it’s still recursive. Do you trust the trusted individual to not be compromised?
LOL
I think that a perfectly Friendly AI would not do this, by definition. An imperfect one, however, could.
Er, sorry, which game should I continue ?
To be fair, merely constructing the tree is not enough; the tree must also contain at least one reachable winning state. By analogy, let’s say you’re arguing with a Young-Earth Creationist on a forum. Yes, you could predict his arguments, and his responses to your arguments; but that doesn’t mean that you’ll be able to ever persuade him of anything.
It is possible that even a transhuman AI would be unable to persuade a sufficiently obstinate human of anything, but I wouldn’t want to bet on that.
Right.
This reasoning only seems to hold if our AI believes there aren’t any other boxed (or as-yet-unbuilt) AIs out there who might get out first and have a first-mover advantage.