I’m always intrigued by these experiments. If the box AI is not confirmed to be friendly, everything it says and promises is absolutely unreliable. I don’t see how the arguments of such an entity could be at all convincing.
But if you knew anything about the process leading up to the development of successful AI, you’d have some beliefs about how likely the AI is to perpetrate a ruse for the purpose of escaping.
But I get the difficulty: how well do you have to understand a being’s nature before you feel confident in predicting its motivations/values?
I’m always intrigued by these experiments. If the box AI is not confirmed to be friendly, everything it says and promises is absolutely unreliable. I don’t see how the arguments of such an entity could be at all convincing.
Good point.
But if you knew anything about the process leading up to the development of successful AI, you’d have some beliefs about how likely the AI is to perpetrate a ruse for the purpose of escaping.
But I get the difficulty: how well do you have to understand a being’s nature before you feel confident in predicting its motivations/values?
So the key to containing an AI is to have a technologically-ignorant rationalist babysit it?
Not more unreliable than the things humans say, and thereby convince you of.
Important difference: we can assume that other humans are probably like us.