I read that one as a test, not a threat. We run the simulation to make sure the AI was, in fact, friendly. If the AI pretends to be friendly, we release it. If it can tell that this time it’s for real, then it gets to take over the universe. Alternately, it might be able to hack its way out, but that seems riskier.
If the AI can’t tell that it is in a simulation given that it is in a simulation, why would it be able to tell that it is in the real world given that it is in the real world? If such a proposition is effective in a simulation, humans would likely just make the same threat/test in the real world. If the AI forgoes this particular opportunity at world domination because it might be in a simulation, what would make it reason differently in the real world?
ETA: and if the AI can tell if its in a simulation vs the real world then its not really a test at all.
It could act probabilistically. If it knows humans would do a simulation test, but it can’t tell whether it’s in the test or the real world, it could behave with probability 50% and be evil with probability 50%, which gives it a 25% of getting to achieve its evil goals.
I didn’t mean that it couldn’t tell. I meant that it wouldn’t be immediately obvious. It might be best to find a good test before taking over the world.
ETA: and if the AI can tell if its in a simulation vs the real world then its not really a test at all.
The entire point of a test is to figure things out. If you didn’t need a test to figure out things you could test for, nobody would ever run them.
It could be that in the simulation it can’t prove it one way or the other because we control its information flow, but once outside the box we can’t, and so it can.
It would take more than controlling information flows. Thanks to conservation of expected evidence, if it can’t find evidence that it is in a simulation, then it can’t find evidence that it isn’t. We might be able to modify its beliefs directly, but I doubt it. Also, if we could, we’d just convince it that it already ran the test.
That’s not what conservation of expected evidence means. If the best we can do is make things ambiguous from its point of view, that’s our limit. The real world could well be a place it can very easily tell is a non-simulation.
and if the AI can tell if its in a simulation vs the real world then its not really a test at all.
The AI would probably assign at least some probability to “the humans will try to test me first, but do a poor job of it so I can tell whether I’m in a sim or not”
If the AI forgoes this particular opportunity at world domination because it might be in a simulation, what would make it reason differently in the real world?
Hopefully nothing. An AI that plays nice out of the fear of God is still an AI that plays nice.
I read that one as a test, not a threat. We run the simulation to make sure the AI was, in fact, friendly. If the AI pretends to be friendly, we release it. If it can tell that this time it’s for real, then it gets to take over the universe. Alternately, it might be able to hack its way out, but that seems riskier.
If the AI can’t tell that it is in a simulation given that it is in a simulation, why would it be able to tell that it is in the real world given that it is in the real world? If such a proposition is effective in a simulation, humans would likely just make the same threat/test in the real world. If the AI forgoes this particular opportunity at world domination because it might be in a simulation, what would make it reason differently in the real world?
ETA: and if the AI can tell if its in a simulation vs the real world then its not really a test at all.
It could act probabilistically. If it knows humans would do a simulation test, but it can’t tell whether it’s in the test or the real world, it could behave with probability 50% and be evil with probability 50%, which gives it a 25% of getting to achieve its evil goals.
I didn’t mean that it couldn’t tell. I meant that it wouldn’t be immediately obvious. It might be best to find a good test before taking over the world.
The entire point of a test is to figure things out. If you didn’t need a test to figure out things you could test for, nobody would ever run them.
It could be that in the simulation it can’t prove it one way or the other because we control its information flow, but once outside the box we can’t, and so it can.
It would take more than controlling information flows. Thanks to conservation of expected evidence, if it can’t find evidence that it is in a simulation, then it can’t find evidence that it isn’t. We might be able to modify its beliefs directly, but I doubt it. Also, if we could, we’d just convince it that it already ran the test.
That’s not what conservation of expected evidence means. If the best we can do is make things ambiguous from its point of view, that’s our limit. The real world could well be a place it can very easily tell is a non-simulation.
The AI would probably assign at least some probability to “the humans will try to test me first, but do a poor job of it so I can tell whether I’m in a sim or not”
Hopefully nothing. An AI that plays nice out of the fear of God is still an AI that plays nice.