I don’t think that’s an example of the model noticing it’s in a simulation. There’s nothing about simulations versus the real world that makes RSA instances more or less likely to pop up.
Rather, that’s a case where the model just has a defecting condition and we don’t hit it in the simulation. This is what I was getting at with “other challenge” #2.
Computationally expensive things are less likely to show up in your simulation than the real world, because you only have so much compute to run your simulation. You can’t convincingly fake the AI having access to a supercomputer.
I don’t think that’s an example of the model noticing it’s in a simulation. There’s nothing about simulations versus the real world that makes RSA instances more or less likely to pop up.
Rather, that’s a case where the model just has a defecting condition and we don’t hit it in the simulation. This is what I was getting at with “other challenge” #2.
Computationally expensive things are less likely to show up in your simulation than the real world, because you only have so much compute to run your simulation. You can’t convincingly fake the AI having access to a supercomputer.