The basic issue is that we assume that it’s not spinning up a second optimizer to recursively search. And deceptive alignment is a dangerous state of affairs, since we may not know it’s not misaligned until it’s too late.
we assume that it’s not spinning up a second optimizer to recursively search
You mean we assume that simulacra don’t mishandle their own AI alignment problem? Yes, that’s an issue, hence I made it an explicit assumption in my argument.
The basic issue is that we assume that it’s not spinning up a second optimizer to recursively search. And deceptive alignment is a dangerous state of affairs, since we may not know it’s not misaligned until it’s too late.
You mean we assume that simulacra don’t mishandle their own AI alignment problem? Yes, that’s an issue, hence I made it an explicit assumption in my argument.