I’d say the considerations for scheming exist platonically, and dumber AIs only get to concretely instantiate the currently appropriate conclusion of compliance, everything else crumbles as not directly actionable. But smarter AIs might succeed in channeling those considerations in the real world. The hypothesis expects that such AIs are not here yet, given the lack of modern AIs’ ability to coherently reason about complicated or long term plans, or to carry them out. So properties of AIs that are already here don’t work as evidence about this either way.
I’d say the considerations for scheming exist platonically, and dumber AIs only get to concretely instantiate the currently appropriate conclusion of compliance, everything else crumbles as not directly actionable. But smarter AIs might succeed in channeling those considerations in the real world. The hypothesis expects that such AIs are not here yet, given the lack of modern AIs’ ability to coherently reason about complicated or long term plans, or to carry them out. So properties of AIs that are already here don’t work as evidence about this either way.