I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model’s performance is very skewed to specific domains.
Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue
I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model’s performance is very skewed to specific domains. Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue