Ariel G. comments on Ariel G.’s Shortform

Ariel G. 23 Dec 2024 18:42 UTC
1 point
0
I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model’s performance is very skewed to specific domains. Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue