ryan_greenblatt comments on Ariel G.’s Shortform

ryan_greenblatt 23 Dec 2024 18:20 UTC
5 points
2
I’d say control seems easier now, but it’s unclear if this makes the agenda more promising. You might have thought one issue with the agenda is that control is likely to be trivial and thus not worth working on (and that some other problem, e.g., doing alignment research with AI labor regardless of whether the AIs are scheming is a bigger issue).
- Ariel G. 23 Dec 2024 18:42 UTC
  1 point
  0
  Parent
  I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model’s performance is very skewed to specific domains. Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue