The point here is that there are enough results in ML like this that I’m more skeptical of the security mindset being accurate, and ML/AI alignment is a strange enough domain such that we shouldn’t port over intuitions from other fields, like you shouldn’t port over intuitions from the large scale to quantum mechanics.
For a specific example relevant to alignment, I talked about SGD’s corrective properties in a section of the post.
Another good example has to do with with the fact that AIs are generally modular and you can switch out parts without breaking the AI, which couldn’t be done under a security mindset as it would predict that either the AI spits out nonsense or breaks it’s security, none of which have happened.
The point here is that there are enough results in ML like this that I’m more skeptical of the security mindset being accurate, and ML/AI alignment is a strange enough domain such that we shouldn’t port over intuitions from other fields, like you shouldn’t port over intuitions from the large scale to quantum mechanics.
For a specific example relevant to alignment, I talked about SGD’s corrective properties in a section of the post.
Another good example has to do with with the fact that AIs are generally modular and you can switch out parts without breaking the AI, which couldn’t be done under a security mindset as it would predict that either the AI spits out nonsense or breaks it’s security, none of which have happened.