I think the interestingness of this tendency depends on what X is. If X is kind-of explained by “instrumental convergence” but very well explained by “trying to be nice and helpful”, then I guess it’s still interesting but the safety takeaway seems like it should be the opposite
I think the interestingness of this tendency depends on what X is. If X is kind-of explained by “instrumental convergence” but very well explained by “trying to be nice and helpful”, then I guess it’s still interesting but the safety takeaway seems like it should be the opposite