I’ll also say to the extent they are optimizing in a utility-maximizing sense, it’s about predicting correctly about the whole world, not a reward function in the traditional sense (though they probably do have more learned utility functions/values as a part of that), so Paul Crowley is still wrong here.
I’ll also say to the extent they are optimizing in a utility-maximizing sense, it’s about predicting correctly about the whole world, not a reward function in the traditional sense (though they probably do have more learned utility functions/values as a part of that), so Paul Crowley is still wrong here.