The cogitation here is implicitly hypothesizing an AI that’s explicitly considering the data and trying to compress it, having been successfully anchored on that data’s compression as identifying an ideal utility function. You’re welcome to think of the preferences as a static object shaped by previous unreflective gradient descent; it sure wouldn’t arrive at any better answers that way, and would also of course want to avoid further gradient descent happening to its current preferences.
The cogitation here is implicitly hypothesizing an AI that’s explicitly considering the data and trying to compress it, having been successfully anchored on that data’s compression as identifying an ideal utility function. You’re welcome to think of the preferences as a static object shaped by previous unreflective gradient descent; it sure wouldn’t arrive at any better answers that way, and would also of course want to avoid further gradient descent happening to its current preferences.