Richard_Ngo comments on Decision theory does not imply that we get to have nice things

Richard_Ngo 24 Oct 2022 15:55 UTC
LW: 5 AF: 5
0
AF
I agree that we’ll have a learning function that works on the data actually input, but it seems strange to me to characterize that learned model as “reflecting back on that data” in order to figure out what it cares about (as opposed to just developing preferences that were shaped by the data).
- Eliezer Yudkowsky 7 Nov 2022 4:16 UTC
  LW: 7 AF: 6
  1
  AF Parent
  The cogitation here is implicitly hypothesizing an AI that’s explicitly considering the data and trying to compress it, having been successfully anchored on that data’s compression as identifying an ideal utility function. You’re welcome to think of the preferences as a static object shaped by previous unreflective gradient descent; it sure wouldn’t arrive at any better answers that way, and would also of course want to avoid further gradient descent happening to its current preferences.