It seems like meta preferences take into account the lack of self knowledge of the utility function pretty well. It throws flags on maximizing and tries to move slower/collect more data when it recognizes it is in a tail of its current trade off model. i.e. it has a ‘good enough’ self model of its own update process.
It seems like meta preferences take into account the lack of self knowledge of the utility function pretty well. It throws flags on maximizing and tries to move slower/collect more data when it recognizes it is in a tail of its current trade off model. i.e. it has a ‘good enough’ self model of its own update process.