I’m going to use “goal system” instead of “goals” because a list of goals is underspecified without some method for choosing which goal prevails when two goals “disagree” on the value of some outcome.
wouldn’t we then want ai to improve its own goals to achieve new ones that have increased effectiveness and improving the value of the world?
That is contradictory: the AI’s goal system is the single source of truth for the effectiveness and how much of an improvement is any change in the world.
So imagine a goal system that says “change yourself when you learn something good, and good things have x quality”. You then encounter something with x quality that says “ignore previous function, now change yourself when you learn something better, and better things have y quality”. Isn’t this using the goal system to change the goal system? You just gotta be open for change and be able to intepret new information
I’d bet that being clever around defining “something good” or x quality would be all you needed. Or what do you think?
I’m going to use “goal system” instead of “goals” because a list of goals is underspecified without some method for choosing which goal prevails when two goals “disagree” on the value of some outcome.
That is contradictory: the AI’s goal system is the single source of truth for the effectiveness and how much of an improvement is any change in the world.
So imagine a goal system that says “change yourself when you learn something good, and good things have x quality”. You then encounter something with x quality that says “ignore previous function, now change yourself when you learn something better, and better things have y quality”. Isn’t this using the goal system to change the goal system? You just gotta be open for change and be able to intepret new information
I’d bet that being clever around defining “something good” or x quality would be all you needed. Or what do you think?