I think there are proposals that (are hoped? with more research?) might lead to changeable utility functions, i.e. an agents won’t try to stop you from changing their utility function.
‘Don’t self modify’ utility functions, I don’t think are around yet—the tricky part might be in getting the agent recognize itself, the goal, or something.
Most of what I’ve seen has revolved around thought experiments (with math).
I think there are proposals that (are hoped? with more research?) might lead to changeable utility functions, i.e. an agents won’t try to stop you from changing their utility function.
‘Don’t self modify’ utility functions, I don’t think are around yet—the tricky part might be in getting the agent recognize itself, the goal, or something.
Most of what I’ve seen has revolved around thought experiments (with math).