If it’s built to not take actions it would pregret, sure. But therein lies the question: how do you differentiate between classes of changes to utility functions? How do you recognize which non-utility functions are critical for utility functions, and preserve them?
For example, if the utility function is while (children.all_fed?) {$utility+=1}, you need to protect children.all_fed? and children. But children is obviously something you would want to change- when you birth a new child, you want to add it to the list. So how can you differentiate between birth and a cuckoo? You can’t make it so you only add to the list- then the death of a child will cause the fed status of the other children to not matter.
Yes, agreed, building a system that can reliably predict the consequences of its actions… for example, that can recognize that hypothetically making X change to its utility function results in its children hypothetically not being fed… is a hard engineering problem.
That said, calling something an AGI with a utility function at all, let alone a superhuman one, seems to presuppose that this problem has been solved. If it can’t do that, we have bigger problems than the stability of its utility function.
(If my actions aren’t conditioned on reliable judgments about likely consequences in the first place, you have no grounds for taking an intentional stance with respect to me at all… knowing what I want does not let you predict what I’ll do. I’m not sure on what grounds you’re calling me intelligent, at that point.)
Distinct from that is the system being built such that, before making a change to its utility function, it considers the likely consequences of that change, and such that, if it considers the likely consequences bad ones, it doesn’t make the change.
But that part seems relatively simple by comparison.
But doing so doesn’t seem likely to result in his children being fed, which means he probably wouldn’t do so even if he could.
If it’s built to not take actions it would pregret, sure. But therein lies the question: how do you differentiate between classes of changes to utility functions? How do you recognize which non-utility functions are critical for utility functions, and preserve them?
For example, if the utility function is while (children.all_fed?) {$utility+=1}, you need to protect children.all_fed? and children. But children is obviously something you would want to change- when you birth a new child, you want to add it to the list. So how can you differentiate between birth and a cuckoo? You can’t make it so you only add to the list- then the death of a child will cause the fed status of the other children to not matter.
Yes, agreed, building a system that can reliably predict the consequences of its actions… for example, that can recognize that hypothetically making X change to its utility function results in its children hypothetically not being fed… is a hard engineering problem.
That said, calling something an AGI with a utility function at all, let alone a superhuman one, seems to presuppose that this problem has been solved. If it can’t do that, we have bigger problems than the stability of its utility function.
(If my actions aren’t conditioned on reliable judgments about likely consequences in the first place, you have no grounds for taking an intentional stance with respect to me at all… knowing what I want does not let you predict what I’ll do. I’m not sure on what grounds you’re calling me intelligent, at that point.)
Distinct from that is the system being built such that, before making a change to its utility function, it considers the likely consequences of that change, and such that, if it considers the likely consequences bad ones, it doesn’t make the change.
But that part seems relatively simple by comparison.
Obviously the concept of ‘ensure my children are fed’ is only coherent within a certain domain. I don’t see what that has to do with wire heading.