I agree if the utility function was unknown and arbitrary. But an AI that has already done 3^^^3 simulations and believes it then derives further utility from doing 3^^^3+1 simulations while sending (for the 3^^^3+1th time) an avatar to influence the entities it is simulating through intimidation and fear while offering no rationale for those fears and to a website inhabited by individuals attempting to be ever more rational does not have an unknown and arbitrary utility function.
I don’t think there is any reasonable utility function that is consistent with the actions the AI is claiming to have done. There may be utility functions that are consistent with those actions, but an AI exhibiting one of those utility functions could not be an AI that I would consider effectively omnipotent.
An omnipotent AI would know that, so this AI cannot be omnipotent and so is lying.
I don’t think there is any reasonable utility function that is consistent with the actions the AI is claiming to have done. There may be utility functions that are consistent with those actions, but an AI exhibiting one of those utility functions could not be an AI that I would consider effectively omnipotent.
There is no connection between the intelligence or power of an agent and its values other than its intelligence functioning as an upper bound on the complexity of its values. An omnipotent actor can have just as stupid values as everyone else. An omnipotent AI could have have a positive utility for annoying you with stupid and redundant tests as many times as possible, either as part of a really stupid utility function that it somehow ended up with on accident, or a non-stupid (if there even is such a thing) utility function that just looks like nonsense to humans.
To me a reasonable utility function has to have a degree of self-consistency. A reasonable utility function wouldn’t value both doing and undoing the same action simultaneously.
If an entity is using a utility function to determine its actions, then for every action the entity can perform, its utility function must be able to determine a utility value which then determines whether the entity does the action or not. If the utility function does not return a value, then the entity still has to act or not act, so the entity still has a utility function for that action (non-action).
The purpose of a utility function is to inform the entity so it seeks to perform actions that result in greater utility. A utility function that is self-contradictory defeats the whole purpose of a utility function. While an arbitrary utility function can in principle occur, an intelligent entity with a self-contradictory utility function would achieve greater utility by modifying its utility function until it was less self-contradictory.
It is probably not possible to have a utility function that is both complete (in that it returns a utility for each action the entity can perform) and consistent (that it returns a single value for the utility of each action the entity can perform) except for very simple entities. An entity complex enough to instantiate arithmetic is complex enough to invoke Gödel’s theorem. An entity can substitute a random choice when its utility function does not return a value, but that will result in sub-optimal results.
In the example that FAWS used, a utility function that seeks to annoy me as much as possible, is inconsistent with the entity being an omnipotent AI that can simulate something as complex as me, an entity which can instantiate arithmetic. The only annoyance the AI has caused me is a −1 karma, which to me is less than a single dust mote in the eye.
I said as many times, not as much as possible. The AI might value that particular kind and degree of annoyance uniquely, say as a failed FAI that was programmed to maximize rich, not strongly negative human experience according to some screwed up definition of rich experiences, and according to this definition your state of mind between reading and replying to that message scores best, so the AI spends as many computational resources as possible on simulating you reacting to that message.
Or perhaps it was supposed to value telling the truth to humans, there is a complicated formula for evaluating the value of each statement, due to human error it values telling the truth without being believed higher (the programmer thought non-obvious truths are more valuable), and simulating you reacting to that statement is the most efficient way to make a high scoring true statement that will not be believed.
Or it could value something else entirely that’s just not obvious to a human. There should be an infinite number of non-contradictory utility functions valuing doing what it supposedly did, even though the prior for most of them is pretty low (and only a small fraction of them should value still simulating you now, so by now you can be even more sure the original statement was wrong than you could be then for reasons unrelated to your deduction)
While an arbitrary utility function can in principle occur, an intelligent entity with a self-contradictory utility function would achieve greater utility by modifying its utility function until it was less self-contradictory.
To the extent humans have utility functions (e.g. derived from their behavior), they are often contradictory, yet few humans try to change their utility functions (in any of several applicable senses of the word) to resolve such contradictions.
This is because human utility functions generally place negative value on changing your own utility function. This is what I think of when I think “reasonable utility function”: they are evolutionarily stable.
Returning to your definition, just because humans have inconsistent utility functions, I don’t think you can argue that they are not ‘intelligent’ (enough). Intelligence is only a tool; utility is supreme. AIs too have a high chance of undergoing evolution, via cloning and self-modification. In a universe where AIs were common, I would expect a stranger AI to have a self-preserving utility function, i.e., one resistant to changes.
Human utility functions change all the time. They are usually not easily changed through conscious effort, but drugs can change them quite readily, for example exposure to nicotine changes the human utility function to place a high value on consuming the right amount of nicotine. I think humans place a high utility on the illusion that their utility function is difficult to change and an even higher utility in rationalizing false logical-seeming motivations for how they feel. There are whole industries (tobacco, advertising, marketing, laws, religions, brainwashing, etc.) set up to attempt to change human utility functions.
Human utility functions do change over time, but they have to because humans have needs that vary with time. Inhaling has to be followed by exhaling, ingesting food has to be followed by excretion of waste, being awake has to be followed by being asleep. Also humans evolved as biological entities; their evolved utility function evolved so as to enhance reproduction and survival of the organism. There are plenty of evolved “back-doors” in human utility functions that can be used to hack into and exploit human utility functions (as the industries mentioned earlier do).
I think that human utility functions are not easily modified in certain ways because of the substrate they are instantiated in, biological tissues, and because they evolved; not because humans don’t want to modify their utility function. They are easily modified in some ways (the nicotine example) for the same reason. I think the perceived inconsistency in human utility functions more relates to the changing needs of their biological substrate and its limitations rather than poor specification of the utility function.
Since an AI is artificial, it would have an artificial utility function. Since even an extremely powerful AI will still have finite resources (including computational resources), an efficient allocation of those resources is a necessary part of any reasonable utility function for that AI. If the resources the AI has change over time, then the utility function the AI uses to allocate those resources has to change over time also. If the AI can modify its own utility function (optimal, but not strictly necessary for it to match its utility function to its available resources), reducing contradictory and redundant allocations of resources is what a reasonable utility function would do.
I agree if the utility function was unknown and arbitrary. But an AI that has already done 3^^^3 simulations and believes it then derives further utility from doing 3^^^3+1 simulations while sending (for the 3^^^3+1th time) an avatar to influence the entities it is simulating through intimidation and fear while offering no rationale for those fears and to a website inhabited by individuals attempting to be ever more rational does not have an unknown and arbitrary utility function.
I don’t think there is any reasonable utility function that is consistent with the actions the AI is claiming to have done. There may be utility functions that are consistent with those actions, but an AI exhibiting one of those utility functions could not be an AI that I would consider effectively omnipotent.
An omnipotent AI would know that, so this AI cannot be omnipotent and so is lying.
There is no connection between the intelligence or power of an agent and its values other than its intelligence functioning as an upper bound on the complexity of its values. An omnipotent actor can have just as stupid values as everyone else. An omnipotent AI could have have a positive utility for annoying you with stupid and redundant tests as many times as possible, either as part of a really stupid utility function that it somehow ended up with on accident, or a non-stupid (if there even is such a thing) utility function that just looks like nonsense to humans.
What is your definition of ‘reasonable’ utility functions, which doesn’t reference any other utility functions (such as our own)?
To me a reasonable utility function has to have a degree of self-consistency. A reasonable utility function wouldn’t value both doing and undoing the same action simultaneously.
If an entity is using a utility function to determine its actions, then for every action the entity can perform, its utility function must be able to determine a utility value which then determines whether the entity does the action or not. If the utility function does not return a value, then the entity still has to act or not act, so the entity still has a utility function for that action (non-action).
The purpose of a utility function is to inform the entity so it seeks to perform actions that result in greater utility. A utility function that is self-contradictory defeats the whole purpose of a utility function. While an arbitrary utility function can in principle occur, an intelligent entity with a self-contradictory utility function would achieve greater utility by modifying its utility function until it was less self-contradictory.
It is probably not possible to have a utility function that is both complete (in that it returns a utility for each action the entity can perform) and consistent (that it returns a single value for the utility of each action the entity can perform) except for very simple entities. An entity complex enough to instantiate arithmetic is complex enough to invoke Gödel’s theorem. An entity can substitute a random choice when its utility function does not return a value, but that will result in sub-optimal results.
In the example that FAWS used, a utility function that seeks to annoy me as much as possible, is inconsistent with the entity being an omnipotent AI that can simulate something as complex as me, an entity which can instantiate arithmetic. The only annoyance the AI has caused me is a −1 karma, which to me is less than a single dust mote in the eye.
I said as many times, not as much as possible. The AI might value that particular kind and degree of annoyance uniquely, say as a failed FAI that was programmed to maximize rich, not strongly negative human experience according to some screwed up definition of rich experiences, and according to this definition your state of mind between reading and replying to that message scores best, so the AI spends as many computational resources as possible on simulating you reacting to that message.
Or perhaps it was supposed to value telling the truth to humans, there is a complicated formula for evaluating the value of each statement, due to human error it values telling the truth without being believed higher (the programmer thought non-obvious truths are more valuable), and simulating you reacting to that statement is the most efficient way to make a high scoring true statement that will not be believed.
Or it could value something else entirely that’s just not obvious to a human. There should be an infinite number of non-contradictory utility functions valuing doing what it supposedly did, even though the prior for most of them is pretty low (and only a small fraction of them should value still simulating you now, so by now you can be even more sure the original statement was wrong than you could be then for reasons unrelated to your deduction)
To the extent humans have utility functions (e.g. derived from their behavior), they are often contradictory, yet few humans try to change their utility functions (in any of several applicable senses of the word) to resolve such contradictions.
This is because human utility functions generally place negative value on changing your own utility function. This is what I think of when I think “reasonable utility function”: they are evolutionarily stable.
Returning to your definition, just because humans have inconsistent utility functions, I don’t think you can argue that they are not ‘intelligent’ (enough). Intelligence is only a tool; utility is supreme. AIs too have a high chance of undergoing evolution, via cloning and self-modification. In a universe where AIs were common, I would expect a stranger AI to have a self-preserving utility function, i.e., one resistant to changes.
Human utility functions change all the time. They are usually not easily changed through conscious effort, but drugs can change them quite readily, for example exposure to nicotine changes the human utility function to place a high value on consuming the right amount of nicotine. I think humans place a high utility on the illusion that their utility function is difficult to change and an even higher utility in rationalizing false logical-seeming motivations for how they feel. There are whole industries (tobacco, advertising, marketing, laws, religions, brainwashing, etc.) set up to attempt to change human utility functions.
Human utility functions do change over time, but they have to because humans have needs that vary with time. Inhaling has to be followed by exhaling, ingesting food has to be followed by excretion of waste, being awake has to be followed by being asleep. Also humans evolved as biological entities; their evolved utility function evolved so as to enhance reproduction and survival of the organism. There are plenty of evolved “back-doors” in human utility functions that can be used to hack into and exploit human utility functions (as the industries mentioned earlier do).
I think that human utility functions are not easily modified in certain ways because of the substrate they are instantiated in, biological tissues, and because they evolved; not because humans don’t want to modify their utility function. They are easily modified in some ways (the nicotine example) for the same reason. I think the perceived inconsistency in human utility functions more relates to the changing needs of their biological substrate and its limitations rather than poor specification of the utility function.
Since an AI is artificial, it would have an artificial utility function. Since even an extremely powerful AI will still have finite resources (including computational resources), an efficient allocation of those resources is a necessary part of any reasonable utility function for that AI. If the resources the AI has change over time, then the utility function the AI uses to allocate those resources has to change over time also. If the AI can modify its own utility function (optimal, but not strictly necessary for it to match its utility function to its available resources), reducing contradictory and redundant allocations of resources is what a reasonable utility function would do.