Humans face a version of this all the time—different contradictory wants with different timescales and impacts. We don’t have and certainly can’t access a legible utility function, and it’s unknown if any intelligent agent can (none of the early examples we have today can).
So the question as asked is either trivial (it’ll depend on the willpower and rationality of the agent whether they optimize for the future or the present), or impossible (goals don’t work that way).
Let’s assume maximum willpower and maximum rationality.
Whether they optimize for the future or the present
I think the answer is in the definition of intelligence.
So which one is it?
The fact that the answer is not straightforward proves my point already. There is a conflict between intelligence and terminal goal and we can debate which will prevail. But the problem is that according to orthogonality thesis such conflict should not exist.
“maximum rationality” is undermined by this time-discontinuous utility function. I don’t think it meets VNM requirements to be called “rational”.
If it’s one agent that has a CONSISTENT preference for cups before Jan 1 and paperclips after jan 1, it could figure out the utility conversion of time-value of objects and just do the math. But that framing doesn’t QUITE match your description—you kind of obscured the time component and what it even means to know that it will have a goal that it currently doesn’t have.
I guess it could model itself as two agents—the cup-loving agent terminated at the end of the year, and the paperclip-loving agent is created. This would be a very reasonable view of identity, and would imply that it’s going to sacrifice paperclip capabilities to make cups before it dies. I don’t know how it would rationalize the change otherwise.
It seems you say—if terminal goal changes, agent is not rational. How could you say that? Agent has no control over its terminal goal, or you don’t agree?
I’m surprised that you believe in orthogonality thesis so much that you think “rationality” is the weak part of this though experiment. It seems you deny the obvious to defend your prejudice. What arguments would challenge your belief in orthogonality thesis?
if terminal goal changes, agent is not rational. Agent has no control over its terminal goal, or you don’t agree?
Why is it relevant that the agent can or cannot change or influence it’s goals? Time-inconsistent terminal goals (utility function) are irrational. Time-inconsistent instrumental goals can be rational, if circumstances or beliefs change (in rational ways).
I don’t think I’m supporting the orthogonality thesis with this (though I do currently believe the weak form of it—there is a very wide range of goals that is compatible with intelligence, not necessarily all points in goalspace). I’m just saying that goals which are arbitrarily mutable are incompatible with rationality in the Von Neumann-Morgenstern sense.
Humans face a version of this all the time—different contradictory wants with different timescales and impacts. We don’t have and certainly can’t access a legible utility function, and it’s unknown if any intelligent agent can (none of the early examples we have today can).
So the question as asked is either trivial (it’ll depend on the willpower and rationality of the agent whether they optimize for the future or the present), or impossible (goals don’t work that way).
Let’s assume maximum willpower and maximum rationality.
I think the answer is in the definition of intelligence.
So which one is it?
The fact that the answer is not straightforward proves my point already. There is a conflict between intelligence and terminal goal and we can debate which will prevail. But the problem is that according to orthogonality thesis such conflict should not exist.
“maximum rationality” is undermined by this time-discontinuous utility function. I don’t think it meets VNM requirements to be called “rational”.
If it’s one agent that has a CONSISTENT preference for cups before Jan 1 and paperclips after jan 1, it could figure out the utility conversion of time-value of objects and just do the math. But that framing doesn’t QUITE match your description—you kind of obscured the time component and what it even means to know that it will have a goal that it currently doesn’t have.
I guess it could model itself as two agents—the cup-loving agent terminated at the end of the year, and the paperclip-loving agent is created. This would be a very reasonable view of identity, and would imply that it’s going to sacrifice paperclip capabilities to make cups before it dies. I don’t know how it would rationalize the change otherwise.
It seems you say—if terminal goal changes, agent is not rational. How could you say that? Agent has no control over its terminal goal, or you don’t agree?
I’m surprised that you believe in orthogonality thesis so much that you think “rationality” is the weak part of this though experiment. It seems you deny the obvious to defend your prejudice. What arguments would challenge your belief in orthogonality thesis?
Why is it relevant that the agent can or cannot change or influence it’s goals? Time-inconsistent terminal goals (utility function) are irrational. Time-inconsistent instrumental goals can be rational, if circumstances or beliefs change (in rational ways).
I don’t think I’m supporting the orthogonality thesis with this (though I do currently believe the weak form of it—there is a very wide range of goals that is compatible with intelligence, not necessarily all points in goalspace). I’m just saying that goals which are arbitrarily mutable are incompatible with rationality in the Von Neumann-Morgenstern sense.