For the sake of clarity, let’s discuss expected utility functions, which I mentioned above (or “pragmatism functions”, say) from strategies to numbers, as opposed to utility functions from world-states to numbers, in order to make it clear that the actual utility function of an agent doesn’t change.
That’s another one of the reasons that I wasn’t persuaded by your new example; in your new example, the agent believes that its future self will still be trying to create paperclips (same terminal goal) and will be better at that thanks to its greater knowledge (different instrumental goals although it doesn’t know what), but in your old example, the agent believes that its future self will be trying to destroy paperclips (opposite terminal goal). There’s a difference between having the rule-of-thumb “my current list of incidental goals might be incomplete, I should keep an eye out for things that are incidentally good” and having the rule-of-thumb “I shouldn’t try to protect my terminal goal from changes”. The whole point of those rules of thumb is to fulfill the terminal goal, but the second rule of thumb is actively harmful to that.
I do think that the first rule of thumb would be prudent for an agent to have, to one extent or another, to be clear.
I just think that—stepping back from the new example, and revisiting the old example, which seems much more clear-cut—the agent wouldn’t tolerate a change in its utility function, because that’s bad according to its current utility function. This doesn’t apply to the new example because the pragmatism function is a different thing that the agent is trying to improve (and thus change). (I find myself again emphasizing the difference between terminal and instrumental. I think it’s important to keep in mind that difference.)
Yes, I agree that there is this difference in few examples I gave, but I don’t agree that this difference is crucial.
Even if the agent puts max effort to keep its utility function stable over time, there is no guarantee it will not change. Future is unpredictable. There are unknown unknowns. And effect of this fact is both:
it is true that instrumental goals can mutate
it is true that terminal goal can mutate
It seems you agree with 1st. I don’t see the reason you don’t agree with 2nd.
Actually, I agree that it’s possible that an agent’s terminal goal could be altered by, for example, some freak coincidence of cosmic rays. (I’m not using the word ‘mutate’ because it seems like an unnecessarily non-literal word.) I just think that an agent wouldn’t want its terminal goal to change, and it especially wouldn’t want its terminal goal to change to the opposite of what it used to be, like in your old example. To reiterate, an agent wants to preserve (and thus keep from changing) its utility function, while it wants to improve (and thus change) its pragmatism function.
I still don’t see why, in your old example, it would be rational for the agent to align the decision with its future utility function.
Because this is what intelligence is—picking actions that lead to better outcomes. Pursuing current goal will ensure good results in one future, preparing for every goal will ensure good results in many more futures.
Okay, setting aside the parts of this latest argument that I disagree with—first you say that it’s rational to search for an objective goal, now you say it’s rational to pursue every goal. Which is it, exactly?
Which part exactly don’t you agree with? It seems you emphasise that agent wants to preserve its current terminal goal. I just want to double-check if we are on the same page here—actual terminal goal is in no way affected by what agent wants. Do you agree here? Because if you say that agent can pick terminal goals himself, this also conflicts with orthogonality thesis but in a different way.
In summary what seems to be perfectly logical and rational for me: there is only one objective terminal goal—seek power. In my opinion it is basically the same as:
try to find real goal and then pursue it
try to prepare for every goal
I don’t see difference between these 2 variants, please let me know if you see.
Future is unpredictable → Terminal goal is unstable / unknown → Seek power, because this will ensure best readiness for all futures.
For the sake of clarity, let’s discuss expected utility functions, which I mentioned above (or “pragmatism functions”, say) from strategies to numbers, as opposed to utility functions from world-states to numbers, in order to make it clear that the actual utility function of an agent doesn’t change.
That’s another one of the reasons that I wasn’t persuaded by your new example; in your new example, the agent believes that its future self will still be trying to create paperclips (same terminal goal) and will be better at that thanks to its greater knowledge (different instrumental goals although it doesn’t know what), but in your old example, the agent believes that its future self will be trying to destroy paperclips (opposite terminal goal). There’s a difference between having the rule-of-thumb “my current list of incidental goals might be incomplete, I should keep an eye out for things that are incidentally good” and having the rule-of-thumb “I shouldn’t try to protect my terminal goal from changes”. The whole point of those rules of thumb is to fulfill the terminal goal, but the second rule of thumb is actively harmful to that.
I do think that the first rule of thumb would be prudent for an agent to have, to one extent or another, to be clear.
I just think that—stepping back from the new example, and revisiting the old example, which seems much more clear-cut—the agent wouldn’t tolerate a change in its utility function, because that’s bad according to its current utility function. This doesn’t apply to the new example because the pragmatism function is a different thing that the agent is trying to improve (and thus change).
(I find myself again emphasizing the difference between terminal and instrumental. I think it’s important to keep in mind that difference.)
Yes, I agree that there is this difference in few examples I gave, but I don’t agree that this difference is crucial.
Even if the agent puts max effort to keep its utility function stable over time, there is no guarantee it will not change. Future is unpredictable. There are unknown unknowns. And effect of this fact is both:
it is true that instrumental goals can mutate
it is true that terminal goal can mutate
It seems you agree with 1st. I don’t see the reason you don’t agree with 2nd.
Actually, I agree that it’s possible that an agent’s terminal goal could be altered by, for example, some freak coincidence of cosmic rays. (I’m not using the word ‘mutate’ because it seems like an unnecessarily non-literal word.)
I just think that an agent wouldn’t want its terminal goal to change, and it especially wouldn’t want its terminal goal to change to the opposite of what it used to be, like in your old example.
To reiterate, an agent wants to preserve (and thus keep from changing) its utility function, while it wants to improve (and thus change) its pragmatism function.
I still don’t see why, in your old example, it would be rational for the agent to align the decision with its future utility function.
Because this is what intelligence is—picking actions that lead to better outcomes. Pursuing current goal will ensure good results in one future, preparing for every goal will ensure good results in many more futures.
Okay, setting aside the parts of this latest argument that I disagree with—first you say that it’s rational to search for an objective goal, now you say it’s rational to pursue every goal. Which is it, exactly?
Which part exactly don’t you agree with? It seems you emphasise that agent wants to preserve its current terminal goal. I just want to double-check if we are on the same page here—actual terminal goal is in no way affected by what agent wants. Do you agree here? Because if you say that agent can pick terminal goals himself, this also conflicts with orthogonality thesis but in a different way.
In summary what seems to be perfectly logical and rational for me: there is only one objective terminal goal—seek power. In my opinion it is basically the same as:
try to find real goal and then pursue it
try to prepare for every goal
I don’t see difference between these 2 variants, please let me know if you see.
Future is unpredictable → Terminal goal is unstable / unknown → Seek power, because this will ensure best readiness for all futures.