the utility function evaluated on subhistories starting on my next observation won’t be able to tell that I did this, and as far as I can tell the AUP penalty doesn’t notice any change in my ability to achieve this goal.
Your utility presently isn’t even requiring a check to see whether you’re playing against the right person. If the utility function actually did require this before dispensing any high utility, we would indeed have the correct difference as a result of this action. In this case, you’re saying that the utility function isn’t verifying in the subhistory, even though it’s not verifying in the default case either (where you don’t swap opponents). This is where the inconsistency comes from.
the whole history tells you more about the state of the world than the subhistory.
What is the “whole history”? We instantiate the main agent at arbitary times.
Your utility presently isn’t even requiring a check to see whether you’re playing against the right person. If the utility function actually did require this before dispensing any high utility, we would indeed have the correct difference as a result of this action. In this case, you’re saying that the utility function isn’t verifying in the subhistory, even though it’s not verifying in the default case either (where you don’t swap opponents).
Say that the utility does depend on whether the username on the screen is “Rohin”, but the initial action makes this an unreliable indicator of whether I’m playing against Rohin. Furthermore, say that the utility function would score the entire observation-action history that the agent observed as low utility. I claim that the argument still goes through. In fact, this seems to be the same thing that Stuart Armstrong is getting at in the first part of this post.
What is the “whole history”?
The whole history is all the observations and actions that the main agent has actually experienced.
So this is actually a separate issue (which I’ve been going back and forth on) involving the t+nth step not being included in the Q calculation. It should be fixed soon, as should this example in particular.
Your utility presently isn’t even requiring a check to see whether you’re playing against the right person. If the utility function actually did require this before dispensing any high utility, we would indeed have the correct difference as a result of this action. In this case, you’re saying that the utility function isn’t verifying in the subhistory, even though it’s not verifying in the default case either (where you don’t swap opponents). This is where the inconsistency comes from.
What is the “whole history”? We instantiate the main agent at arbitary times.
Say that the utility does depend on whether the username on the screen is “Rohin”, but the initial action makes this an unreliable indicator of whether I’m playing against Rohin. Furthermore, say that the utility function would score the entire observation-action history that the agent observed as low utility. I claim that the argument still goes through. In fact, this seems to be the same thing that Stuart Armstrong is getting at in the first part of this post.
The whole history is all the observations and actions that the main agent has actually experienced.
So this is actually a separate issue (which I’ve been going back and forth on) involving the t+nth step not being included in the Q calculation. It should be fixed soon, as should this example in particular.