Koen.Holtman comments on Counterfactual control incentives

Koen.Holtman 26 Feb 2021 16:01 UTC
LW: 1 AF: 1
AF
In this comment (last in my series of planned comments on this post) I’ll discuss the detailed player-to-match-with example developed in the post:

In order to analyse the issues with the setup, let’s choose a more narrowly defined example. There are many algorithms that aim to manipulate payers of mobile games in order to get them to buy more expensive in-game items.

I have by now re-read this analysis with the example several times. First time I read it, I already felt that it was a strange way to analyse the problem, but it took me a while to figure out exactly why.

Best I can tell right now is that there are two factors
1. I can’t figure out if the bad thing that the example tries to prove is that a) agent is trying to maximize purchases, which is unwanted or b) the agent is manipulating user’s item ranking, which is unwanted. (If it is only a), then there is no need to bring in all this discussion about correlation.)
2. the example refines its initial CID by redrawing it in a strange way
So now I am going to develop the same game example in a style that I find less strange. I also claim that this gets closer to the default style people use when they want to analyse and manage causal incentives.

To start with, this is the original model of the game mechanics: the model of the mechanics in the real world in which the game takes place.

This shows that the agent has an incentive to control predicted purchases upwards, but also to do so by influencing the item rankings that exist in the mind of the player.

If we want to weaken this incentive to influence the item rankings that exist in the mind of the player, we can construct a counterfactual planning world for the agent (see here for an explanation of the planning world terminology I am using):

(Carey et all call often call this planning world a twin model, a model which combines both factual and counterfactual events.) In both my work and in Carey et intention, the is that the above diagram defines the world model in which the agent will plan the purchases-maximizing action, and then this same action is applied in the real world model above.

Now, the important things to note are:
- this counterfactual construction does not eliminate the incentive of the agent to maximize purchases, as we still have the red arrow in there
- this counterfactual construction does not eliminate the ability of the agent to influence item rankings, as we still have the orange arrow in there
- but as the orange halo around the influenced item rankings is gone, the agent has lost its instrumental control incentive on item rankings. (The meaning of the orange halo and the terminology of instrumental control incentives are defined in Agent Incentives: A Causal Perspective.)
Now, say that we want to drill down further on these models, to a more detailed level of modeling. We might do so if we may want to examine further how the orange arrow above will act in practice.

We could add more detail by adding a node ‘gameplay knowledge’ which is correlated with item rankings. In the real-world model, this would be depicted as follows:

I added the blue arrow above to make the correlation between influenced gameplay knowledge and influenced rankings explicit as a line of causal influence. An equivalent blue arrow is not present in the drawings in the post above: technically speaking the drawing in the post is compatible with the assumption that there may be a correlation between the two, but it does not spell out the presence of this correlation, which would be unusual when doing this type of analysis.

The drawing in the post is also unusual in that it omits the red arrow I left in above. If I were to remove the red arrow, this would amount to a claim that in the real world situation, item ranking and gameplay knowledge are the only two channels by which the agent’s decision may influence purchases. Clearly this is not the case: if the agent were to match the player with an opponent who is sure to win the fight and steal all of the player’s possessions afterward, this would have an effect on predicted purchases, because the player would have to buy some replacements for all the stuff they lost.

I can now again construct a counterfactual planning world to go along with the refined real world model:

If we use the above planning world we have removed the instrumental control incentive on influenced rankings again.

We can go one step further and apply the same construction to gameplay knowledge:

The construction also goes some way towards addressing the problem of low-probability counterfactual worlds discussed in the post.

If we keep this up long enough, to split further observables out of the red arrow and give them the counterfactual treatment, we will weaken the ability of the agent to take ‘rational’ actions that maximize purchases in the real world further and further.

Going this route is nice in a thought experiment, but in practice a less cumbersome way to weaken this ability is to decouple the yellow utility node entirely, e.g. to use a random number generator for opponent selection.