Koen.Holtman comments on Counterfactual control incentives

Koen.Holtman 25 Feb 2021 13:13 UTC
LW: 10 AF: 7
AF
In this comment I will focus on the case of the posts-to-show agent only. The main question I explore is: does the agent construction below actually stop the agent from manipulating user opinions?

The post above also explores this question, my main aim here is to provide an exploration which is very different from the post, to highlight other relevant parts of the problem.

Carey et al designed an algorithm to remove this control incentive. They do this by instructing the algorithm to choose its posts, not on predictions of the user’s actual clicks—which produce the undesired control incentive—but on predictions of what the user would have clicked on, if their opinions hadn’t been changed.

In this graph, there is no longer any control incentive for the AI on the “Influenced user opinions”, because that node no longer connects to the utility node.

[...]

It seems to neutralise a vicious, ongoing cycle of opinion change in order to maximize clicks. But, [...]

The TL;DR of my analysis is that the above construction may suppress a vicious, ongoing cycle of opinion change in order to maximize clicks, but there are many cases where a full suppression of the cycle will definitely not happen.

Here is an example of when full suppression of the cycle will not happen.

First, note that the agent can only pick among the posts that it has available. If all the posts that the agent has available are posts that make the user change their opinion on something, then user opinion will definitely be influenced by the agent showing posts, no matter how the decision what posts to show is computed. If the posts are particularly stupid and viral, this may well cause vicious, ongoing cycles of opinion change.

But the agent construction shown does have beneficial properties. To repeat the picture:

The above construction makes the agent indifferent about what effects it has on opinion change. It removes any incentive of the agent to control future opinion in a particular direction.

Here is a specific case where this indifference, this lack of a control incentive, leads to beneficial effects:
- Say that the posts to show agent in the above diagram decides on a sequence of 5 posts that will be suggested in turn, with the link to the next suggested post being displayed at the bottom of the current one. The user may not necessarily see all 5 suggestions, they may leave the site instead of clicking the suggested link. The objective is to maximize the number of clicks.
- Now, say that the user will click the next link with a 50% chance if the next suggested post is about cats. The agent’s predictive model knows this.
- But if the suggested post is a post about pandas, then the user will click only with 40% chance, and leave the site with 60%. However, if they do click on the panda post, this will change their opinion about pandas. If the next suggested posts are also all about pandas, they will click the links with 100% certainty. The agent’s predictive model knows this.
- In the above setup, the click-maximizing strategy is to show the panda posts.
- However, the above agent does not take the influence on user opinion by the first panda post into account. It will therefore decide to show a sequence of suggested cat posts.
To generalize from the above example: the construction creates a type of myopia in the agent, that makes it under-invest (compared to the theoretical optimum) into manipulating the user’s opinion to get more clicks.

But also note that in this diagram:

there is still an arrow from ‘posts to show’ to ‘influenced user opinion’. In the graphical language of causal influence diagrams. this is a clear warning that the agent’s choices may end up influencing opinion, in some way. We have eliminated the agent incentive to control future opinion, but not the possibility that it might influence future opinion as a side effect.

I guess I should also say something about how the posts-to-show agent construction relates to real recommender systems as deployed on the Internet.

Basically, the posts-to-show agent is a good toy model to illustrate points about counterfactuals and user manipulation, but it does not provide a very complete model of the decision making processes that takes place inside real-world recommender systems. There is a somewhat hidden assumption in the picture below, represented by the arrow from ‘model of original opinions’ to ‘posts to show’:

The hidden assumption is that the agent’s code which computes ‘posts to show’ will have access to a fairly accurate ‘model of original opinions’ for that individual user. In practice, that model would be very difficult to construct accurately, if the agent has to do so based on only past click data from that user. (A future superintelligent agent might of course design a special mind-reading ray to extract a very accurate model of opinion without relying on clicks....)

To implement at least a rough approximation of the above decision making process, we have to build user opinion models that rely on aggregating click data collected from many users. We might for example cluster users into interest groups, and assign each individual user to one or more of these groups. But if we do so, then the fine-grained time-axis distinction between ‘original user opinions’ and ‘influenced opinions after the user has seen the suggested posts’ gets very difficult to make. The paper “The Incentives that Shape Behaviour” suggests:

We might accomplish this by using a prediction model that assumes independence between posts, or one that is learned by only showing one post to each user.

An assumption of independence between posts is not valid in practice, but the idea of learning based on only one post per user would work. However, this severely limits the amount of useful training data we have available. So it may lead to much worse recommender performance, if we measure performance by either a profit-maximizing engagement metric or a happiness-maximizing user satisfaction metric.
- Stuart_Armstrong 5 Apr 2021 18:05 UTC
  LW: 2 AF: 2
  AF Parent
  Thanks. I think we mainly agree here.