Knowledge, manipulation, and free will

Thanks to Rebecca Gorman for co-developing this idea

On the 26th of September 1983, Stanislav Petrov observed the early warning satellites reporting the launch of five nuclear missiles towards the Soviet Union. He decided to disobey orders and not pass on the message to higher command, which could easily have resulted in a nuclear war (since the soviet nuclear position was “launch on warning”).

Now, did Petrov have free will when he decided to save the world?

Maintaining free will when knowledge increases

I don’t intend to go into the subtle philosophical debate on the nature of free will. See this post for a good reductionist account. Instead, consider the following scenarios:

  1. The standard Petrov incident.

  2. The standard Petrov incident, except that it is still ongoing and Petrov hasn’t reached a decision yet.

  3. The standard Petrov incident, after it was over, except that we don’t yet know what his final decision was.

  4. The standard Petrov incident, except that we know that, if Petrov had had eggs that morning (instead of porridge[1]), he would have made a different decision.

  5. The same as scenario 4., except that some entity deliberately gave Petrov porridge that morning, aiming to determine his decision.

  6. The standard Petrov incident, except that a guy with a gun held Petrov hostage and forced him not to pass on the report.

There is an interesting contrast between scenarios 1, 2, and 3. Clearly, 1 and 3 only differ in our knowledge of the incident. It does not seem that Petrov’s free will should depend on the degree of knowledge of some other person.

Scenarios 1 and 2 only differ in time: in one case the decision is made, in the second it is yet to be made. If we say that Petrov has free will, whatever that is, in scenario 2, then it seems that in scenario 1, we have to say that he “had” free will. So whatever our feeling on free will, it seems that knowing the outcome doesn’t change whether there was free will or not.

That intuition is challenged by scenario 4. It’s one thing to know that Petrov’s decision was deterministic (or deterministic-stochastic if there’s a true random element to it). It’s another to know the specific causes of the decision.

And it’s yet another thing if the specific causes have been influence to manipulate the outcome, as in scenario 5. Again, all we have done here is add knowledge—we know the causes of Petrov’s decision, and we know that his breakfast was chosen with that outcome in mind. But someone has to decide what Petrov had that morning[2]; why does it matter that it was done for a specific purpose?

Maybe this whole free will thing isn’t important, after all? But it’s clear in scenario 6 that something is wrong. Even though Petrov has just as much free will, in the philosophical sense—before, he could choose to pass on the warning or not, now he can equally choose to not pass on the message or die. This suggests that free will is something that is determined by outside features, not just internal ones. This is related to the concept of coercion and its philosophical analysis.

What free will we’d want from an AI

Scenarios 5 and 6 are problematic: call them manipulation and coercion, respectively. We might not want the AI to guarantee us free will, but we do want it to avoid manipulation and coercion.

Coercion is probably the easiest to define, and hence avoid. We feel coercion when its imposed on us, when our options narrow. Any reasonably aligned AI should avoid that. There remains the problem of when we don’t realise that our options are narrowing—but that seems to be a case of manipulation, not coercion.

So, how do we avoid manipulation? Just giving Petrov eggs is not manipulation, if the AI doesn’t know the consequences of doing so. Nor does it become manipulation if the AI suddenly learns those consequences—knowledge doesn’t remove free will or cause manipulation. And, indeed, it would be foolish to try and constrain an AI by restricting its knowledge.

So it seems we must accept that:

  1. The AI will likely know ahead of time what decision we will reach in certain circumstances.

  2. The AI will also know how to influence that decision.

  3. In many circumstances, the AI will have to influence that decision, simply because it has to do certain actions (or inactions). A butler AI will have to give Petrov breakfast, or make him go hungry (which will have its own consequences), even if it knows the consequences of its own decision.

So “no manipulation” or “maintaining human free will” seems to require a form of indifference: we want the AI to know how its actions affect our decisions, but not take that influence into account when choosing those actions.

It will be important to define exactly what we mean by that.


  1. ↩︎

    I have no idea what Petrov actually had for breakfast, that day or any other.

  2. ↩︎

    Even if Petrov himself decided what to have for breakfast, he choose among the options that were possible for him that morning.