(This comment provides more intuition pumps for why it is invalid to argue “math implies AI risk”. This is not a controversial point—the critical review agrees that this is true—but I figured it was worth writing down for anyone who might still find it confusing, or feel like my argument in the post is “too clever”.)
It should seem really weird to you on a gut level to hear the claim that VNM theorem, and only the VNM theorem, implies that AI systems would kill us all. Like, really? From just the assumption that we can’t steal resources from the AI system with certainty [1], we can somehow infer that the AI must kill us all? Just by knowing that the AI system calculates the value of uncertain cases by averaging the values of outcomes based on their probabilities [2], we can infer that the AI system will take over the world?
But it’s even worse than that. Consider the following hopefully-obvious claims:
The argument for AI risk should still apply if the universe is deterministic.
The argument for AI risk should still apply if the agent is made more intelligent.
If you believe that, then you should believe that the argument for AI risk should also work in a deterministic universe in which the AI can perfectly predict exactly what the universe does. However, in such a universe, the VNM theorem is nearly contentless—the AI has no need of probability, and most of the VNM axioms are irrelevant. All you get with the VNM theorem in such a universe is that the AI’s ordering over outcomes is transitive: If it chooses A over B and B over C, then it also chooses A over C. Do you really think that just from transitivity you can argue for AI x-risk? Something must have gone wrong somewhere.
I think the way to think about the VNM theorem is to see it as telling you how to compactly describe choice-procedures.
Suppose there are N possible outcomes in the world. Then one way to describe a choice-procedure is to describe, for all N(N-1)/2 pairs of outcomes, which outcome the choice-procedure chooses. This description has size O(N^2).
If you assume that the choice-procedure is transitive (choosing A over B and choosing B over C implies choosing A over C), then you can do better: you can provide a ranking of the options (e.g. B, A, C). This description has size O(N).
The VNM theorem deals with the case where you introduce lotteries over outcomes, e.g. a 50% chance of A, 20% chance of B, and 30% chance of C, and now you have to choose between lotteries. While there were only N outcomes, there are uncountably infinitely many lotteries, so simply writing down what the choice-procedure does in all cases would require an uncountably infinitely large description.
The VNM theorem says that if the choice-procedure satisfies a few intuitive axioms, then you can still have an O(N) size description, called a utility function. This function assigns a number to each of the N outcomes (hence the O(N) size description) [3]. Then, to compute what the choice-procedure would say for a pair of lotteries, you simply compute the expected utility for each lottery, and say that the choice-procedure would choose the one that is higher.
Notably, the utility function can be arbitrarily complicated, in the sense that it can assign any number to each outcome, independently of all the other outcomes. People then impose other conditions, like “utility must be monotonically increasing in the amount of money you have”, and get stronger conclusions, but these are not implied by the VNM theorem. Ultimately the VNM theorem is a representation theorem telling you how to compactly represent a choice-procedure.
It seems to me that AI risk is pretty straightforwardly about how the choice-procedures that we build rank particular outcomes, as opposed to different lotteries over outcomes. The VNM theorem / axioms say ~nothing about that, so you shouldn’t expect it to add anything to the argument for AI risk.
The VNM axioms are often justified on the basis that if you don’t follow them, you can be Dutch-booked: you can be presented with a series of situations where you are guaranteed to lose utility relative to what you could have done. So on this view, we have “no Dutch booking” implies “VNM axioms” implies “AI risk”.
The conclusion of the VNM theorem is that you must maximize expected utility, which means that your “better-than” relation is done by averaging the utilities of outcomes weighted by their probabilities, and then using the normal “better-than” relation on numbers (i.e. higher numbers are better than lower numbers).
Technically, since each of the numbers is a real number, it still requires infinite memory to write down this description, but we’ll ignore that technicality.
I finally read Rational preference: Decision theory as a theory of practical rationality, and it basically has all of the technical content of this post; I’d recommend it as a more in-depth version of this post. (Unfortunately I don’t remember who recommended it to me, whoever you are, thanks!) Some notable highlights:
It is, I think, very misleading to think of decision theory as telling you to maximize your expected utility. If you don’t obey its axioms, then there is no utility function constructable for you to maximize the expected value of. If you do obey the axioms, then your expected utility is always maximized, so the advice is unnecessary. The advice, ‘Maximize Expected Utility’ misleadingly suggests that there is some quantity, definable and discoverable independent of the formal construction of your utility function, that you are supposed to be maximizing. That is why I am not going to dwell on the rational norm, Maximize Expected Utility! Instead, I will dwell on the rational norm, Attend to the Axioms!
Very much in the spirit of the parent comment.
Unfortunately, the Fine Individuation solution raises another problem, one that looks deeper than the original problems. The problem is that Fine Individuation threatens to trivialize the axioms.
(Fine Individuation is basically the same thing as moving from preferences-over-snapshots to preferences-over-universe-histories.)
All it means is that a person could not be convicted of intransitive preferences merely by discovering things about her practical preferences. [...] There is no possible behavior that could reveal an impractical preference
His solution is to ask people whether they were finely individuating, and if they weren’t, then you can conclude they are inconsistent. This is kinda sorta acknowledging that you can’t notice inconsistency from behavior (“practical preferences” aka “choices that could actually be made”), though that’s a somewhat inaccurate summary.
There is no way that anyone could reveal intransitive preferences through her behavior. Suppose on one occasion she chooses X when the alternative was Y, on another she chooses Y when the alternative was Z, and on a third she chooses g when the alternative was X. But that is nonsense; there is no saying that the Y she faced in the first occasion was the same as the Y she faced on the second. Those alternatives could not have been just the same, even leaving aside the possibility of individuating them by reference to what else could have been chosen. They will be alternatives at different times, and they will have other potentially significant differentia.
Basically making the same point with the same sort of construction as the OP.
(This comment provides more intuition pumps for why it is invalid to argue “math implies AI risk”. This is not a controversial point—the critical review agrees that this is true—but I figured it was worth writing down for anyone who might still find it confusing, or feel like my argument in the post is “too clever”.)
It should seem really weird to you on a gut level to hear the claim that VNM theorem, and only the VNM theorem, implies that AI systems would kill us all. Like, really? From just the assumption that we can’t steal resources from the AI system with certainty [1], we can somehow infer that the AI must kill us all? Just by knowing that the AI system calculates the value of uncertain cases by averaging the values of outcomes based on their probabilities [2], we can infer that the AI system will take over the world?
But it’s even worse than that. Consider the following hopefully-obvious claims:
The argument for AI risk should still apply if the universe is deterministic.
The argument for AI risk should still apply if the agent is made more intelligent.
If you believe that, then you should believe that the argument for AI risk should also work in a deterministic universe in which the AI can perfectly predict exactly what the universe does. However, in such a universe, the VNM theorem is nearly contentless—the AI has no need of probability, and most of the VNM axioms are irrelevant. All you get with the VNM theorem in such a universe is that the AI’s ordering over outcomes is transitive: If it chooses A over B and B over C, then it also chooses A over C. Do you really think that just from transitivity you can argue for AI x-risk? Something must have gone wrong somewhere.
I think the way to think about the VNM theorem is to see it as telling you how to compactly describe choice-procedures.
Suppose there are N possible outcomes in the world. Then one way to describe a choice-procedure is to describe, for all N(N-1)/2 pairs of outcomes, which outcome the choice-procedure chooses. This description has size O(N^2).
If you assume that the choice-procedure is transitive (choosing A over B and choosing B over C implies choosing A over C), then you can do better: you can provide a ranking of the options (e.g. B, A, C). This description has size O(N).
The VNM theorem deals with the case where you introduce lotteries over outcomes, e.g. a 50% chance of A, 20% chance of B, and 30% chance of C, and now you have to choose between lotteries. While there were only N outcomes, there are uncountably infinitely many lotteries, so simply writing down what the choice-procedure does in all cases would require an uncountably infinitely large description.
The VNM theorem says that if the choice-procedure satisfies a few intuitive axioms, then you can still have an O(N) size description, called a utility function. This function assigns a number to each of the N outcomes (hence the O(N) size description) [3]. Then, to compute what the choice-procedure would say for a pair of lotteries, you simply compute the expected utility for each lottery, and say that the choice-procedure would choose the one that is higher.
Notably, the utility function can be arbitrarily complicated, in the sense that it can assign any number to each outcome, independently of all the other outcomes. People then impose other conditions, like “utility must be monotonically increasing in the amount of money you have”, and get stronger conclusions, but these are not implied by the VNM theorem. Ultimately the VNM theorem is a representation theorem telling you how to compactly represent a choice-procedure.
It seems to me that AI risk is pretty straightforwardly about how the choice-procedures that we build rank particular outcomes, as opposed to different lotteries over outcomes. The VNM theorem / axioms say ~nothing about that, so you shouldn’t expect it to add anything to the argument for AI risk.
The VNM axioms are often justified on the basis that if you don’t follow them, you can be Dutch-booked: you can be presented with a series of situations where you are guaranteed to lose utility relative to what you could have done. So on this view, we have “no Dutch booking” implies “VNM axioms” implies “AI risk”.
The conclusion of the VNM theorem is that you must maximize expected utility, which means that your “better-than” relation is done by averaging the utilities of outcomes weighted by their probabilities, and then using the normal “better-than” relation on numbers (i.e. higher numbers are better than lower numbers).
Technically, since each of the numbers is a real number, it still requires infinite memory to write down this description, but we’ll ignore that technicality.
I finally read Rational preference: Decision theory as a theory of practical rationality, and it basically has all of the technical content of this post; I’d recommend it as a more in-depth version of this post. (Unfortunately I don’t remember who recommended it to me, whoever you are, thanks!) Some notable highlights:
Very much in the spirit of the parent comment.
(Fine Individuation is basically the same thing as moving from preferences-over-snapshots to preferences-over-universe-histories.)
His solution is to ask people whether they were finely individuating, and if they weren’t, then you can conclude they are inconsistent. This is kinda sorta acknowledging that you can’t notice inconsistency from behavior (“practical preferences” aka “choices that could actually be made”), though that’s a somewhat inaccurate summary.
Basically making the same point with the same sort of construction as the OP.