Prepare to be astounded by this rationalist reconstruction of Kant, drawn out of an unbelievably tiny parcel of Kant literature![1]
Kant argues that all rational agents will:
“Act only according to that maxim whereby you can at the same time will that it should become a universal law.” (421)[2][3]
“Act in such a way that you treat humanity, whether in your own person or in the person of another, always at the same time as an end and never simply as a means.” (429)[2]
Kant clarifies that treating someone as an end means striving to further their ends, i.e. goals/values. (430)[2]
Kant clarifies that strictly speaking it’s not just humans that should be treated this way, but all rational beings. He specifically says that this does not extend to non-rational beings. (428)[2]
“Act in accordance with the maxims of a member legislating universal laws for a merely possible kingdom of ends.” (439)[2]
Not only are all of these claims allegedly derivable from the concept of instrumental rationality, they are supposedly equivalent!
Bold claims, lol. What is he smoking?
Well, listen up…
Taboo “morality.” We are interested in functions that map [epistemic state, preferences, set of available actions] to [action].
Suppose there is an “optimal” function. Call this “instrumental rationality,” a.k.a. “Systematized Winning.”
Kant asks: Obviously what the optimal function tells you to do depends heavily on your goals and credences; the best way to systematically win depends on what the victory conditions are. Is there anything interesting we can say about what the optimal function recommends that isn’t like this? Any non-trivial things that it tells everyone to do regardless of what their goals are?[4]
Kant answers: Yes! Consider the twin Prisoner’s Dilemma—a version of the PD in which it is common knowledge that both players implement the same algorithm and thus will make the same choice. Suppose (for contradiction) that the optimal function defects. We can now construct a new function, Optimal+, that seems superior to the optimal function:
IF in twin PD against someone who you know runs Optimal+: Cooperate
ELSE: Do whatever the optimal function will do.
Optimal+ is superior to the optimal function because it is exactly the same except that it gets better results in the twin PD (because the opponent will cooperate too, because they are running the same algorithm as you).[5]
Contradiction! Looks like our “optimal function” wasn’t optimal after all. Therefore the real optimal function must cooperate in the twin PD.
Generalizing this reasoning, Kant says, the optimal function will choose as if it is choosing for all instances of the optimal function in similar situations. Thus we can conclude the following interesting fact: Regardless of what your goals are, the optimal function will tell you to avoid doing things that you wouldn’t want other rational agents in similar situations to do. (rational agents := agents obeying the optimal function.)
To understand this, and see how it generalizes still further, I hereby introduce the following analogy:
The Decision Theory App Store
Imagine an ideal competitive market for advice-giving AI assistants.[6] Tech companies code them up and then you download them for free from the app store. [7] There is AlphaBot, MetaBot, OpenBot, DeepBot…
When installed, the apps give advice. Specifically they scan your brain to extract your credences and values/utility function, and then they tell you what to do. You can follow the advice or not.
Sometimes users end up in Twin Prisoner’s Dilemmas. That is, situations where they are in some sort of prisoner’s dilemma with someone else where there is common knowledge that they both are likely to take advice from the same app.
Suppose AlphaBot was inspired by causal decision theory and thus always recommends defect in prisoner’s dilemmas, even twin PDs. Whereas OpenBot mostly copied the code of the AlphaBot, but has a subroutine that notices when it is giving advice to two people on opposite sides of a PD, and advises them both to cooperate.
As the ideal competive market chugs along, users of OpenBot will tend to do better than users of AlphaBot. AlphaBot will either lose market share or be modified to fix this flaw.
What’s the long-run outcome of this market? Will there be many niches, with some types of users preferring Bot A and other types preferring Bot B?
No, because companies can just make a new bot, Bot C, that gives type-A advice to the first group of customers and type-B advice to the second group of customers.
(We are assuming computing cost, memory storage, etc. are negligible factors. Remember these bots are a metaphor for decision functions, and the market is a metaphor for a process that finds the optimal decision function—the one that gives the best advice, not the one that is easiest to calculate.)
So in the long run there will only be one bot, and/or all the bots will dispense the same advice & coordinate with each other exactly as if they were a single bot.
Now, what’s it like to be one of these hyper-sophisticated advice bots? You are sitting there in your supercomputer getting all these incoming requests for advice, and you are dispensing advice like the amazing superhuman oracle you are, and you are also reflecting a bit about how to improve your overall advice-giving strategy...
You are facing a massive optimization problem. You shouldn’t just consider each case in isolation; the lesson of the Twin PD is that you can sometimes do better by coordinating your advice across cases. But it’s also not quite right to say you want to maximize total utility across all your users; if your advice predictably screwed over some users to benefit others, those users wouldn’t take your advice, and then the benefits to the other users wouldn’t happen, and then you’d lose market share to a rival bot that was just like you except that it didn’t do that and thus appealed to those users.
(Can we say “Don’t ever screw over anyone?” Well, what would that mean exactly? Due to the inherent randomness of the world, no matter what you say your advice will occasionally cause people to do things that lead to bad outcomes for them. So it has to be something like “don’t screw over anyone in ways they can predict.”)
Kant says:
“Look, it’s complicated, and despite me being the greatest philosopher ever I don’t know all the intricacies of how it’ll work out. But I can say, at a high level of abstraction: The hyper-sophisticated advice bots are basically legislating laws for all their users to follow. They are the exalted Central Planners of a society consisting of their users. And so in particular, the best bot, the optimal policy, the one we call Instrumental Rationality, does this. And so in particular if you are trying to think about how to be rational, if you are trying to think about what the rational thing to do is, you should be thinking like this too—you should be thinking like a central planner optimizing the behavior of all rational beings, legislating laws for them all to follow.”
(To ward off possible confusion: It’s important to remember that you are only legislating laws for rational agents, i.e. ones inclined to listen to your advice; the irrational ones won’t obey your laws so don’t bother. And again, you can’t legislate something that would predictably screw over some to benefit others, because then the some wouldn’t take your advice, and the benefits would never accrue.)
OK, so that’s the third bullet point taken care of. The second one as well: “treat other rational agents as ends, not mere means” = “optimize for their values/goals too.” If an app doesn’t optimize for the values/goals of some customers, it’ll lose market share as those customers switch to different apps that do.
(Harsanyi’s aggregation theorem is relevant here. IIRC it proves that any pareto-optimal way to control a bunch of agents with different goals… is equivalent to maximizing expected utility where the utility function is some weighted sum of the different agent’s utility functions. Of course, it is left open what the weights should be… Kant leaves it open too, as far as I can tell, but reminds us that the decision about what weights to use should be made in accordance with the three bullet points too, just like any other decision. Kant would also point out that if two purportedly rational agents end up optimizing for different weights — say, they each heavily favor themselves over the other — then something has gone wrong, because the result is not pareto-optimal; there’s some third weighting that would make them both better off if they both followed it. (I haven’t actually tried to prove this claim, maybe it’s false. Exercise for readers.))
As for the first bullet point, it basically goes like this: If what you are about to do isn’t something you could will to be a universal law—if you wouldn’t want other rational agents to behave similarly—then it’s probably not what the Optimal Decision Algorithm would recommend you do, because an app that recommended you do this would either recommend that others in similar situations behave similarly (and thus lose market share to apps that recommended more pro-social behavior, the equivalent of cooperate-cooperate instead of defect-defect) or it would make an exception for you and tell everyone else to cooperate while you defect (and thus predictably screw people over, and lose customers and then eventually be outcompeted also.)
Tada!
Thanks to Caspar Oesterheld for helpful discussion. He pointed out that the decision theory app store idea is similar to the game-theoretic discussion of Mediated Equilibria, with apps = mediators. Also thanks to various other people in and around CLR, such as David Udell, Tristan Cook, and Julian Stastny.
It’s been a long time since I wrote this rationalist reconstruction of Kant, but people asked me about it recently so I tried to make a better version here. The old version looks similar but has a different philosophical engine under the hood. I’m not sure which version is better.
Kant thinks it is a necessary law for all rational beings always to judge their actions according to this imperative. (426) I take this to mean that obeying this imperative is a requirement of rationality; it is always irrational to disobey it.
Nowadays, we’d point to the coherence theorems as examples of interesting/non-trivial things we can say about instrumental rationality. But Kant didn’t know about the coherence theorems.
It’s true that for any function, you can imagine a world in which that function does worse than any other function — just imagine the world is full of demons who attack anyone who implements the first function but help anyone who implements the second function. But for this reason, this sort of counterexample doesn’t count. If there is a notion of optimality at all, it clearly isn’t performs-best-in-every-possible-world. But plausibly there is still some interesting and useful optimality notion out there, and plausibly by that notion Optimal+ is superior to its’ twin-PD-defecting cousin.
If you take this as a serious proposal for how to think about decision theory, instead of just as a way of understanding Kant, then a lot of problems are going to arise having to do with how to define the ideal competitive market more precisely in ways that avoid path-dependencies and various other awkward results.
Immanuel Kant and the Decision Theory App Store
[Epistemic status: About as silly as it sounds.]
Prepare to be astounded by this rationalist reconstruction of Kant, drawn out of an unbelievably tiny parcel of Kant literature![1]
Kant argues that all rational agents will:
“Act only according to that maxim whereby you can at the same time will that it should become a universal law.” (421)[2][3]
“Act in such a way that you treat humanity, whether in your own person or in the person of another, always at the same time as an end and never simply as a means.” (429)[2]
Kant clarifies that treating someone as an end means striving to further their ends, i.e. goals/values. (430)[2]
Kant clarifies that strictly speaking it’s not just humans that should be treated this way, but all rational beings. He specifically says that this does not extend to non-rational beings. (428)[2]
“Act in accordance with the maxims of a member legislating universal laws for a merely possible kingdom of ends.” (439)[2]
Not only are all of these claims allegedly derivable from the concept of instrumental rationality, they are supposedly equivalent!
Bold claims, lol. What is he smoking?
Well, listen up…
Taboo “morality.” We are interested in functions that map [epistemic state, preferences, set of available actions] to [action].
Suppose there is an “optimal” function. Call this “instrumental rationality,” a.k.a. “Systematized Winning.”
Kant asks: Obviously what the optimal function tells you to do depends heavily on your goals and credences; the best way to systematically win depends on what the victory conditions are. Is there anything interesting we can say about what the optimal function recommends that isn’t like this? Any non-trivial things that it tells everyone to do regardless of what their goals are?[4]
Kant answers: Yes! Consider the twin Prisoner’s Dilemma—a version of the PD in which it is common knowledge that both players implement the same algorithm and thus will make the same choice. Suppose (for contradiction) that the optimal function defects. We can now construct a new function, Optimal+, that seems superior to the optimal function:
IF in twin PD against someone who you know runs Optimal+: Cooperate
ELSE: Do whatever the optimal function will do.
Optimal+ is superior to the optimal function because it is exactly the same except that it gets better results in the twin PD (because the opponent will cooperate too, because they are running the same algorithm as you).[5]
Contradiction! Looks like our “optimal function” wasn’t optimal after all. Therefore the real optimal function must cooperate in the twin PD.
Generalizing this reasoning, Kant says, the optimal function will choose as if it is choosing for all instances of the optimal function in similar situations. Thus we can conclude the following interesting fact: Regardless of what your goals are, the optimal function will tell you to avoid doing things that you wouldn’t want other rational agents in similar situations to do. (rational agents := agents obeying the optimal function.)
To understand this, and see how it generalizes still further, I hereby introduce the following analogy:
The Decision Theory App Store
Imagine an ideal competitive market for advice-giving AI assistants.[6] Tech companies code them up and then you download them for free from the app store. [7] There is AlphaBot, MetaBot, OpenBot, DeepBot…
When installed, the apps give advice. Specifically they scan your brain to extract your credences and values/utility function, and then they tell you what to do. You can follow the advice or not.
Sometimes users end up in Twin Prisoner’s Dilemmas. That is, situations where they are in some sort of prisoner’s dilemma with someone else where there is common knowledge that they both are likely to take advice from the same app.
Suppose AlphaBot was inspired by causal decision theory and thus always recommends defect in prisoner’s dilemmas, even twin PDs. Whereas OpenBot mostly copied the code of the AlphaBot, but has a subroutine that notices when it is giving advice to two people on opposite sides of a PD, and advises them both to cooperate.
As the ideal competive market chugs along, users of OpenBot will tend to do better than users of AlphaBot. AlphaBot will either lose market share or be modified to fix this flaw.
What’s the long-run outcome of this market? Will there be many niches, with some types of users preferring Bot A and other types preferring Bot B?
No, because companies can just make a new bot, Bot C, that gives type-A advice to the first group of customers and type-B advice to the second group of customers.
(We are assuming computing cost, memory storage, etc. are negligible factors. Remember these bots are a metaphor for decision functions, and the market is a metaphor for a process that finds the optimal decision function—the one that gives the best advice, not the one that is easiest to calculate.)
So in the long run there will only be one bot, and/or all the bots will dispense the same advice & coordinate with each other exactly as if they were a single bot.
Now, what’s it like to be one of these hyper-sophisticated advice bots? You are sitting there in your supercomputer getting all these incoming requests for advice, and you are dispensing advice like the amazing superhuman oracle you are, and you are also reflecting a bit about how to improve your overall advice-giving strategy...
You are facing a massive optimization problem. You shouldn’t just consider each case in isolation; the lesson of the Twin PD is that you can sometimes do better by coordinating your advice across cases. But it’s also not quite right to say you want to maximize total utility across all your users; if your advice predictably screwed over some users to benefit others, those users wouldn’t take your advice, and then the benefits to the other users wouldn’t happen, and then you’d lose market share to a rival bot that was just like you except that it didn’t do that and thus appealed to those users.
(Can we say “Don’t ever screw over anyone?” Well, what would that mean exactly? Due to the inherent randomness of the world, no matter what you say your advice will occasionally cause people to do things that lead to bad outcomes for them. So it has to be something like “don’t screw over anyone in ways they can predict.”)
Kant says:
“Look, it’s complicated, and despite me being the greatest philosopher ever I don’t know all the intricacies of how it’ll work out. But I can say, at a high level of abstraction: The hyper-sophisticated advice bots are basically legislating laws for all their users to follow. They are the exalted Central Planners of a society consisting of their users. And so in particular, the best bot, the optimal policy, the one we call Instrumental Rationality, does this. And so in particular if you are trying to think about how to be rational, if you are trying to think about what the rational thing to do is, you should be thinking like this too—you should be thinking like a central planner optimizing the behavior of all rational beings, legislating laws for them all to follow.”
(To ward off possible confusion: It’s important to remember that you are only legislating laws for rational agents, i.e. ones inclined to listen to your advice; the irrational ones won’t obey your laws so don’t bother. And again, you can’t legislate something that would predictably screw over some to benefit others, because then the some wouldn’t take your advice, and the benefits would never accrue.)
OK, so that’s the third bullet point taken care of. The second one as well: “treat other rational agents as ends, not mere means” = “optimize for their values/goals too.” If an app doesn’t optimize for the values/goals of some customers, it’ll lose market share as those customers switch to different apps that do.
(Harsanyi’s aggregation theorem is relevant here. IIRC it proves that any pareto-optimal way to control a bunch of agents with different goals… is equivalent to maximizing expected utility where the utility function is some weighted sum of the different agent’s utility functions. Of course, it is left open what the weights should be… Kant leaves it open too, as far as I can tell, but reminds us that the decision about what weights to use should be made in accordance with the three bullet points too, just like any other decision. Kant would also point out that if two purportedly rational agents end up optimizing for different weights — say, they each heavily favor themselves over the other — then something has gone wrong, because the result is not pareto-optimal; there’s some third weighting that would make them both better off if they both followed it. (I haven’t actually tried to prove this claim, maybe it’s false. Exercise for readers.))
As for the first bullet point, it basically goes like this: If what you are about to do isn’t something you could will to be a universal law—if you wouldn’t want other rational agents to behave similarly—then it’s probably not what the Optimal Decision Algorithm would recommend you do, because an app that recommended you do this would either recommend that others in similar situations behave similarly (and thus lose market share to apps that recommended more pro-social behavior, the equivalent of cooperate-cooperate instead of defect-defect) or it would make an exception for you and tell everyone else to cooperate while you defect (and thus predictably screw people over, and lose customers and then eventually be outcompeted also.)
Tada!
Thanks to Caspar Oesterheld for helpful discussion. He pointed out that the decision theory app store idea is similar to the game-theoretic discussion of Mediated Equilibria, with apps = mediators. Also thanks to various other people in and around CLR, such as David Udell, Tristan Cook, and Julian Stastny.
It’s been a long time since I wrote this rationalist reconstruction of Kant, but people asked me about it recently so I tried to make a better version here. The old version looks similar but has a different philosophical engine under the hood. I’m not sure which version is better.
Kant, I. (1785) Grounding for the Metaphysics of Morals. J. Ellington translation. Hackett publishing company 1993.
Kant thinks it is a necessary law for all rational beings always to judge their actions according to this imperative. (426) I take this to mean that obeying this imperative is a requirement of rationality; it is always irrational to disobey it.
Nowadays, we’d point to the coherence theorems as examples of interesting/non-trivial things we can say about instrumental rationality. But Kant didn’t know about the coherence theorems.
It’s true that for any function, you can imagine a world in which that function does worse than any other function — just imagine the world is full of demons who attack anyone who implements the first function but help anyone who implements the second function. But for this reason, this sort of counterexample doesn’t count. If there is a notion of optimality at all, it clearly isn’t performs-best-in-every-possible-world. But plausibly there is still some interesting and useful optimality notion out there, and plausibly by that notion Optimal+ is superior to its’ twin-PD-defecting cousin.
If you take this as a serious proposal for how to think about decision theory, instead of just as a way of understanding Kant, then a lot of problems are going to arise having to do with how to define the ideal competitive market more precisely in ways that avoid path-dependencies and various other awkward results.
What’s in it for the tech companies? They make money by selling your data I guess.