Random thoughts on game theory and what it means to be a good person
It does seem to me like there doesn’t exist any good writing on game theory from a TDT perspective. Whenever I read classical game theory, I feel like the equilibria that are being described obviously fall apart when counterfactuals are being properly brought into the mix (like D/D in prisoners dilemmas).
The obvious problem with TDT-based game theory, just as it is with Bayesian epistemology, the vast majority of direct applications are completely computationally intractable. It’s kind of obvious what should happen in games with lots of copies of yourself, but as soon as anything participates that isn’t a precise copy, everything gets a lot more confusing. So it is not fully clear what a practical game-theory literature from a TDT-perspective would look like, though maybe the existing LessWrong literature on Bayesian epistemology might be a good inspiration.
Even when you can’t fully compute everything (and we even don’t really know how to compute everything in principle), you might still be able to go through concrete scenarios and list considerations and perspectives that incorporate TDT-perspectives. I guess in that sense, a significant fraction of Zvi’s writing could be described as practical game theory, though I do think there is a lot of value in trying to formalize the theory and make things as explicit as possible, which I feel like Zvi at least doesn’t do most of the time.
Critch (Academian) tends to have this perspective of trying to figure out what a “robust agent” would do, in the sense of an agent that would at the very least be able to reliably cooperate with copies of itself, and adopt cooperation and coordination principles that allow it to achieve very good equilibria with agents that adopt the same type of cooperation and coordination norms. And I do think there is something really valuable here, though I am also worried that the part where you have to cooperate with agents who haven’t adopted super similar cooperation norms is actually the more important one (at least until something like AGI).
And I do think that the majority of the concepts we have for what it means to be a “good person” are ultimately attempts at trying to figure out how to coordinate effectively with other people, in a way that a more grounded game theory would help a lot with.
Maybe a good place to start would be to brainstorm a list of concrete situations in which I am uncertain what the correct action is. Here is some attempt at that:
How to deal with threats of taking strongly negative-sum actions? What is the correct response to the following concrete instances?
A robber threatens to shoot you if you don’t hand over your wallet
Do you precommit to violently attack any robber that robs you, or do you simply hand over your wallet?
You are in the room with someone holding the launch buttons for the USA’s nuclear arsenal and they are threatening to launch them if you don’t hand over your wallet
You are head of the U.S. and another nation state is threatening a small-scale nuclear attack on one of your cities if you don’t provide some kind of economic subsidy to them
Do you launch a conventional attack?
Do you launch a full out nuclear response as a deterrent?
Do you launch a small-scale nuclear response?
Do you not do anything at all?
Does the answer depend on the size of the economic subsidy? What if they ask twice?
You are at a party and your assigned driver ended up drinking, even though they said they would not (the driver was chosen by a random draw)
Do you somehow punish them now, do you punish them later, or not at all?
What if they are less likely to remember if you punish them now because they are drunk? Does that matter for the game-theoretic correct action?
What if they did this knowingly, reasoning from a CDT perspective that there wouldn’t be any point in punishing them now because they wouldn’t remember the next day
What if you would never see them again later?
What if you only ever get to interact with them after they made the choice to be drunk?
I feel like I have some hint of an answer to all of these, but also feel like any answer that I can come up with makes me exploitable in a way that makes me feel like there is no meta-level on which there is an ideal strategy.
Reading through this, I went “well, obviously I pay the mugger...
...oh, I see what you’re doing here.”
I don’t have a full answer to the problem you’re specifying, but something that seems relevant is the question of “How much do you want to invest in the ability to punish defectors [both in terms of maximum power-to-punish, a-la nukes, and in terms of your ability to dole out fine-grained-exactly-correct punishment, a-la skilled assassins]”
The answer to this depends on your context. And how you have answered this question determines whether it makes sense to punish people in particular contexts.
In many cases there might want to be some amount of randomization where at least some of the time you really disproportionately punish people, but you don’t have to pay the cost of doing so every time.
Answering a couple of the concrete questions:
Mugger
Right now, in real life, I’ve never been mugged, and I feel fine basically investing zero effort into preparing for being mugged. If I do get mugged, I will just hand over my wallet.
If I was getting mugged all the time, I’d probably invest effort into a) figuring out what good policies existed for dealing with muggers, b) what costs I’d have to pay in order to implement those policies.
In some worlds, it’s worth investing in literal body armor or bullet proof cars or whatever, and in the skill to successfully fight back against a literal mugger. (My understanding is that this usually not actually a good idea even in crime-heavy areas, but I can imagine worlds where it was correct to just get good at fighting, or to hire people who are good at fighting as bodyguards)
In some worlds it’s worth investing more in police-force and avoiding having to think about the problem, or not carrying as much money around in the first place.
Small Nation Demands Subsidies, Threatens Nuclear War
Again, I think my options here depend a lot on having already invested in defense.
One scenario is “I do not have the ability to say ‘no’ without risking millions of either my own citizens lives, or innocent citizens of the country-in-question.” In that case, I probably have to do something that makes my vague-hippie-values sad.
I have some sense that my vague-hippie-values depend on having invested enough money in defense (and offense) that I can “afford” to be moral. Things I may wish my country had invested in include:
Anti-ICBM capabilities that can shoot down incoming nukes with enough reliability that either a small-scale nuclear counterstrike, or a major non-nuclear retaliatory invasion, are viable options that will at least only punish foreign civilians if the foreign government actually launches an attack
Possibly invested in assassins who just kill individuals who threaten nuclear strikes (I’m somewhat confused about why this isn’t more used, suspect the answer has to do with the game theory of ’the people in charge [of all nations] want it to be true that they aren’t at risk of getting assassinated, so they have a gentleman’s agreement to avoid killing enemy leaders)
So I probably want to invest a lot in either having strong capabilities in those domains, or having allies who do.
Drinking
In real life I expect that the solution here is “I never invite said person to parties again, and depending on our relative social standing I might publicly badmouth them or quietly gossip about them.”
In weird contrived scenarios I’m not sure what I do because I don’t know how to anticipate weird contrived scenarios.
I do invest, generally, on communicating about how obviously people should follow up on their commitments, such that when someone fails to live up to their commitment, it costs less to punish them for doing so. (And this is a shared social good that multiple people invest in).
If I’m in a one-off interaction with someone who is currently too drunk to remember being punished and who I’m not socially connected to, I probably treat it like being mugged – a fluke event that doesn’t happen often enough to be worth investing resources in being able to handle better.
Extra Example: Having to Stand Up to a Boss/High-Status-Person/Organization
A situation that I’m more likely to run into, where the problem actually seems hard, is that sometimes high status people do bad things, and they have more power than you, and people will naturally end up on their side and take their word over yours.
Sort of similar to the “Small nation threatening nuclear war”, I think if you want to be able to “afford to actually have moral principles”, you need to invest upfront in capabilities. This isn’t always the right thing to do, depending on your life circumstances, but it may be sometimes. You want to have enough surplus power that you have the Slack to stand up for things.
Possibilities include investing in being directly high status yourself, or investing in making friends with a strong enough coalition of people to punish high status people, or encourage strong norms and rule of law such that you don’t need to have as strong a coalition, because you’ve made it lower cost to socially attack someone who breaks a norm.
Extra Example: The Crazy House Guest
Perhaps related to the drinking example: a couple times, I’ve had people show at former houses, potentially looking to move in, and then causing some kind of harm.
In one case, they had a very weird combination of mental illnesses and cluelessness that resulted in them dealing several thousands of dollars worth of physical damage to the house.
They seemed crazy and unpredictable enough that it seemed like if I tried to punish them, they might follow me around forever and make my life suck in weird ways.
So I didn’t punish them and they got away with it and went away and I never heard from them again.
So… sure, you can get away with certain kinds of things by signaling insanity and unpredictability… but at the cost of not being welcome in major social networks. The few extra thousand dollars they saved was not remotely worth the fact that, had they been a more reasonable person, they’d have had access to a strong network of friends and houses that help each other out finding jobs and places to live and what-not.
So I’m not worried about the longterm incentives here – the only people for whom insanity is a cost-effective tool to avoid punishment are actual insane people who don’t have the ability to interface with society normally.
What if there turn out to be lots of crazy people? Then you probably either invest upfront resources in fighting this somehow, or become less trusting.
Extra Example: The Greedy Landlord
In another housing situation, the landlord tried to charge us extra for things that were not our fault. In this case, it was reasonably clear that we were in the right. Going to small claims court would have been net-negative for us, but also costly to them.
I was angry and full of zealous energy and I decided it was worth it and I threatened going to small claims court and wasting both of our time, even though a few hundred dollars wasn’t really worth it.
They backed down.
This seems like the system working as intended. This is what anger is for, to make sure people have the backbone to defend themselves, and to live in a world where at least some of the time people will get riled up and punish you disproportionately
What if you haven’t invested in defense capabilities in advance?
Then you probably will periodically need to lose and accept bad situations, such as either a more powerful empire demanding tribute from your country, or choosing policies like “if you are under threat, flip an unknown number of coins and if enough coins come up heads, go to war and punish them disproportionately even though you will probably lose and lots of people will die but now empires will sometimes think twice about invading poor countries.”
The meta level point
It doesn’t seem inconsistent to me to apply different policies in different situations, even if they share commonalities, based on how common the situation is, how costly the defection, how much long-term punishment you can inflict, and how much resources your have invested in being able to punish.
This does mean that mugging (for example) is a somewhat viable strategy, since people don’t invest as heavily in handling it (because it is rare), but this seems like a self-correcting problem. There would be some least-defended against defect-button that defectors can press, you can’t protect against everything.
Another point is that it’s important to be somewhat unpredictable, and to at least sometimes just punish people disproportionately (when they wrong you), so that people aren’t confident that the expected value of taking advantage of you is positive.
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they’re basically the same thing) is regarded as a strict improvement over TDT.
Same thing, it’s just the handle that stuck in my mind. I think of the whole class as “timeless”, since I don’t think there exists a good handle that describes all of them.
Random thoughts on game theory and what it means to be a good person
It does seem to me like there doesn’t exist any good writing on game theory from a TDT perspective. Whenever I read classical game theory, I feel like the equilibria that are being described obviously fall apart when counterfactuals are being properly brought into the mix (like D/D in prisoners dilemmas).
The obvious problem with TDT-based game theory, just as it is with Bayesian epistemology, the vast majority of direct applications are completely computationally intractable. It’s kind of obvious what should happen in games with lots of copies of yourself, but as soon as anything participates that isn’t a precise copy, everything gets a lot more confusing. So it is not fully clear what a practical game-theory literature from a TDT-perspective would look like, though maybe the existing LessWrong literature on Bayesian epistemology might be a good inspiration.
Even when you can’t fully compute everything (and we even don’t really know how to compute everything in principle), you might still be able to go through concrete scenarios and list considerations and perspectives that incorporate TDT-perspectives. I guess in that sense, a significant fraction of Zvi’s writing could be described as practical game theory, though I do think there is a lot of value in trying to formalize the theory and make things as explicit as possible, which I feel like Zvi at least doesn’t do most of the time.
Critch (Academian) tends to have this perspective of trying to figure out what a “robust agent” would do, in the sense of an agent that would at the very least be able to reliably cooperate with copies of itself, and adopt cooperation and coordination principles that allow it to achieve very good equilibria with agents that adopt the same type of cooperation and coordination norms. And I do think there is something really valuable here, though I am also worried that the part where you have to cooperate with agents who haven’t adopted super similar cooperation norms is actually the more important one (at least until something like AGI).
And I do think that the majority of the concepts we have for what it means to be a “good person” are ultimately attempts at trying to figure out how to coordinate effectively with other people, in a way that a more grounded game theory would help a lot with.
Maybe a good place to start would be to brainstorm a list of concrete situations in which I am uncertain what the correct action is. Here is some attempt at that:
How to deal with threats of taking strongly negative-sum actions? What is the correct response to the following concrete instances?
A robber threatens to shoot you if you don’t hand over your wallet
Do you precommit to violently attack any robber that robs you, or do you simply hand over your wallet?
You are in the room with someone holding the launch buttons for the USA’s nuclear arsenal and they are threatening to launch them if you don’t hand over your wallet
You are head of the U.S. and another nation state is threatening a small-scale nuclear attack on one of your cities if you don’t provide some kind of economic subsidy to them
Do you launch a conventional attack?
Do you launch a full out nuclear response as a deterrent?
Do you launch a small-scale nuclear response?
Do you not do anything at all?
Does the answer depend on the size of the economic subsidy? What if they ask twice?
You are at a party and your assigned driver ended up drinking, even though they said they would not (the driver was chosen by a random draw)
Do you somehow punish them now, do you punish them later, or not at all?
What if they are less likely to remember if you punish them now because they are drunk? Does that matter for the game-theoretic correct action?
What if they did this knowingly, reasoning from a CDT perspective that there wouldn’t be any point in punishing them now because they wouldn’t remember the next day
What if you would never see them again later?
What if you only ever get to interact with them after they made the choice to be drunk?
I feel like I have some hint of an answer to all of these, but also feel like any answer that I can come up with makes me exploitable in a way that makes me feel like there is no meta-level on which there is an ideal strategy.
Reading through this, I went “well, obviously I pay the mugger...
...oh, I see what you’re doing here.”
I don’t have a full answer to the problem you’re specifying, but something that seems relevant is the question of “How much do you want to invest in the ability to punish defectors [both in terms of maximum power-to-punish, a-la nukes, and in terms of your ability to dole out fine-grained-exactly-correct punishment, a-la skilled assassins]”
The answer to this depends on your context. And how you have answered this question determines whether it makes sense to punish people in particular contexts.
In many cases there might want to be some amount of randomization where at least some of the time you really disproportionately punish people, but you don’t have to pay the cost of doing so every time.
Answering a couple of the concrete questions:
Mugger
Right now, in real life, I’ve never been mugged, and I feel fine basically investing zero effort into preparing for being mugged. If I do get mugged, I will just hand over my wallet.
If I was getting mugged all the time, I’d probably invest effort into a) figuring out what good policies existed for dealing with muggers, b) what costs I’d have to pay in order to implement those policies.
In some worlds, it’s worth investing in literal body armor or bullet proof cars or whatever, and in the skill to successfully fight back against a literal mugger. (My understanding is that this usually not actually a good idea even in crime-heavy areas, but I can imagine worlds where it was correct to just get good at fighting, or to hire people who are good at fighting as bodyguards)
In some worlds it’s worth investing more in police-force and avoiding having to think about the problem, or not carrying as much money around in the first place.
Small Nation Demands Subsidies, Threatens Nuclear War
Again, I think my options here depend a lot on having already invested in defense.
One scenario is “I do not have the ability to say ‘no’ without risking millions of either my own citizens lives, or innocent citizens of the country-in-question.” In that case, I probably have to do something that makes my vague-hippie-values sad.
I have some sense that my vague-hippie-values depend on having invested enough money in defense (and offense) that I can “afford” to be moral. Things I may wish my country had invested in include:
Anti-ICBM capabilities that can shoot down incoming nukes with enough reliability that either a small-scale nuclear counterstrike, or a major non-nuclear retaliatory invasion, are viable options that will at least only punish foreign civilians if the foreign government actually launches an attack
Possibly invested in assassins who just kill individuals who threaten nuclear strikes (I’m somewhat confused about why this isn’t more used, suspect the answer has to do with the game theory of ’the people in charge [of all nations] want it to be true that they aren’t at risk of getting assassinated, so they have a gentleman’s agreement to avoid killing enemy leaders)
So I probably want to invest a lot in either having strong capabilities in those domains, or having allies who do.
Drinking
In real life I expect that the solution here is “I never invite said person to parties again, and depending on our relative social standing I might publicly badmouth them or quietly gossip about them.”
In weird contrived scenarios I’m not sure what I do because I don’t know how to anticipate weird contrived scenarios.
I do invest, generally, on communicating about how obviously people should follow up on their commitments, such that when someone fails to live up to their commitment, it costs less to punish them for doing so. (And this is a shared social good that multiple people invest in).
If I’m in a one-off interaction with someone who is currently too drunk to remember being punished and who I’m not socially connected to, I probably treat it like being mugged – a fluke event that doesn’t happen often enough to be worth investing resources in being able to handle better.
Extra Example: Having to Stand Up to a Boss/High-Status-Person/Organization
A situation that I’m more likely to run into, where the problem actually seems hard, is that sometimes high status people do bad things, and they have more power than you, and people will naturally end up on their side and take their word over yours.
Sort of similar to the “Small nation threatening nuclear war”, I think if you want to be able to “afford to actually have moral principles”, you need to invest upfront in capabilities. This isn’t always the right thing to do, depending on your life circumstances, but it may be sometimes. You want to have enough surplus power that you have the Slack to stand up for things.
Possibilities include investing in being directly high status yourself, or investing in making friends with a strong enough coalition of people to punish high status people, or encourage strong norms and rule of law such that you don’t need to have as strong a coalition, because you’ve made it lower cost to socially attack someone who breaks a norm.
Extra Example: The Crazy House Guest
Perhaps related to the drinking example: a couple times, I’ve had people show at former houses, potentially looking to move in, and then causing some kind of harm.
In one case, they had a very weird combination of mental illnesses and cluelessness that resulted in them dealing several thousands of dollars worth of physical damage to the house.
They seemed crazy and unpredictable enough that it seemed like if I tried to punish them, they might follow me around forever and make my life suck in weird ways.
So I didn’t punish them and they got away with it and went away and I never heard from them again.
So… sure, you can get away with certain kinds of things by signaling insanity and unpredictability… but at the cost of not being welcome in major social networks. The few extra thousand dollars they saved was not remotely worth the fact that, had they been a more reasonable person, they’d have had access to a strong network of friends and houses that help each other out finding jobs and places to live and what-not.
So I’m not worried about the longterm incentives here – the only people for whom insanity is a cost-effective tool to avoid punishment are actual insane people who don’t have the ability to interface with society normally.
What if there turn out to be lots of crazy people? Then you probably either invest upfront resources in fighting this somehow, or become less trusting.
Extra Example: The Greedy Landlord
In another housing situation, the landlord tried to charge us extra for things that were not our fault. In this case, it was reasonably clear that we were in the right. Going to small claims court would have been net-negative for us, but also costly to them.
I was angry and full of zealous energy and I decided it was worth it and I threatened going to small claims court and wasting both of our time, even though a few hundred dollars wasn’t really worth it.
They backed down.
This seems like the system working as intended. This is what anger is for, to make sure people have the backbone to defend themselves, and to live in a world where at least some of the time people will get riled up and punish you disproportionately
What if you haven’t invested in defense capabilities in advance?
Then you probably will periodically need to lose and accept bad situations, such as either a more powerful empire demanding tribute from your country, or choosing policies like “if you are under threat, flip an unknown number of coins and if enough coins come up heads, go to war and punish them disproportionately even though you will probably lose and lots of people will die but now empires will sometimes think twice about invading poor countries.”
The meta level point
It doesn’t seem inconsistent to me to apply different policies in different situations, even if they share commonalities, based on how common the situation is, how costly the defection, how much long-term punishment you can inflict, and how much resources your have invested in being able to punish.
This does mean that mugging (for example) is a somewhat viable strategy, since people don’t invest as heavily in handling it (because it is rare), but this seems like a self-correcting problem. There would be some least-defended against defect-button that defectors can press, you can’t protect against everything.
Another point is that it’s important to be somewhat unpredictable, and to at least sometimes just punish people disproportionately (when they wrong you), so that people aren’t confident that the expected value of taking advantage of you is positive.
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they’re basically the same thing) is regarded as a strict improvement over TDT.
Same thing, it’s just the handle that stuck in my mind. I think of the whole class as “timeless”, since I don’t think there exists a good handle that describes all of them.