My thinking about this is that a problem is fair if it captures some aspect of some real world problem. I believe Gary Drescher came up with ASP as a distillation of the following problem, which itself tries to capture some essense of bargaining in the real world (similar to how Newcomb’s Problem is a distillation of Prisoner’s Dilemma, which tries to capture some essense of cooperation in the real world):
Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of UDT. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:
“I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear.”
The human writes down 9, and UDT, predicting this, prescribes writing down 1. This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.
(It looks like the citation here is wrong, since I can’t find a description of this game in Slepnev (2011). As far as I know, I was the first person to come up with this game as something that UDT seems to handle poorly.)
“My thinking about this is that a problem is fair if it captures some aspect of some real world problem”—I would say that you have to accept that the real world can be unfair, but that doesn’t make real world problems “fair” in the sense gestured at in the FDT paper. Roughly, it is possible to define a broad class of problems such that you can have an algorithm that optimally handles all of them, for example if the reward only depends on your choice or predictions of your choice.
“It seems unsatisfactory that increased predictive power can harm an agent”—that’s just life when interacting with other agents. Indeed, in some games, exceeding a certain level of rationality provides an incentive for other players to take you out. That’s unfair, but that’s life.
ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it? (I had an idea that I called UDT2 which I think does better on it than UDT1.1 but it’s not as elegant as I hoped.) Defining such problem classes may be useful for talking about the technical properties of specific decision theories, but that doesn’t seem to be what you’re trying to do here. The only other motivation I can think of is finding a way to justify not solving certain problems, but I don’t think that makes sense in the case of ASP.
“ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it?”—my intuition is the opposite, that doing well on such problems means doing poorly on others.
Can you explain your intuition? (Even supposing your intuition is correct, it still doesn’t seem like defining a “fair” class of problems is that useful. Shouldn’t we instead try to find a decision theory that offers the best trade-offs on the actual distribution of decision problems that we (or our AIs) will be expected to face?)
To explain my intuition, suppose we had a decision theory that does well on ASP-like problems and badly on others, and a second decision theory that does badly on ASP-like problems and well on others, then we can create a meta decision theory that first tries to figure out what kind of problem it is facing and then select one of these decision theories to solve it. This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist.
BTW, you can quote others by putting a quote in a separate paragraph and putting “>” in front of it.
It still doesn’t seem like defining a “fair” class of problems is that useful”—discovering one class of fair problems lead to CDT. Another lead to TDT. This theoretical work is seperate from the problem of producing pragmatic algorithms that deal with unfairness, but both approaches produce insights.
“This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist”—I currently have a draft post that does allow some kinds of rewards based on algorithm internals to be considered fair and which basically does the whole meta-decision theory thing (that section of the draft post was written a few hours after I asked this question which is why my views in it are slightly different).
My thinking about this is that a problem is fair if it captures some aspect of some real world problem. I believe Gary Drescher came up with ASP as a distillation of the following problem, which itself tries to capture some essense of bargaining in the real world (similar to how Newcomb’s Problem is a distillation of Prisoner’s Dilemma, which tries to capture some essense of cooperation in the real world):
(It looks like the citation here is wrong, since I can’t find a description of this game in Slepnev (2011). As far as I know, I was the first person to come up with this game as something that UDT seems to handle poorly.)
“My thinking about this is that a problem is fair if it captures some aspect of some real world problem”—I would say that you have to accept that the real world can be unfair, but that doesn’t make real world problems “fair” in the sense gestured at in the FDT paper. Roughly, it is possible to define a broad class of problems such that you can have an algorithm that optimally handles all of them, for example if the reward only depends on your choice or predictions of your choice.
“It seems unsatisfactory that increased predictive power can harm an agent”—that’s just life when interacting with other agents. Indeed, in some games, exceeding a certain level of rationality provides an incentive for other players to take you out. That’s unfair, but that’s life.
ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it? (I had an idea that I called UDT2 which I think does better on it than UDT1.1 but it’s not as elegant as I hoped.) Defining such problem classes may be useful for talking about the technical properties of specific decision theories, but that doesn’t seem to be what you’re trying to do here. The only other motivation I can think of is finding a way to justify not solving certain problems, but I don’t think that makes sense in the case of ASP.
“ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it?”—my intuition is the opposite, that doing well on such problems means doing poorly on others.
Can you explain your intuition? (Even supposing your intuition is correct, it still doesn’t seem like defining a “fair” class of problems is that useful. Shouldn’t we instead try to find a decision theory that offers the best trade-offs on the actual distribution of decision problems that we (or our AIs) will be expected to face?)
To explain my intuition, suppose we had a decision theory that does well on ASP-like problems and badly on others, and a second decision theory that does badly on ASP-like problems and well on others, then we can create a meta decision theory that first tries to figure out what kind of problem it is facing and then select one of these decision theories to solve it. This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist.
BTW, you can quote others by putting a quote in a separate paragraph and putting “>” in front of it.
It still doesn’t seem like defining a “fair” class of problems is that useful”—discovering one class of fair problems lead to CDT. Another lead to TDT. This theoretical work is seperate from the problem of producing pragmatic algorithms that deal with unfairness, but both approaches produce insights.
“This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist”—I currently have a draft post that does allow some kinds of rewards based on algorithm internals to be considered fair and which basically does the whole meta-decision theory thing (that section of the draft post was written a few hours after I asked this question which is why my views in it are slightly different).