It’s a simple question, but I think it might help if I add in context. In the paper introducing Functional Decision Theory, it is noted that it is impossible to design an algorithm that can perform well on all decision problems since some of them can be specified to be blatantly unfair, ie. punish every agent that isn’t an alphabetical decision theorist.
The question then arises, how do we define which problems are or are not fair? We start by noting that some people consider Newcomb’s-like problems to be unfair since your outcome depends on a predictor’s prediction, which is rooted in an analysis of your algorithm. So what makes this case any different from only rewarding the alphabetical decision theorist?
The paper answers that the prediction only depends on the decision you end up making and that any other internal details are ignored. So it only cares about your decision and not how you come to it, the problem seems fair. I’m inclined to agree with this reasoning, but a similar line of reasoning doesn’t seem to hold with Agent Simulates Predictor. Here the algorithm you use is relevant as the predictor can only predict the agent if it’s algorithm is less than a certain level of complexity, otherwise it may make a mistake.
Please note that this question isn’t about whether this problem is worth considering; life is often unfair and we have to deal with it the best that we can. The question is about whether the problem is “fair”, where I roughly understand “fair” meaning that this is in a certain class of problems that I can’t specify at this moment (I suspect it would require its own seperate post) where we should be able to achieve the optimal result in each problem.
My thinking about this is that a problem is fair if it captures some aspect of some real world problem. I believe Gary Drescher came up with ASP as a distillation of the following problem, which itself tries to capture some essense of bargaining in the real world (similar to how Newcomb’s Problem is a distillation of Prisoner’s Dilemma, which tries to capture some essense of cooperation in the real world):
(It looks like the citation here is wrong, since I can’t find a description of this game in Slepnev (2011). As far as I know, I was the first person to come up with this game as something that UDT seems to handle poorly.)
“My thinking about this is that a problem is fair if it captures some aspect of some real world problem”—I would say that you have to accept that the real world can be unfair, but that doesn’t make real world problems “fair” in the sense gestured at in the FDT paper. Roughly, it is possible to define a broad class of problems such that you can have an algorithm that optimally handles all of them, for example if the reward only depends on your choice or predictions of your choice.
“It seems unsatisfactory that increased predictive power can harm an agent”—that’s just life when interacting with other agents. Indeed, in some games, exceeding a certain level of rationality provides an incentive for other players to take you out. That’s unfair, but that’s life.
ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it? (I had an idea that I called UDT2 which I think does better on it than UDT1.1 but it’s not as elegant as I hoped.) Defining such problem classes may be useful for talking about the technical properties of specific decision theories, but that doesn’t seem to be what you’re trying to do here. The only other motivation I can think of is finding a way to justify not solving certain problems, but I don’t think that makes sense in the case of ASP.
“ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it?”—my intuition is the opposite, that doing well on such problems means doing poorly on others.
Can you explain your intuition? (Even supposing your intuition is correct, it still doesn’t seem like defining a “fair” class of problems is that useful. Shouldn’t we instead try to find a decision theory that offers the best trade-offs on the actual distribution of decision problems that we (or our AIs) will be expected to face?)
To explain my intuition, suppose we had a decision theory that does well on ASP-like problems and badly on others, and a second decision theory that does badly on ASP-like problems and well on others, then we can create a meta decision theory that first tries to figure out what kind of problem it is facing and then select one of these decision theories to solve it. This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist.
BTW, you can quote others by putting a quote in a separate paragraph and putting “>” in front of it.
It still doesn’t seem like defining a “fair” class of problems is that useful”—discovering one class of fair problems lead to CDT. Another lead to TDT. This theoretical work is seperate from the problem of producing pragmatic algorithms that deal with unfairness, but both approaches produce insights.
“This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist”—I currently have a draft post that does allow some kinds of rewards based on algorithm internals to be considered fair and which basically does the whole meta-decision theory thing (that section of the draft post was written a few hours after I asked this question which is why my views in it are slightly different).
I’ve defined three classes of “fair” problems for UDT, which are all basically equivalent: single player extensive form games, programs with a halting oracle, and formulas in provability logic. But none of these are plain old programs without oracles or stuff. I haven’t been able to define any class of “fair” problems involving plain old programs. The most I can do is agree with you: ASP doesn’t seem “fair” in spirit and doesn’t translate into any of the classes I mentioned. This is an open question—maybe you can find a better “fair” class!
There are some formal notions of fairness that include ASP. See Asymptotic Decision Theory.
Here’s one way of thinking about this. Imagine a long sequence of instances of ASP. Both the agent and predictor in a later instance know what happened in all the earlier instances (say, because the amount of compute available in later instances is much higher, such that all previous instances can be simulated). The predictor in ASP is a logical inductor predicting what the agent will do this time.
Looking at the problem this way, it looks pretty fair. Since logical inductors can do induction, if an agent takes actions according to a certain policy, then the predictor will eventually learn this, regardless of the agent’s source code. So only the policy matters, not the source code.
See also In Logical Time, All Games are Iterated Games.
To my mind what seems unfair about some problems is that they propose predictors that, to the best of our knowledge, are physically impossible, like a Newcomb Omega that never makes a mistake, although these are only unfair in the sense that they depict scenarios we won’t ever encounter (perfect predictors), not that they ask us something mathematically unfair.
Other more mundane types of unfairness, like where a predictor simply demands something so specific that no general algorithm could always find a way to satisfy it, seem more fair to me because they are the sorts of things we actually encounter in the real world. If you haven’t encountered this sort of thing, just spend some time with a toddler, and you will be quickly disabused of the notion that there could not exist an agent which demands impossible things.
I already acknowledged in the real post that there exist problems that are unfair, so I don’t know why you think we disagree there.
I don’t think we disagree.