One Doubt About Timeless Decision Theories
Timeless Decisions Theories (including variants like FDT, UDT, ADT, ect.) provide a rather elegant method of solving a broader class of problems than CDT. While CDT requires the outcomes of decisions to be independent of the individual making the decision (in such a way that causal surgery on a single node is valid), timeless decisions theories can handle any problem where the outcome is a function of the choice selected (even if this occurs indirectly as a result of a prediction).
(Epistemic Status: Thoughts for further investigation)
This is an excellent reason to investigate these decision theories, yet we need to make sure that we don’t get blinded by insight. Before we immediately jump to conclusions by taking this improvement, it is worthwhile considering what we give up. Perhaps there are other classes that we might which to optimise over which we can no longer optimise over once we have included this whole class?
After all, there is a sense in which there is no free lunch. As discussed in the TDT paper, for any algorithm, we could create a situation where there is an agent that specifically punished that algorithm. The usual response is that these situations are unfair, but a) the universe is often unfair b) there are plausible situations where the algorithm chosen influences the outcome is slightly less unfair ways.
Expanding on b), there are times when you want to be predictable to simulators. Indeed, I can even imagine agents that wish to eliminate agents that they can’t predict. Further, rather than facing a perfect predictor, it seems like it’ll be at least a few orders of magnitude more likely that you’ll face an imperfect predictor. Modelling these as X% perfect predictor, 100-X% random predictor will usually be implausible as predictors won’t have a uniform success rate over all algorithms. These situations are slightly more plausible for scenarios involving AI, but even if you perfectly know an agent’s source code, you are unlikely to know its exact observational state due to random noise.
It therefore seems that the “best” decision theory algorithm might be dominated by factors other than optimal performance on the narrow class of problems TDT operates on. It may very well be the case that TDT is ultimately taking the right approach, but even if this is the case, I thought it was worthwhile sketching out these concerns so that they can be addressed.
A logically updateless agent will, when it increases expected utility, commit to carrying out a certain policy early in logical time, so that predictors with at least a small amount of compute will know that the agent will carry out this policy. See also: UDT2, Policy selection solves most problems.
I also have my doubts about timeless/updateless decision theories, although maybe for different reasons. Now to be fair I’m not a decision theory researchers so maybe my doubts are easily addressed, but they persist whenever I read about them and as an informed, adjacent outsider they are perhaps worth bring up.
My doubts cluster around the concern that timelessness is some sort of trap or local maximum in decision theory design space that may work well for certain classes of problems but that we shouldn’t expect to work well in general. My concern hinges on the way real agents need to make decisions, which is generally not timeless due to computational constraints and is situated within the world such that limited information about the world is available at any particular (logical/functional) time and we should reasonably expect agents to update on what decisions they would make as they learn and think more. I realize this might make timeless decision theories idealized theories of decision making that can then be approximated by computational constrained agents who don’t know enough and can’t think enough to make decisions in time-invariant ways, but I have this nagging sense that even in that case the approximations may not do everything we want because they are trying to optimize on features that are unachievable for real agents (my intuition for this comes from the ways function approximation fails).
“My concern hinges on the way real agents need to make decisions, which is generally not timeless due to computational constraints”—that’s not what the timeless in timeless decision theory refers to.
“I realize this might make timeless decision theories idealized theories of decision making that can then be approximated by computational constrained agents”—well that’s exactly the idea
“But I have this nagging sense that even in that case the approximations may not do everything we want because they are trying to optimize on features that are unachievable for real agents”—This seems to be the central plank of your case. I don’t suppose you could spec it out some more? I have some idea of what you might be thinking, but I don’t want to put words in your mouth.
Well since I was confused my reasons for concern have dissolved and timelessness now feels like aiming in the direction of what you want, although it would probably make sense to consider stepping back further such that you are not trapped by, say, assumptions about the metaphysics of the world (think, allow yourself to go up a Tegmark level). So that aside I now wonder what you had in mind when I said this, since whatever i was thinking was not relevant.
Well, maybe it’s all unfounded if I’m confused about what makes these theories “timeless” or “updateless”, but I was under the impression that the goal was to have a decision theory where an agent couldn’t fall into the types of traps that happen if you allow agents to update on how they decide based on the outcomes of iterated games (though how they decide in the timeless case might include conditioning on memory) or condition on whether they are prior to or post seeing the outcome of a decision.
Regardless of the decision theory used, your previous calculations can become outdated due to new information. In Bayesian calculations, we normally have the agent update their model of the world based on evidence. In UDT (edited from TDT), the world model remains the same, but the selected agents change. So it isn’t obvious that one is worse than another in this regard.
Oh, so this sounds to me like (I also did a little additional refreshing on TDT), to translate this to philosophy terms I’m more comfortable with, timelessness is about having the agent not identify with its ontology, i.e. it can tell it’s using a map of the territory rather than confusing the two and so can change its map if needed (become a different agent). Although this makes me think maybe that’s not right because it’s not clear to me how you’d come to call the property timelessness unless it has something to do with how CDT relates the world and time.
Oh, I messed up, I mean UDT rather than TDT in the last comment. And in UDT it’s more a set of possible worlds that remains the same, rather than the model of a single world.
Anyway, timeless decision theories are called that because they calculate what a theoretical agent at the start of time would pre-commit to doing in the current situation.