This has been extremely useful for me, even though I understood the basic idea already, because with the discussion it made me understand something very basic that somehow I’d always missed so far: I’d always thought of this kind of program in the context of applying classical decision theory with the possible actions being particular agent()s. But using classical decision theory already presupposes some notion of “could.” Now I understand that the principle here yields (or is a real step towards) a notion of could-ness even in the real world: If you know the formula that computes/approximates the universe (assuming there is such) and can state a utility function over results of this program, you can program a computer with something like your program, which gives a notion of “could.” Thank you for helping me fill my stupid gap in understanding!
Side note: the intimate way that your notion of “could” is bound up with the program you’re running makes it nontrivial (or at any rate non-obvious to me) how to compare different possible programs. I don’t immediately see a direct equivalent to how, given a decision-theoretic problem, you can compare different efficient heuristic programs on how close they come to the theoretical optimum.
Good question, but surprisingly hard to parse. Took me three attempts.
Basically, you point out that my implementation of agent() quantifies over possible choices, but sometimes it’s useful for agents to quantify over programs that lead to choices. Maybe this falls magically out of the proof checker, maybe not. I cannot answer this. My brain hurts.
Ouch, sorry! Too much time spent thinking about these things by myself without having to explain what I mean, probably. I think I still haven’t gotten across the simple point I was trying to make; let me try again.
Given an optimization problem like the TSP, there is a well-defined notion of “what utility/score do I get if I choose to do X,” independent of the algorithm you use to compute a solution—so there is a theoretical optimum solution independent of the algorithm you use to make a choice, and you can compare different heuristic algorithms on how close they come to the theoretical optimum. It seems useful to have this sort of description of the problem when trying to find actual practical algorithms.
I used to hope that we could find an equivalent for timeless decision theories, an abstract description of the optimum solution that we could then try to approximate through heuristic algorithms, but in your work, the definition of “what would happen if I did X” is so intimately connected to the algorithm making the decision that it’s at least not obvious to me how this could work.
Most (all?) proofs of statements like “if I did X, utility would be U” don’t actually need to look at the implementation of the agent, only find the call sites in world(). So we can compare different heuristics without things blowing up. Or did you mean something else?
Do you have an example of a decision theory problem where the optimal solution is hard to find, but heuristics help? Or should I use the TSP as an example?
(Sorry for the delayed reply.) I’m a little confused: it’s true that we can easily compare heuristics for world()s that depend on the agent only through easily identifiable call sites, but if we’re only considering this class of problems, what does your algorithm add over the simpler algorithm of replacing all call sites to agent() by a constant, running the result, and choosing the constant that gives the highest value?
(I suppose I was hoping for more direct game theoretical applications than you, so I had situations in mind where the (rest of) the world does depend on the source of the agent.)
Regarding examples, what I had in mind was: Surely your literal algorithm of enumerating all proofs is not efficient enough to run even on toy problems like your implementation of Newcomb! Also, surely for any real-world problem, it won’t be feasible to find the optimal solution. So at some point, we will want to use the theory as a background for looking for practical (and thus necessarily heuristic) algorithms.
what does your algorithm add over the simpler algorithm of replacing all call sites to agent() by a constant, running the result, and choosing the constant that gives the highest value?
Admittedly, not much. If world() calls agent2() that provably returns the same value as agent(), agent() will notice it. But this wasn’t really the point of the post. The point was to show that having perfect knowledge of yourself doesn’t necessarily stop you from considering counterfactuals about yourself. Of course it’s all completely impractical.
Perhaps a closer approximation to reality would be a “listener” algorithm that accepts proofs (e.g. by reading game theory books), instead of trying to find them.
It’s true that agent() cannot see where world() ends and agent() begins, but we can see that! Otherwise it would be unclear what a “decision-theoretic problem” even means. So we can just compare the return values of world() for different implementations of agent(). Or have I mis-parsed your question?
This has been extremely useful for me, even though I understood the basic idea already, because with the discussion it made me understand something very basic that somehow I’d always missed so far: I’d always thought of this kind of program in the context of applying classical decision theory with the possible actions being particular agent()s. But using classical decision theory already presupposes some notion of “could.” Now I understand that the principle here yields (or is a real step towards) a notion of could-ness even in the real world: If you know the formula that computes/approximates the universe (assuming there is such) and can state a utility function over results of this program, you can program a computer with something like your program, which gives a notion of “could.” Thank you for helping me fill my stupid gap in understanding!
Side note: the intimate way that your notion of “could” is bound up with the program you’re running makes it nontrivial (or at any rate non-obvious to me) how to compare different possible programs. I don’t immediately see a direct equivalent to how, given a decision-theoretic problem, you can compare different efficient heuristic programs on how close they come to the theoretical optimum.
Good question, but surprisingly hard to parse. Took me three attempts.
Basically, you point out that my implementation of agent() quantifies over possible choices, but sometimes it’s useful for agents to quantify over programs that lead to choices. Maybe this falls magically out of the proof checker, maybe not. I cannot answer this. My brain hurts.
Ouch, sorry! Too much time spent thinking about these things by myself without having to explain what I mean, probably. I think I still haven’t gotten across the simple point I was trying to make; let me try again.
Given an optimization problem like the TSP, there is a well-defined notion of “what utility/score do I get if I choose to do X,” independent of the algorithm you use to compute a solution—so there is a theoretical optimum solution independent of the algorithm you use to make a choice, and you can compare different heuristic algorithms on how close they come to the theoretical optimum. It seems useful to have this sort of description of the problem when trying to find actual practical algorithms.
I used to hope that we could find an equivalent for timeless decision theories, an abstract description of the optimum solution that we could then try to approximate through heuristic algorithms, but in your work, the definition of “what would happen if I did X” is so intimately connected to the algorithm making the decision that it’s at least not obvious to me how this could work.
Most (all?) proofs of statements like “if I did X, utility would be U” don’t actually need to look at the implementation of the agent, only find the call sites in world(). So we can compare different heuristics without things blowing up. Or did you mean something else?
Do you have an example of a decision theory problem where the optimal solution is hard to find, but heuristics help? Or should I use the TSP as an example?
(Sorry for the delayed reply.) I’m a little confused: it’s true that we can easily compare heuristics for world()s that depend on the agent only through easily identifiable call sites, but if we’re only considering this class of problems, what does your algorithm add over the simpler algorithm of replacing all call sites to agent() by a constant, running the result, and choosing the constant that gives the highest value?
(I suppose I was hoping for more direct game theoretical applications than you, so I had situations in mind where the (rest of) the world does depend on the source of the agent.)
Regarding examples, what I had in mind was: Surely your literal algorithm of enumerating all proofs is not efficient enough to run even on toy problems like your implementation of Newcomb! Also, surely for any real-world problem, it won’t be feasible to find the optimal solution. So at some point, we will want to use the theory as a background for looking for practical (and thus necessarily heuristic) algorithms.
Admittedly, not much. If world() calls agent2() that provably returns the same value as agent(), agent() will notice it. But this wasn’t really the point of the post. The point was to show that having perfect knowledge of yourself doesn’t necessarily stop you from considering counterfactuals about yourself. Of course it’s all completely impractical.
Perhaps a closer approximation to reality would be a “listener” algorithm that accepts proofs (e.g. by reading game theory books), instead of trying to find them.
It’s true that agent() cannot see where world() ends and agent() begins, but we can see that! Otherwise it would be unclear what a “decision-theoretic problem” even means. So we can just compare the return values of world() for different implementations of agent(). Or have I mis-parsed your question?