I’m not sure Decision Theory = NP makes sense. If a problem is in NP then you can produce a certificate (verifiable in polynomial time) showing that a solution is correct. If you’re iterating over possible actions to find which one has the highest expected utility, I don’t see what certificate you can output (in general) that would let someone check that the chosen action is the highest EU one, without going through the whole computation all over again. Perhaps I’m expecting the analogy to be tighter than intended?
The recent AI safety via debate paper gave a Supervised Learning = P, Reinforcement Learning = NP, DEBATE = PSPACE analogy (section 2.2), which does make sense to me.
If you have an NP oracle, you can first ask it “is there an action with EU >= k” for some medium k value, then binary search until you find the highest k value (within epsilon) for which there is an action with EU >= k, assuming that EU can be computed in polynomial time. I agree that you can’t prove that a particular sequence of actions has the highest possible EU.
“what move should open with in reversi” would be considered as an admissible decision-theory problem by many people. Or in other words: Your argument that EU maximization is in NP only holds for utility functions that permit computation in P of expected utility given your actions. That’s not quite true in the real world.
Joan Feigenbaum’s Games, Complexity Classes, and Approximation Algorithms may be a good citation for zero-sum game theory = PSPACE.
I’m not sure Decision Theory = NP makes sense. If a problem is in NP then you can produce a certificate (verifiable in polynomial time) showing that a solution is correct. If you’re iterating over possible actions to find which one has the highest expected utility, I don’t see what certificate you can output (in general) that would let someone check that the chosen action is the highest EU one, without going through the whole computation all over again. Perhaps I’m expecting the analogy to be tighter than intended?
The recent AI safety via debate paper gave a Supervised Learning = P, Reinforcement Learning = NP, DEBATE = PSPACE analogy (section 2.2), which does make sense to me.
Thanks for the links, those are relevant.
If you have an NP oracle, you can first ask it “is there an action with EU >= k” for some medium k value, then binary search until you find the highest k value (within epsilon) for which there is an action with EU >= k, assuming that EU can be computed in polynomial time. I agree that you can’t prove that a particular sequence of actions has the highest possible EU.
“what move should open with in reversi” would be considered as an admissible decision-theory problem by many people. Or in other words: Your argument that EU maximization is in NP only holds for utility functions that permit computation in P of expected utility given your actions. That’s not quite true in the real world.