Why does this pose an issue for reinforcement learning? Forgive my ignorance, I do not have a background in the subject. Though I don’t believe that I have information which distinguishes cereal/granola in terms of which has stronger highest-severity consequences (given the smallness of those numbers and my inability to conceive of them, I strongly suspect anything I could come up with would exclusively represent epistemic and not aleatoric uncertainty), even if I accept it then the theory would tell me, correctly, that I should act based on that level. If that seems wrong, then it’s evidence we’ve incorrectly identified an implicit severity class in our imagination of the hypothetical, not that severity classes are incoherent (i.e. if I really have reason to believe that eating cereal even slightly increases the chance of Universe Destruction compared to eating granola, shouldn’t that make my decision for me?)
I would argue that many actions are sufficiently isolated such that, while they’ll certainly have high-severity ripple effects, we have no reason to believe that on expectation the high-severity consequences are worse than they would have been for a different action.
If the non-Archimedean framework really does “collapse” to an Archimedean one in practice, that’s fine with me. It exists to respond to a theoretical question about qualitatively different forms of utility, without biting a terribly strange bullet. Collapsing the utility function would mean assigning weight 0 to all but the maximal severity level, which seems very bad in that we certainly prefer no dust specks in our eyes to dust specks (ceteris paribus), and this should be accurately reflected in our evaluation of world states, even if the ramped function does lead to the same action preferences in many/most real-life scenarios for a sufficiently discerning agent (which maybe AI will be, but I know I am not).
If we had infinite compute, that would not eliminate empirical uncertainty. There are many things you cannot compute because you just don’t have enough information. This is why in learning theory sample complexity is distinct from computational complexity, and applies to algorithms with unlimited computational resources. So, you would definitely still need to take expectations.
Thanks for letting me know about this! Another thing I haven’t studied.
Why does this pose an issue for reinforcement learning? Forgive my ignorance, I do not have a background in the subject. Though I don’t believe that I have information which distinguishes cereal/granola in terms of which has stronger highest-severity consequences (given the smallness of those numbers and my inability to conceive of them, I strongly suspect anything I could come up with would exclusively represent epistemic and not aleatoric uncertainty), even if I accept it then the theory would tell me, correctly, that I should act based on that level. If that seems wrong, then it’s evidence we’ve incorrectly identified an implicit severity class in our imagination of the hypothetical, not that severity classes are incoherent (i.e. if I really have reason to believe that eating cereal even slightly increases the chance of Universe Destruction compared to eating granola, shouldn’t that make my decision for me?)
I would argue that many actions are sufficiently isolated such that, while they’ll certainly have high-severity ripple effects, we have no reason to believe that on expectation the high-severity consequences are worse than they would have been for a different action.
If the non-Archimedean framework really does “collapse” to an Archimedean one in practice, that’s fine with me. It exists to respond to a theoretical question about qualitatively different forms of utility, without biting a terribly strange bullet. Collapsing the utility function would mean assigning weight 0 to all but the maximal severity level, which seems very bad in that we certainly prefer no dust specks in our eyes to dust specks (ceteris paribus), and this should be accurately reflected in our evaluation of world states, even if the ramped function does lead to the same action preferences in many/most real-life scenarios for a sufficiently discerning agent (which maybe AI will be, but I know I am not).
Thanks for letting me know about this! Another thing I haven’t studied.