If you’re choosing between granola and cereal, then it might be that the granola manufacturer is owned by a multinational conglomerate that also has vested interests in the oil industry, and therefore, buying granola contributes to climate change that has consequences of much larger severity than your usual considerations for granola vs. cereal. Of course there might also be arguments in favor of cereal. But, if you take your non-Archimedean preferences seriously, it is absolutely worth the time to write down all of the arguments in favor of both options, and then choose the side which has the advantage in the largest severity class, however microscopic that advantage is.
I agree that small decisions have high-severity impacts, my point was that it is only worth the time to evaluate that impact if there aren’t other decisions I could spend that time making which have much greater impact and for which my decision time will be more effectively allocated. This is a comment about how using the non-Archimedean framework goes in practice. Certainly, if we had infinite compute and time to make every decision, we should focus on the most severe impacts of those decisions, but that is not the world we are in (and if we were, it would change quite a lot of other things too, effectively eliminating empirical uncertainty and the need to take expectations at all).
Alright, so if you already know any fact which connects cereal/granola to high severity consequences (however slim the connection is) then you should choose based on those facts only.
My intuition is, you will never have a coherent theory of agency with non-Archimedean utility functions. I will change my mind when I see something that looks like such the beginning of such a theory (e.g. some reasonable analysis of reinforcement learning for non-Archimedean utility).
If we had infinite compute, that would not eliminate empirical uncertainty. There are many things you cannot compute because you just don’t have enough information. This is why in learning theory sample complexity is distinct from computational complexity, and applies to algorithms with unlimited computational resources. So, you would definitely still need to take expectations.
Why does this pose an issue for reinforcement learning? Forgive my ignorance, I do not have a background in the subject. Though I don’t believe that I have information which distinguishes cereal/granola in terms of which has stronger highest-severity consequences (given the smallness of those numbers and my inability to conceive of them, I strongly suspect anything I could come up with would exclusively represent epistemic and not aleatoric uncertainty), even if I accept it then the theory would tell me, correctly, that I should act based on that level. If that seems wrong, then it’s evidence we’ve incorrectly identified an implicit severity class in our imagination of the hypothetical, not that severity classes are incoherent (i.e. if I really have reason to believe that eating cereal even slightly increases the chance of Universe Destruction compared to eating granola, shouldn’t that make my decision for me?)
I would argue that many actions are sufficiently isolated such that, while they’ll certainly have high-severity ripple effects, we have no reason to believe that on expectation the high-severity consequences are worse than they would have been for a different action.
If the non-Archimedean framework really does “collapse” to an Archimedean one in practice, that’s fine with me. It exists to respond to a theoretical question about qualitatively different forms of utility, without biting a terribly strange bullet. Collapsing the utility function would mean assigning weight 0 to all but the maximal severity level, which seems very bad in that we certainly prefer no dust specks in our eyes to dust specks (ceteris paribus), and this should be accurately reflected in our evaluation of world states, even if the ramped function does lead to the same action preferences in many/most real-life scenarios for a sufficiently discerning agent (which maybe AI will be, but I know I am not).
If we had infinite compute, that would not eliminate empirical uncertainty. There are many things you cannot compute because you just don’t have enough information. This is why in learning theory sample complexity is distinct from computational complexity, and applies to algorithms with unlimited computational resources. So, you would definitely still need to take expectations.
Thanks for letting me know about this! Another thing I haven’t studied.
Reinforcement learning with rewards or punishments that can have an infinite magnitude would seem to make intuitive sense for me. The buck is then kicked to reasoning whether it’s ever reasonable to give a sample a post-finite reward. Say that there are pictures label as either “woman”, “girl”,”boy” or “man” and labeling a boy a man or a man a boy would get you a Small reward while labeling a man a man would get you a Large reward where Large is infinite respect with respect to Small. With a finite version some “boy” vs “girl” weight could overcome a “man” vs “girl” weight which might be undesirable behaviour (if you strictly care about gender discrimination with no tradeoff for age discrimination).
If you’re choosing between granola and cereal, then it might be that the granola manufacturer is owned by a multinational conglomerate that also has vested interests in the oil industry, and therefore, buying granola contributes to climate change that has consequences of much larger severity than your usual considerations for granola vs. cereal. Of course there might also be arguments in favor of cereal. But, if you take your non-Archimedean preferences seriously, it is absolutely worth the time to write down all of the arguments in favor of both options, and then choose the side which has the advantage in the largest severity class, however microscopic that advantage is.
I agree that small decisions have high-severity impacts, my point was that it is only worth the time to evaluate that impact if there aren’t other decisions I could spend that time making which have much greater impact and for which my decision time will be more effectively allocated. This is a comment about how using the non-Archimedean framework goes in practice. Certainly, if we had infinite compute and time to make every decision, we should focus on the most severe impacts of those decisions, but that is not the world we are in (and if we were, it would change quite a lot of other things too, effectively eliminating empirical uncertainty and the need to take expectations at all).
Alright, so if you already know any fact which connects cereal/granola to high severity consequences (however slim the connection is) then you should choose based on those facts only.
My intuition is, you will never have a coherent theory of agency with non-Archimedean utility functions. I will change my mind when I see something that looks like such the beginning of such a theory (e.g. some reasonable analysis of reinforcement learning for non-Archimedean utility).
If we had infinite compute, that would not eliminate empirical uncertainty. There are many things you cannot compute because you just don’t have enough information. This is why in learning theory sample complexity is distinct from computational complexity, and applies to algorithms with unlimited computational resources. So, you would definitely still need to take expectations.
Why does this pose an issue for reinforcement learning? Forgive my ignorance, I do not have a background in the subject. Though I don’t believe that I have information which distinguishes cereal/granola in terms of which has stronger highest-severity consequences (given the smallness of those numbers and my inability to conceive of them, I strongly suspect anything I could come up with would exclusively represent epistemic and not aleatoric uncertainty), even if I accept it then the theory would tell me, correctly, that I should act based on that level. If that seems wrong, then it’s evidence we’ve incorrectly identified an implicit severity class in our imagination of the hypothetical, not that severity classes are incoherent (i.e. if I really have reason to believe that eating cereal even slightly increases the chance of Universe Destruction compared to eating granola, shouldn’t that make my decision for me?)
I would argue that many actions are sufficiently isolated such that, while they’ll certainly have high-severity ripple effects, we have no reason to believe that on expectation the high-severity consequences are worse than they would have been for a different action.
If the non-Archimedean framework really does “collapse” to an Archimedean one in practice, that’s fine with me. It exists to respond to a theoretical question about qualitatively different forms of utility, without biting a terribly strange bullet. Collapsing the utility function would mean assigning weight 0 to all but the maximal severity level, which seems very bad in that we certainly prefer no dust specks in our eyes to dust specks (ceteris paribus), and this should be accurately reflected in our evaluation of world states, even if the ramped function does lead to the same action preferences in many/most real-life scenarios for a sufficiently discerning agent (which maybe AI will be, but I know I am not).
Thanks for letting me know about this! Another thing I haven’t studied.
Reinforcement learning with rewards or punishments that can have an infinite magnitude would seem to make intuitive sense for me. The buck is then kicked to reasoning whether it’s ever reasonable to give a sample a post-finite reward. Say that there are pictures label as either “woman”, “girl”,”boy” or “man” and labeling a boy a man or a man a boy would get you a Small reward while labeling a man a man would get you a Large reward where Large is infinite respect with respect to Small. With a finite version some “boy” vs “girl” weight could overcome a “man” vs “girl” weight which might be undesirable behaviour (if you strictly care about gender discrimination with no tradeoff for age discrimination).