I don’t think that 1 is a valid objection, because if you have non-Archimedean preferences, then it is rational to forget about your “weak” preferences (preferences that can be reversed by mixing in arbitrarily small probabilities of something else) so that you can free up computational resources to optimize for your “strong” preferences, and if you do that, your remaining preferences are Archimedean. See the “Doing without Continuity” section of http://lesswrong.com/lw/fu1/why_you_must_maximize_expected_utility/.
I suppose it’s conceivable that 2 could be a reason to give up VNM-rationality, but I’m pretty skeptical. One possible alternative to addressing the corrigibility problem is to just get indirect normativity right in the first place, so you don’t need the programmer to be able to tamper with the value system later. Of course, you might not want to rely on that, but I suspect that if a solution to corrigibility involves abandoning VNM-rationality, you end up increasing the chance that both the value system isn’t what you wanted and the corrigibility solution doesn’t work either, since now the framework for the value system is probably an awful kludge instead of a utility function.
(By the way, I like the guide as a whole and I appreciate you putting it together. I feel kind of bad about my original comment consisting entirely of a minor quibble, but I have a habit of only commenting about things I disagree with.)
Thanks! FWIW, I don’t strongly expect that we’ll have to abandon VNM at this point, but I do still have a lot of uncertainty in the area.
I will be much more willing to discard objection 1 after resolving my confusion surrounding Pascal muggings, and I’ll be much more willing to discard objection 2 once I have a formalization of “corrigibility.” The fact that I still have some related confusions, combined with the fact that (in my experience) mathematical insights often reveal hidden assumptions in things I thought were watertight, combined with a dash of outside view, leads me to decently high credence (~15%, with high anticipated variance) that VNM won’t cut it.
I agree with you about 3, 4, and 5.
I don’t think that 1 is a valid objection, because if you have non-Archimedean preferences, then it is rational to forget about your “weak” preferences (preferences that can be reversed by mixing in arbitrarily small probabilities of something else) so that you can free up computational resources to optimize for your “strong” preferences, and if you do that, your remaining preferences are Archimedean. See the “Doing without Continuity” section of http://lesswrong.com/lw/fu1/why_you_must_maximize_expected_utility/.
I suppose it’s conceivable that 2 could be a reason to give up VNM-rationality, but I’m pretty skeptical. One possible alternative to addressing the corrigibility problem is to just get indirect normativity right in the first place, so you don’t need the programmer to be able to tamper with the value system later. Of course, you might not want to rely on that, but I suspect that if a solution to corrigibility involves abandoning VNM-rationality, you end up increasing the chance that both the value system isn’t what you wanted and the corrigibility solution doesn’t work either, since now the framework for the value system is probably an awful kludge instead of a utility function.
(By the way, I like the guide as a whole and I appreciate you putting it together. I feel kind of bad about my original comment consisting entirely of a minor quibble, but I have a habit of only commenting about things I disagree with.)
Thanks! FWIW, I don’t strongly expect that we’ll have to abandon VNM at this point, but I do still have a lot of uncertainty in the area.
I will be much more willing to discard objection 1 after resolving my confusion surrounding Pascal muggings, and I’ll be much more willing to discard objection 2 once I have a formalization of “corrigibility.” The fact that I still have some related confusions, combined with the fact that (in my experience) mathematical insights often reveal hidden assumptions in things I thought were watertight, combined with a dash of outside view, leads me to decently high credence (~15%, with high anticipated variance) that VNM won’t cut it.
What’s the definition of preferences used in boundedly-rational agent research?