My impression, as someone just starting to learn Infra-Bayesianism, is that it’s about caution, lower bounds on utility (which is exactly the way anything trying to overcome the Optimizer’s Curse should be reasoning, especially in an environment already heavily optimized by humans where utility will have a lot more downside than upside uncertainty), so the utility score is vital in the argmax min process, and in the relationship between sa-measures and a-measures.
However, this does make it intuitively inobvious how to apply Infra-Bayesianism to Value Learning, where the utility function from physical states of the environment to utility values is initially very uncertain, and is an important part of what the AI is trying to do (Infra-)Bayesian updates on. So, a question for people who already understand Infra-Bayesianism: is it in fact applicable to Value Learning? If so, does it apply in the following way: the (a-priori unknown, likely quite complex, and possibly not fully computable/realizable to the agent) human effective utility function that maps physical states to human (and thus also value-learner-agent) utility values is treated as (an important) part of the environment, and thus the min over environments(‘Murphy’) part of the argmax min process includes making the most pessimistic still-viable assumptions about this?
To ask a follow-on question, if so, would cost-effectively reducing uncertainty in the human effective utility function (i.e. doing research on the alignment problem) to reduce Murphy’s future room-to-maneuver on this be a convergent intermediate strategy for any value-learner-agents that were using Infra-Bayesian reasoning? Or would such a system automatically assume that learning more about the human effective utility function is pointless, because they assume Murphy will always ensure that they live in the worst of all possible environments, so decreasing uncertainty on utility will only ever move the upper bound on it not the lower one?
[I’m trying to learn Infra-Bayesianism, but my math background is primarily from Theoretical Physics, so I’m more familiar with functional analysis, via field-theory Feynman history integrals, than with Pure Math concepts like Banach spaces. So the main Infra-Bayesianism sequence’s Pure Math approach is thus rather heavy going for me.]
[Question] Is Infra-Bayesianism Applicable to Value Learning?
My impression, as someone just starting to learn Infra-Bayesianism, is that it’s about caution, lower bounds on utility (which is exactly the way anything trying to overcome the Optimizer’s Curse should be reasoning, especially in an environment already heavily optimized by humans where utility will have a lot more downside than upside uncertainty), so the utility score is vital in the argmax min process, and in the relationship between sa-measures and a-measures.
However, this does make it intuitively inobvious how to apply Infra-Bayesianism to Value Learning, where the utility function from physical states of the environment to utility values is initially very uncertain, and is an important part of what the AI is trying to do (Infra-)Bayesian updates on. So, a question for people who already understand Infra-Bayesianism: is it in fact applicable to Value Learning? If so, does it apply in the following way: the (a-priori unknown, likely quite complex, and possibly not fully computable/realizable to the agent) human effective utility function that maps physical states to human (and thus also value-learner-agent) utility values is treated as (an important) part of the environment, and thus the min over environments(‘Murphy’) part of the argmax min process includes making the most pessimistic still-viable assumptions about this?
To ask a follow-on question, if so, would cost-effectively reducing uncertainty in the human effective utility function (i.e. doing research on the alignment problem) to reduce Murphy’s future room-to-maneuver on this be a convergent intermediate strategy for any value-learner-agents that were using Infra-Bayesian reasoning? Or would such a system automatically assume that learning more about the human effective utility function is pointless, because they assume Murphy will always ensure that they live in the worst of all possible environments, so decreasing uncertainty on utility will only ever move the upper bound on it not the lower one?
[I’m trying to learn Infra-Bayesianism, but my math background is primarily from Theoretical Physics, so I’m more familiar with functional analysis, via field-theory Feynman history integrals, than with Pure Math concepts like Banach spaces. So the main Infra-Bayesianism sequence’s Pure Math approach is thus rather heavy going for me.]