We want our value-learner AI to learn to have the same preference order over outcomes as humans, which requires its goal to be to find (or at least learn to act according to) a utility function as close as possible to some aggregate of ours (if humans actually had utility functions rather than a collection of cognitive biases) up to an arbitrary monotonically-increasing mapping. We also want its preference order over probability distributions of outcomes to match ours, which requires it to find a utility function that matches ours up to an increasing affine (linear, i.e. scale and shift) transformation. So, once it has made good progress on its value learning, its utility function ought to make a lot of sense to us.
Yeah, humans lack a unique utility function. I know what you mean informally, just don’t get bogged down mathematizing something we don’t have.
So, once it has made good progress on its value learning, its utility function ought to make a lot of sense to us.
Do you think this is a desideratum, or a guarantee?
I’ll say the key point plainly: suppose some policy is “the good policy.” Which utility function causes an agent to follow the good policy will be different depending on how the agent makes decisions. For a given “good policy,” the utility functions that produce that policy can look weird to humans if worst-case reasoning steps are sprinkled into the agent’s decision-making.
I take your point that the way an Infra-Bayesian system makes decisions isn’t the same as a human — it presumably doesn’t share our cognitive biases, and the pessimism element ‘Murphy’ in it seems stronger than for most humans. I normally assume that if there’s something I don’t understand about the environment that’s injecting noise into the outcome of my actions, the noise-related parts of results aren’t going to be well-optimized, so they’re going to be worse than I could have achieved had I had full understanding, but that even leaving things to chance I may sometimes get some good luck along with the bad — I don’t generally assume that everything I can’t control will have literally the worst possible outcome. So I guess in Infra-Bayesian terms I’m assuming that Murphy is somewhat constrained by laws that I’m not yet aware of, and may never be aware of.
My take on Murphy is that it’s a systematization of the force of entropy trying to revert the environment to a thermodynamic equilibrium state, and of the common fact that the utility of that equilibrium state is usually pretty low. One of the flaws I see in Infra-Bayesianism is that there are sometimes (hard to reach but physically possible) states whose utility to me is even lower than the thermodynamic equilibrium (such as a policy that scores less than 20% on a 5-option multiple choice quiz so does worse than random guessing, or a minefield left over after a war that is actually worse than a blasted wasteland) where increasing entropy would actually help improve things. In a hellworld, randomly throwing money wrenches in the gears is a moderately effective strategy. In those unusual cases Infra-Bayesianism’s Murphy no longer aligns with the actual effects of entropy/Knightian uncertainty.
We want our value-learner AI to learn to have the same preference order over outcomes as humans, which requires its goal to be to find (or at least learn to act according to) a utility function as close as possible to some aggregate of ours (if humans actually had utility functions rather than a collection of cognitive biases) up to an arbitrary monotonically-increasing mapping. We also want its preference order over probability distributions of outcomes to match ours, which requires it to find a utility function that matches ours up to an increasing affine (linear, i.e. scale and shift) transformation. So, once it has made good progress on its value learning, its utility function ought to make a lot of sense to us.
Yeah, humans lack a unique utility function. I know what you mean informally, just don’t get bogged down mathematizing something we don’t have.
Do you think this is a desideratum, or a guarantee?
I’ll say the key point plainly: suppose some policy is “the good policy.” Which utility function causes an agent to follow the good policy will be different depending on how the agent makes decisions. For a given “good policy,” the utility functions that produce that policy can look weird to humans if worst-case reasoning steps are sprinkled into the agent’s decision-making.
I take your point that the way an Infra-Bayesian system makes decisions isn’t the same as a human — it presumably doesn’t share our cognitive biases, and the pessimism element ‘Murphy’ in it seems stronger than for most humans. I normally assume that if there’s something I don’t understand about the environment that’s injecting noise into the outcome of my actions, the noise-related parts of results aren’t going to be well-optimized, so they’re going to be worse than I could have achieved had I had full understanding, but that even leaving things to chance I may sometimes get some good luck along with the bad — I don’t generally assume that everything I can’t control will have literally the worst possible outcome. So I guess in Infra-Bayesian terms I’m assuming that Murphy is somewhat constrained by laws that I’m not yet aware of, and may never be aware of.
My take on Murphy is that it’s a systematization of the force of entropy trying to revert the environment to a thermodynamic equilibrium state, and of the common fact that the utility of that equilibrium state is usually pretty low. One of the flaws I see in Infra-Bayesianism is that there are sometimes (hard to reach but physically possible) states whose utility to me is even lower than the thermodynamic equilibrium (such as a policy that scores less than 20% on a 5-option multiple choice quiz so does worse than random guessing, or a minefield left over after a war that is actually worse than a blasted wasteland) where increasing entropy would actually help improve things. In a hellworld, randomly throwing money wrenches in the gears is a moderately effective strategy. In those unusual cases Infra-Bayesianism’s Murphy no longer aligns with the actual effects of entropy/Knightian uncertainty.