Charlie Steiner answers Is Infra-Bayesianism Applicable to Value Learning?

Charlie Steiner 13 May 2023 5:27 UTC
2 points
0
Take this with a big grain of salt, but I’ll just tell you my impression.
Theoretically, I think it’s useful in that it tells us that a lot is possible even in non-realizable settings.
As a guide to practice, I think there’s plenty of room to do better. Ideally I’d want a representation that leverages composition of hypotheses with each other, and that natively does its reasoning in a non-extremizing way that makes more sense to humans (even if it’s mathematically equivalent to armax/argmin on some function).
Presently I think it’s a mistake to identify a good infrabayesian-physicalist utility function with a good human-intuitive-decision-theory utility function—the utility-numbers that get assigned to states don’t have to make sense to humans. This is an obstacle to value learning approaches that care about having a feedback loop of human reflection on the AI’s value learning process, which I think is important.
- Roger Dearnaley 14 May 2023 11:02 UTC
  1 point
  0
  Parent
  We want our value-learner AI to learn to have the same preference order over outcomes as humans, which requires its goal to be to find (or at least learn to act according to) a utility function as close as possible to some aggregate of ours (if humans actually had utility functions rather than a collection of cognitive biases) up to an arbitrary monotonically-increasing mapping. We also want its preference order over probability distributions of outcomes to match ours, which requires it to find a utility function that matches ours up to an increasing affine (linear, i.e. scale and shift) transformation. So, once it has made good progress on its value learning, its utility function ought to make a lot of sense to us.
  - Charlie Steiner 14 May 2023 13:43 UTC
    2 points
    0
    Parent
    
    if humans actually had utility functions
    
    Yeah, humans lack a unique utility function. I know what you mean informally, just don’t get bogged down mathematizing something we don’t have.
    
    So, once it has made good progress on its value learning, its utility function ought to make a lot of sense to us.
    
    Do you think this is a desideratum, or a guarantee?
    
    I’ll say the key point plainly: suppose some policy is “the good policy.” Which utility function causes an agent to follow the good policy will be different depending on how the agent makes decisions. For a given “good policy,” the utility functions that produce that policy can look weird to humans if worst-case reasoning steps are sprinkled into the agent’s decision-making.
    - Roger Dearnaley 15 May 2023 0:10 UTC
      1 point
      0
      Parent
      I take your point that the way an Infra-Bayesian system makes decisions isn’t the same as a human — it presumably doesn’t share our cognitive biases, and the pessimism element ‘Murphy’ in it seems stronger than for most humans. I normally assume that if there’s something I don’t understand about the environment that’s injecting noise into the outcome of my actions, the noise-related parts of results aren’t going to be well-optimized, so they’re going to be worse than I could have achieved had I had full understanding, but that even leaving things to chance I may sometimes get some good luck along with the bad — I don’t generally assume that everything I can’t control will have literally the worst possible outcome. So I guess in Infra-Bayesian terms I’m assuming that Murphy is somewhat constrained by laws that I’m not yet aware of, and may never be aware of.
      My take on Murphy is that it’s a systematization of the force of entropy trying to revert the environment to a thermodynamic equilibrium state, and of the common fact that the utility of that equilibrium state is usually pretty low. One of the flaws I see in Infra-Bayesianism is that there are sometimes (hard to reach but physically possible) states whose utility to me is even lower than the thermodynamic equilibrium (such as a policy that scores less than 20% on a 5-option multiple choice quiz so does worse than random guessing, or a minefield left over after a war that is actually worse than a blasted wasteland) where increasing entropy would actually help improve things. In a hellworld, randomly throwing money wrenches in the gears is a moderately effective strategy. In those unusual cases Infra-Bayesianism’s Murphy no longer aligns with the actual effects of entropy/Knightian uncertainty.