TekhneMakre comments on LOVE in a simbox is all you need

TekhneMakre 10 Nov 2022 17:49 UTC
2 points
In the real world, these domains aren’t the sort of thing where you get a perfect simulation. The differences will strongly add up when you strongly train an AI to maximize <this thing which was a good predictor of diamonds in the more restricted domain of <the domain, as viewed by the AI that was trained to predict the environment> >.
- jacob_cannell 10 Nov 2022 18:11 UTC
  2 points
  Parent
  We are now far from your original objection ” I don’t even know how to make an agent with a clear utility function module”.
  
  Imperfect simulations work just fine—for humans and various DL agents, so for your argument to be correct, you now need to explain how humans can still think and steer the future with imperfect world models, and once you do that you will understand how AI can as well.
  - TekhneMakre 10 Nov 2022 18:18 UTC
    2 points
    Parent
    We’re not far from there. There’s inferential distance here. Translating my original statement, I’d say: the closest thing to the “utility function module” in the scenario you’re describing here with MuZero, is the concept of predicted diamond and the AI it’s inside of. But then you train another AI to pursue that. And I’m saying, I don’t trust that that new trained AI actually maximizes diamond; and to the point, I don’t have any clarity on how the goals of newly trained AI sit inside it, operate inside it, direct its behavior, etc. And in particular I don’t understand it well enough to have any justified confidence it’ll robustly pursue diamond.
    - jacob_cannell 10 Nov 2022 18:33 UTC
      3 points
      Parent
      So to be clear there is just one AI, built out of several components: a world model, a planning engine, and a utility function. The world model is learned, but assumed to be learned perfectly (resulting in a functional equivalent of the actual sim physics). The planning engine also can learn action/value estimators for efficiency, but that is not required. The utility function is not learned at all, and is manually coded. So the learning components here can not possibly cause any problems.
      
      Of course that’s just in a sim.
      
      Translating the concept to the real world, there are now 3 possible sources of ‘errors’:
      
      imperfection of the learned world model
      imperfect planning (compute bound)
      imperfect utility function
      
      My main claim is that approximation error in 1 and 2 (which is inevitable) don’t necessarily bias for strong optimization towards the wrong utility function (and they can’t really).