jacob_cannell comments on Framing approaches to alignment and the hard problem of AI cognition

jacob_cannell 16 Dec 2021 7:19 UTC
4 points
So I said consequentialist mostly maps to model-based RL because “choosing actions based on their anticipated consequences” is just a literal plain english description of how model-based RL works—with the model-based predictive planning being an implementation of “anticipating consequences”.

It’s more complicated for model-free RL, in part because with enough diverse training data and regularization various forms of consequentalist/planning systems could potentially develop as viable low complexity solutions.

But effective consequentalist-planning requires significant compute and recursion depth such that it’s outside the scope of many simpler model-free systems—and i’m thinking primarily of earlier DM atari agents—so instead they often seem to develop a collection of clever heuristics that work well in most situations, without the ability to explicitly evaluate the long term consequences of specific actions in novel situations—thus more deontological.
- Steven Byrnes 16 Dec 2021 14:06 UTC
  4 points
  Parent
  Hmm, I would say that DQN “chooses actions based on their anticipated consequences” in that the Q-function incorporates an estimate of anticipated consequences. (Especially with a low discount rate.)
  I’m happy to say that model-based RL might be generically better at anticipating consequences (especially in novel circumstances) than model-free RL. Neither is perfect though.
  DQN has an implicit plan encoded in the Q-function—i.e., in state S1 action A1 seems good, and that brings us to state S2 where action A2 seems good, etc. … all that stuff together is (IMO) an implicit plan, and such a plan can involve short-term sacrifices for longer-term benefit.
  Whereas model-based RL with tree search (for example) has an explicit plan: at timestep T, it has an explicit representation of what it’s planning to do at timesteps T+1, T+2, ….
  Humans are able to make explicit plans too, although it doesn’t look like one-timestep-at-a-time.
  - jacob_cannell 16 Dec 2021 18:41 UTC
    2 points
    Parent
    Sure you can consider the TD style unrolling in model-free a sort of implicit planning, but it’s not really consequentialist in most situations as it can’t dynamically explore new relevant expansions of the state tree the way planning can. Or you could consider planning as a dynamic few-shot extension to fast learning/updating the decision function.
    
    Human planning is sometimes explicit timestep by timestep (when playing certain board games for example) when that is what efficient planning demands, but in the more general case human planning uses more complex approximations that more freely jump across spatio-temporal approximation hierarchies.