Charlie Steiner comments on The Computational Anatomy of Human Values

Charlie Steiner 8 Apr 2023 5:51 UTC
LW: 2 AF: 1
0
AF
The world model is learnt mostly by unsupervised predictive learning and so is somewhat orthogonal to the specific goal. Of course in practice in a continual learning setting, what you do and pay attention to (which is affected by your goal) will affect the data input to the unsupervised learning process?
afaict, a big fraction of evolution’s instructions for humans (which made sense in the ancestral environment) are encoded as what you pay attention to. Babies fixate on faces, not because they have a practical need to track faces at 1 week old, but because having a detailed model of other humans will be valuable later. Young children being curious about animals is a human universal. Etc.
Patterns of behavior (some of which I’d include in my goals) encoded in my model can act in a way that’s somewhere between unconscious and too obvious to question—you might end up doing things not because you have visceral feelings about the different options, but simply because your model is so much better at some of the options that the other options never even get considered.
- beren 8 Apr 2023 11:07 UTC
  LW: 2 AF: 1
  0
  AF Parent
  afaict, a big fraction of evolution’s instructions for humans (which made sense in the ancestral environment) are encoded as what you pay attention to. Babies fixate on faces, not because they have a practical need to track faces at 1 week old, but because having a detailed model of other humans will be valuable later. Young children being curious about animals is a human universal. Etc.
  
  This is true but I don’t think is super important for this argument. Evolution definitely encodes inductive biases into learning about relevant things which ML architectures do not, but this is primarily to speed up learning and handle limited initial data. Most of the things evolution focuses on such as faces are natural abstractions anyway and would be learnt by pure unsupervised learning systems.
  Patterns of behavior (some of which I’d include in my goals) encoded in my model can act in a way that’s somewhere between unconscious and too obvious to question—you might end up doing things not because you have visceral feelings about the different options, but simply because your model is so much better at some of the options that the other options never even get considered.
  Yes, there are also a number of ways to short-circuit model evaluation entirely. The classic one is having a habit policy which is effectively your action prior. There are also cases where you just follow the default model-free policy and only in cases where you are even more uncertain do you actually deploy the full model-based evaluation capacities that you have.